classes – CSS-Tricks https://css-tricks.com Tips, Tricks, and Techniques on using Cascading Style Sheets. Wed, 26 Jan 2022 19:48:51 +0000 en-US hourly 1 https://wordpress.org/?v=6.1.1 https://i0.wp.com/css-tricks.com/wp-content/uploads/2021/07/star.png?fit=32%2C32&ssl=1 classes – CSS-Tricks https://css-tricks.com 32 32 45537868 How to Cycle Through Classes on an HTML Element https://css-tricks.com/cycle-through-classes-html-element/ https://css-tricks.com/cycle-through-classes-html-element/#comments Wed, 26 Jan 2022 19:48:49 +0000 https://css-tricks.com/?p=361200 Say you have three HTML classes, and a DOM element should only have one of them at a time:

<div class="state-1"</div<div class="state-2"</div<div class="state-3"</div

Now your job is to rotate them. That is, cycle through classes …


How to Cycle Through Classes on an HTML Element originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

]]>
Say you have three HTML classes, and a DOM element should only have one of them at a time:

<div class="state-1"></div>
<div class="state-2"></div>
<div class="state-3"></div>

Now your job is to rotate them. That is, cycle through classes on an HTML element. When some event occurs, if the element has state-1 on it, remove state-1 and add state-2. If it has state-2 on it, remove that and add state-3. On the last state, remove it, and cycle back to state-1.

Example of how to Cycle Through Classes on an HTML Element. Here a large <button> with an <svg> inside cycles through state-1, state-2, and state-3 classes, turning from red to yellow to green.

It’s notable that we’re talking about 3+ classes here. The DOM has a .classList.toggle() function, even one that takes a conditional as a second parameter, but that’s primarily useful in a two-class on/off situation, not cycling through classes.

Why? There is a number of reasons. Changing a class name gives you lots of power to re-style things in the DOM, and state management like that is a cornerstone of modern web development. But to be specific, in my case, I was wanting to do FLIP animations where I’d change a layout and trigger a tween animation between the different states.

Careful about existing classes! I saw some ideas that overwrote .className, which isn’t friendly toward other classes that might be on the DOM element. All these are “safe” choices for cycling through classes in that way.

Because this is programming, there are lots of ways to get this done. Let’s cover a bunch of them — for fun. I tweeted about this issue, so many of these solutions are from people who chimed into that discussion.

A verbose if/else statement to cycle through classes

This is what I did at first to cycle through classes. That’s how my brain works. Just write out very specific instructions for exactly what you want to happen:

if (el.classList.contains("state-1")) {
  el.classList.remove("state-1");
  el.classList.add("state-2");
} else if (el.classList.contains("state-2")) {
  el.classList.remove("state-2");
  el.classList.add("state-3");
} else {
  el.classList.remove("state-3");
  el.classList.add("state-1");
}

I don’t mind the verbosity here, because to me it’s super clear what’s going on and will be easy to return to this code and “reason about it,” as they say. You could consider the verbosity a problem — surely there is a way to cycle through classes with less code. But a bigger issue is that it isn’t very extensible. There is no semblance of configuration (e.g. change the names of the classes easily) or simple way to add classes to the party, or remove them.

We could use constants, at least:

const STATE_1 = "state-1";
const STATE_2 = "state-2";
const STATE_3 = "state-3";

if (el.classList.contains(STATE_1)) {
  el.classList.remove(STATE_1);
  el.classList.add(STATE_2);
} else if (el.classList.contains(STATE_2)) {
  el.classList.remove(STATE_2);
  el.classList.add(STATE_3);
} else {
  el.classList.remove(STATE_3);
  el.classList.add(STATE_1);
}

But that’s not wildly different or better.

RegEx off the old class, increment state, then re-add

This one comes from Tab Atkins. Since we know the format of the class, state-N, we can look for that, pluck off the number, use a little ternary to increment it (but not higher than the highest state), then add/remove the classes as a way of cycling through them:

const oldN = +/\bstate-(\d+)\b/.exec(el.getAttribute('class'))[1];
const newN = oldN >= 3 ? 1 : oldN+1;
el.classList.remove(`state-${oldN}`);
el.classList.add(`state-${newN}`);

Find the index of the class, then remove/add

A bunch of techniques to cycle through classes center around setting up an array of classes up front. This acts as configuration for cycling through classes, which I think is a smart way to do it. Once you have that, you can find the relevant classes for adding and removing them. This one is from Christopher Kirk-Nielsen:

const classes = ["state-1", "state-2", "state-3"];
const activeIndex = classes.findIndex((c) => el.classList.contains(c));
const nextIndex = (activeIndex + 1) % classes.length;

el.classList.remove(classes[activeIndex]);
el.classList.add(classes[nextIndex]);

Christopher had a nice idea for making the add/remove technique shorter as well. Turns out it’s the same…

el.classList.remove(classes[activeIndex]);
el.classList.add(classes[nextIndex]);

// Does the same thing.
el.classList.replace(classes[activeIndex], classes[nextIndex]);

Mayank had a similar idea for cycling through classes by finding the class in an array, only rather than using classList.contains(), you check the classes currently on the DOM element with what is in the array.

const states = ["state-1", "state-2", "state-3"];
const current = [...el.classList].find(cls => states.includes(cls));
const next = states[(states.indexOf(current) + 1) % states.length];
el.classList.remove(current);
el.classList.add(next);

Variations of this were the most common idea. Here’s Jhey’s and here’s Mike Wagz which sets up functions for moving forward and backward.

Cascading replace statements

Speaking of that replace API, Chris Calo had a clever idea where you chain them with the or operator and rely on the fact that it returns true/false if it works or doesn’t. So you do all three and one of them will work!

 el.classList.replace("state-1", "state-2") ||
 el.classList.replace("state-2", "state-3") ||
 el.classList.replace("state-3", "state-1");

Nicolò Ribaudo came to the same conclusion.

Just cycle through class numbers

If you pre-configured a 1 upfront, you could cycle through classes 1-3 and add/remove them based on that. This is from Timothy Leverett who lists another similar option in the same tweet.

// Assumes a `let s = 1` upfront
el.classList.remove(`state-${s + 1}`);
s = (s + 1) % 3;
el.classList.add(`state-${s + 1}`);

Use data-* attributes instead

Data attributes have the same specificity power, so I have no issue with this. They might actually be more clear in terms of state handling, but even better, they have a special API that makes them nice to manipulate. Munawwar Firoz had an idea that gets this down to a one-liner:

el.dataset.state = (+el.dataset.state % 3) + 1

A data attribute state machine

You can count on David Khourshid to be ready with a state machine:

const simpleMachine = {
  "1": "2",
  "2": "3",
  "3": "1"
};
el.dataset.state = simpleMachine[el.dataset.state];

You’ll almost surely want a function

Give yourself a little abstraction, right? Many of the ideas wrote code this way, but so far I’ve move it out to focus on the idea itself. Here, I’ll leave the function in. This one is from Andrea Giammarchi in which a unique function for cycling through classes is set up ahead of time, then you call it as needed:

const rotator = (classes) => ({ classList }) => {
  const current = classes.findIndex((cls) => classList.contains(cls));
  classList.remove(...classes);
  classList.add(classes[(current + 1) % classes.length]);
};

const rotate = rotator(["state-1", "state-2", "state-3"]);
rotate(el);

I heard from Kyle Simpson who had this same idea, almost character for character.

Others?

There were more ideas in the replies to my original tweet, but are, best I can tell, variations on what I’ve already shared above. Apologies if I missed yours! Feel free to share your idea again in the comments here. I see nobody used a switch statements — that could be a possibility!

David Desandro went as far as recording a video, which is wonderful as it slowly abstracts the concepts further and further until it’s succinct but still readable and much more flexible:

And here’s a demo Pen with all the code for each example in there. They are numbered, so to test out another one, comment out the one that is uncommented, and uncomment another example:


How to Cycle Through Classes on an HTML Element originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

]]>
https://css-tricks.com/cycle-through-classes-html-element/feed/ 6 https://www.youtube.com/embed/hXrHZ_LSzkk State variable and cycling through an array nonadult 361200
Could Grouping HTML Classes Make Them More Readable? https://css-tricks.com/could-grouping-html-classes-make-them-more-readable/ https://css-tricks.com/could-grouping-html-classes-make-them-more-readable/#comments Mon, 22 Apr 2019 19:45:46 +0000 http://css-tricks.com/?p=286565 You can have multiple classes on an HTML element:

<div class="module p-2"></div>

Nothing incorrect or invalid there at all. It has two classes. In CSS, both of these will apply:

.module { }
.p-2 { }
const div = document.querySelector("div");


Could Grouping HTML Classes Make Them More Readable? originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

]]>
You can have multiple classes on an HTML element:

<div class="module p-2"></div>

Nothing incorrect or invalid there at all. It has two classes. In CSS, both of these will apply:

.module { }
.p-2 { }
const div = document.querySelector("div");
console.log(div.classList.contains("module")); // true
console.log(div.classList.contains("p-3"));    // false

But what about grouping them? All we have here is a space-separated string. Maybe that’s fine. But maybe we can make things more clear!

Years ago, Harry Roberts talked about grouping them. He wrapped groups of classes in square brackets:

<div class="[ foo  foo--bar ]  [ baz  baz--foo ]">

The example class names above are totally abstract just to demonstrate the grouping. Imagine they are like primary names and variations as one group, then utility classes as another group:

<header class="[ site-header site-header-large ]  [ mb-10 p-15 ]">

Those square brackets? Meaningless. Those are there to visually represent the groups to us developers. Technically, they are also classes, so if some sadist wrote .[ {}, it would do stuff in your CSS. But that’s so unlikely that, hopefully, the clarity from the groups outweighs it and is more helpful.

That example above groups the primary name and a variation in one group and some example utility classes in another group.

I’m not necessarily recommending that approach. They are simply groups of classes that you might have.

Here’s the same style of grouping, with different groups:

<button class="[ link-button ] [ font-base text-xs color-primary ] [ js-trigger ]" type="button" hidden>

That example has a single primary name, utility classes with different naming styles, and a third group for JavaScript specific selectors.

Harry wound up shunning this approach a few years ago, saying that the look of it was just too weird for the variety of people and teams he worked with. It caused enough confusion that the benefits of grouped classes weren’t worth it. He suggested line breaks instead:

<div class="media  media--large
            testimonial  testimonial--main"> 

That seems similarly clear to me. The line breaks in HTML are totally fine. Plus, the browser will have no trouble with that and JSX is generally written with lots of line breaks in HTML anyway because of how much extra stuff is plopped onto elements in there, like event handlers and props.

Perhaps we combine the ideas of line breaks as separators and identified groups… with emojis!

See the Pen
Grouping Classes
by Chris Coyier (@chriscoyier)
on CodePen.

Weird, but fun. Emojis are totally valid there. Like the square brackets, they could also do things if someone wrote a class name for them, but that’s generally unlikely and something for a team to talk about.

Another thing I’ve seen used is data-* attributes for groups instead of classes, like…

<div 
  class="primary-name"
  data-js="js-hook-1 js-hook-2"
  data-utilities="padding-large"
>

You can still select and style based on attributes in both CSS and JavaScript, so it’s functional, though slightly less convenient because of the awkward selectors like [data-js="js-hook-1"] and lack of convenient APIs like classList.

How about you? Do you have any other clever ideas for class name groups?


Could Grouping HTML Classes Make Them More Readable? originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

]]>
https://css-tricks.com/could-grouping-html-classes-make-them-more-readable/feed/ 16 286565
Random Interesting Facts on HTML/SVG usage https://css-tricks.com/random-interesting-facts-htmlsvg-usage/ https://css-tricks.com/random-interesting-facts-htmlsvg-usage/#comments Fri, 25 Nov 2016 14:28:35 +0000 http://css-tricks.com/?p=247716 Last time, we saw how the average web page looks like using data from about 8 million websites. That’s a lot of data, and we’ve been continuing to sift through it. We’re back again this time to showcase some …


Random Interesting Facts on HTML/SVG usage originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

]]>
Last time, we saw how the average web page looks like using data from about 8 million websites. That’s a lot of data, and we’ve been continuing to sift through it. We’re back again this time to showcase some random and hopefully interesting facts on markup usage.

Hiding DOM elements

There are various ways of hiding DOM elements: completely, semantically, or visually.

Considering the current practices and recommendations, check out the findings on the most used methods to hide things via HTML or CSS:

Selector Count
[aria-hidden] 2,609,973
.hidden 1,556,017
.hide 1,389,540
.sr-only 583,126
.visually-hidden 136,635
.visuallyhidden 116,269
.invisible 113,473
[hidden] 31,290

no-js HTML class

When JavaScript libraries like Modernizr run, the no-js class is removed and it’s replaced with js. This way you can apply different CSS rules depending on whether JavaScript is enabled or not in your browser.

We found a total number of 844,242 elements whose HTML class list contains the no-js string. More than 92% of them are html elements.

If you’re wondering about the remaining 8%, check out the top 10:

Element Count
html 782,613
body 31,256
a 17,833
div 7,364
meta 1,104
ul 905
li 789
nav 768
span 431
article 263

noscript

The HTML noscript element defines a section of markup that acts as an alternate content for users that have client-side scripting disabled, or whose browser lacks support. The client-side scripting language is usually JavaScript.

We found 3,536,247 noscript elements within the 8 million top twenty Google results.

AMP

Accelerated Mobile Pages (AMP) is a Google initiative which aims to speed up the mobile web. Most publishers are making their content available in parallel in the AMP format.

To let Google and other platforms know about it, you need to link the AMP and non-AMP pages together.

Within the 8 million pages we looked at, we found only 1,944 non-AMP pages referencing their AMP version using rel=amphtml.

Links attributes & values

href=”javascript:void(0)

We found 2,002,716 a elements with href="javascript:void(0)". Whether you’re coding a button or coding a link, you’re doing it wrong.

href=”javascript:void(0)”
(a) You’re coding a button with the wrong element
(b) You’re coding a link with the wrong technology
Heydon Pickering

target=_blank w/ or w/o rel=noopener

43,924,869 of the anchors we analyzed are using target="_blank" without a rel="noopener" conjunction. In this case, if rel="noopener" is missing, you are leaving your users open to a phishing attack and it’s considered a security vulnerability.

Anchor/Link Count
[target=_blank] 43,924,869
[rel=noopener] 40,756
[target=_blank][rel=noopener] 35,604

MDN:

When using target you should consider adding rel=”noopener noreferrer” to avoid exploitation of the window.opener API.

Ben Halpern and Mathias Bynens also wrote some good articles on this matter and the common advice is: don’t use target=_blank, unless you have good reasons.

href=#top

It seems it is a common practice to use #top as a href value to redirect the user to the top section of the current page. There were found 377,486 a elements with href=#top values.

lang

Léonie Watson:

The HTML lang attribute is used to identify the language of text content on the web. This information helps search engines return language specific results, and it is also used by screen readers that switch language profiles to provide the correct accent and pronunciation.

Of the 8,021,323 pages that we were able to look into, 5,368,133 use the lang attribute on the html element. That’s about 70%!

div

The average web page has around 71 divs. This number was computed after counting all the div elements (576,067,185) encountered within 8,021,323 million pages.

header vs footer

2,358,071 of pages use the header element while the footer is used by 2,363,665 pages. Also we found that only 2,117,448 of pages are using both header and footer.

Element Count
footer 2,363,665
header 2,358,071

Links are not buttons

Neither are div‘s and span‘s.

Element Attribute & Value Count
a class=btn 3,251,114
a class=button 2,776,660
span class=button 292,168
div class=button 278,996
span class=btn 202,054
div class=btn 131,950

In exchange, here are the native buttons statistics:

Selector Count
button 4,237,743
input[type=image] 1,030,802
input[type=button] 916,268

Buttons without a specified type

Speaking of buttons, the button element has a default type of submit. Make sure you always specify the button type, because we found around 1,336,990 button elements with missing type attribute. That’s around 31.5% from the total of buttons found in the wild.

BEM syntax

If you’re a CSS addict, you may have heard about BEM, which is a popular naming convention for HTML classes.

Knowing the BEM naming style that consists of strings containing double-underscores or/and double-dashes, we were able to guess that only 20,463 elements use the BEM naming style.

Bootstrap & Font Awesome

Apparently, we found only 1,711 pages that link to CSS or JavaScript resources that contain the bootstrap[.min.].js|.css. Also, it looks like 379 pages link to CSS resources that contain the font-awesome[.min.].css.

I would have expected more.

WordPress

1,866,241 pages, from the total that we analyzed, contain <meta name="generator" content="*WordPress*">. We can only assume there are more that use WordPress, but some chose to remove this meta info from their sources.

.clearfix VS .clear VS .cf

There are many naming styles for this well-known CSS utility that help clearing the floats. Here’s the breakdown:

Selector Count
.clearfix 19,573,521
.clear 10,925,887
.cf 1,102,698

Favicon

Modern browsers fetch /favicon.ico automatically and asynchronously. So don’t manually specify its root location, just place it in there. Unless, for some reasons you prefer a different location for it.

It looks like 354,024 publishers still link the /favicon.ico in the head.

Void elements

To close or not to close the void elements, that is the question. Although fine with HTML either way, it is recommended to not close the void elements. At least for the sake of brevity.

Element Count
<img/> 121,463,561
<br/> 67,595,285
<link/> 61,746,984
<meta/> 46,688,572
<br> 34,492,680
<input/> 27,845,389
<img> 17,844,667
<meta> 15,133,457
<link> 11,740,839
<input> 7,231,827
<hr/> 2,610,890
<hr> 1,690,891
<param/> 1,390,339
<area/> 1,336,974
<area> 1,025,183
<param> 698,611
<source/> 435,877
<base/> 389,717
<embed/> 304,954
<source> 286,380
<wbr> 237,606
<col/> 151,757
<col> 145,434
<base> 105,688
<wbr/> 77,922
<embed> 56,610
<track/> 376
<track> 310
<keygen/> 1
<keygen>

tabindex

On hijacking the tab order, when using tabindex to solve some disconnected UI elements, usually that only pushes the issue up to the document level.

The common advice is to use it with caution. We did notice that 552,109 HTML elements are using the tabindex attribute to override the defaults when navigating with a keyboard.

Missing alt for images

This eternal SEO and accessibility issue still seems to be pretty common after analyzing this set of data. From the total of 139,308,228 images, almost half are missing the alt attribute or use it with a blank value.

Element Count
img 139,308,228
img alt="*" 73,430,818
img alt="" 32,603,650
img w/ missing alt 33,273,760

Custom elements

Excluding the Web Component tags, here is a list of made up tags or custom elements, different to MDN’s HTML element reference.

Element Count
<o:p> 808,253
<g:plusone> 273,166
<fb:like> 111,806
<asp:label> 76,501
<inline> 53,026
<noindex> 51,604
<icon> 42,703
<block> 34,167
<red> 33,424
<ss> 27,451

We did find 21,403 h7 elements too.

A11Y

First rule of ARIA use:

If you can use a native HTML element [HTML51] or attribute with the semantics and behaviour you require already built in, instead of re-purposing an element and adding an ARIA role, state or property to make it accessible, then do so.

Landmark roles

ARIA Landmark Roles help users using assistive technology devices to navigate your site.

You might have seen this warning message when validating a document: “The banner role is unnecessary for element header”. This happens because browsers like iOS Safari do not currently support the above implicit mappings and for now it’s a good practice to keep adding these landmark roles and avoid the HTML validation warnings.

Regarding the HTML5 implicit mappings, here’s the stats:

Element Count
<nav role=navigation> 1,144,750
<header role=banner> 675,970
<footer role=contentinfo> 613,454
<main role=main> 236,484
<article role=article> 129,845
<aside role=complementary> 105,627
<section role=region> 4,326

autoplay

Video and audio autoplay is considered a bad practice, not only for accessibility, but also for usability.

So, don’t auto-play and it will please all of your users.

Check out below the findings from the total of 108,321 video and 64,212 audio elements.

Element Count
<video autoplay> 31,653
<video autoplay=true> 5,601
<audio autoplay> 2,595
<audio autoplay=true> 339
<video autoplay=false> 79
<audio autoplay=false> 22

maximum-scale

Maximum-scale define maximum zoom and when set like maximum-scale=1 it won’t allow the users to zoom your page. You shouldn’t do that, since zooming is an important accessibility feature that is used by a lot of people because it provides a better experience by meeting users’ needs.

Warning from HTML 5.2 Editor’s Draft, 4 October 2016:

Authors should not suppress or limit the ability of users to resize a document, as this causes accessibility and usability issues.

However, we did find 1,047,294 websites using maximum-scale=1 and 87,169 websites with a user-scalable=no value set. At the same time, 326,658 pages are using both maximum-scale=1 and user-scalable=no.

role=button

Setting role=button for a button is allowed but not recommended as the button already has role=button as default implicit ARIA semantic. Still, we did find 26,360 button elements having set a role=button.

Here’s a breakdown on other notable elements, whose behavior was overridden by role=button:

Element Count
<a role=button> 577,905
<div role=button> 85,565
<span role=button> 21,466
<input role=button> 8,286

On making clickable things correctly, MDN sums it up:

Be careful when marking up links with the button role. Buttons are expected to be triggered using the Space key, while links are expected to be triggered using the Enter key. In other words, when links are used to behave like buttons, adding role=”button” alone is not sufficient. It will also be necessary to add a key event handler that listens for the Space key in order to be consistent with native buttons.

SVG

There are several ways of including SVG in HTML, we sum them up and found a total of 5,610,764 SVG references.

How to use SVG %
Inline SVG code within HTML 97.05%
Using SVG as an <img> 2.88%
Using SVG as an <object> 0.05%
Using SVG as an <embed> 0.02%
Using SVG as an <iframe>

The object, iframe and embed methods usage is under 1%.

data-*=svg

There are 17,920 elements whose data-* attribute value contains the string svg. Most of the elements are <svg> or <img>.

Top 5 data-* values:

  1. http://www.w3.org/2000/svg – 471
  2. hg-svg – 127
  3. svg-siteline-facebook – 114
  4. icon-facebook.svg – 95
  5. twitter.svg – 95

id*=svg

There are 141,813 elements whose id attribute value contains the string “svg”. Most of the elements are <svg> or its inner elements.

Top 5 id values:

  1. emotion-header-title-svg – 16,281
  2. cicLeftBtnSVG – 5,793
  3. cicPauseBtnSVG – 5,793
  4. cicPlayBtnSVG – 5,793
  5. cicRightBtnSVG – 5,793

class*=svg

There are 329,004 elements whose class attribute value contains the string “svg”. Most of the elements are <svg>, <i>, <img> or inner elements.

Top 10 class values:

  1. sqs-svg-icon--social – 58,501
  2. nav_svg – 29,826
  3. svg – 28,604
  4. mk-svg-icon – 24,193
  5. svg-icon – 12,255
  6. icon_svg – 7,956
  7. ico ico-svg ico-40-svg – 3,980
  8. svg temp_no_id temp_no_img_src – 3,794
  9. svgIcon-use – 3,127
  10. svg temp_no_img_src – 3,017

Regarding the above top, maybe it’s worth mentioning that sqs-svg-icon--social is a (BEM-like) naming convention used by Squarespace website templates.

currentColor

There are 868,194 SVG inner elements that contain the value currentColor, mainly for the fill or stroke attributes.

Top 10 SVG elements

  1. <symbol> – 845,626
  2. <path> – 12,834
  3. <g> – 6,073
  4. <path> – 3,207
  5. <circle> – 1,885
  6. <svg> – 1,061
  7. <polygon> – 575
  8. <rect> – 480
  9. <line> – 412
  10. <use> – 206

SVG as background-image (The journey!)

To figure out if an element used SVG for a background-image, things were more complicated. Most of our data only used the HTML documents, but we worked out a solution to get the active stylesheets.

From the total of 6,359,031 domains we were able to gather data from, 84.5% (5,371,806) are using HTML elements with CSS background images, whilst only 1.2% (82,008) domains were using at least one SVG background image.

Also, from the total of 92,388,615 HTML elements with CSS background images, 0.5% (439,447) of them are using a SVG background image.

The process

We went through all of the HTML files and transformed local/relative CSS file references into absolute ones, e.g. <link rel="stylesheet" href="style.css"> became <link rel="stylesheet" href="http://www.domain.com/style.css">.

This took some time, since we sampled a couple of the results from our first runs, found inconsistencies with the results and had to restart the process. With a zipped file size of 65GB (and unzipped 323GB), it wasn’t a surprise why processing needed a couple of days to produce the above set of results.

Trying and aborting PhantomJS

Since background images can be applied via CSS, we needed something to render the DOM and apply styles to it. We thought of a tool we were very familiar with: PhantomJS. We ran a couple of tests with actual pages and saw that everything seemed to work properly. We then built our Java client to interface with the PhantomJS webserver: starting, opening pages, extracting output, handling responses, saving results and then cleaning up, but ran into disastrous performance results when trying to use and scale the rendering process on even one machine.

Rendering one HTML file would take anything from a couple of seconds to a couple of minutes and we had no way of knowing what PhantomJS was doing. This, coupled with the fact that the resources usage goes up exponentially the larger the DOM is, caused us to ditch it and look for alternatives.

Better luck with Selenium

As luck would have it, a colleague was experimenting with Selenium on top of headless Chrome. Since he had encouraging results in all areas where PhantomJS was lacking, we thought about leaving the Java-do-it-all comfort zone and delegating stuff to other tools if needed. The test results were very promising – headless Chrome looked like it suited our needs marvelously: super fast startup time, great rendering time, and full control over stopping a process.

The Selenium web driver would actually close the binary, as opposed to us sending an exit command to PhantomJS and hoping it wasn’t in 100% load so it would actually process it. This allowed us to control each process individually, without having to use killall every couple of minutes and stopping all processes in case just one of them went rogue and throttled the CPU.

The only problem with this approach was that the JavaScript could no longer be contained in a single, standalone JS file we’d pass onto the PhantomJS executable, but had to be included inline in the actual HTML files. Here’s a simplified version of the script we used, relying on the Window.getComputedStyle() method:

let backgroundImages = [],
    allElem = document.querySelectorAll("*"),
    allElemLength = allElem.length;

for (let i = 0; i < allElemLength; i++) {
  let style = window.getComputedStyle(allElem[i], false),
  backgroundImage = style.backgroundImage.slice(4, -1);
  backgroundImages.push(backgroundImage);
}

Saving data would be done by calling a simple PHP script. We ran a couple of larger-scale tests to validate our choice and everything performed flawlessly, so we went on with setting up a scalable environment.

We processed all HTML files (again) and injected the above JavaScript snippet. The next challenge was uploading everything to Amazon. S3Browser, which we use for “casual” listing and downloading/uploading, didn’t seem fast enough for this job (not the free version, at least). So, we looked for an alternative and came across s3-parallel-put.

We set it up on a local Linux machine, moved over the SSD and had 65GB worth of zipped text data uploaded in no time. It crippled our machine and the local Jenkins server that was running on it – until we upgraded the old Q9550 CPU :).

The problems showed up when starting to scale up. We saw that our single web server would become overwhelmed and stop saving results, even though the Selenium driver was reporting the page had rendered successfully. This also meant many of our queue messages would be wasted (consumed and deleted from the queue), without producing any results.

We thus decided to have a more scrutinized system for keeping track of processed/unprocessed files by using Redis: each time we’d start processing a file, we’d insert the domain name into a Redis set. Each time we’d process a file (our PHP script would be called), we’d insert the domain name into another Redis set. The point was to keep the difference between the two to a minimum (anything over a certain value would mean something wasn’t working properly) and to make retrying easy if it was ever going to be needed.

Hardware setup

For our hardware setup, we started by running 10 threads * 1 Chrome instance each on 10 Amazon c4.large machines, served by one Apache webserver running on a m3.medium initially doing a very lousy job. After toying with Apache’s settings, we scaled everything up gradually and got to 40 c4.large machines being served by Apache webservers running on 4 m3.medium machines behind a load balancer. Our Redis instance was serving all 10 threads * 40 machines * 3-4 requests per 5-20 seconds off a r3.large machine. So, that’s about 60-320 requests per second.

On costs, it’s pretty hard to give a total amount of money spent or CPU-time, since we ran into many issues before having a fully functional and stable ecosystem. Ideally, a single machine would need about 45 seconds for processing 100 files: downloading, unzipping, rendering and cleaning up.

Q&A / Follow up

Why so many tbody elements?

For the above new data, we did perform another full scan for the 8 million documents and also fixed a parsing sanitization issue where the jsoup parser was adding the tbody element automatically for all the tables. This is the answer to the question asked by some of you in the comments: “Why so many tbody elements?”.

As a consequence, the number of elements used on the most pages is now 25, tbody stats being now lessened.

body at 99%?

A little refresher: according to the specifications, omitting the body is fine: Start tag: optional, End tag: optional.

So, one of the most surprising results number, based on your comments, was the missing 1% of body elements. I guess I owe you an answer, for that I went a bit further and ran the parser again to get some insights:

  • As some of you already guessed, most of the pages are missing the body tag due to the high frameset usage.
  • The client-side redirect method using meta http-equiv="refresh", followed by no body content is another reason.
  • Pretty disappointing to see that, among the pages expected to rank high on Google, there are lots of them that use the rough JavaScript window.location solution in order to redirect people to other domains. Again, these kinds of pages are not including the body section at all.
  • Some of the pages marked with missing body were completely broken due to PHP errors for example. Some were omitting the starting body tag but not the end tag.

Want more?

Have an element/attribute you’d want to see the numbers for? Give me a shout on Twitter or leave a comment below and we’ll figure out something!

Also, make sure you check out the full stats here.


Random Interesting Facts on HTML/SVG usage originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

]]>
https://css-tricks.com/random-interesting-facts-htmlsvg-usage/feed/ 16 247716
(To use or not-to-use) Classes https://css-tricks.com/use-not-use-classes/ Fri, 20 May 2016 12:35:56 +0000 http://css-tricks.com/?p=241910 Love me some blog debates! Papa Bear, Mama Bear, Baby Bear.

To Shared LinkPermalink on CSS-Tricks


(To use or not-to-use) Classes originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

]]>
Love me some blog debates! Papa Bear, Mama Bear, Baby Bear.

To Shared LinkPermalink on CSS-Tricks


(To use or not-to-use) Classes originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

]]>
241910
Templates are easy to change. Content usually isn’t. https://css-tricks.com/class-up-templates-not-content/ https://css-tricks.com/class-up-templates-not-content/#comments Fri, 04 Mar 2016 15:58:10 +0000 http://css-tricks.com/?p=238670 There are two kinds of HTML:

  1. HTML that makes up templates
  2. HTML that is content

I feel like some discussions about HTML are clouded by not making this distinction.

For example, I like Harry Roberts approach to classes on header


Templates are easy to change. Content usually isn’t. originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

]]>
There are two kinds of HTML:

  1. HTML that makes up templates
  2. HTML that is content

I feel like some discussions about HTML are clouded by not making this distinction.

For example, I like Harry Roberts approach to classes on header elements. Harry was talking about “apps”, so perhaps it was implied, but let’s put a point on it: those classes are for headers in HTML templates, not HTML content.

(This is just a little random example I thought of, this concept applies to broadly.)

WordPress being used here as an example, but any system of content and templates applies.

If it’s a chunk of HTML that goes in a database, it’s content

It’s not impossible to change content, but it’s likely much harder and more dangerous.

Websites can last a long time. Content tends to grow and grow. For instance on CSS-Tricks there are 2,260 Posts and 1,369 Pages. Over the years I’ve sprinkled in classes here and there to do certain stylistic things and over time I always regret it.

Why the regret over classes in content?

Maybe you’ll find you named the class wrong and start hating it.
Maybe you’ll change the class to something you like better.
Maybe you’ll stop using that class.
Maybe you’ll forget that class even existed, and not address it in a redesign.
Maybe you’ll use that old name again, only it does something new now and messes up old content.

Those are just a few possibilities.

But the pain comes when you decide you’d like to “fix” old content. What do you do? Find all old content that uses those classes and clean them out? Try to run a database query to strip classes? Tedious or dangerous work.

Content in Markdown Helps

Markdown is comfortable to write in, once you get the hang of it. But its best feature is that it doesn’t put, or even allow you to put, classes on any of the HTML it generates. Unless you write HTML directly in the Markdown, which always feels a little dirty (as it should).

Styling Content

How do you style the content then, without classes? Hopefully content is fairly consistently styled. You can scope styles to content.

.content h2 { }
.content figure { }
.content table { } 

Say you absolutely need different variations of things in content

If you have to break the “rule” of no classes in content, perhaps you can still apply some smarts to how you handle it.

  • If you’re using a CMS, perhaps it has some default classes that are likely to stick around. For example, WordPress spits out classes like align-right on images if you choose it.
  • Can you insert the content in such a way it isn’t just a part of a “blob” of content? Like a custom field?
  • Can you insert the content via an abstraction? In WordPress parlinance, like a shortcode? [table data="foo" style="bar"]
  • If you have to use a class, can you name it in a way that feels forever-proof-ish?

Templates are easy to change

That’s what they are for! You probably have a handful of templates compared to how many pages they generate. Change them, and all content using them is updated. The HTML that surrounds the content will all have whatever useful classes you use.

So

  • HTML for templates = classes, yay!
  • HTML for content = keep it raw!

Templates are easy to change. Content usually isn’t. originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

]]>
https://css-tricks.com/class-up-templates-not-content/feed/ 19 238670