Menu Search
Jump to the content X X
Smashing Conf Barcelona 2016

We use ad-blockers as well, you know. We gotta keep those servers running though. Did you know that we publish useful books and run friendly conferences — crafted for pros like yourself? E.g. upcoming SmashingConf Barcelona, dedicated to smart front-end techniques and design patterns.

HTML5 Semantics

Much of the excitement we’ve seen so far about HTML5 has been for the new APIs: local storage, application cache, Web workers, 2-D drawing and the like. But let’s not overlook that HTML5 brings us 30 new elements to mark up documents and applications, boosting the total number of elements available to us to over 100.

Sexy yet hollow demos1 aside, even the most JavaScript-astic Web 2.0-alicious application will likely have textual content that needs to be marked up sensibly, so let’s look at some of the new elements to make sure that your next project is as semantic as it is interactive.

To keep this article from turning into a book, we won’t look at each in depth. Instead, this is a taster menu: you can see what’s available, and there are links that I’ve vetted for when you want to learn more.

Along the way, we’ll see that HTML5 semantics are carefully designed to extend the current capabilities of HTML, while always enabling users of older browsers to access the content. We’ll also see that semantic markup is not “nice to have,” but is rather a cornerstone of Web development, because it is what enhances accessibility, searchability, internationalization and interoperability.

A human language like English, with its vocabulary of a million words, can’t express every nuance of thought unambiguously, so with only 100 or so words that we can use in HTML, there will be situations when it’s not clear-cut which element to use for which piece of content. In that case, choose one; be consistent across the site.

Some Presentational Elements Are Gone Link

Purely presentational elements such as center, font and big are now obsolete. Their role has been perfectly usurped by Cascading Style Sheets. Now, this doesn’t mean you have to rush out and recode all of those ancient pages; HTML5 makes them obsolete for authors, but because HTML5 strives not to break the Web, browsers will still render those cobwebbed legacy pages.

For the same reason, presentational attributes have been removed from current elements; for example, align on img, table, background on body, and bgcolor on table.

The evil frame element is absent in HTML5. Frames caused usability and accessibility nasties. If you get the urge to use them, use an older DOCTYPE so that your pages still validate.

Beyond this short overview, see the W3C’s exhaustive list of removed elements and attributes2.

Some Presentational Elements Have Been Redefined To Be Semantic Link

Not all presentational elements have been taken out and shot. Some have undergone an extensive re-education program and emerged with shiny new semantics. For example, the small element no longer means “use a small font,” although it will display that way in browser style sheets. Now it represents side comments, such as small print:

Small print typically features disclaimers, caveats, legal restrictions, or copyrights. Small print is also sometimes used for attribution, or for satisfying licensing requirements.

Some of the redefinitions feel to me to be a mop-up. While I can get behind <b> for drawing attention to product names, keywords and so forth, without any special emphasis implied, specifying the semantics for marking up ship names (<i>, if you’re so inclined) feels weirdly precise. But I get seasick, and your nautical mileage may vary. With similar niche precision:

The u element [now] represents a span of text with an unarticulated, though explicitly rendered, non-textual annotation, such as labeling the text as being a proper name in Chinese text (a Chinese proper name mark), or labeling the text as being misspelt.

You can read more about changed elements and attributes3 on the W3C website.

Sexy New Semantics Link

We all know about video4 and audio. And canvas is particularly popular at the moment because it allows for 3-D graphics using webGL, so game designers can port their products to the Web. Like good ol’ img, these semantics are embedded content, because they drag in content from another source — either a file, a data URI or JavaScript.

Unlike img, however, they have opening and closing tags, allowing for fallbacks. Therefore, browsers that don’t support the new semantics can be fed some content: an image could be the fallback for a canvas, for example, or a Flash movie could be the fallback for video, a technique called “video for everybody5.”

The source and track elements are empty elements (with no closing tags) that are children of video or audio.

The source element gets past the codec Tower of Babel that we have. Each element points to a different source file (WebM, MP4, Ogg Theora), and the browser will play the first one it knows how to deal with:

<audio controls>
  <source src=bieber.ogg type=audio/ogg>
  <source src=bieber.mp3 type=audio/mp3>
    <!-- fallback content: -->
    Download <a href=bieber.ogg>Ogg</a> or <a href=bieber.mp3>MP3</a> formats.
</audio>

In this example, Opera, Firefox and Chrome will download the Ogg version of Master Bieber’s latest toe-tappin’ masterpiece, while Safari and IE will grab the MP3 version. Chrome can play both Ogg and MP3, but browsers will download the first source file that they understand. The fallback content between the opening and closing tags is a link to download the content to the desktop and play it via a separate media player, and it is only shown in browsers that can’t play native multimedia.

For video, you could use an embedded Flash movie hosted on YouTube:

<video controls>
  <source src=best-video-ever.webm type=video/webm>
  <source src=best-video-ever.mp4 type=video/mp4>
    <!-- fallback content: -->
    <iframe width="480" height="360" 
      src="http://www.youtube.com/embed/xzMUyqmaqcw?rel=0" 
      frameborder="0" allowfullscreen>
    </iframe>
</video>

This way, users of older browsers, such as IE 6-8, will see a YouTube movie (as long as they have the Flash Player), so they will at least be able to see the video, while users with modern browsers will get the full native-video experience. Everyone gets the content, then, which is what your website is there for, after all.

The track element is a newer addition to the HTML5 family and is being implemented by Opera, Chrome and IE at the moment. It points to a subtitle file that contains text and timing information. When implemented, it synchronizes captions with the media file to enable on-demand subtitling and captioning; useful not only for viewers who are hard of hearing, but also for those who do not speak the language used in the audio or video file.

Semantics For Internationalization Link

Less woo! than the semantics for multimedia and games are the semantics for internationalization. It may surprise the cool kids in Silicon Valley to learn that a worldwide Web of people use languages other than English and even use different writing systems.

Languages such as Arabic and Hebrew are written right to left, unlike European languages, which are written left to right. On pages that use only one writing system, this doesn’t present a problem, but on pages with bi-directional (“bidi”) writing, browsers have to decide where to put punctuation, bullets, numbers and the like. Browsers usually do a pretty good job using the Unicode bidirectional algorithm, but it gets it wrong in some cases, which can seriously dent the comprehensibility of content.

HTML5 gives us a bdi element, which enables authors to override the Unicode bidirectional algorithm and make their text more comprehensible. For a further description of the problem and to see how bdi solves it, see “HTML5’s New bdi Element6” by Richard Ishida7, the W3C’s internationalization activity lead.

Some languages have scripts that are not alphabetic at all, but that express an idea rather than a sound. Occasionally, an author will have to assist readers with pronunciation for especially rare or awkward characters, usually by providing an alternate script in a small font above the relevant character. In print, this was traditionally done with a very small 5-point font called “ruby,” and HTML5 gives us three new elements for marking up ruby text: ruby, rt and rp.

For more information, see “The HTML5 ruby Element in Words of One Syllable or Less8” by Daniel Davis.

Structural Semantics Link

Most people are aware that HTML5 gives us many new elements to describe parts of a Web page, such as header, footer, nav, section, article, aside and so on. These exist because we Web developers actually wanted such semantics. How did the authors of the HTML5 specification know this? Because in 2005 Google analyzed 1 billion pages9 to see what authors were using as class names on divs and other elements. More recently, in 2008, Opera MAMA analyzed 3 million URLs to see the top class names10 and top IDs11 used in the wild. These analyses revealed that authors wanted to mark up these areas of the page but had no elements to do so, other than the humble and generic div, to which they then added descriptive classes and IDs.

(HTML5 Doctor has many articles about HTML5 semantics12, so we won’t bloat this article by going in depth here. Warning: some were written by me.)

The new semantics were built to degrade gracefully. For example, consider what the specification has to say about the new figure element:

The figure element represents some flow content, optionally with a caption, that is self-contained and is typically referenced as a single unit from the main flow of the document.

The element can thus be used to annotate illustrations, diagrams, photos, code listings, etc…

This isn’t a new idea. HTML3 proposed a fig element13 (which never made it into the final HTML 3.2 specification). It looked like this:

<FIG SRC="nicodamus.jpeg">
   <CAPTION>Ground dweller: <I>Nicodamus bicolor</I> builds silk snares</CAPTION>
   <P>A small hairy spider.
   <CREDIT>J. A. L. Cooke/OSF</CREDIT></P>
</FIG>

There’s a big problem with this. In browsers that do not support fig (and none do), the image wouldn’t be displayed because the fig element would be completely ignored. The contents of the credit element would be displayed, because it’s just text. So you’d get a credit with no image on older browsers.

In HTML5, you would code the same example like so:

<figure> 
<img src="nicodamus.jpeg"> 
   <figcaption>
      <p>Ground dweller: <i>Nicodamus bicolor</i> builds silk snares.</p>
      <p>A small hairy spider.
      <small>J. A. L. Cooke/OSF</small&gt</p>
   </figcaption>
</figure>

Unlike the aborted HTML3 syntax, the HTML5 version is backwards-compatible: a browser that doesn’t “know” about the figure element will still show the img and the text inside figcaption (as the HTML3 credit element would similarly display its content). Note that we’re using the redefined small element, instead of minting a new credit element. Remember that “Small print is also sometimes used for attribution.”

HTML5 also gives us a new figcaption element. Originally, the specification’s authors tried to reuse caption, as suggested in HTML3, but there were legacy problems, because caption had previously only been a child of table.

One of the design principles on which HTML5 is based14 is that new features should degrade gracefully15. When they can’t, the language allows for fallback content. It tries to reuse elements rather than mint new ones — but it’s a pragmatic language: when minting something new is necessary, it does so.

Interactive Semantics Link

The structural elements of HTML5 currently don’t do much in visual browsers, although software that sits on top of browsers (such as screen readers) are starting to use them (see “HTML5, ARIA Roles, and Screen Readers in March 201116“ and “JAWS, IE and Headings in HTML517.”)

Other elements do have a visual effect. The details element18, for example, is a groovy interactive element that functions as “a disclosure widget from which the user can obtain additional information or controls.”

Most browsers will implement it as an “expando box”: when the user clicks on some browser-generated icon (such as a triangle or downwards-pointing arrow) or the word “Details” (which can be replaced by the author’s own rubric in a child summary), the element will slide open, revealing its details within. The details could be a full description of an image or graph, a description of a complex table, advanced options for a search form, or just about anything else. This is a common need on the Web today, now made native and obviating the need for custom JavaScript.

Most of us have seen HTML5’s new form semantics19. Most of these are attributes of the input element, thereby ensuring graceful degradation to <input type=text> in older browsers. New elements include datalist20, output, progress and meter21.

Do We Have The Right Semantics? Link

So, we have many new semantics, but are they the right ones? After all, the Google research on which they were based was conducted in 2005 — quite some time ago! Perhaps the semantics are already somewhat behind the times? Many have noted that they’re document-centric rather than application-centric. Do we need more application-centered semantics, such as a login or share element, or some kind of modal element for modal dialogue boxes?

I don’t know; I’m not an app developer. But at least HTML is a “living standard,” and so these can be added if strong enough use cases are presented to the Working Group.

I think most coders would welcome a new way to embed images that respond to the device’s context. Borrowing from the video element, which displays source video according to what media queries instruct, I can imagine a new element such as picture:

<picture alt="angry pirate">
   <source src=hires.png media="min-width:800px">
   <source src=midres.png media="min-width:480px">
   <source src=lores.png>
      <!-- fallback for browsers without support -->
      <img src=midres.png alt="angry pirate">
</picture>

This would pull in hires.png for widescreen devices, midres.png for devices between 480 and 800 pixels wide, and lores.png for everything else, thereby rendering moot the question that designers currently ask themselves, “Do I make every browser download a high-resolution image and then squash it down for small screens, thus wasting bandwidth, or do I send a low-resolution image to every browser and scale it up for big screens, potentially sacrificing quality?”

Taking a leaf from the other popular semantics we’ve seen, there would be a fallback in the middle — in this case, a conventional img element — so everyone would get the right content.

Sending the right-sized image to devices without wasting bandwidth is one of the knottiest problems in cross-device and responsive design at the moment. Perhaps we’ll see a solution to this in HTML6. At the moment, the best solutions, which include Matt Wilcox’s Adaptive Images22 and Filament Group’s Responsive Images23, require JavaScript and tweaks to the server’s htaccess file. The worst solutions require old-fashioned techniques, such as browser-sniffing24, now rebranded as “device detection” but still the same old user-agent string-pattern matching, which is hilariously fragile25, not future-proof or scalable, and straight out of the days of “Best viewed in Netscape Navigator at 800 × 600” badges on websites.

When, Where, Who? Link

A lot of data depends on three pieces of information: when, where and who?

HTML5 Semantics

HTML5 has a time element (which has been a bit of a battleground26 lately). This enables you to annotate a human-readable date with an unambiguous machine-readable one. It doesn’t matter what goes between the tags, because that’s the content for people to read. So, you could use either of the following:

<time datetime="1982-07-18">The day the woman I love was born</time>

<time datetime="1982-07-18">Priyanka Chopra’s birthday</time>

Whichever you choose, the machine would still know the date you mean because of the datetime attribute, formatted as YYYY-MM-DD. If you wanted to add a time, you could: separate the time from the date with a T, and then put the time in 24-hour format, terminated by a Z, along with any time-zone offset. So, 2011-11-13T20:00Z would be 8:00 pm on 13 November 2011 UTC, while 2011-11-13T23:26.083Z-05.00 would be 23:26 pm and 83 milliseconds in the time zone lying 5 hours before UTC. A Sri Lankan-localised browser could use this information to automatically convert dates into Buddhist calendar. Search engines could use timestamps to help evaluate “freshness”27.

It’s perhaps surprising that, even though geolocation is so prevalent now, we don’t have a location element that simply takes three attributes: latitude, longitude and (optionally) altitude. It would be great to be able to write the following:

<location lat=51.502064 long=-0.131981>London SW1A 4WW</location>

The browser would then offer to show you a map or give you directions from the current GPS location or any other location-based service.

(Since I gave the talk that this article is based on, Ian Hickson, the HTML5 editor, said that he expects to add a new <geo> element28. If I could choose, I’d prefer place, so I could wear a T-shirt with the slogan “I’ve got the time if you’ve got the place“.)

HTML3 had a person element29, “used for names of people to allow these to be extracted automatically by indexing programs,” but it was never implemented. In HTML4, the cite element30 could be used to wrap names of people, but this has been removed in HTML5 — controversially (see “Incite a Riot31” by Jeremy Keith). In HTML5, then, we’re left with no way to unambiguously denote a person. People’s names are, however, a hard problem to solve. Whereas times and dates have well-known standardized ISO formats (YYYY-MM-DD and HH:MM:SS.mmm, respectively), and location is always latitude, longitude and altitude, personal names are harder to break down into useful parts: there are Russian patronymics, Indonesian single-word names, multiple family names, and Thai nicknames to consider. (See Richard Ishida’s excellent article “Personal Names Around the World32” for more information and discussion.)

The new data element, which replaces time33, has a value attribute that passes machine-readable information, but it has no required or implied format, so there is no way for a browser or search engine to know, for example, whether 1936-10-19 is a date, a part number or a postal code.

Microdata Link

HTML5, like HTML4, is extensible (but not in the oh-so-dirty eXtensibility way of XML formats, so loathed by the Working Group). You can use the tried and tested microformats, which use HTML classes, or the full RDFa specification, which doesn’t validate in HTML4 or HTML5. Because RDFa was considered to be too hard for authors to write (Google has conducted research that finds that authors make 30% more mistakes with RDFa than with other formats), HTML5 specifies microdata, a mechanism for adding common semantics via agreed-upon markup patterns. HTML5 Doctor has more information on HTML5 microdata34, and Opera 11.6035 supports the Microdata DOM API.

Like microformats and RDFa, the extra semantics added to the markup make sense only if you have a cheat sheet that tells you what each piece means. This means that the data has to point to a vocabulary that tells any crawler how to interpret the lump of data it finds. For microdata, there is the newly established Schema.org36, which is “a collection of schemas, i.e. HTML tags, that webmasters can use to mark up their pages in ways recognized by major search providers.”

Do Semantics Matter Anyway? Link

Now that more and more markup is generated by JavaScript, some people are tempted to think that semantics don’t matter. We see various products marketed as HTML5 which simply make divs fly around the screen with JavaScript  —  simple DHTML techniques unchanged from 10 years ago.

I’ve even seen some Web pages with no markup at all. Some frameworks emit skeletal HTML with empty body tags and inject all the HTML with script. If you’re squirting some minified JavaScript down the wire, with no markup at all, you’re closer to Flash than you are to the Web.

In the same way that 47 minutes is (apparently) too long to to struggle making a CSS layout, at which point you should just give up and use tables, some people suggest that thinking about which element to use is a waste of time. “There are two types of developers: those who argue about div’s not being semantic and those who create epic shit” writes Thomas Fuchs, as if the two activities were mutually exclusive.

A better argument is that no software cares about or consumes semantics anyway, so why bother? This isn’t true (work is underway already to map assistive technologies to new semantics37), but even if it were true, it ignores that this is a chicken-and-egg argument. It assumes that no new search engine will ever come to the market and be able to use new elements, or that browsers will never release new versions that can make use of these semantics, and that developers will write no new extensions  —  in short, it assumes that the evolution of the Web is complete.

Semantics do matter. Semantics communicate meaning, and once that is established, machines can do something meaningful with that data, without having to develop and use algorithms to guess. A browser extension might allow a user to jump straight to the nav with a single keystroke. It can do this because it looks for nav rather than having to employ heuristics to find a div with an id or class that would suggest it’s being used as navigation (assuming the author decided to use something sensible like nav, navigation, sidebar, or menu  —  and a restaurant site with a div called “menu” might be a list of foods rather than other pages…ah, the ambiguity of natural language). A crawler might dynamically assemble articles on a timeline. There are many more possibilities than my meagre imagination can dream up.

The Web is based on simple technologies, mashed up together to bring surprising results  —  results which have certainly surpassed the inventors’ original intents or expectations. The Web will continue to do so. What makes the Web so great, so flexible and so powerful is the fact that content is in open formats that can be parsed and mashed up in new and surprising ways.

These can happen if the content is marked up for meaning by the author  —  and if the language has the right markup elements for authors to use as a vocabulary. HTML5 extends our vocabulary. We’ll need more words  —  and those will come about with HTML6 etc.

If, like me, you believe the Web to be a system that works across browsers, across operating systems, across devices, across languages, that is View-sourcable, hackable, mash-uppable, accessible, indexable, reusable, then we need to ensure that we use the small number of semantic tools at our disposal properly, and we’ll all benefit.

(This article is based on a talk I gave at the Fronteers Conference38.)

About the Author Link

Introducing HTML5

Bruce39 evangelizes Open Web Standards for Opera40. He wrote the book Introducing HTML5 together with Remy Sharp. The book points out the good and bad parts of HTML5 specifications and shows you how to use the language as well as some areas of spec will be discussed theoretically as they’re not yet implemented anywhere. It’s the first full-length book on HTML5 (New Riders, appearing in the 2nd edition).

(al) (il) (vf)

Footnotes Link

  1. 1 http://www.brucelawson.co.uk/2011/html5-and-hollow-demos/
  2. 2 http://www.w3.org/TR/html5-diff/#absent-elements
  3. 3 http://www.w3.org/TR/html5-diff/#changed-elements
  4. 4 http://dev.opera.com/articles/view/introduction-html5-video/
  5. 5 http://camendesign.com/code/video_for_everybody
  6. 6 http://rishida.net/blog/?p=564
  7. 7 http://twitter.com/r12a
  8. 8 http://my.opera.com/tagawa/blog/the-html5-ruby-element-in-words-of-one-syllable-or-less
  9. 9 http://code.google.com/webstats/
  10. 10 http://devfiles.myopera.com/articles/572/classlist-url.htm
  11. 11 http://devfiles.myopera.com/articles/572/idlist-url.htm
  12. 12 http://html5doctor.com/article-archive/
  13. 13 http://www.w3.org/MarkUp/html3/figures
  14. 14 http://www.w3.org/TR/html-design-principles/
  15. 15 http://www.w3.org/TR/html-design-principles/#degrade-gracefully
  16. 16 http://www.accessibleculture.org/articles/2011/04/html5-aria-2011/
  17. 17 http://www.accessibleculture.org/articles/2011/10/jaws-ie-and-headings-in-html5/
  18. 18 http://html5doctor.com/the-details-and-summary-elements/
  19. 19 http://dev.opera.com/articles/view/new-form-features-in-html5/
  20. 20 http://adactio.com/journal/4272/
  21. 21 http://html5doctor.com/measure-up-with-the-meter-tag/
  22. 22 http://adaptive-images.com/
  23. 23 https://github.com/filamentgroup/Responsive-Images
  24. 24 http://farukat.es/journal/2011/02/499-lest-we-forget-or-how-i-learned-whats-so-bad-about-browser-sniffing
  25. 25 http://webaim.org/blog/user-agent-string-history/
  26. 26 http://www.brucelawson.co.uk/2011/goodbye-html5-time-hello-data/
  27. 27 http://www.googleblog.blogspot.com/2011/11/giving-you-fresher-more-recent-search.html
  28. 28 http://www.netmagazine.com/news/ian-hickson-responds-over-html5-getting-time-element-back-111552
  29. 29 http://www.w3.org/MarkUp/html3/logical.html
  30. 30 http://www.w3.org/TR/html401/struct/text.html#h-9.2.1
  31. 31 http://www.24ways.org/2009/incite-a-riot
  32. 32 http://www.w3.org/International/questions/qa-personal-names
  33. 33 http://www.html5doctor.com/time-and-data-element/
  34. 34 http://www.html5doctor.com/tag/microdata/
  35. 35 http://www.opera.com/next
  36. 36 http://www.schema.org/
  37. 37 http://www.paciellogroup.com/blog/2011/11/html5-semantics-and-accessibility/
  38. 38 http://fronteers.nl/congres/2011/sessions/html5-semantics-bruce-lawson
  39. 39 http://twitter.com/brucel
  40. 40 http://www.opera.com/developer
SmashingConf Barcelona 2016

Hold on, Tiger! Thank you for reading the article. Did you know that we also publish printed books and run friendly conferences – crafted for pros like you? Like SmashingConf Barcelona, on October 25–26, with smart design patterns and front-end techniques.

↑ Back to top Tweet itShare on Facebook

Advertisement

  1. 1

    Oh, that imaginary picture tag is getting me all hot under the collar. I would love to see that become a reality. Haven’t had time to read this properly yet – instapapered for later though.

    -1
    • 2

      Honestly, the picture tag was a big turnoff for me. This triples the amount of work to simply embed an image, nevermind all the troubles that come with formatting that image. Margins change on smaller devices, and there is every size of handheld device so you need to make at least 3 versions of each image, likely more.

      Not ideal.

      0
      • 3

        What’s the ideal alternative then? Serve a gigantic image that works from desktop down to mobile and cripples user data allowances? Current solutions involve either JS or htaccess hacks that aren’t ideal. I’d love to do this natively with the suggested fallback. This is how audio and video tags work, where the appropriate version is served for the end context, why not do this natively with images?

        It’s difficult to discuss an imaginary tag but I’d say you don’t always have to load 3 versions every time. Perhaps some areas of a layout will only require the standard img tag if they are small enough to begin with.

        0
        • 4

          Nathan Gardner

          March 9, 2012 12:24 pm

          I wrote a PHP script that dynamically resizes the image based on the user agent. So my img src is something like image.php?f=myimage.jpg

          This way I only have 1 high res version of each image, and the image.php script generates new ones on demand and caches them for future requests.

          Also, if you want to display the same image but at different sizes on the same page (great for thumbnails), you could pass a width or height to image.php to force a size.

          0
    • 5

      Is it to late to submit that picture tag to get it in the spec, it’s just what we need to do responsive images.

      Every solution available still requires desktop browsers to download 2 images or uses and .htaccess and feels like a hack (and no one had ported them to nginx).

      Having a picture tag would allow authors who care enough to create and serve small images to mobile browsers would be able to do so, and everyone can still use the img tag.

      0
  2. 6

    I am currently reading the article… but i clicked on the link of the video(http://www.youtube.com/embed/xzMUyqmaqcw?rel=0), its crazy

    -3
  3. 7

    Bruce,

    Because pedantry is part of my make-up, I just want to point out a slightly incorrect statement you made with regard to RDFa.

    There is in fact a W3C specification which allows for the conformant use of RDFa with HTML5 – see: HTML+RDFa 1.1. Not everyone considers RDFa “too hard for authors to write”, only some folks do, and it should be pointed out that the latest version of Drupal (Content Management System) has RDFa support baked in under the hood. Small, under-visited sites such as Yahoo!/Flikr, Best-Buy and Whitehouse.gov also use RDFa today, so don’t be too quick to accept the WHATWG hype that RDFa is a lost cause – that is a biased and incomplete assessment of the real situation on the web.

    Outside of that, great recap. Semantics Matter.

    0
    • 8

      heh. Pednartry is great. So note that I said “Because RDFa was considered to be too hard for authors to write “. I don’t say that statement is true, or that I agree with it, but merely that’s the perception that led to microdata being specced.

      0
    • 9

      We’re planning on using Drupal7 to produce RDFa in output for our worldwide sites. One thing we’re discussing internally is whether to contribute to schema.org for ontology around learning classes, courses, programmes of study.
      The use of HTML5 and semantic tags to enable accessibility for screen readers is a big plus. Also the ability to mark, say, Art photos from our Arts sites to make them discoverable by Google users is a win.

      0
  4. 10

    My current dilemma is one where my supervisor believes that using HTML5 will magically promote the website higher in search results. I even pointed out to him the words of John Mu from Google in the thread “Does semantic HTML5 matter to Google yet?”, and he still thinks using HTML5 should position the website better in search results.

    I’m having a hard time explaining to him that it takes much more than HTML5 to receive a better placement in search results. Any suggestions?

    0
    • 11

      Well, unless you’re still making Flash sites, and have no organic SEO, then yeah.. switching to HTML5 will help. Otherwise, no.

      0
    • 12

      Evan 'OldWorld' Skuthorpe

      November 21, 2011 7:31 am

      A suggestion? Tell him he’s an idiot.

      0
  5. 13

    So much semantics talks these days when everyone should focus on content and content quality….

    0
    • 14

      This is a seriously great article. Semantics complement great content, so it is important to have both. For once, I don’t need to explain my point because the article has already done it for me.

      0
    • 15

      Well….

      Seeing as Semantic was introduced to classify data properly and to focus on content rather than endless similar tags that wrap everything from header to footer.

      Google is tuning their search engine in a way to make SEO obsolete, content is what is going to matter.

      0
  6. 16

    Honnestly, cookie based adaptatives images are little less evil than UA sniffing.

    It’s way better to load all img async with ajax according to device width and pixel ratio, and provide a fallback. It’s more work, but way better than playing with cookies and htaccess.

    There are lots of big problems with cookies :

    – JavaScript on the top
    – racing conditions
    – if mobile first, users on browser will see crappy img on their first visit (the most important one)
    – if not, mobile on 3G will take minutes to load your site (and leave)

    Last but not least, it assumes smaller = slower, which is false (I know, defered loading is the same, but there’s no real solution to detect bandwidth cross-browser)

    What we need is not a new img tag (well, we need it, but it’s not the most important) : what we need is a bandwidth indicator in the user agent.

    0
    • 17

      JS is not required to set the cookie in AI (although it’s recommended over the alternative ‘false image in CSS’ method).

      AI handles the no-cookie + mobile first setting (and cookie race condition) better than you think because of the fallback browser sniffing it does – which is not as nasty as you’re thinking as it only needs detect a desktop OS, and that’s simpler and more reliable than I’d expected.

      Take a look at the changelog if you want more details on what it’s doing and why :)

      AI’s not perfect, but it’s not as bad as you’re making the cookie-based technique out to be.

      We do need some ‘real’ solutions though, rather than work-arounds like AI.

      0
  7. 18

    I do belive that many designers think that having a button animated instead a flash button that will be semantic seo, well in my opinion that doesn’t matter there are many html5 sites that doesnt even have good seo content on their sites, they just look good with html5 and no flash but that doesnt going to put your site in the first places of google search [keyword]. SEO is a huge programming included in the website.

    Having a awsome design with animation in HTML5 doesn’t help you that much in SEO, it just going to be visible on mobile devices but nothing more.

    Of course you can design something awsome and have good semantic seo and good seo programming on the site but I’ve seen in many CSS Galleries that some designers say they do SEO with their sites but many of them are only 1 page with many picture and few text (but awsome animation “jquery”).

    Think about it having cool sliders (jquery) buttons or cool fonts will NOT make your site position on the first places in [search keyword] it just help your site can be visible in iPhone, iPad or Android.

    As Designers we must design and code our sites so they can be visible to all of our potential customers.

    [BTW I love HTML5 and FLASH SUX]

    Damian Rivas | hybridixstudio.com

    0
    • 19

      Stupid comment, crappy arguments and final spam at the end of the comment.

      Congratulations, you are an ignorant who believes to be wise. The next time, shut up and read a book; you need it.

      0
  8. 20

    I hate priyanka chopra :p

    0
  9. 22

    Have been building standard document style sites for years so the new HTML5 semantics are a breath of fresh air and help standardize markup. I do see the need for more tags to define application UI and functionality, would help clean the slate of heavy nested divs for the sake of visual elements. CSS3 of course will help with this, but the battle of the browsers on this front seems to never end.
    The new “data-” attribute is interesting to me, and since this attribute is custom I am sure we will start seeing trends happen especially in application development.

    0
  10. 23

    Great article! Read earlier this month that time, datetime, and pubdate were eliminated from the HTML5 spec recently. Not sure I’m excited about this, but that’s the way it is.

    0
    • 24

      The time element has returned to the HTML5 specification although the pubdate attribute is still under threat.

      0
  11. 25

    Structure != Semantics. From 1994: ‘lets nip this one in the bud before the masses get ahold of it.’

    http://1997.webhistory.org/www.lists/www-html.1994q4/0094.html

    From this article i can see they weren’t successful.

    0
  12. 27

    I think this article really useful to me! thank the writer

    0
  13. 28

    Nice article Bruce, you do a great job of covering some of the new *functionality* in HTML5, but fwiw I think the new *semantics* are a separate issue, on which I’ll rant briefly below :)

    IMO conflating semantics with functionality misses the point of the previous articles, which were more focused on the structural elements.

    On the structural elements, you say: “Most people are aware that HTML5 gives us many new elements to describe parts of a Web page, such as header, footer, nav, section, article, aside and so on. These exist because we Web developers actually wanted such semantics. How did the authors of the HTML5 specification know this? Because in 2005 Google analyzed 1 billion pages to see what authors were using as class names on divs and other elements. ”

    This is, unfortunately, a myth, and I hope we can put an end to it. Please? :) When doing research for my own HTML5 book (http://itsninja.com/html5book/) I asked Hickson about the new structural elements, and he said he (and a few others) added them *prior* to any research. As far as I could tell from the WHATWG archives, Hickson drew up the new elements on a whiteboard in 2004, without any consultation or research.

    But the research backs him up though, right?

    Well, on the face of it, you’d think so (apart from the vast absence of any classes!). But then you look at what the spec actually says (and you’d know this better than most), and it’s vastly different to what web designers and other authors actually want. For example, how is sectioning a product of the research? It’s an old concept from 1991; yet it’s the foundation of the new structural elements. How is header and footer in the spec — which are intended for any section, not specifically the “overall” page section — at all similar to how authors actually use them? (Which is far more like ARIA banner and contentinfo landmarks.) Who wanted an “aside” element for both sidebars and pull-quotes? Who has been using “article” to denote comments, or forum posts, or widgets (!) as the spec suggests?

    This is the biggest problem with HTML5’s semantics: Hickson says they’re just to make styling easier, they’re just what everyone has been doing, they’re just what the research says. But then you look at the spec, and it’s nothing like what we’ve been doing at all! I’m not surprised it’s such a flustercuck of confusion — when you tell everyone it’s what they’re already doing, give elements names which intuitively *look* like it’s what we’ve been doing, and then write them up in the spec in pretty esoteric ways which don’t reflect reality at all, you end up with a mess. And that’s what we’ve got.

    Then when it comes to the supposed benefits of what search engines or AT etc is going to do with it… well, what are they going to do with it? Do they follow the incorrect and extremely messy real world usage, or do they follow the not-at-all-followed-by-authors spec? (I also noted in a WHATWG exchange with yourself Hickson said he doesn’t think UA’s will ever do anything with them, which is maybe for the better!)

    And I know from the WHATWG archives what a mess the whole lack of “main” or “content” element is regarding what authors actually want — I quoted your WHATWG comments in my book :)

    HTML5 semantics, in terms of structural elements, are dead on arrival. No search engine here and now asked for, or needs them as currently spec’d (they’ve defined what the actually want with Schema.org), no non-HTML-book-writing HTML authors understand them correctly (ask a HTML5-aware designer what their backwards compatible document outline looks like), and no one *will* be able to use them meaningfully as per the spec because the spec has become, as Hickson often says he wants to avoid, a word of fiction in terms of real world use of these elements.

    As for nav, it’s an accessibility disaster for IE8 and below with JS disabled (according to Yahoo 2010 research 1-2% of all traffic to their sites have JS disabled: http://developer.yahoo.com/blogs/ydn/posts/2010/10/how-many-users-have-javascript-disabled/). Those users don’t get the JS fix, styling blows up, and the page now has very broken navigation. So for a theoretical future benefit we do real harm now. I’ve found it quite perplexing that the web standards community has been happy to implicitly declare JavaScript mandatory for a significant subset of users — I don’t think this issue has gotten the attention it deserves. Fortunately the ARIA landmark seems a much safer & saner approach.

    Anyway, rant over, but it’s really bugged me to see the HTML5 elements (and the story behind them) taught in a way that isn’t reflected in the spec, or in their original creation.

    I enjoyed the rest of the article (especially those ideas about baking some responsive image stuff into HTML), and if the WHATWG and W3C can sustain their marriage of convenience for the foreseeable future and we actually get a HTML6/HTML.next/HTML-uh-it’s-still-versionless-but-updated-HTML then it will be interesting to see what makes it in :)

    0
    • 29

      Great comment Luke.

      I can’t answer regarding your assertion that “Hickson drew up the new elements on a whiteboard in 2004, without any consultation or research” as you reference a private email that I haven’t seen.

      But you say that “This is the biggest problem with HTML5′s semantics: Hickson says they’re just to make styling easier” seems contrary to this mailing list conversation from August 2004 (http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2004-August/002114.html) in which James Graham of Opera says

      “I think that explicit markup for document sections is good (although I would like to see more single-purpose elements such as header or footer to provide addiational semantics for UAs – the ability to seperate out sitewide elements from page-specific content is, in my opinion, particularly important)”

      and Hixie replies

      “Yeah, header and footer or similar elements are almost certainly going to be defined at some point, along with content (for the main body of the page), entry or post or article to refer to a unit of text bigger than a section but smaller than a page, sidebar to mean a, well, side bar, note> to mean a note… and so forth. Suggestions welcome.
      We’ll probably keep it to a minimum though. The idea is just to relieve the most common pseudo-semantic uses of div.”

      Seems to me that here, they’re talking about semantics and UAs rather than just styling. (If styling were the primary use-case, we’d certainly have some kind of content element, I’m sure).

      You can see the history of the aside element (as sidebar was eventually renamed) in a post by Lachy at http://lists.w3.org/Archives/Public/public-html/2009Sep/0257.html

      “As for nav, it’s an accessibility disaster for IE8 and below with JS disabled”

      I agree that its a styling disaster for old IE without JS. The accessibility (in an AT sense) should be unaffected. Pragmatically, if a user is surfing the web with IE6 and JS off, his or her experience of the Web is pretty nasty, and about to get a whole lot worse, not because of unstyled nav but because of increased JS use on websites. (I”m not suggesting that this is laudable, just that it is the way we seem to be headed.)

      0
  14. 30

    I was going to say something similar. Maybe not so offensively, but yeah!

    My frustration is:
    1) new markup where existing markup will work fine. Figcaption is a perfect example. We already have a caption tag. If caption is within a fig it belongs to a fig and can be styled according to how you won’t figs captioned. There’s no need for both caption and figcaption.

    2) re-purposing b/i/u… some things should just DIE! The problem is that we want to promote using this markup because people just won’t stop using it. The problem is that it will continue to be misused and have ZERO meaning for the most part. People will just continue to use it as the shortest tag they can use to tack on for some other purpose (e.g. making headings, corners, borders, placeholders, etc.,.) You’re putting the em-Phasis on the wrong syl-Lable.

    3) confusion around div/section/article/aside. I’ve been to a dozen sites with articles about HTML5 and how to use it properly. I’ve been to conventions and heard very smart people in the design and development community speak. The one thing that seems consistent is a lot of people just don’t get when or how to use these elements correctly. Oh sure, everyone has an article about how to use them, but then weeks later they usually come back and say “my bad, here’s the real story…” or “well, it’s still a work in progress.” Yes, one can fall back to the good old div… trusty old divitis… or one can move forward into the section? article? no definitely section… no, this is on the right side so it must be an aside? Oh crap, no that’s layout, not semantics… where was I again?

    It seems simple. To me, div = generic container, section = section of content, article = syndicated content/feed.

    Divs can hold anything.
    Sections imply a logical hierarchy (an outline).
    Articles imply reuse.

    But then why doesn’t it FEEL simple when writing it? Is it just the newness? I wouldn’t think so since people have been writing “div class=’section/article'” for years now. Or is it taking us back to our high school English class and the pain of diagramming sentences that causes us so much grief. That a lot of what this feels like some days, trying to figure out if you’re following the correct grammar rule and some grammar nazi is going to whack you with the big book of HTML5 at some point.

    I guess what I’m looking for from HTML5 is for it to simplify how I design and develop. I like section, I like new form controls, I like headers and footers (why no body/copy/content?). I know I can fall back to div, but that’s no good in the long run. I need to be able to say without any guesswork/wailing/gnashing of teeth, that [X] is the tag I use [HERE].

    0
    • 31

      ” We already have a caption tag. If caption is within a fig it belongs to a fig and can be styled according to how you won’t figs captioned. There’s no need for both caption and figcaption.”

      Unfortunately, that’s not the case. It was looked at, but it cause problems in older browsers. If you used a figure inside a table, and the figure’s caption were marked up re-using the caption element, the browser would think that it is the caption for the table rather than the figure.

      There was a similar problem when the working group then tried to re-use legend for figures (and what is now summary in details). If you’ve ever tried to style a form legend in IE6, you’ll be glad they didn’t reuse it.

      “It seems simple. To me, div = generic container, section = section of content, article = syndicated content/feed.

      Divs can hold anything.
      Sections imply a logical hierarchy (an outline).
      Articles imply reuse.

      But then why doesn’t it FEEL simple when writing it? Is it just the newness? ”

      I *think* it’s newness. But I really don’t know.

      ” re-purposing b/i/u… some things should just DIE! ”

      well, die is a bit strong. But it does seem to me to be navel gazing, as I hope I indicated in the article.

      0
      • 32

        It seems schizophrenic to say in one breath “we’re going to add all these tags that aren’t supported stylistically in older browsers” and then say “we’re not going to use this tag because it’s not supported stylistically in older browsers.”

        At what point do we say “IE6 is dead” and forget about the nuances for that platform and actually move FORWARD with a sane normalized standard that isn’t hacked together because some niggle in a copy of IE5.2 mac, IE6 or Netscape Gold that happens to still be floating around for 1.4% of the population.

        While the content should still display in those older browsers I don’t think we should put a burden on authors due to outdated implementations. Just do like some developers have been doing, like Andy Clarke, and give older browsers flat content with no CSS, or a serious CSS reset and a hint to upgrade to a newer browser.

        0
        • 33

          Do what I do. If I detect an outdated browser, I forward the user to the following message:

          “You are using an outdated browser. This site requires the latest version of . Please go to to download the latest version of , you lazy incompetent twig.” :-)

          0
    • 35

      Well said Michael, I think you pretty much nailed it with this part, I’ve been developing with this in mind since the new elements became available

      “div = generic container, section = section of content, article = syndicated content/feed.”

      0
  15. 36

    “It may surprise the cool kids in Silicon Valley to learn that a worldwide Web of people use languages other than English and even use different writing systems.”

    Do you really fucking think that? Is your command of English too limited to understand what a bunch of fatuous bullshit that is?

    0
  16. 37

    Fredrik Ekelund

    November 20, 2011 2:14 am

    That “removed elements and attributes”-link needs an http:// in its href attribute. It links back to the current article right now.

    0
  17. 39

    Why the fu*k is everyone using unquoted attributes now? I’ve started to miss XHTML.

    0
    • 40

      It’s up to you. No style is preferred over the other. To quote, or not to quote: that is the question. Do as you like; the browsers don’t care (so neither does the validator)

      0
  18. 41

    Stop worrying about semantics and all that so much and start building great sites. All this talking just leads to talking. Getting tired of the HTML5 bubble. Playtime’s over, get to work.

    0
    • 42

      But it’s our jobs to care about doing it right… Soooo, yeah.

      0
    • 43

      Like Bruce said in the article it’s important that we do this properly, I don’t see this as talking unnecessarily or messing about, (although some of it does seem a tad confusing and makes my head spin). It’s important that we mark content up properly, and as Elliot says – it’s our job to care about doing it right!

      0
  19. 44

    There are also revisions to the structure, syntax, and semantics of HTML, some of which Lachlan Hunt covered in “A Preview of HTML 5.” …
    4 The elements of HTML — HTML5
    The semantics of the protocol used (e.g. HTTP) must be followed when fetching external resources. (For example, redirects will be followed and 404 responses …
    Don,
    bestbusinessbrands.blogspot.com/

    0
  20. 45

    SCUMBAG SMASHINGMAG

    Adds comment voting icons. Doesn’t let you vote.

    -2

↑ Back to top