HTML5 Semantics

Advertisement

Much of the excitement we’ve seen so far about HTML5 has been for the new APIs: local storage, application cache, Web workers, 2-D drawing and the like. But let’s not overlook that HTML5 brings us 30 new elements to mark up documents and applications, boosting the total number of elements available to us to over 100.

Sexy yet hollow demos1 aside, even the most JavaScript-astic Web 2.0-alicious application will likely have textual content that needs to be marked up sensibly, so let’s look at some of the new elements to make sure that your next project is as semantic as it is interactive.

To keep this article from turning into a book, we won’t look at each in depth. Instead, this is a taster menu: you can see what’s available, and there are links that I’ve vetted for when you want to learn more.

Along the way, we’ll see that HTML5 semantics are carefully designed to extend the current capabilities of HTML, while always enabling users of older browsers to access the content. We’ll also see that semantic markup is not “nice to have,” but is rather a cornerstone of Web development, because it is what enhances accessibility, searchability, internationalization and interoperability.

A human language like English, with its vocabulary of a million words, can’t express every nuance of thought unambiguously, so with only 100 or so words that we can use in HTML, there will be situations when it’s not clear-cut which element to use for which piece of content. In that case, choose one; be consistent across the site.

Some Presentational Elements Are Gone

Purely presentational elements such as center, font and big are now obsolete. Their role has been perfectly usurped by Cascading Style Sheets. Now, this doesn’t mean you have to rush out and recode all of those ancient pages; HTML5 makes them obsolete for authors, but because HTML5 strives not to break the Web, browsers will still render those cobwebbed legacy pages.

For the same reason, presentational attributes have been removed from current elements; for example, align on img, table, background on body, and bgcolor on table.

The evil frame element is absent in HTML5. Frames caused usability and accessibility nasties. If you get the urge to use them, use an older DOCTYPE so that your pages still validate.

Beyond this short overview, see the W3C’s exhaustive list of removed elements and attributes2.

Some Presentational Elements Have Been Redefined To Be Semantic

Not all presentational elements have been taken out and shot. Some have undergone an extensive re-education program and emerged with shiny new semantics. For example, the small element no longer means “use a small font,” although it will display that way in browser style sheets. Now it represents side comments3, such as small print:

Small print typically features disclaimers, caveats, legal restrictions, or copyrights. Small print is also sometimes used for attribution, or for satisfying licensing requirements.

Some of the redefinitions feel to me to be a mop-up. While I can get behind <b> for drawing attention to product names, keywords and so forth, without any special emphasis implied, specifying the semantics for marking up ship names (<i>, if you’re so inclined) feels weirdly precise. But I get seasick, and your nautical mileage may vary. With similar niche precision4:

The u element [now] represents a span of text with an unarticulated, though explicitly rendered, non-textual annotation, such as labeling the text as being a proper name in Chinese text (a Chinese proper name mark), or labeling the text as being misspelt.

You can read more about changed elements and attributes5 on the W3C website.

Sexy New Semantics

We all know about video6 and audio. And canvas is particularly popular at the moment because it allows for 3-D graphics using webGL7, so game designers can port their products to the Web. Like good ol’ img, these semantics are embedded content, because they drag in content from another source — either a file, a data URI or JavaScript.

Unlike img, however, they have opening and closing tags, allowing for fallbacks. Therefore, browsers that don’t support the new semantics can be fed some content: an image could be the fallback for a canvas, for example, or a Flash movie could be the fallback for video, a technique called “video for everybody8.”

The source and track elements are empty elements (with no closing tags) that are children of video or audio.

The source element gets past the codec Tower of Babel that we have. Each element points to a different source file (WebM, MP4, Ogg Theora), and the browser will play the first one it knows how to deal with:

<audio controls>
  <source src=bieber.ogg type=audio/ogg>
  <source src=bieber.mp3 type=audio/mp3>
    <!-- fallback content: -->
    Download <a href=bieber.ogg>Ogg</a> or <a href=bieber.mp3>MP3</a> formats.
</audio>

In this example, Opera, Firefox and Chrome will download the Ogg version of Master Bieber’s latest toe-tappin’ masterpiece, while Safari and IE will grab the MP3 version. Chrome can play both Ogg and MP3, but browsers will download the first source file that they understand. The fallback content between the opening and closing tags is a link to download the content to the desktop and play it via a separate media player, and it is only shown in browsers that can’t play native multimedia.

For video, you could use an embedded Flash movie hosted on YouTube:

<video controls>
  <source src=best-video-ever.webm type=video/webm>
  <source src=best-video-ever.mp4 type=video/mp4>
    <!-- fallback content: -->
    <iframe width="480" height="360" 
      src="http://www.youtube.com/embed/xzMUyqmaqcw?rel=0" 
      frameborder="0" allowfullscreen>
    </iframe>
</video>

This way, users of older browsers, such as IE 6-8, will see a YouTube movie (as long as they have the Flash Player), so they will at least be able to see the video, while users with modern browsers will get the full native-video experience. Everyone gets the content, then, which is what your website is there for, after all.

The track element is a newer addition to the HTML5 family and is being implemented by Opera, Chrome and IE at the moment. It points to a subtitle file that contains text and timing information. When implemented, it synchronizes captions with the media file to enable on-demand subtitling and captioning; useful not only for viewers who are hard of hearing, but also for those who do not speak the language used in the audio or video file.

Semantics For Internationalization

Less woo! than the semantics for multimedia and games are the semantics for internationalization. It may surprise the cool kids in Silicon Valley to learn that a worldwide Web of people use languages other than English and even use different writing systems.

Languages such as Arabic and Hebrew are written right to left, unlike European languages, which are written left to right. On pages that use only one writing system, this doesn’t present a problem, but on pages with bi-directional (“bidi”) writing, browsers have to decide where to put punctuation, bullets, numbers and the like. Browsers usually do a pretty good job using the Unicode bidirectional algorithm, but it gets it wrong in some cases, which can seriously dent the comprehensibility of content.

HTML5 gives us a bdi element, which enables authors to override the Unicode bidirectional algorithm and make their text more comprehensible. For a further description of the problem and to see how bdi solves it, see “HTML5’s New bdi Element9” by Richard Ishida10, the W3C’s internationalization activity lead.

Some languages have scripts that are not alphabetic at all, but that express an idea rather than a sound. Occasionally, an author will have to assist readers with pronunciation for especially rare or awkward characters, usually by providing an alternate script in a small font above the relevant character. In print, this was traditionally done with a very small 5-point font called “ruby,” and HTML5 gives us three new elements for marking up ruby text: ruby, rt and rp.

For more information, see “The HTML5 ruby Element in Words of One Syllable or Less11” by Daniel Davis.

Structural Semantics

Most people are aware that HTML5 gives us many new elements to describe parts of a Web page, such as header, footer, nav, section, article, aside and so on. These exist because we Web developers actually wanted such semantics. How did the authors of the HTML5 specification know this? Because in 2005 Google analyzed 1 billion pages12 to see what authors were using as class names on divs and other elements. More recently, in 2008, Opera MAMA13 analyzed 3 million URLs to see the top class names14 and top IDs15 used in the wild. These analyses revealed that authors wanted to mark up these areas of the page but had no elements to do so, other than the humble and generic div, to which they then added descriptive classes and IDs.

(HTML5 Doctor has many articles about HTML5 semantics16, so we won’t bloat this article by going in depth here. Warning: some were written by me.)

The new semantics were built to degrade gracefully. For example, consider what the specification has to say17 about the new figure element:

The figure element represents some flow content, optionally with a caption, that is self-contained and is typically referenced as a single unit from the main flow of the document.

The element can thus be used to annotate illustrations, diagrams, photos, code listings, etc…

This isn’t a new idea. HTML3 proposed a fig element18 (which never made it into the final HTML 3.2 specification). It looked like this:

<FIG SRC="nicodamus.jpeg">
   <CAPTION>Ground dweller: <I>Nicodamus bicolor</I> builds silk snares</CAPTION>
   <P>A small hairy spider.
   <CREDIT>J. A. L. Cooke/OSF</CREDIT></P>
</FIG>

There’s a big problem with this. In browsers that do not support fig (and none do), the image wouldn’t be displayed because the fig element would be completely ignored. The contents of the credit element would be displayed, because it’s just text. So you’d get a credit with no image on older browsers.

In HTML5, you would code the same example like so:

<figure> 
<img src="nicodamus.jpeg"> 
   <figcaption>
      <p>Ground dweller: <i>Nicodamus bicolor</i> builds silk snares.</p>
      <p>A small hairy spider.
      <small>J. A. L. Cooke/OSF</small&gt</p>
   </figcaption>
</figure>

Unlike the aborted HTML3 syntax, the HTML5 version is backwards-compatible: a browser that doesn’t “know” about the figure element will still show the img and the text inside figcaption (as the HTML3 credit element would similarly display its content). Note that we’re using the redefined small element, instead of minting a new credit element. Remember that19 “Small print is also sometimes used for attribution.”

HTML5 also gives us a new figcaption element. Originally, the specification’s authors tried to reuse caption, as suggested in HTML3, but there were legacy problems, because caption had previously only been a child of table.

One of the design principles on which HTML5 is based20 is that new features should degrade gracefully21. When they can’t, the language allows for fallback content. It tries to reuse elements rather than mint new ones — but it’s a pragmatic language: when minting something new is necessary, it does so.

Interactive Semantics

The structural elements of HTML5 currently don’t do much in visual browsers, although software that sits on top of browsers (such as screen readers) are starting to use them (see “HTML5, ARIA Roles, and Screen Readers in March 201122“ and “JAWS, IE and Headings in HTML523.”)

Other elements do have a visual effect. The details element24, for example, is a groovy interactive element that functions as “a disclosure widget from which the user can obtain additional information or controls.”

Most browsers will implement it as an “expando box”: when the user clicks on some browser-generated icon (such as a triangle or downwards-pointing arrow) or the word “Details” (which can be replaced by the author’s own rubric in a child summary), the element will slide open, revealing its details within. The details could be a full description of an image or graph, a description of a complex table, advanced options for a search form, or just about anything else. This is a common need on the Web today, now made native and obviating the need for custom JavaScript.

Most of us have seen HTML5’s new form semantics25. Most of these are attributes of the input element, thereby ensuring graceful degradation to <input type=text> in older browsers. New elements include datalist26, output, progress and meter27.

Do We Have The Right Semantics?

So, we have many new semantics, but are they the right ones? After all, the Google research on which they were based was conducted in 2005 — quite some time ago! Perhaps the semantics are already somewhat behind the times? Many have noted that they’re document-centric rather than application-centric. Do we need more application-centered semantics, such as a login or share element, or some kind of modal element for modal dialogue boxes?

I don’t know; I’m not an app developer. But at least HTML is a “living standard,” and so these can be added if strong enough use cases are presented to the Working Group.

I think most coders would welcome a new way to embed images that respond to the device’s context. Borrowing from the video element, which displays source video according to what media queries instruct, I can imagine a new element such as picture:

<picture alt="angry pirate">
   <source src=hires.png media="min-width:800px">
   <source src=midres.png media="min-width:480px">
   <source src=lores.png>
      <!-- fallback for browsers without support -->
      <img src=midres.png alt="angry pirate">
</picture>

This would pull in hires.png for widescreen devices, midres.png for devices between 480 and 800 pixels wide, and lores.png for everything else, thereby rendering moot the question that designers currently ask themselves, “Do I make every browser download a high-resolution image and then squash it down for small screens, thus wasting bandwidth, or do I send a low-resolution image to every browser and scale it up for big screens, potentially sacrificing quality?”

Taking a leaf from the other popular semantics we’ve seen, there would be a fallback in the middle — in this case, a conventional img element — so everyone would get the right content.

Sending the right-sized image to devices without wasting bandwidth is one of the knottiest problems in cross-device and responsive design at the moment. Perhaps we’ll see a solution to this in HTML6. At the moment, the best solutions, which include Matt Wilcox’s Adaptive Images28 and Filament Group’s Responsive Images29, require JavaScript and tweaks to the server’s htaccess file. The worst solutions require old-fashioned techniques, such as browser-sniffing30, now rebranded as “device detection” but still the same old user-agent string-pattern matching, which is hilariously fragile31, not future-proof or scalable, and straight out of the days of “Best viewed in Netscape Navigator at 800 × 600” badges on websites.

When, Where, Who?

A lot of data depends on three pieces of information: when, where and who?

HTML5 Semantics

HTML5 has a time element (which has been a bit of a battleground32 lately). This enables you to annotate a human-readable date with an unambiguous machine-readable one. It doesn’t matter what goes between the tags, because that’s the content for people to read. So, you could use either of the following:

<time datetime="1982-07-18">The day the woman I love was born</time>

<time datetime="1982-07-18">Priyanka Chopra’s birthday</time>

Whichever you choose, the machine would still know the date you mean because of the datetime attribute, formatted as YYYY-MM-DD. If you wanted to add a time, you could: separate the time from the date with a T, and then put the time in 24-hour format, terminated by a Z, along with any time-zone offset. So, 2011-11-13T20:00Z would be 8:00 pm on 13 November 2011 UTC33, while 2011-11-13T23:26.083Z-05.00 would be 23:26 pm and 83 milliseconds in the time zone lying 5 hours before UTC. A Sri Lankan-localised browser could use this information to automatically convert dates into Buddhist calendar. Search engines could use timestamps to help evaluate “freshness”34.

It’s perhaps surprising that, even though geolocation35 is so prevalent now, we don’t have a location element that simply takes three attributes: latitude, longitude and (optionally) altitude. It would be great to be able to write the following:

<location lat=51.502064 long=-0.131981>London SW1A 4WW</location>

The browser would then offer to show you a map or give you directions from the current GPS location or any other location-based service.

(Since I gave the talk that this article is based on, Ian Hickson, the HTML5 editor, said that he expects to add a new <geo> element36. If I could choose, I’d prefer place, so I could wear a T-shirt with the slogan “I’ve got the time if you’ve got the place“.)

HTML3 had a person element37, “used for names of people to allow these to be extracted automatically by indexing programs,” but it was never implemented. In HTML4, the cite element38 could be used to wrap names of people, but this has been removed in HTML5 — controversially (see “Incite a Riot39” by Jeremy Keith). In HTML5, then, we’re left with no way to unambiguously denote a person. People’s names are, however, a hard problem to solve. Whereas times and dates have well-known standardized ISO formats (YYYY-MM-DD and HH:MM:SS.mmm, respectively), and location is always latitude, longitude and altitude, personal names are harder to break down into useful parts: there are Russian patronymics, Indonesian single-word names40, multiple family names, and Thai nicknames to consider. (See Richard Ishida’s excellent article “Personal Names Around the World41” for more information and discussion.)

The new data element, which replaces time42, has a value attribute that passes machine-readable information, but it has no required or implied format, so there is no way for a browser or search engine to know, for example, whether 1936-10-19 is a date, a part number or a postal code.

Microdata

HTML5, like HTML4, is extensible (but not in the oh-so-dirty eXtensibility way of XML formats, so loathed by the Working Group). You can use the tried and tested microformats, which use HTML classes, or the full RDFa specification, which doesn’t validate in HTML4 or HTML5. Because RDFa was considered to be too hard for authors to write (Google has conducted research that finds that authors make 30% more mistakes with RDFa43 than with other formats), HTML5 specifies microdata, a mechanism for adding common semantics via agreed-upon markup patterns. HTML5 Doctor has more information on HTML5 microdata44, and Opera 11.6045 supports the Microdata DOM API46.

Like microformats and RDFa, the extra semantics added to the markup make sense only if you have a cheat sheet that tells you what each piece means. This means that the data has to point to a vocabulary that tells any crawler how to interpret the lump of data it finds. For microdata, there is the newly established Schema.org47, which is “a collection of schemas, i.e. HTML tags, that webmasters can use to mark up their pages in ways recognized by major search providers.”

Do Semantics Matter Anyway?

Now that more and more markup is generated by JavaScript, some people are tempted to think that semantics don’t matter. We see various products marketed as HTML5 which simply make divs fly around the screen with JavaScript  —  simple DHTML techniques unchanged from 10 years ago.

I’ve even seen some Web pages with no markup at all. Some frameworks emit skeletal HTML with empty body tags and inject all the HTML with script. If you’re squirting some minified JavaScript down the wire, with no markup at all, you’re closer to Flash than you are to the Web.

In the same way that 47 minutes is (apparently) too long to to struggle making a CSS layout, at which point you should just give up and use tables48, some people suggest that thinking about which element to use is a waste of time. “There are two types of developers: those who argue about div’s not being semantic and those who create epic shit” writes Thomas Fuchs49, as if the two activities were mutually exclusive.

A better argument is that no software cares about or consumes semantics anyway, so why bother? This isn’t true (work is underway already to map assistive technologies to new semantics50), but even if it were true, it ignores that this is a chicken-and-egg argument. It assumes that no new search engine will ever come to the market and be able to use new elements, or that browsers will never release new versions that can make use of these semantics, and that developers will write no new extensions  —  in short, it assumes that the evolution of the Web is complete.

Semantics do matter. Semantics communicate meaning, and once that is established, machines can do something meaningful with that data, without having to develop and use algorithms to guess. A browser extension might allow a user to jump straight to the nav with a single keystroke. It can do this because it looks for nav rather than having to employ heuristics to find a div with an id or class that would suggest it’s being used as navigation (assuming the author decided to use something sensible like nav, navigation, sidebar, or menu  —  and a restaurant site with a div called “menu” might be a list of foods rather than other pages…ah, the ambiguity of natural language). A crawler might dynamically assemble articles on a timeline. There are many more possibilities than my meagre imagination can dream up.

The Web is based on simple technologies, mashed up together to bring surprising results  —  results which have certainly surpassed the inventors’ original intents or expectations. The Web will continue to do so. What makes the Web so great, so flexible and so powerful is the fact that content is in open formats that can be parsed and mashed up in new and surprising ways.

These can happen if the content is marked up for meaning by the author  —  and if the language has the right markup elements for authors to use as a vocabulary. HTML5 extends our vocabulary. We’ll need more words  —  and those will come about with HTML6 etc.

If, like me, you believe the Web to be a system that works across browsers, across operating systems, across devices, across languages, that is View-sourcable, hackable, mash-uppable, accessible, indexable, reusable, then we need to ensure that we use the small number of semantic tools at our disposal properly, and we’ll all benefit.

(This article is based on a talk I gave at the Fronteers Conference51.)

About the Author

Introducing HTML552

Bruce53 evangelizes Open Web Standards for Opera54. He wrote the book Introducing HTML555 together with Remy Sharp. The book points out the good and bad parts of HTML5 specifications and shows you how to use the language as well as some areas of spec will be discussed theoretically as they’re not yet implemented anywhere. It’s the first full-length book on HTML5 (New Riders, appearing in the 2nd edition).

(al) (il) (vf)

↑ Back to topShare on Twitter

  1. 1

    Oh, that imaginary picture tag is getting me all hot under the collar. I would love to see that become a reality. Haven’t had time to read this properly yet – instapapered for later though.

    0
    • 2

      Honestly, the picture tag was a big turnoff for me. This triples the amount of work to simply embed an image, nevermind all the troubles that come with formatting that image. Margins change on smaller devices, and there is every size of handheld device so you need to make at least 3 versions of each image, likely more.

      Not ideal.

      0
      • 3

        What’s the ideal alternative then? Serve a gigantic image that works from desktop down to mobile and cripples user data allowances? Current solutions involve either JS or htaccess hacks that aren’t ideal. I’d love to do this natively with the suggested fallback. This is how audio and video tags work, where the appropriate version is served for the end context, why not do this natively with images?

        It’s difficult to discuss an imaginary tag but I’d say you don’t always have to load 3 versions every time. Perhaps some areas of a layout will only require the standard img tag if they are small enough to begin with.

        0
        • 4

          I wrote a PHP script that dynamically resizes the image based on the user agent. So my img src is something like image.php?f=myimage.jpg

          This way I only have 1 high res version of each image, and the image.php script generates new ones on demand and caches them for future requests.

          Also, if you want to display the same image but at different sizes on the same page (great for thumbnails), you could pass a width or height to image.php to force a size.

          0
    • 5

      Is it to late to submit that picture tag to get it in the spec, it’s just what we need to do responsive images.

      Every solution available still requires desktop browsers to download 2 images or uses and .htaccess and feels like a hack (and no one had ported them to nginx).

      Having a picture tag would allow authors who care enough to create and serve small images to mobile browsers would be able to do so, and everyone can still use the img tag.

      0
  2. 6

    Bruce,

    Because pedantry is part of my make-up, I just want to point out a slightly incorrect statement you made with regard to RDFa.

    There is in fact a W3C specification which allows for the conformant use of RDFa with HTML5 – see: HTML+RDFa 1.1. Not everyone considers RDFa “too hard for authors to write”, only some folks do, and it should be pointed out that the latest version of Drupal (Content Management System) has RDFa support baked in under the hood. Small, under-visited sites such as Yahoo!/Flikr, Best-Buy and Whitehouse.gov also use RDFa today, so don’t be too quick to accept the WHATWG hype that RDFa is a lost cause – that is a biased and incomplete assessment of the real situation on the web.

    Outside of that, great recap. Semantics Matter.

    0
    • 7

      heh. Pednartry is great. So note that I said “Because RDFa was considered to be too hard for authors to write “. I don’t say that statement is true, or that I agree with it, but merely that’s the perception that led to microdata being specced.

      0
    • 8

      We’re planning on using Drupal7 to produce RDFa in output for our worldwide sites. One thing we’re discussing internally is whether to contribute to schema.org for ontology around learning classes, courses, programmes of study.
      The use of HTML5 and semantic tags to enable accessibility for screen readers is a big plus. Also the ability to mark, say, Art photos from our Arts sites to make them discoverable by Google users is a win.

      0
  3. 9

    My current dilemma is one where my supervisor believes that using HTML5 will magically promote the website higher in search results. I even pointed out to him the words of John Mu from Google in the thread “Does semantic HTML5 matter to Google yet?”, and he still thinks using HTML5 should position the website better in search results.

    I’m having a hard time explaining to him that it takes much more than HTML5 to receive a better placement in search results. Any suggestions?

    0
  4. 12

    So much semantics talks these days when everyone should focus on content and content quality….

    0
    • 13

      This is a seriously great article. Semantics complement great content, so it is important to have both. For once, I don’t need to explain my point because the article has already done it for me.

      0
    • 14

      Well….

      Seeing as Semantic was introduced to classify data properly and to focus on content rather than endless similar tags that wrap everything from header to footer.

      Google is tuning their search engine in a way to make SEO obsolete, content is what is going to matter.

      0
  5. 15

    I am currently reading the article… but i clicked on the link of the video(http://www.youtube.com/embed/xzMUyqmaqcw?rel=0), its crazy

    0
  6. 16

    Honnestly, cookie based adaptatives images are little less evil than UA sniffing.

    It’s way better to load all img async with ajax according to device width and pixel ratio, and provide a fallback. It’s more work, but way better than playing with cookies and htaccess.

    There are lots of big problems with cookies :

    - JavaScript on the top
    - racing conditions
    - if mobile first, users on browser will see crappy img on their first visit (the most important one)
    - if not, mobile on 3G will take minutes to load your site (and leave)

    Last but not least, it assumes smaller = slower, which is false (I know, defered loading is the same, but there’s no real solution to detect bandwidth cross-browser)

    What we need is not a new img tag (well, we need it, but it’s not the most important) : what we need is a bandwidth indicator in the user agent.

    0
    • 17

      JS is not required to set the cookie in AI (although it’s recommended over the alternative ‘false image in CSS’ method).

      AI handles the no-cookie + mobile first setting (and cookie race condition) better than you think because of the fallback browser sniffing it does – which is not as nasty as you’re thinking as it only needs detect a desktop OS, and that’s simpler and more reliable than I’d expected.

      Take a look at the changelog if you want more details on what it’s doing and why :)

      AI’s not perfect, but it’s not as bad as you’re making the cookie-based technique out to be.

      We do need some ‘real’ solutions though, rather than work-arounds like AI.

      0
  7. 18

    I do belive that many designers think that having a button animated instead a flash button that will be semantic seo, well in my opinion that doesn’t matter there are many html5 sites that doesnt even have good seo content on their sites, they just look good with html5 and no flash but that doesnt going to put your site in the first places of google search [keyword]. SEO is a huge programming included in the website.

    Having a awsome design with animation in HTML5 doesn’t help you that much in SEO, it just going to be visible on mobile devices but nothing more.

    Of course you can design something awsome and have good semantic seo and good seo programming on the site but I’ve seen in many CSS Galleries that some designers say they do SEO with their sites but many of them are only 1 page with many picture and few text (but awsome animation “jquery”).

    Think about it having cool sliders (jquery) buttons or cool fonts will NOT make your site position on the first places in [search keyword] it just help your site can be visible in iPhone, iPad or Android.

    As Designers we must design and code our sites so they can be visible to all of our potential customers.

    [BTW I love HTML5 and FLASH SUX]

    Damian Rivas | hybridixstudio.com

    0
    • 19

      Stupid comment, crappy arguments and final spam at the end of the comment.

      Congratulations, you are an ignorant who believes to be wise. The next time, shut up and read a book; you need it.

      0
  8. 20

    I hate priyanka chopra :p

    0
  9. 22

    Have been building standard document style sites for years so the new HTML5 semantics are a breath of fresh air and help standardize markup. I do see the need for more tags to define application UI and functionality, would help clean the slate of heavy nested divs for the sake of visual elements. CSS3 of course will help with this, but the battle of the browsers on this front seems to never end.
    The new “data-” attribute is interesting to me, and since this attribute is custom I am sure we will start seeing trends happen especially in application development.

    0
  10. 23

    Great article! Read earlier this month that time, datetime, and pubdate were eliminated from the HTML5 spec recently. Not sure I’m excited about this, but that’s the way it is.

    0
    • 24

      The time element has returned to the HTML5 specification although the pubdate attribute is still under threat.

      0
  11. 25

    Structure != Semantics. From 1994: ‘lets nip this one in the bud before the masses get ahold of it.’

    http://1997.webhistory.org/www.lists/www-html.1994q4/0094.html

    From this article i can see they weren’t successful.

    0
  12. 27

    I think this article really useful to me! thank the writer

    0
  13. 28

    Nice article Bruce, you do a great job of covering some of the new *functionality* in HTML5, but fwiw I think the new *semantics* are a separate issue, on which I’ll rant briefly below :)

    IMO conflating semantics with functionality misses the point of the previous articles, which were more focused on the structural elements.

    On the structural elements, you say: “Most people are aware that HTML5 gives us many new elements to describe parts of a Web page, such as header, footer, nav, section, article, aside and so on. These exist because we Web developers actually wanted such semantics. How did the authors of the HTML5 specification know this? Because in 2005 Google analyzed 1 billion pages to see what authors were using as class names on divs and other elements. ”

    This is, unfortunately, a myth, and I hope we can put an end to it. Please? :) When doing research for my own HTML5 book (http://itsninja.com/html5book/) I asked Hickson about the new structural elements, and he said he (and a few others) added them *prior* to any research. As far as I could tell from the WHATWG archives, Hickson drew up the new elements on a whiteboard in 2004, without any consultation or research.

    But the research backs him up though, right?

    Well, on the face of it, you’d think so (apart from the vast absence of any classes!). But then you look at what the spec actually says (and you’d know this better than most), and it’s vastly different to what web designers and other authors actually want. For example, how is sectioning a product of the research? It’s an old concept from 1991; yet it’s the foundation of the new structural elements. How is header and footer in the spec — which are intended for any section, not specifically the “overall” page section — at all similar to how authors actually use them? (Which is far more like ARIA banner and contentinfo landmarks.) Who wanted an “aside” element for both sidebars and pull-quotes? Who has been using “article” to denote comments, or forum posts, or widgets (!) as the spec suggests?

    This is the biggest problem with HTML5′s semantics: Hickson says they’re just to make styling easier, they’re just what everyone has been doing, they’re just what the research says. But then you look at the spec, and it’s nothing like what we’ve been doing at all! I’m not surprised it’s such a flustercuck of confusion — when you tell everyone it’s what they’re already doing, give elements names which intuitively *look* like it’s what we’ve been doing, and then write them up in the spec in pretty esoteric ways which don’t reflect reality at all, you end up with a mess. And that’s what we’ve got.

    Then when it comes to the supposed benefits of what search engines or AT etc is going to do with it… well, what are they going to do with it? Do they follow the incorrect and extremely messy real world usage, or do they follow the not-at-all-followed-by-authors spec? (I also noted in a WHATWG exchange with yourself Hickson said he doesn’t think UA’s will ever do anything with them, which is maybe for the better!)

    And I know from the WHATWG archives what a mess the whole lack of “main” or “content” element is regarding what authors actually want — I quoted your WHATWG comments in my book :)

    HTML5 semantics, in terms of structural elements, are dead on arrival. No search engine here and now asked for, or needs them as currently spec’d (they’ve defined what the actually want with Schema.org), no non-HTML-book-writing HTML authors understand them correctly (ask a HTML5-aware designer what their backwards compatible document outline looks like), and no one *will* be able to use them meaningfully as per the spec because the spec has become, as Hickson often says he wants to avoid, a word of fiction in terms of real world use of these elements.

    As for nav, it’s an accessibility disaster for IE8 and below with JS disabled (according to Yahoo 2010 research 1-2% of all traffic to their sites have JS disabled: http://developer.yahoo.com/blogs/ydn/posts/2010/10/how-many-users-have-javascript-disabled/). Those users don’t get the JS fix, styling blows up, and the page now has very broken navigation. So for a theoretical future benefit we do real harm now. I’ve found it quite perplexing that the web standards community has been happy to implicitly declare JavaScript mandatory for a significant subset of users — I don’t think this issue has gotten the attention it deserves. Fortunately the ARIA landmark seems a much safer & saner approach.

    Anyway, rant over, but it’s really bugged me to see the HTML5 elements (and the story behind them) taught in a way that isn’t reflected in the spec, or in their original creation.

    I enjoyed the rest of the article (especially those ideas about baking some responsive image stuff into HTML), and if the WHATWG and W3C can sustain their marriage of convenience for the foreseeable future and we actually get a HTML6/HTML.next/HTML-uh-it’s-still-versionless-but-updated-HTML then it will be interesting to see what makes it in :)

    0
    • 29

      Great comment Luke.

      I can’t answer regarding your assertion that “Hickson drew up the new elements on a whiteboard in 2004, without any consultation or research” as you reference a private email that I haven’t seen.

      But you say that “This is the biggest problem with HTML5′s semantics: Hickson says they’re just to make styling easier” seems contrary to this mailing list conversation from August 2004 (http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2004-August/002114.html) in which James Graham of Opera says

      “I think that explicit markup for document sections is good (although I would like to see more single-purpose elements such as header or footer to provide addiational semantics for UAs – the ability to seperate out sitewide elements from page-specific content is, in my opinion, particularly important)”

      and Hixie replies

      “Yeah, header and footer or similar elements are almost certainly going to be defined at some point, along with content (for the main body of the page), entry or post or article to refer to a unit of text bigger than a section but smaller than a page, sidebar to mean a, well, side bar, note> to mean a note… and so forth. Suggestions welcome.
      We’ll probably keep it to a minimum though. The idea is just to relieve the most common pseudo-semantic uses of div.”

      Seems to me that here, they’re talking about semantics and UAs rather than just styling. (If styling were the primary use-case, we’d certainly have some kind of content element, I’m sure).

      You can see the history of the aside element (as sidebar was eventually renamed) in a post by Lachy at http://lists.w3.org/Archives/Public/public-html/2009Sep/0257.html

      “As for nav, it’s an accessibility disaster for IE8 and below with JS disabled”

      I agree that its a styling disaster for old IE without JS. The accessibility (in an AT sense) should be unaffected. Pragmatically, if a user is surfing the web with IE6 and JS off, his or her experience of the Web is pretty nasty, and about to get a whole lot worse, not because of unstyled nav but because of increased JS use on websites. (I”m not suggesting that this is laudable, just that it is the way we seem to be headed.)

      0
  14. 30

    I was going to say something similar. Maybe not so offensively, but yeah!

    My frustration is:
    1) new markup where existing markup will work fine. Figcaption is a perfect example. We already have a caption tag. If caption is within a fig it belongs to a fig and can be styled according to how you won’t figs captioned. There’s no need for both caption and figcaption.

    2) re-purposing b/i/u… some things should just DIE! The problem is that we want to promote using this markup because people just won’t stop using it. The problem is that it will continue to be misused and have ZERO meaning for the most part. People will just continue to use it as the shortest tag they can use to tack on for some other purpose (e.g. making headings, corners, borders, placeholders, etc.,.) You’re putting the em-Phasis on the wrong syl-Lable.

    3) confusion around div/section/article/aside. I’ve been to a dozen sites with articles about HTML5 and how to use it properly. I’ve been to conventions and heard very smart people in the design and development community speak. The one thing that seems consistent is a lot of people just don’t get when or how to use these elements correctly. Oh sure, everyone has an article about how to use them, but then weeks later they usually come back and say “my bad, here’s the real story…” or “well, it’s still a work in progress.” Yes, one can fall back to the good old div… trusty old divitis… or one can move forward into the section? article? no definitely section… no, this is on the right side so it must be an aside? Oh crap, no that’s layout, not semantics… where was I again?

    It seems simple. To me, div = generic container, section = section of content, article = syndicated content/feed.

    Divs can hold anything.
    Sections imply a logical hierarchy (an outline).
    Articles imply reuse.

    But then why doesn’t it FEEL simple when writing it? Is it just the newness? I wouldn’t think so since people have been writing “div class=’section/article’” for years now. Or is it taking us back to our high school English class and the pain of diagramming sentences that causes us so much grief. That a lot of what this feels like some days, trying to figure out if you’re following the correct grammar rule and some grammar nazi is going to whack you with the big book of HTML5 at some point.

    I guess what I’m looking for from HTML5 is for it to simplify how I design and develop. I like section, I like new form controls, I like headers and footers (why no body/copy/content?). I know I can fall back to div, but that’s no good in the long run. I need to be able to say without any guesswork/wailing/gnashing of teeth, that [X] is the tag I use [HERE].

    0
    • 31

      ” We already have a caption tag. If caption is within a fig it belongs to a fig and can be styled according to how you won’t figs captioned. There’s no need for both caption and figcaption.”

      Unfortunately, that’s not the case. It was looked at, but it cause problems in older browsers. If you used a figure inside a table, and the figure’s caption were marked up re-using the caption element, the browser would think that it is the caption for the table rather than the figure.

      There was a similar problem when the working group then tried to re-use legend for figures (and what is now summary in details). If you’ve ever tried to style a form legend in IE6, you’ll be glad they didn’t reuse it.

      “It seems simple. To me, div = generic container, section = section of content, article = syndicated content/feed.

      Divs can hold anything.
      Sections imply a logical hierarchy (an outline).
      Articles imply reuse.

      But then why doesn’t it FEEL simple when writing it? Is it just the newness? ”

      I *think* it’s newness. But I really don’t know.

      ” re-purposing b/i/u… some things should just DIE! ”

      well, die is a bit strong. But it does seem to me to be navel gazing, as I hope I indicated in the article.

      0
      • 32

        It seems schizophrenic to say in one breath “we’re going to add all these tags that aren’t supported stylistically in older browsers” and then say “we’re not going to use this tag because it’s not supported stylistically in older browsers.”

        At what point do we say “IE6 is dead” and forget about the nuances for that platform and actually move FORWARD with a sane normalized standard that isn’t hacked together because some niggle in a copy of IE5.2 mac, IE6 or Netscape Gold that happens to still be floating around for 1.4% of the population.

        While the content should still display in those older browsers I don’t think we should put a burden on authors due to outdated implementations. Just do like some developers have been doing, like Andy Clarke, and give older browsers flat content with no CSS, or a serious CSS reset and a hint to upgrade to a newer browser.

        0
        • 33

          Do what I do. If I detect an outdated browser, I forward the user to the following message:

          “You are using an outdated browser. This site requires the latest version of . Please go to to download the latest version of , you lazy incompetent twig.” :-)

          0
    • 35

      Well said Michael, I think you pretty much nailed it with this part, I’ve been developing with this in mind since the new elements became available

      “div = generic container, section = section of content, article = syndicated content/feed.”

      0
  15. 36

    “It may surprise the cool kids in Silicon Valley to learn that a worldwide Web of people use languages other than English and even use different writing systems.”

    Do you really fucking think that? Is your command of English too limited to understand what a bunch of fatuous bullshit that is?

    0
  16. 37

    That “removed elements and attributes”-link needs an http:// in its href attribute. It links back to the current article right now.

    0
  17. 39

    Why the fu*k is everyone using unquoted attributes now? I’ve started to miss XHTML.

    0
    • 40

      It’s up to you. No style is preferred over the other. To quote, or not to quote: that is the question. Do as you like; the browsers don’t care (so neither does the validator)

      0
  18. 41

    Stop worrying about semantics and all that so much and start building great sites. All this talking just leads to talking. Getting tired of the HTML5 bubble. Playtime’s over, get to work.

    0
    • 42

      But it’s our jobs to care about doing it right… Soooo, yeah.

      0
    • 43

      Like Bruce said in the article it’s important that we do this properly, I don’t see this as talking unnecessarily or messing about, (although some of it does seem a tad confusing and makes my head spin). It’s important that we mark content up properly, and as Elliot says – it’s our job to care about doing it right!

      0
  19. 44

    There are also revisions to the structure, syntax, and semantics of HTML, some of which Lachlan Hunt covered in “A Preview of HTML 5.” …
    4 The elements of HTML — HTML5
    The semantics of the protocol used (e.g. HTTP) must be followed when fetching external resources. (For example, redirects will be followed and 404 responses …
    Don,
    bestbusinessbrands.blogspot.com/

    0
  20. 45

    I want to make two points:
    1. Embedding YouTube videos already serves HTML5 to non-Flash-supporting devices, so if you have a YouTube video, you don’t need the fallback on your site as it already does that.
    2. “you’re closer to Flash than you are to the Web” – Flash IS part of the Web. It’s not a web standard, but it undeniably, part of the Web.

    A good article, thanks.

    0
    • 46

      1) Use-case: I want to script my own player in capable browsers rather than use the default YouTube one.

      2) Philosophically I’d say that Flash, like Silverlight, PDF are content that’s delivered through the web but they aren’t part of the web in the sense that they need browser plugins to render them and they don’t easily interact with other web technologies.

      0
  21. 47

    SCUMBAG SMASHINGMAG

    Adds comment voting icons. Doesn’t let you vote.

    0
  22. 49

    Some great new features looking forward to using them on my new portfolio design over at stephencostello.com in the coming weeks.

    0
  23. 50

    This is great. Thank you.

    0
  24. 51

    Hi Bruce

    I was wondering why your site looks like a highschool student’s site ?

    For quite awhile I try to learn the semantic site of HTML5 and I have found that all the famous 5 star developers keep preaching the principles they don’t practice.

    Do you yourself make use of the principle you teach others to do ?

    If the answer is NO, then why not ?

    0
    • 52

      Hi Paul

      Yup, you’re right. Feel free to disregard everything I write about aesthetics, proportion, composition, colour balance and whitespace.

      You’ll find I haven’t written anything about that, because I’m not a designer and have never claimed to be. (Which is why why my site looks crap. I am however revamping it over Xmas to be slightly less unpleasant.)

      As an aside, I generally find that the worst kind of pseudo-designers are those that disregard or discount the content because they dislike the colours.

      0
      • 53

        Hello again Bruce

        First of all, I apologize if my words are little to harsh, English is not my first language.

        I don’t want to disregard everything you and many others point too, I want to learn new stuff and at the same time see the real working examples that reflect the real life.

        Aesthetics aside, how many percents of HTML5 stuffs that you (and many others) teach are being used in their own works ?

        I feel like readers are being used as beta testers here. Look at this site (smashingmagazine) it doesn’t use ‘time’ element, there is no ‘article’ or ‘section’ – the divs are all over.

        How do you think readers should feel ?

        0
        • 54

          “I feel like readers are being used as beta testers here. Look at this site (smashingmagazine) it doesn’t use ‘time’ element, there is no ‘article’ or ‘section’ – the divs are all over.”

          Fair point. I don’t control Smashing Mag’s templates. Look at my site. Or any WordPress site using the 2011 theme. Reddit uses time, as does Github. HTML5Doctor, which I co-curate, uses all the new elements as apporpriate.

          0
    • 55

      As a highschool student myself, I’d like to point out that is a really offensive thing to say about highschool students’ websites. (Admittedly, my site does look exactly like Bruce’s).

      0
  25. 56

    So nice article.Really great.

    0
  26. 57

    I’m trying to get myself into HTML5 as best as I can, and in all honesty, aside from it’s rather ambiguous nature (which makes it rather hard to get any solid information on), I’ve yet to find a resource that really puts best practices forward in a nice, succinct manner.

    Instead, there just seems to be an overwhelming amount of people creating things that give you more options, but end up confusing you because you have no grasp of the core mechanics. Do I use Modernizr? What about LESS? What are the hundreds of thousands of “but wait…” situations that crop up from all of this interplay with non-standardized technology?

    In the end, I just go home defeated and more confused. The perfectionist in me wants to put the best foot forward, but with everyone chiming up on Feature A and Feature B, the message itself isn’t coming through loud enough.

    1
  27. 58

    No double quotes on the tag elements?

    0
  28. 60

    Well Bruce one thing that we share is the LOVE towards Priyanka. After your affair with Peepli Live go for Dostana to see the super gorgeous Priyanka.

    Coming back to HTML5. I have been tracking you and Molly a lot on this topic and also reading your book thot you co-authored with Remy Sharp. I really love the content that you guys create and I have learnt a lot about HTML5 and the new semantics from you guys.

    Even though you had to face some harsh comments keep up this awesome task that you have taken up to enlighten people like us.

    Enjoy Dostana. ;) Get in touch if you come over to Bombay some time :)

    0
  29. 61

    That great post.Thank you for sharing.

    0
  30. 62

    What the fuck is wrong with you people!? If you are still designing and coding your sites to include the older versions of browsers, then you are a part of the PROBLEM and should be taken out back and beaten with the Flash Bible! This is the MAIN reason why HTML5 will take 20 years to evolve, because designers are soooo worried about what grandma and grandpa are going to think of their website when they visit it with IE3. If people are too stupid or unwilling to upgrade their browsers, then that is their loss and should get with the times!

    0
    • 63

      I do agree with you, but you must consider that it’s not about the grampa with IE3.

      It’s the crappy companies that don’t grant access to it’s workers to install anything. This means that YOU could find yourself in a ‘big’ company that still uses XP and just have only IE6 installed.

      So, even on work, you will browse stuff. And if the developer of the website you access see the monthly access rates for browsers (and you are adding % to IE6) and see that it still have a nice amount of people using it, it WILL code for older browser. Because visits can mean more income. And if the visitors can’t see the website or it looks too crappy, this means less money.

      And noone wants to loose money.

      But yeah, i’m a mobile dev and i already don’t give a sh!t to what people using Windows Mobile (6.5 and lower) or Blackberry (6.0 and lower) users see. Because there are more work to do than trying to code for those things that come with the OS and that call themselves ‘browsers’.

      0
  31. 64

    I don’t care much about the media attribute on the source element of a picture. I’d rather see width, height and maybe also a size attribute tell the browser the dimensions of an image so that it can choose the right image based on the space it has available or the current bandwidth of the connection. (So you get a hires image on a retina screen if connected to WIFI, and lowres on a laptop with a 3G dongle.)

    I also wish browsers would use the width and height attributes of the old IMG element as the native width and height of the image until the image starts loading. (To keep not yet loaded images from collapsing if CSS width:100%; height:auto override the native width and height to make them shrink to the available space.)

    0
  32. 65

    Gunnar Bittersmann

    December 13, 2011 7:08 am

    A couple of days ago, my first attempt to comment has never shown up, so I give it another try.

    Yeah, those redefinitions of previosly presentational elements in HTML5 are questionable. The spec once read “The i element represents a span of text […] whose typical typographic presentation is italicized.” In English typography, it is. But you cannot write a spec for the world-wide web based on the habits of just a small part of the world, can you?

    Now the reference to italicized presentation has been dropped. There’s still a reference to “Western texts” though.

    b, i, s and u elements make sense with class attributes describing their contents.Which might raise the question: why not just spans with classes?

    In the section “When, Where, Who?”:

    “put the time in 24-hour format, terminated by a Z, along with any time-zone offset” is not correct. It can be either but not both. ‘Z’ is just an abbreviation for ‘+00:00′.

    In “2011-11-13T23:26.083Z-05.00 would be 23:26 pm and 83 milliseconds” the seconds ’00′ are missing, it has to be 2011-11-13T23:26:00.083-05:00. No ‘Z’; ‘:’ in the time-zone offset.

    “[…] in the time zone lying 5 hours before UTC” is confusing to me. Noon in New York (-05:00) is 5 hours _later_ than noon in London (UTC). Isn’t that the time zone lying 5 hours _behind_ UTC?

    0
    • 66

      Gunnar

      Thanks for your comment. Yup, I made a mistake leaving out the seconds out in that example. The format of the time element recently changed, so maybe that’s where the Z confusion came. I’m (pretty) sure it was right when I checked it.

      As I write this it’s 19:01 GMT and 14:01 in New York. Thus, New York is GMT -5.00

      0
  33. 67

    I like html5 semantics. It has very neat and clean markup. It requires a lot of practice and it is great for responsive design. Start using now!

    0
  34. 68

    The main difference between XML and earlier versions of HTML was this very issue of semantics: the writer of an XML document could choose any elements they wanted and, having “invented” those elements, knew exactly what they “meant”. To say it another way, HTML is a *formatting* language, while XML is a *semantic* language. With HTML5, we can see more and more the need for semantic documents.

    Back in 2006, I remember having a discussion about this with another web developer, and the question was raised: why not replace HTML with XML (which is fully style-able via CSS), where a user can *name* an element whatever they like and associate with that element whatever *meaning* they like.

    The main problems with the idea were that:
    1. There is an implicit “meaning” in most HTML elements. For example, a browser knows to replace its window’s caption with the contents of the HTML <title> element. With XML it wouldn’t know this. As another example, how would a browser know that a custom XML <image> element should be replaced with an actual image, obtained from an HTTP “GET” request to the URL identified by the “src” attribute? With HTML, this is implied.
    2. Search engines would have difficulty in determining the most relevant content. For example, currently search engines prioritise the textual content of <h> heading elements. With a custom heading element, it wouldn’t know that the contents are more indicative of the content than a <span> element.

    In an ideal world, I think all markup should be semantic, and I can think of two solutions to the above:
    1. Have standard “xml:” namespaced elements. For example, a custom XML <image> tag could have an “xml:source” attribute (e.g., “http://site.com/image.png”) and an “xml:type” attribute (corresponding to the MIME-type, e.g., “image/png”). Due to the “xml:” namespacing, the browser would know to embed the content (this “embedding” of content could also work with other media types, whether they be CSS files, video files, etc). Similarly, there could be an <xml:script> element with the implied meaning of containing content.
    2. Have two types of XML documents – the first being the XML document containing your content, the second document “explaining” to search engines what each element “means”.

    Anyway, head trip.

    0
  35. 69

    I’m still suspicous of HTML5. Support is still very patchy and getting it to play nicely in some browsers requires a bit of work. I have this theory; HTML5 is like an unstable mental patient…it requires support to get it to behave properly in public. And if something requires extra effort to get it to cooperate then would you fully trust it? I can understand the arguement that we should embrace the future but I’ll be stepping into the future with caution for the short term…

    I want HTML5 to be my new best friend. We’ll see…c’mon browsers, get supporting it fully!

    0

Leave a Comment

Yay! You've decided to leave a comment. That's fantastic! Please keep in mind that comments are moderated and rel="nofollow" is in use. So, please do not use a spammy keyword or a domain as your name, or else it will be deleted. Let's have a personal and meaningful conversation instead. Thanks for dropping by!

↑ Back to top