When One Word Is More Meaningful Than A Thousand

Advertisement

You may be wondering why you’re reading about the good old semantics on Smashing Magazine. Why doesn’t this article deal with HTML5 or another fancy new language: anything but plain, clear, tired old semantics. You may even find the subject boring, being a devoted front-end developer. You don’t need a lecture on semantics. You’ve done a good job keeping up with the Web these last 10 years, and you know pretty much all there is to know.

I’m writing about HTML semantics because I’ve noticed that semantic values are often handled sloppily and are sometimes neglected, even today. A huge void remains in semantic consistency and clarity, begging to be filled. We need better and more consistent naming conventions and smarter ways to construct HTML templates, to give us more consistent, clearer and readable HTML code. If that doesn’t sound like paradise, I don’t know what does.

The Bare Necessities Of Semantics

With all the functional mumbo jumbo hidden away in HTML5, some of us seem to have forgotten what HTML is really all about. Native video support is considered way cooler than the new header tags, somewhat understandably, but from a semantic and structural point of view, these latter elements present the most valuable improvement.

Semantic importance got a serious boost when accessibility became a big deal to us Web developers. But its powers go way beyond making our content available to those lacking the skills to surf the Web in regular ways. For one, making content recognizable to all kinds of crawlers (but most importantly search engines) could greatly improve the results of search queries on the Web. Rather than wading through trailers, film websites and product pages, wouldn’t it be much nicer to filter reviews directly and find out how a certain film has been received? Currently, no trustworthy mechanism exists to recognize or filter a broad range of content types, which is a serious loss for the Web as a whole.

Screenshot
When looking for reviews, you don’t want to end up on a page with grayed-out links.

If all of that sounds like a far-off dream, then note that once you’ve distinguished between all the elements on your website, you will have little to no trouble styling or adding functional behavior to the page. The combination of context and proper semantics ensures a solid structure for all further front-end work, which is only made stronger by making sure every element is defined correctly.

The (Very Simple) Basics

Absolutely nothing is complex about semantics, and the basics have been preached for a long time now. A recap of the bare minimum won’t hurt anybody, though, so here it goes.

The HTML language has a range of tags with semantic meaning. If none of the available tags suits your needs, then two generic tags (span and div) are the HTML equivalents of the word “thing,” which can be used in combination with one or more classes to add (not necessarily standardized) semantic value to your elements. It’s the microformats idea without the actual microformats. Some basic examples:

  • Main navigation: nav.main (HTML5) or div.navMain;
  • An article: article (HTML5) or div.article;
  • Article header: article>header (HTML5) or div.article>div.header

That’s all there is to it, really. Adding semantic value is about choosing the correct tag(s) and/or applying the correct label(s) to an element. It really makes you wonder why applying this simple concept consistently to professionally developed websites has proven to be so difficult, even today.

For those of you who don’t like the microformats ideology, you could also go all HTML5 and look at the HTML5 Microdata proposition1. What follows in this article reflects both methodologies equally, so the choice is entirely up to you.

Sampling The Web

To illustrate my point, I took some quick samples from some of today’s leading websites. By no means do these samples hold any scientific validity, nor is this a purposeful bash of the websites I’ve singled out. They are simply chosen because I believe they best represent their kind. I hope the data speaks for itself either way.

To grasp the semantic consistency within a website, I tried finding some common content types. Content types are easy to recognize and even easier to label. Before I get to the data, though, let’s look at one way we could label products in a Web store:

  • Product detail: div.product;
  • Products added to your basket: .basket li.product;
  • Promo product in a list: .categoryList .product.promo;
  • Etc.

Products are everywhere in a Web store, so it seems logical that the product class would reappear across the pages for every instance of a product on the website. After all, whether a product is located in a “Related items” list, added to a basket or shown in full doesn’t really change its semantic nature, so why change its structure or class name?

Screenshot
These are all products, appearing as variants or in different contexts.

For my sample, I picked five content types (story, product, video, person, blog post) and picked four websites to represent each content type. To check for semantic consistency, I looked at the labels on a shortlist (a list of content type instances) and the content type’s detail. The following table summarizes my findings:

Type Website Shortlist Detail
Story BBC2 div.hpData table.storycontent
Story New York Times3 div.story div#article
Story CNN4 ul.cnn_bulletbin li div.cnn_storyarea
Story MSN5 li.ter div.w649 (?)
Product Amazon6 div.asinItem -
Product Apple Store7 li.product div.product-selection
Product Play.com8 div.info div.dvd
Product YesAsia9 div.item div#productpage
Video YouTube10 div.video-cell div.video-info
Video Vimeo11 div.item div.video_container_hd
Video Dailymotion12 div.video div.dmco_box
Video eBaum’s World13 div.mediaitem div#videoContentContainer
Person Facebook14 div.UIFullListing div.profile_top_wash and div.profile_bottom_wash
Person Last.fm15 div.user div.user
Person Virb16 table.people td div#profile_wrapper.artist
Person Twitter17 div#following_list span.vcard div#profile
Blog post Zeldman18 - -
Blog post A List Apart19 div.item - or body.articles
Blog post Jens Meiert20 div.item .content .col-1
Blog post Webaim21 div#features div.section

Apart from last.fm, none of the websites I checked got it right, even though the content types I chose were very easy to label. Apple and the New York Times came quite close, but some of the others are miles away from what you’d expect to find. And that’s just looking at the root tag of the content type. The structure and classes within are often even worse, bordering on complete randomness. Another thing to note is that blogs about Web design seem to score the worst.

Think Components, Not Pages

There is, of course, not one single cause of this problem, nor is the solution simple. But you can make one important mental shift as a front-end developer to give your work more semantic consistency. The key is to stop thinking of a website as a collection of pages and to instead look for common components.

Front-end developers tend to work the same as designers: start with the home page, finish that, and then move on to the second wireframe — copy the reusable components, adapt if needed, and then repeat until all pages are done. This process requires a lot of copying, adapting and checking older pages to find reusable elements. It is a true killer of consistency — invoking spur-of-the-moment labels and destroying semantic consistency.

Because we want consistency, both in structure and semantics, focusing on a single component at a time is better. When you need to write the HTML code for a product, check each wireframe for variations within and across products. Write code that can handle all existing variants. Once that is done, you will have a consistent and solid model to describe your component that you can used wherever you want.

Making It All Happen

I know from experience that this mental shift takes some time to get used to, and the only way to get it working is to throw yourself in and practice. I’ll share some quick pointers to make the whole process a little less daunting.

Think Beyond Styling Needs and Performance

.productList li or .products li

ul li.product

Consider the example above. As Web developers, we’ve been taught that the first option should be preferred. From a performance and styling perspective, this is indeed the case. But putting on your semantic hat, you’ll notice that to recognize the list items in the first example as products, you need to make a deduction. Singling out all products on a page isn’t as easy as looking for the product class. Automated systems should also account for the possibility that a product is defined as a list item inside a parent that refers to a collection of products. Not such a big deal for the human brain, but writing a foolproof, fully automated implementation isn’t as easy.

On top of that, the second option allows for more flexibility because it makes it possible to drop instances of other content types into the same list without running into styling hell, while at the same time ensuring semantic integrity. It wouldn’t be the first time I was asked to merge a news and event shortlist into one big list just because there wasn’t sufficient content to warrant separate lists. The second option would give you a smaller headache, especially if you’re nearing an important deadline.

Bottom line: try to minimize semantic deductions, and keep the code clear and simple. Pick unique class names for components, and stick with them throughout the entire project.

Don’t Mix Responsibilities

I know that many people like to mix wireframing, HTML and even design into one organic and homogeneous process. The downside to this is that you will have a hard time not compromising your work. When you’re designing, writing HTML and CSS is not priority number one; and once the design is done, you’ll find it tough to go back and rework your code to match HTML and CSS standards.

It’s also refreshing to try to build a website based purely on a set of wireframes, without the slightest notion of design. It helps you focus on meaning and makes it easier to spot components that are actually the same but could differ wildly design-wise. And if you’ve done it right, you’ll find that during CSS development, you don’t have to adapt the HTML at all, unless the design calls for major structural changes.

Try to build your HTML templates based on wireframes, and save the design and CSS for when your static HTML templates are completed.

Automate Your Job

Automation is a major key to success. Whether you use existing tools (such as a CMS) or build your own (as we do), automating the job of building static templates could help you to define a component once and reuse the code everywhere that the component is featured in your templates. The process itself (when done right) ensures semantic consistency and is sure to bring you new insight when constructing HTML templates.

At my current job, we build such a tool based on components (recurring HTML code blocks) and schemes (outlines of each template that refer to these components). Thrown in some simple program logic (if and loop statements, parameters) and allow for proper nesting methods, and you’re good to go.

Semantic Consistency Across Projects

Finally, keep a list of components you’ve made over multiple projects. Many components will be relevant for each new project and will be semantically identical, meaning that the HTML structure should be identical just as well (save some wrappers for visual CSS trickery, if you’re into that).

Once you have such a list of components, starting up a new project will be a lot faster, and you’ll have the added benefit of semantic consistency across all of your projects.

Banana ≠ Curvy Yellow Fruit

Semantics is all about identifying objects, but it goes beyond simply slapping a label on every object that comes your way. If you have a blog, and you randomly throw around classes like article, story, blogpost and news, then your website will lack semantic consistency, making all your hard work amount to very little. Semantics have no point when they are not applied consistently, even though today’s technology does very little with them — which, by the way, is no surprise given that locating a simple “product” on most Web stores is nearly impossible these days.

Screenshot
People looking for bananas might think twice before buying these.

The next time you begin a project, try to view a Web page as a collection of building blocks. Start by constructing these building blocks first, and worry about building the pages later. Come up with a single label for an HTML component, and use it consistently across your website. It won’t make styling harder, and it won’t affect the way you write JavaScript. Over time, you can take it further by being semantically consistent over multiple projects.

If your main job is to develop static HTML templates, try to automate your work. You’ll find that you spend more time writing flexible and solid HTML structures and less time copying and adapting code from point A to point B. It makes your job more interesting and makes the Web a better and more meaningful place.

Further Resources

  • Microformats22
    Summarizes the microformats ideology. Read more about using class names as semantic aids.
  • HTML5 Microdata23
    Explains how HTML5 is built to standardize the use of flexible semantics.

(al)

Footnotes

  1. 1 http://dev.w3.org/html5/md/
  2. 2 http://www.bbc.co.uk/
  3. 3 http://www.nytimes.com/
  4. 4 http://www.cnn.com/
  5. 5 http://www.msn.com/
  6. 6 http://www.amazon.com/
  7. 7 http://store.apple.com/
  8. 8 http://www.play.com/
  9. 9 http://www.yesasia.com/
  10. 10 http://www.youtube.com/
  11. 11 http://www.vimeo.com/
  12. 12 http://www.dailymotion.com/
  13. 13 http://www.ebaumsworld.com/
  14. 14 http://www.facebook.com/
  15. 15 http://www.last.fm/
  16. 16 http://www.virb.com/
  17. 17 http://www.twitter.com/
  18. 18 http://www.zeldman.com/
  19. 19 http://www.alistapart.com/
  20. 20 http://www.meiert.com/
  21. 21 http://www.webaim.com/
  22. 22 http://microformats.org/about
  23. 23 http://dev.w3.org/html5/md/

↑ Back to topShare on Twitter

Niels Matthijs spends his spare time combining his luxury life with the agonizing pressure of blogging under his Onderhond moniker. As a front-end developer he is raised at Internet Architects, investing plenty of time in making the web a more accessible and pleasant place.

Advertising

Note: Our rating-system has caused errors, so it's disabled at the moment. It will be back the moment the problem has been resolved. We're very sorry. Happy Holidays!

  1. 1

    Great Article…

  2. 2

    Really helpful tips, thanks alot.

  3. 3

    Very Useful…
    Thanks

  4. 4
  5. 5

    That’s Yellow Fatty Beans BTW… :)

  6. 6

    I love the “curved yellow fruit” a lot!
    Much better than a simple, standart “banana”… no?

  7. 7

    Great tips. Semantics is so important in development and I’m glad it’s finally becoming the norm in code :)

  8. 8

    I have to say, I was a bit skeptical as I read through your article, not really seeing the point of enforcing such strict semantics in class names and IDs… but by the end, I had considered how some of my own lack of semantics has caused me problems in the past, and how this style might have made things a bit easier.

    Thanks for giving us some practical, use-it-today advice as well as some previews of what HTML 5 will bring!

  9. 9

    Quite a different post for smashing but a really good and useful read.

  10. 10

    Great article! I agree with you about approaching a site as components rather than pages… Thanks for the tips.

  11. 11

    The problems with semantics is that there’s only little use for it at the moment. Even when using consistent semantics within one site there’s no guarantee that a crawler will be able to extract the correct semantics. Simply because he doesn’t know what you had in mind when creating it. There might be cultural differences, too. Let’s say you want to find a review of a movie. The article tags won’t help much since also previews, trivia or what ever is also wrapped in article tags. The movie name itself will be extracted from the content. So the only thing we won is, that the search results only contain articles (if and only if all pages use them). This is for sure a more accurate results than now, but my point is that just because people use consistent semantics withtin their page doesn’t necessarely mean than the system as a whole works better.

  12. 12

    I even think that semantic is not only important for developers but also for designers. If designer focus more on re-usable components it will help to make the designs more consistent and flexible.

  13. 13

    This is true of course, but it’s a bit like the chicken and the egg. Without consistent semantics crawlers have no place to start from. So I believe it’s better to just lay an egg and hope that the chicken will hatch.

    Going back to the movie review example, if crawlers could just include (and give preference) to article.review sections found in pages we could make serious leaps forward. The problem remains that classnames are free to choose, but “review” is a pretty common and logical label. The chance of cross-site consistency is quite big (or at least, that’s what you would expect, the table above shows it’s not always the case).

    I guess the main point I’m trying to make here is that semantic consistency on the
    web is impossible to achieve if we can’t even make it happen on one site (or even one page). Apart from the semantic value there are other minor benefits, so there’s really no reason not to do it :)

  14. 14

    It’s funny you mention the chicken vs. egg thing, because British scientists recently came up with the definitive answer to that: the chicken was first.

    Following that line of thought, crawlers would have to take the first steps and start paying attention to semantic classnames. And that happens to be how I think about it.

    I agree with the main point of your article (semantic consistency on the
    web is impossible to achieve if we can’t even make it happen on one site), but I’m pretty sure it’s impossible to enforce consistent semantics over multiple sites unless it’s defined in a specification.

    Note that article.review is not guaranteed to be a movie review; it could be a review of a hotel, or a restaurant, etc… as well. Perhaps the parent element could/should indicate this with another classname? Or would one big wrapper e.g. article.movies or section.movies suffice? See, if there was a specification for these simplified microformats you’re proposing, I wouldn’t be having these questions.

  15. 15

    If it’s a movie or a hotel review is determined by the content. Current search engines do a good job in finding relevant content. So that won’t be much of a problem I think.

  16. 16

    In an ideal world crawlers should get things rolling, but then there’s that thing about Mohammed and the mountain. If they don’t think it’s worth supporting, maybe we should make it worth supporting first.

    As for the movie review remark, you’re right of course. Though it would only be useful if you’re searching with the term “movie” explicitly added to your search query (or could it be that we could search filtering on content types in say … 5 years time?). Mostly I just search for “-title movie- review” as in most cases the movie title is unique enough to eliminate any hotel and car reviews. :)

    I completely agree on the simplified specs for microformats too. Defining them can’t be too hard, having enough authority as to be noticed and respected by the web development community as a whole is something entirely different I’m afraid. That said, if such a list should exist crawlers would have an easy time taking into account these specific classes.

    Let’s hope this article can kick-start something.

  17. 17

    Very useful indeed. Thanks!!

  18. 18

    I’m a strong advocate of constructing sites with building blocks, like you suggest. When I start a new site, I use one of two custom systems (one Rails app template, one WordPress theme) based the idea that each piece of discrete data (a blog post, for instance) can be rendered with one of several templates: as an “entry” (individually as a full page), a “piece” of a collection (like a list or grid), or an “atom” (icon + link). “Pieces” are rendered as either list “items” (thin and horizontal) or “blocks” (a grid cell, rectangular). CSS with multiple classes then singles them out, so .movie.block is a movie rendered in a grid, or .movie.item as a movie in a list. This technique make initial development much, much faster; tasks that would take days are done in a matter of hours, and the game afterwards is one of customization. But I’d never thought of this as a means to implement a semantic model. Thanks!

  19. 19

    I really liked the idea of this article, but I found that where I’m coming from I wasn’t really sure what/why/how enough.. there were not enough clearly explained examples for me to really understand what the benefits were supposed to be or how to implement them. I think it was aimed at people who have already played with the concepts in depth, because to me I was left a bit perplexed still.

  20. 20

    I’m confused by the premise of this article. It seems like you’re talking about two things:

    1. Global semantics, where a crawler/search engine recognizes markup conventions used across multiple sites in order to aggregate data.

    2. Local semantics, where you use conventions in your markup to better label its content.

    The problem with global semantics (the example with movie reviews) is that they will be gamed if gaming them has any value. If search engines implement hreview, so will spammers. Without a trusted middleperson, it becomes meta@name=”keywords” all over again.

    The problem with local semantics is that they’re constraining. The data structure you set up for one section might not work in another and probably won’t work in the one your designer hands you tomorrow. It’s possible, but very difficult to predict the future.

  21. 21

    Am I alone in thinking “duh!”? Maybe 5 years of doing this classifies as “experienced” enough to consider this routine, but I still feel like an uber novice. As I was going through learning this stuff five years ago, these concepts were drilled into my workflow everywhere I read. I guess I was hoping for more. :

  22. 22

    Thinking “duh” is great!

    But I do hope this article illustrates that not everyone is at that point yet. “Not everyone” being a serious understatement here. The data clearly shows that big players (and even leading web development bloggers) don’t apply the theory in real life.

    Wouldn’t it be great if in a couple of years time everyone is thinking “duh” and this article has become completely obsolete.

  23. 23

    Great points – thank-you!

  24. 24

    I do agree with you. Writing semantic code is a good practice and makes the mark-up logical and structured.

  25. 25

    Really useful to me – I want to start with HTML5 ASAP as that sounds like a giant leap in the right direction, and makes more semantic sense ;-) Cheers

  26. 26

    Songkick looks great. If they can dsnuiigtish their service from the already strong Last.fm events offering then it will be a very powerful service. I think the most exciting thing about Songkick is the thinking and theory behind what they are doing. There are obviously some very talented people in the Songkick team. Highlights such as their Battle Of The Bands’ and their blog plugin Bandsense’ show glimpses of their full potential and the powers of today’s emerging openmediaweb.

  27. 27

    Wow! Talk about a posting knocikng my socks off!

↑ Back to top