Responsive Image Container: A Way Forward For Responsive Images?


The aim of republishing the original article by Yoav is to raise awareness and support the discussion about solutions for responsive images. We look forward to your opinions and thoughts in the comments section! – Ed.

It’s been a year since I last wrote about it, but the dream of a “magical” image format that will solve world hunger and/or the responsive images problem (whichever comes first) lives on. A few weeks back, I started wondering if such an image format could be used to solve both the art direction and resolution-switching use cases.

I had a few ideas on how this could be done, so I created a prototype to prove its feasibility. The prototype is now available, ready to be tinkered with. In this post, I’ll explain what this prototype does, what it cannot do, how it works, and its advantages and disadvantages relative to markup solutions. I’ll also try to de-unicorn the concept of a responsive-image format and make it more tangible and less magical.

“Got Something Against Markup Solutions?”

No, I don’t! Honestly! Some of my best friends are markup solutions.

I’ve been participating in the Responsive Images Community Group for a while now, prototyping, promoting and presenting markup solutions. Current markup solutions (picture and srcset) are great and can cover all of the important use cases for responsive images. And if it was up to me, I’d vote to start shipping both picture and srcset (i.e. its resolution-switching version) in all browsers tomorrow.

But the overall markup-based solution has some flaws. Here’s some of the criticism I’ve been hearing for the last year or so when talking about markup solutions for responsive images.

Too Verbose

Markup solution are, by definition, verbose because they must enumerate all of the various resources. When art direction is involved, they must also list the breakpoints, which add to that verbosity.

Mixing Presentation and Content

A markup solution that is art direction-oriented needs to keep layout breakpoints in the markup. This mixes presentation and content and means that any layout changes would force changes to the markup.

Constructive discussions have taken place on how this can be resolved — particularly by bringing back the media query definitions into CSS — but it’s not certain when any of this will be defined and implemented.

Define Viewport-Based Breakpoints

This one is proposed often by developers. For performance reasons, markup-based solutions are based on viewport size, rather than on image dimensions. Because the layout dimensions of images are not yet known to the browser by the time it starts fetching images, it cannot rely on them to decide which resources to fetch.

This means that developers will need to store some sort of table of viewports and dimensions on the server side, or maintain one in their head, in order to create images that are ideally sized for certain viewport dimensions and layouts.

While the addition of a build step could resolve this issue in many cases, it can get complicated in cases where a single component is used over multiple pages, with varying dimensions on each.

Results in Excessive Downloading in Some Cases

OK, this one I hear mostly in my head (and from other Web performance freaks on occasion).

From a performance perspective, any solution that’s based on separate resources for different screen sizes and dimensions will require the entire images to be redownloaded if the screen size or dimensions change to a higher resolution. Because most of that image data will very likely already be in the browser’s memory or cache, having to redownload everything from scratch would be sad.

All of the above makes me wonder (again) how wonderful life would be if we had a solution based on a file format that addresses these issues.

Why Would A File Format Be Better?

A solution based on a file format would do better for the following reasons:

  • The burden is put on the image encoder. The markup would remain identical to what it is today: a single tag with a single resource.
  • Automatically converting websites to such a responsive-image solution might be easier because the automation layer would focus only on the images themselves, rather than on the page’s markup and layout.
  • Changes to image layout (as a result of changes to the viewport’s dimensions) could be handled by downloading only the difference between current image and the higher-resolution one, without having to redownload the data that the browser already has in memory.
  • Web developers would not need to maintain multiple versions of each image resource, although they would have to keep a non-responsive version of the image for the purpose of content negotiation.

This is my attempt at a simpler solution based on file format that would relieve Web developers of much grunt work and would avoid useless image data from having to be downloaded (even when conditions change), while keeping preloaders working.

Why Not Progressive JPEG?

Progressive JPEG could serve this role for the resolution-switching case, but it’s extremely rigid. It comes with strict limits on the lowest image quality and, from what I’ve seen, is often too data-heavy. Also, the minimal difference between resolutions is limited and doesn’t give enough control to encoders that want to do better. Furthermore, progressive JPEG cannot handle art direction at all.

What Would It Look Like?

We’re talking about a responsive image container, containing internal layers that could be WebP or JPEG-XR or any future format. It uses resizing and cropping operations to cover both the resolution-switching and the art direction use cases.

The decoder (i.e. the browser) would then be able to download just the number of layers it needs (and their bytes) in order to show a certain image. Each layer would enhance the layer before it, giving the decoder the data it needs to show it properly at a higher resolution.

How Does It Work?

  1. The encoder takes the original image, along with a description of the required output resolutions and, optionally, directives on art direction.
  2. It then outputs one layer per resolution that the final image should be perfectly rendered in.
  3. Each layer represents the difference in image data between the previous layer (when “stretched” on the current layer’s canvas) and the current layer’s “original” image. This way, the decoder can construct the layers one by one, each time using the previous layer to recreate the current one, creating a higher resolution image as it goes.

Support for resolution-switching is obvious in this case, but art direction could also be supported by positioning the previous layer on the current one and being able to give it certain dimensions. Let’s look at some examples.

Art Direction

Here’s a photo that is often used in discussions about the art direction use case:

Obama in a jeep factory - original with context

Let’s see what the smallest layer would look like:

Obama in a jeep factory - cropped to show only Obama

That’s just a cropped version of the original. Nothing special.

Now, one layer above that:

Obama in a jeep factory - some context + diff from previous layer

You can see that pixels that don’t appear in the previous layer are shown as normal, while pixels that do appear there contain only the difference between them and the equivalent ones in the previous layer.

And here’s the third, final layer:

Obama in a jeep factory - full context + diff from previous layer


A high-resolution photo of a fruit:

iPhone - original resolution

Here is the first layer, showing a significantly downsized version:

iPhone - significantly<br />

The second layer shows the difference between a medium-sized version and the
“stretched” previous layer:

iPhone - medium sized<br />

The third layer shows the difference between the original and the “stretched” previous layer:

iPhone - full sized<br />

If you’re interested in more detail, you can go to the repository. More detail on the container’s structure is also there.

“But I Need More From Art Direction”

I’ve seen cases where rotation and image repositioning are required for art direction, usually in order to add a logo at different locations around the image itself, depending on the viewport’s dimensions.

This use case is probably better served by CSS. CSS transforms can handle rotation, while CSS positioning, along with media-specific background images, could probably handle the rest.

Note: If your art direction is special and can’t be handled by either one of these, I’d love to hear about it.

How Is It Fetched?

This is where things get tricky. A special fetching mechanism must be created to fetch this type of image. I can’t say that I have figured that part out, but here’s my rough idea on how it could work.

My proposed mechanism relies on HTTP ranges, similar to the fetching mechanisms of the <video> element, when seeks are involved.

More specifically:

  • Resources that should be fetched progressively should be flagged as such. One possibility is to add a progressive attribute to the element that describes the resource.
  • Once the browser detects an image resource with a progressive attribute on it, it picks the initial requested range for that resource. The initial range request could be:
    • a relatively small fixed range for all images (like 8 KB);
    • specified by the author (for example, as a value of the progressive attribute);
    • some heuristic;
    • based on a manifest (which we’ll get to later).
  • The browser can fetch this initial range at the same time that it requests the entire resource today, or even sooner, because the chances of starving critical path resources (including CSS and JavaScript) are slimmer once the payloads are of a known size.
  • Once the browser has downloaded the image’s initial range, it has the file’s offset table box, which links byte offset to resolution. This means that once the browser has calculated the page’s layout, it will know exactly which byte range it needs in order to display the image correctly.
  • Assuming that the browser sees fit, it could heuristically fetch follow-up layers (i.e. of higher resolutions) even before it knows for certain that they are needed.
  • Once the browser has the page’s layout, it can complete fetching all of the required image layers.

The mechanism above will increase the number of HTTP requests, which in an HTTP/1.1 world would introduce some delay in many cases.

This mechanism may be optimized by defining a manifest that describes the byte ranges of the image resources to the browser. The idea of adding a manifest was proposed by Cyril Concolato at the W3C’s Technical Plenary / Advisory Committee meetings week last year, and it makes a lot of sense, borrowing from our collective experience with video streaming. It enables browsers to avoid fetching an arbitrary initial range (at least once the manifest is downloaded itself).

Adding a manifest will prevent these extra requests for everything requested after the layout, and it might help to prevent them (using heuristics) even before the layout.

Creating a manifest could be easily delegated either to development tools or to the server-side layer, so that developers don’t have to manually deal with these image-specific details.

“Couldn’t We Simply Reset the Connection?”

In theory, we could address this by fetching the entire image, and then reset the connection once the browser has all the necessary data, but that would most likely introduce serious performance issues.

Here are the problems with reseting a TCP connection during a browsing session:

  • It terminates an already connected, warmed-up TCP connection, whose set-up had a significant performance cost and which could have been reused for future resources.
  • It sends at least a round-trip time’s worth of data down the pipe, the time it takes for the browser’s reset to reach the server. That data is never read by the browser, which means wasted bandwidth and slower loading times.

Downsides To This Approach

There are a few downsides to this approach:

  • It involves touching and modifying many pieces of the browser stack, which means that standardization and implementation could be painful and take a while.
  • The monochrome and print use case cannot be addressed by this type of a solution.
  • The decoding algorithm involves per-layer upscaling, which could be processing-heavy. Therefore, decoding performance could be an issue. Moving this to the GPU might help, but I don’t know that area well enough to judge. If you have an opinion on the subject, I’d appreciate your comments.
  • Introducing a new file format is a long process. As we have seen with the introduction of previous image formats, the lack of a client-side mechanism makes this a painful process for Web developers. Because new file formats start out being supported in some browsers but not others, a server-side mechanism must be used (hopefully based on the Accept header, rather than on the User-Agent header). I’m hoping that this new file format’s simplicity and reliance on other file formats to do the heavy lifting help here, but I’m not sure they would.
  • As discussed, it would likely increase the number of requests, and could introduce some delay in HTTP/1.1.
  • This solution cannot address the need for “pixel-perfect” images, which is mainly needed to improve decoding speed. Even if it did, it’s not certain that decoding speed would benefit from it.
  • Relying on HTTP ranges for the fetching mechanism could result in some problem with intermediate cache servers, which don’t support it.

So, Should We Dump Markup-Based Solutions?

Not at all. This is a prototype, showing how most of the responsive-image use cases would have been solved by such a container.

Reaching consensus on this solution, defining it in detail and implementing it in an interoperable way could be a long process. The performance implications on HTTP/1.1 websites and decoding speed still need to be explored.

I believe this might be a way to simplify responsive images in the future, but I don’t think we should wait for the ideal solution.

To Sum Up

If you’ve just skipped to here, that’s OK. This is a long post.

To sum it up, I’ve demonstrated (along with a prototype) how a responsive-image format could work and how it could resolve most responsive-image use cases. I also went into some detail about which other bits would have to be added to the solution in order to make it a viable solution.

I consider this to be a long-term solution because some key issues need to be addressed before it can be made practical. In my opinion, the main issue is decoding performance, with the impact of downloading performance on HTTP/1.1 being a close second.

Continuing to explore this option is worthwhile, but let’s not wait for it. Responsive images need an in-browser, real-life solution two years ago today, not two years from now.

(al, ea, il)

↑ Back to top

Yoav Weiss is a developer that likes to get his hands dirty fiddling with various layers of the Web platform stack. Constantly striving towards a faster Web, he's trying to make the world a better place, one Web performance issue at a time. He recently prototyped the picture element in a Chromium build as part of the Responsive Images Community Group. You can follow his rants on Twitter or have a peek at his latest prototypes on Github.

  1. 1

    It would be nice to have a standard that web browsers simply looked for the correct image, such as the default image being src=”image.png” the browser could look for alternate version automatically src=”image.dpi72.png” src=”image.dpi300.png” src=”image.dpi600.png”

    But from real word experience a lot of people use CMS to add images and these people do not have skills or the tools to re-size images, Iv seen 4000px wide photos as 300px wide is typical of sorting out client problems. Though of course the higher end website those with a budget would be-able to get someone who is capable at producing scaled image options.

    So the end solution would be to have a file that would be scaled on the server by say PHP for example just like PHP detecting if the browser is a mobile, tablet or desktop.

    • 2

      A good server based solution is . The only problem with this is that it just scales the image rather then crafting a different aspect ratio depending on screensize.

      • 3

        With Pixtulate you can define focal points and no-crop boundaries to achieve art direction in both the scaling and cropping of responsive images. This approach will also adapt the image to any desired new aspect ratio in your design if you so wish.

        I am not really sure a brand new file format is really needed here but I agree with Yoav that current markup based approaches seem very verbose…including the newly proposed srcset and . Any new standard or approach should really allow designers to do less, and not more, and still accomplish the desired effect. In my opinion, the adaptation of the image to the device screen size or resolution is a content negotiation problem which should be extend to include images. The basic protocols and mechanisms already exist (HTTP, HTML, URI) we just need to standardize their use for responsive images.

        • 4

          That looks like an interesting solution, it would be great if some popular CDNs would implement this kind of feature. It would make the move so much easier for people who already rely on CDN based delivery.

  2. 5

    Why don’t we have the option where by we target the users device dimensions to determine whether it is mobile, tablet or desktop. Based on that we can tell the browser which image size to download. Because downloading a whole image and then cropping it to whatever size is not mobile-friendly. I don’t like downloading resources and in this case images that I will not use or are too large. I think we should focus on building intelligent browsers that have the ability to know what image size to download.

    In orther words, assuming we are able to determine what device is accessing our website. The browser then loads all images based on the device class (mobile, tablet, desktop). Which means on the server we would have 3 images.

  3. 6

    I am very intrigued by your proposal here, and it shows much promise. But there are some things that worry me.

    First off, the picture you’ve taken as an example is simple in content if I could say so – basically the only important bit there is the individual as he conveys the entire message of the image itself. So cropping to the smallest layer is easy as repetitive features such as his legs and the background could be cropped with virtually no loss of information – but imagine you have a photo of a group of objects all of which share equal value relative to the image itself. In your best-case scenario you would have a compressed group picture such as some tourists smiling which would still give you a headache and a very constrained ‘smallest layer’ dimension. Worst-case? Imagine a picture of the stars with labels…

    Now my second issue is with the ‘fruit’ picture you’ve shown. Now, I myself am quite new to the industry and I might be totally wrong on this, but I do not understand why don’t we use SVGs or other vector formats. They’re lightweight, mathematical and resize just about perfectly. Anytime you would have to put a logo, some iconography or something of this nature, why don’t we use vector images?

    Thanks for the article! Great read.

    • 7

      You can understand why we don’t use vectro formats if you trace one landscape image at highest possible fidelity in Adobe Illustrator :)

      • 8

        Are you serious?

        I think you misunderstood my previous claim regarding the usage of vector images. I think I clearly specified using vector art on logos, icons or anything of the nature. I did not say it would be better to trace images on a high-fidelity setting.

        And truly speaking, you can use vectors – most likely SVG images – on a large amount of content. You could render mathematical graphs and other constructs (think circles, polygons), you could render your logos and icons, you could render simple UI with it and I’m sure there’s a lot more than that.

        And yes, there are some downsides to SVGs, namely the fact that IE8 does not support it. Some people seem to dislike copying and pasting SVG nodes directly to HTML – but there’s an easy solution for that – just encode it to a base64 data uri and use HTML to display that (though that seems to be slower in performance sometimes).

  4. 9

    Really interesting article.

    I’ve long thought about responsive, ‘progressive’ loading of images…
    We commonly see progressively loaded PNGs on websites; the decoding technology is there, the difficult part is trying to stop loading once you’ve loaded the image to optimum fidelity. As you mentioned in your post, ‘cancelling’ the download would require another HTTP request and would also waste a little bandwidth. I’d love to see some sort of file type/protocol in the near future because we’re catering for a seemingly endless array of screen resolutions!

    For now, I’m going to stick to my own Javascript rendering solution but look forward to seeing how things develop!

    • 10

      I agree that the difficult part is “stopping” the download, or better yet, avoid asking for parts of the image you do not need to begin with.

      I believe the key here is to use HTTP ranges as a fetching mechanism for these images.
      In today’s HTTP/1.1, this will probably introduce some delay. I’ve looked into ways to diminish that delay (downloading a manifest, heuristically downloading follow-up ranges, etc), but it’d probably not be zero.

      When using SPDY/HTTP2, this delay will disappear, and using ranges will have virtually no cost. That will enable the browser to stop downloading when the number of bytes is sufficient, download images progressively across the entire page (giving a significantly better experience), and other optimizations of that sort.

      So, I agree it is hard, but it will get easier.

  5. 11

    Got Something Against Markup Solutions? Yes. Clients.

    The boom of clients managing content increased drag/drop simplicity means clients expect increased ease and minimal … not more decisions and work. I think some developers are forgetting how important the client is in maintaining websites these days.

    When 4K displays become common, all these hacks will be embaressing in the same way trying to make a website page load under 30Kb would be, now that most javascript libraries used these days are multiple times that.

    To me, a 4K future spells better broadband and as such, I will stick to larger images and scaling them through the trusty single img src. At least in the future, processing power of devices will likely be able to automatically deliver the approriate image for a user rather than make the client manage multiple versions (and even if servers manage multiple versions, this goes against the efficiency and eco responsability that website hosting has been taking in recent years).

  6. 12

    JPEG2000 is a responsive image file format by design (RLCP packet ordering can mimic hierarchical JPEG), tiles could be used to implement art direction. JPEG2000 is not widely used but it has way more advanced features than plain old JPEG like ROI (Region Of Interest) and offers better compression, it has relatively good support (Adobe Photoshop, Apple OS X…). Why fiddle with WebP or JPEG-XR when JPEG2000 could be an answer on its own?
    The JPEG committee was aware of the design flaws and weaknesses of the original JPEG, that’s why they continued their work and proposed new standards: the lossless mode of the original JPEG has evolved in JPEG-LS, JPEG2000 offered more flexibility and better compression.
    JPEG large predominance is a real problem since it slows down to a crawl (or should I say halt?) adoption of newer formats. One of the for-coming JPEG standard, JPEG-XT (ISO 18477-1), is mostly based on the original JPEG standard in the hope it will foster its adoption.
    So far the web has overlooked JPEG2000 in my opinion this “historical error” should be corrected, the world should catch up this technology gap and take benefit from the advanced features JPEG2000 offers.

    • 13

      Hey Frédéric, thanks for the useful information. I much prefer this method of using a single file reference as a container (one image rather than set of images and/or scripts) from which to get live, responsive information … having had a quick look, it does seem there has been some traction in server-side image delivery supporting the JPEG2000 format.

      I can see how many front-end developers and the HTML5 spec want to find a solution to live-publishing of images, but this is by far the best solution I have come across.

    • 14

      JPEG2K had 13 years to catch on. It didn’t.

      AFAIK, the reasons were patents (typical for image formats), but most importantly decoding complexity and lack of real advantage in term of compression.

      Internal tests I ran in 2004 showed very little to negative file size improvements for Web images using JPEG2K. OTOH, WebP/JPEGXR offer 30-40% improvement for Web images.

      I agree that many of the things I tried to show with this container could’ve been done by JPEG2K, if it was supported everywhere. But after 13 years I’ve lost faith that this is going to happen. What I offer here is a much lighter alternative.

  7. 15

    On a related note, one issue that I have been thinking about recently is the way browsers render images, especially when it comes to fluid layouts like one might apply to responsive sites.

    Conventional wisdom says that you should specify image dimensions to allow the browser to layout the page before the images are downloaded. If you don’t, the browser loads them in as 0px * 0px and then reflows and repaints once the image, causing a delay in page load. If you’ve got a lot of images on the page, for example a product category page, this delay can be significant.

    However, therein lies the rub. Our new responsive site uses media queries to specify different image sources for our product category page (the content for which is loaded in dynamically post-load) at two major break-points, but specifies the images as % width/height so that the page is fluid and scales well between the breakpoints. It looks great, but we do get the repaints as the images load in.

    I’ve thought about taking the section of the page out of the render tree until the images are loaded and then add it back in, which would only cause one repaint, but I then lose the progressive loading. I’ve thought about giving them all a fixed width to start with and then setting them all to % in one go. I’ve thought of setting the fixed dimensions using JS when the content is dynamically loaded and just keeping them fixed width.

    None of these are tested yet, so don’t know which will perform better. Has anyone seen any good solutions to this?


Leave a Comment

Yay! You've decided to leave a comment. That's fantastic! Please keep in mind that comments are moderated and rel="nofollow" is in use. So, please do not use a spammy keyword or a domain as your name, or else it will be deleted. Let's have a personal and meaningful conversation instead. Thanks for dropping by!

↑ Back to top