Cache Invalidation Strategies With Varnish Cache

Advertisement

Phil Karlton once said, “There are only two hard things in Computer Science: cache invalidation and naming things.” This article is about the harder of these two: cache invalidation. It’s directed at readers who already work with Varnish Cache. To learn more about it, you’ll find background information in “Speed Up Your Mobile Website With Varnish1.”

10 microseconds (or 250 milliseconds): That’s the difference between delivering a cache hit and delivering a cache miss. How often you get the latter will depend on the efficiency of the cache — this is known as the “hit rate.” A cache miss depends on two factors: the volume of traffic and the average time to live (TTL), which is a number indicating how long the cache is allowed to keep an object. As system administrators and developers, we can’t do much about the traffic, but we can influence the TTL.

However, to have a high TTL, we need to be able to invalidate objects from the cache so that we avoid serving stale content. With Varnish Cache, there are myriad ways to do this. We’ll explore the most common ways and how to deploy them.

Varnish does a whole lot of other stuff as well, but its caching services are most popular. Caches speed up Web services by serving cached static content. When Varnish Cache is delivering a cache hit, it usually just dumps a chunk of memory into a socket. Varnish Cache is so fast that, on modern hardware, we actually measure response time in microseconds!


Caching isn’t always as simple as we think; a few gotchas and problems may take quite some of our time to master. (Image credits2)

When using a cache, you need to know when to evict content from the cache. If you have no way to evict content, then you would rely on the cache to time-out the object after a predetermined amount of time. This is one method, but hardly the most optimal solution. The best way would be to let Varnish Cache keep the object in memory forever (mostly) and then tell the object when to refresh. Let’s go into detail on how to achieve this.

HTTP Purging

HTTP Purging is the most straightforward of these methods. Instead of sending a GET /url to Varnish, you would send PURGE /url. Varnish would then discard that object from the cache. Add an access control list to Varnish so that not just anyone can purge objects from your cache; other than that, though, you’re home free.

acl purge {

	"localhost";


	"192.168.55.0"/24;

}



sub vcl_recv {

	# allow PURGE from localhost and 192.168.55...



	if (req.request == "PURGE") {

		if (!client.ip ~ purge) {

			error 405 "Not allowed.";


		}

		return (lookup);

	}
}



sub vcl_hit {

	if (req.request == "PURGE") {

		purge;

		error 200 "Purged.";

	}

}



sub vcl_miss {

	if (req.request == "PURGE") {

		purge;

		error 200 "Purged.";

	}

}

Shortcomings of Purging

HTTP purging falls short when a piece of content has a complex relationship to the URLs it appears on. A news article, for instance, might show up on a number of URLs. The article might have a desktop view and a mobile view, and it might show up on a section page and on the front page. Therefore, you would have to either get the content management system to keep track of all of these manifestations or let Varnish do it for you. To let Varnish do it, you would use bans, which we’ll get into now.

Bans

A ban is a feature specific to Varnish and one that is frequently misunderstood. It enables you to ban Varnish from serving certain content in memory, forcing Varnish to fetch new versions of these pages.

An interesting aspect is how you specify which pages to ban. Varnish has a language that provides quite a bit of flexibility. You could tell Varnish to ban by giving the ban command in the command-line interface, typically connecting to it with varnishadm. You could also do it through the Varnish configuration language (VCL), which provides a simple way to implement HTTP-based banning.

Let’s start with an example. Suppose we need to purge our website of all images.

> ban obj.http.content-type ~ “^image/”

The result of this is that, for all objects in memory, the HTTP response header Content-Type would match the regular expression ^image/, which would invalidate immediately.

Here’s what happens in Varnish. First, the ban command puts the ban on the “ban list.” When this command is on the ban list, every cache hit that serves an object older than the ban itself will start to look at the ban list and compare the object to the bans on the list. If the object matches, then Varnish kills it and fetches a newer one. If the object doesn’t match, then Varnish will make a note of it so that it does not check again.

Let’s build on our example. Now, we’ll only ban images that are placed somewhere in the /feature URL. Note the logical “and” operator, &&.

> ban obj.http.content-type ~ “^image/” && req.url ~ “^/feature”

You’ll notice that it says obj.http.content-type and req.url. In the first part of the ban, we refer to an attribute of an object stored in Varnish. In the latter, we refer to a part of a request for an object. This might be a bit unconventional, but you can actually use attributes on the request to invalidate objects in cache. Now, req.url isn’t normally stored in the object, so referring to the request is the only thing we can do here. You could use this to do crazy things, like ban everything being requested by a particular client’s IP address, or ban everything being requested by the Chromium browser. As these requests hit Varnish, objects are invalidated and refreshed from the originating server.

Issuing bans that depend on the request opens up some interesting possibilities. However, there is one downside to the process: A very long list of bans could slow down content delivery.

There is a worker thread assigned to the task of shortening the list of bans, “the ban lurker”. The ban lurker tries to match a ban against applicable objects. When a ban has been matched against all objects older than itself, it is discarded.

As the ban lurker iterates through the bans, it doesn’t have an HTTP request that it is trying to serve. So, any bans that rely on data from the request cannot be tested by the ban lurker. To keep ban performance up, then, we would recommend not using request data in the bans. If you need to ban something that is typically in the request, like the URL, you can copy the data from the request to the object in vcl_fetch, like this:

set beresp.http.x-url = req.url;

Now, you’ll be able to use bans on obj.http.x-url. Remember that the beresp objects turn into obj as it gets stored in cache.

Tagging Content For Bans

Bans are often a lot more effective when you give Varnish a bit of help. If the object has an X-Article-id header, then you don’t need to know all of the URLs that the object is presented as.

For pages that depend on several objects, you could have the content management system add an X-depends-on header. Here, you’d list the objects that should trigger an update of the current document. To take our news website again, you might use this to list all articles mentioned on the front page:

X-depends-on: 3483 4376 32095 28372

Naturally, then, if one of the articles changes, you would issue a ban, like this:

ban obj.http.x-depends-on ~ “\D4376\D”

This is potentially very powerful. Imagine making the database issue these invalidation requests through triggers, thus eliminating the need to change the middleware layer. Neat, eh?

Graceful Cache Invalidations

Imagine purging something from Varnish and then the origin server that was supposed to replace the content suddenly crashes. You’ve just thrown away your only workable copy of the content. What have you done?! Turns out that quite a few content management systems crash on a regular basis.

Ideally, we would want to put the object in a third state — to invalidate it on the condition that we’re able to get some new content. This third state exists in Varnish: It is called “grace,” and it is used with TTL-based invalidations. After an object expires, it is kept in memory in case the back-end server crashes. If Varnish can’t talk to the back end, then it checks to see whether any graced objects match, and it serves those instead.

One Varnish module (or VMOD), named softpurge, allows you to invalidate an object by putting it into the grace state. Using it is simple. Just replace the PURGE VCL with the VCL that uses the softpurge VMOD.

import softpurge;
sub vcl_hit {
	if (req.method == "PURGE") {

		softpurge.softpurge();
		error 200 “Successful softpurge”;

	}

}

sub vcl_miss {

	if (req.method == "PURGE) {

		softpurge.softpurge();

		error 200 "Successful softpurge";

	}

}

Distributing Cache Invalidations Events

All of the methods listed above describe the process of invalidating content on a single cache server. Most serious configurations would have more than one Varnish server. If you have two, which should give enough oomph for most websites, then you would want to issue one invalidation event for each server. However, if you have 20 or 30 Varnish servers, then you really wouldn’t want to bog down the application by having it loop through a huge list of servers.

Instead, you would want a single API end point to which you can send your purges, having it distribute the invalidation event to all of your Varnish servers. For reference, here is a very simple invalidation service written in shell script. It will listen on port 2000 and invalidate URLs to three different servers (alfa, beta and gamma) using cURL.

nc -l 2000 | while true
	do read url
	for srv in “alfa” “beta” “gamma”
		do curl -m 2 -x $srv -X PURGE $url
	done
done

It might not be suitable for production because the error handling leaves something to be desired!

Cache invalidation is almost as important as caching. Therefore, having a sound strategy for invalidating the content is crucial to maintaining high performance and having a high cache-hit ratio. If you maintain a high hit rate, then you’ll need fewer servers and will have happier users and probably less downtime. With this, you’re hopefully more comfortable using tools like these to get stale content out of your cache.

(al, ml, il)

Footnotes

  1. 1 http://www.smashingmagazine.com/2013/12/04/speed-up-your-mobile-website-with-varnish/
  2. 2 https://twitter.com/varnishcache

↑ Back to topShare on Twitter

Per Buer is the CTO and founder of Varnish Software, the company behind the open source project Varnish Cache. Buer is a former programmer turned sysadmin, then manager turned entrepreneur. Runs, cross country skis and tries to keep his two boys from tearing down the house.

Advertising
  1. 1

    Bjørn Johansen

    April 23, 2014 2:00 pm

    Thanks a lot for this excellent article, Per!

    I didn’t know that Tagging Content For Bans existed, but been missing it many times. No more banning of all pages just because one article changed. Yay :-)

    2
    • 2

      Thanks for this great article !

      And what about the grace with ban method ? Is it possible to “softban” in varnish 4.x ?

      Thank you !

      1
      • 3

        Sebastien,

        Bans have not changed in 4.0, really. Soft purges are still available as a module, but there are no soft bans. Personally, I would like to see bans and purges being “soft” per default in the next major release. Grace is such a useful feature so making the purges soft would be really nice.

        2
  2. 4

    Nice article! What would you have to say about using Varnish against GWAN? Especially when it comes to their benchmarks: http://gwan.com/benchmark ?

    0
    • 5

      Bjørn Johansen

      April 23, 2014 6:26 pm

      Forget about G-WAN. The benchmarks might look impressive, but benchmarks is all it’s good for. Ref.: http://tomoconnor.eu/blogish/gwan-snakeoil-beware/

      Here is a different, and shorter, explanation why you shouldn’t trust G-WAN:
      https://riccardo.forina.me/why-i-ll-never-trust-g-wan/

      I haven’t looked much into G-WAN myself, but it looks like most (all?) in-depth info on it is like these two.

      3
    • 6

      Hi Rob.

      We’re not really concerned with G-Wan. It seems rather niche and I’ve never seen a website actually use it. Some people seem to love though. Whatever rocks your boat. :-)
      Besides, with only trivial tuning (increasing the thread count and tuning the TCP stack) we typically get Varnish to saturate a 10Gbe-link, so for 99.9% of our users we’re more than fast enough. So, even if G-Wan manages to be a bit faster it is hardly noticeable. I believe the functionality and the fact that Varnish is pretty battle tested ads quite a bit of value.

      1
  3. 7

    Varnish is very good at what it does, if you are using a word press website I recommend adding the below to prevent invalidation when upgrading WP/Plugins

    backend default {
    .host = “localhost”;
    .port = “8080″;
    .max_connections = 120;
    .connect_timeout = 10.0s;
    .first_byte_timeout = 5000s;
    .between_bytes_timeout = 600s;
    }

    0
  4. 8

    We’ll be donating the money (if we manage to figure out the paperwork) we get for the article to the newly minted LibreSSL project. I’m not 100% convinced they will succeed, but I hope the project will at some point be able to create something we can use to add SSL support (in some shape or form) to Varnish.

    2
    • 9

      Wow, that’s excellent news! Especially since I’m pretty sure I’ve read a statement from Poul-Henning Kamp that Varnish won’t ever get SSL support. I’d love to get rid of the SSL terminator I use in front.

      0
      • 10

        I think it is fair to say that we’ve vastly underestimated the complexities of (Open)SSL. The SSL-support in Varnish, when it arrives, will most likely be in the form of better integration with a SSL terminator, probably using the PROXY protocol. So, far proper PROXY support will, AFAIK, not have any significant downsides compared to native SSL support.

        As things are today, native SSL support in Varnish itself is not likely to happen before we modularize the frontend protocol support, which might not happen.

        0
  5. 11

    One other trick I’ve used is the combination of ESI and RESTful GETs to denote a
    list of the desired objects. On a large site with only a small subset of its content frequently changing (and this content being present on most pages, commonly banners/offers/upsells/etc), it dramatically reduces backend load to stuff those parts of pages into ESI blocks which are purged separately from the rest of the content. This also works great for serving user-specific content within mostly-static pages.

    ESI does run in sequence (rather than parallel on Varnish 3, so there’s a bit of slowdown (a ms rather than 20 us), but that just means a couple more CPUs on a varnish box, rather than 10x the backend & DB servers.

    0
  6. 12

    Nice article, Per — thanks for getting this in front of a big audience — the more people are aware, the faster the web will become :)

    I found it particularly interesting that you specifically called out tagging (“Tagging Content For Bans”) … because that is precisely what Drupal 8 will ship with!
    Drupal 8 sets “cache tags” on each “page part”: those cache tags that are associated with the things (entities) being depended upon to generate that “page part” (an article, a user, an image gallery, a comment…). As the page is assembled, the cache tags are bubbled up, which allows Drupal 8 to send a comprehensive X-Drupal-Cache-Tags header (see https://drupal.org/node/2222835 for details).
    Precisely what you’ve described with X-depends-on: 3483 4376 32095 28372, but slightly more advanced: X-Drupal-Cache-Tags: node:5 user:33 file:57 taxonomy_term:42 taxonomy_term:1337 .

    I have one (broad) question for you: how does this scale? Could you share numbers around that? Because surely, you will have more data and insight on this than anyone else!
    For example, which is the primary bottleneck: the number of tags per response (e.g. >100), or the number of unique tags across all responses cached by Varnish? (Say 10 million.) Or maybe even something else, such as a high ban frequency being problematic? (For example: Drupal 8 will require a ban for every new comment that is posted, and comment frequency varies greatly by site, of course.)

    Over at https://www.varnish-software.com/blog/advanced-cache-invalidation-strategies, you say:

    Hashtwo was developed earlier this year to solve the scalability issues we’ve seen with bans. Hashtwo maintains an additional hash, in addition to the main hash that links objects and content in Varnish. You can put whatever you want into the hash and it will maintain a many-to-many relationship between keys and objects.

    Can you share some performance numbers on how Hashtwo improves upon regular bans?

    P.S.: Smashing Magazine, previewing comments is broken!

    1
  7. 13

    Thanks for Brilliant Article!
    After reading Varnish User Guid I was not sure, how to invalidate dependant objects from backend (didn’t understand how ban-lurker works, whether growing ban list is reliable and which ban-rules exactly aren’t safe for ban-lurker).
    This article answers most of my questions. Thanks!
    Maybe You may consider link it from:
    https://www.varnish-cache.org/docs/trunk/users-guide/purging.html or

    0

Leave a Comment

Yay! You've decided to leave a comment. That's fantastic! Please keep in mind that comments are moderated and rel="nofollow" is in use. So, please do not use a spammy keyword or a domain as your name, or else it will be deleted. Let's have a personal and meaningful conversation instead. Thanks for dropping by!

↑ Back to top