Kristian Lyngstøl's Blog

Magic Grace

Posted on 2015-09-25

I was hacking together a JavaScript varnishstat implementation for a customer a few days ago when I noticed something strange. I have put Varnish in front of the agent delivering stats, but I'm only caching the statistics for 1 second.

But the cache hit rate was 100%.

And the stats were updating?

Logically speaking, how can you hit cache 100% of the time and still get fresh content all the time?

Enter Grace

Grace mode is a feature Varnish has had since version 2.0 back in 2008. It is a fairly simple mechanic: Add a little bit of extra cache duration to an object. This is the grace period. If a request is made for the object during that grace period, the object is updated and the cached copy is used while updating it.

This reduces the thundering horde problem when a large amount of users request recently expired content, and it can drastically improve user experience when updating content is expensive.

The big change that happened in Varnish 4 was background fetches.

Varnish uses a very simple thread model (so to speak). Essentially, each session is handled by one thread. In prior versions of Varnish, requests to the backend were always tied to a client request.

  • Thread 1: Accept request from client 1
  • Thread 1: Look up content in cache
  • Thread 1: Cache miss
  • Thread 1: Request content from web server
  • Thread 1: Block
  • Thread 1: Get content from web server
  • Thread 1: Respond

If the cache is empty, there isn't much of a reason NOT to do this. Grace mode always complicated this. What PHK did to solve this was, in my opinion, quite brilliant in its simplicity. Even if it was a trade-off.

With grace mode, you HAVE the content, you just need to make sure it's updated. It looked something like this:

  • Thread 1: Accept request from client 1
  • Thread 1: Look up content in cache
  • Thread 1: Cache miss
  • Thread 1: Request content from web server
  • Thread 1: Block
  • Thread 1: Get content from web server
  • Thread 1: Respond

So ... NO CHANGE. For a single client, you don't have grace mode in earlier Varnish versions.

But enter client number 2 (or 3, 4, 5...):

  • Thread 1: Accept request from client 1
  • Thread 1: Look up content in cache
  • Thread 1: Cache miss
  • Thread 1: Request content from web server
  • Thread 1: Block
  • Thread 2: Accept request from client 2
  • Thread 2: Look up content in cache
  • Thread 2: Cache hit - grace copy is now eligible - Respond
  • Thread 1: Get content from web server
  • Thread 1: Respond

So with Varnish 2 and 3, only the first client will block waiting for new content. This is still an issue, but it does the trick for the majority of use cases.

Background fetches!

Background fetches changed all this. It's more complicated in many ways, but from a grace perspective, it massively simplifies everything.

With Varnish 4 you get:

  • Thread 1: Accept request from client 1
  • Thread 1: Look up content in cache
  • Thread 1: Cache hit - grace copy is now eligible - Respond
  • Thread 2: Request content from web server
  • Thread 2: Block
  • Thread 3: Accept request from client 2
  • Thread 3: Look up content in cache
  • Thread 3: Cache hit - grace copy is now eligible - Respond
  • Thread 2: Get content from web server

And so forth. Strictly speaking, I suppose this makes grace /less/ magical...

In other words: The first client will also get a cache hit, but Varnish will update the content in the background for you.

It just works.

Statistics?

What is a cache hit?

If I tell you that I have 100% cache hit rate, how much backend traffic would you expect?

We want to keep track of two ratios:

  • Cache hit rate - how much content is delivered directly from cache (same as today). Target value: 100%.
  • Fetch/request ratio: How many backend fetches do you initiate per client request. Target value: 0%.

For my application, a single user will result in a 100% cache hit rate, but also a fetch/request ratio of 100%. The cache isn't really offloading the backend load significantly until I have multiple users of the app. Mind you, if the application was slow, this would still benefit that one user.

The latter is also interesting from a security point of view. If you find the right type of request, you could end up with more backend fetches than client requests (e.g. due to restarts/retries).

How to use grace

You already have it, most likely. Grace is turned on by default, using a 10 second grace period. For frequently updated content, this is enough.

Varnish 4 changed some of the VCL and parameters related to grace. The important bits are:

  • Use beresp.grace in VCL to adjust grace for an individual object.
  • Use the default_grace parameter to adjust the ... default grace for objects.

If you want to override grace mechanics, you can do so in either vcl_recv by setting req.ttl to define a max TTL to be used for an object, regardless of the actual TTL. That bit is a bit mysterious.

Or you can look at vcl_hit. Here you'll be able to do:

if (obj.ttl + obj.grace > 0s && obj.ttl =< 0s) {
        // We are in grace mode, we have an object though
        if (req.http.x-magic-skip-grace-header ~ "yes") {
                return (miss);
        } else {
                return (delier);
        }
}

The above example-snippet will evaluate of the object has an expired TTL, but is still in the grace period. If that happens, it looks for a client header called "X-Magic-Skip-Grace-Header" and checks if it contains the string "yes". If so, the request is treated as a cache miss, otherwise, the cached object is delivered.