Intelligent content caching is one of the most effective ways to improve the experience for your site’s visitors. Caching, or temporarily storing content from previous requests, is part of the core content delivery strategy implemented within the HTTP protocol. Components throughout the delivery path can all cache items to speed up subsequent requests, subject to the caching policies declared for the content.
In this guide, we will discuss some of the basic concepts of web content caching. This will mainly cover how to select caching policies to ensure that caches throughout the internet can correctly process your content. We will talk about the benefits that caching affords, the side effects to be aware of, and the different strategies to employ to provide the best mixture of performance and flexibility.
Caching is the term for storing reusable responses in order to make subsequent requests faster. There are many different types of caching available, each of which has its own characteristics. Application caches and memory caches are both popular for their ability to speed up certain responses.
Web caching, the focus of this guide, is a different type of cache. Web caching is a core design feature of the HTTP protocol meant to minimize network traffic while improving the perceived responsiveness of the system as a whole. Caches are found at every level of a content’s journey from the original server to the browser.
Web caching works by caching the HTTP responses for requests according to certain rules. Subsequent requests for cached content can then be fulfilled from a cache closer to the user instead of sending the request all the way back to the web server.
Effective caching aids both content consumers and content providers. Some of the benefits that caching brings to content delivery are:
When dealing with caching, there are a few terms that you are likely to come across that might be unfamiliar. Some of the more common ones are below:
There are plenty of other caching terms, but the ones above should help you get started.
Certain content lends itself more readily to caching than others. Some very cache-friendly content for most sites are:
These tend to change infrequently, so they can benefit from being cached for longer periods of time.
Some items that you have to be careful in caching are:
Some items that should almost never be cached are:
In addition to the above general rules, it’s possible to specify policies that allow you to cache different types of content appropriately. For instance, if authenticated users all see the same view of your site, it may be possible to cache that view anywhere. If authenticated users see a user-sensitive view of the site that will be valid for some time, you may tell the user’s browser to cache, but tell any intermediary caches not to store the view.
Content can be cached at many different points throughout the delivery chain:
Each of these locations can and often do cache items according to their own caching policies and the policies set at the content origin.
Caching policy is dependent upon two different factors. The caching entity itself gets to decide whether or not to cache acceptable content. It can decide to cache less than it is allowed to cache, but never more.
The majority of caching behavior is determined by the caching policy, which is set by the content owner. These policies are mainly articulated through the use of specific HTTP headers.
Through various iterations of the HTTP protocol, a few different cache-focused headers have arisen with varying levels of sophistication. The ones you probably still need to pay attention to are below:
Expires
: The Expires
header is very straight-forward, although fairly limited in scope. Basically, it sets a time in the future when the content will expire. At this point, any requests for the same content will have to go back to the origin server. This header is probably best used only as a fall back.Cache-Control
: This is the more modern replacement for the Expires
header. It is well supported and implements a much more flexible design. In almost all cases, this is preferable to Expires
, but it may not hurt to set both values. We will discuss the specifics of the options you can set with Cache-Control
a bit later.Etag
: The Etag
header is used with cache validation. The origin can provide a unique Etag
for an item when it initially serves the content. When a cache needs to validate the content it has on-hand upon expiration, it can send back the Etag
it has for the content. The origin will either tell the cache that the content is the same, or send the updated content (with the new Etag
).Last-Modified
: This header specifies the last time that the item was modified. This may be used as part of the validation strategy to ensure fresh content.Content-Length
: While not specifically involved in caching, the Content-Length
header is important to set when defining caching policies. Certain software will refuse to cache content if it does not know in advanced the size of the content it will need to reserve space for.Vary
: A cache typically uses the requested host and the path to the resource as the key with which to store the cache item. The Vary
header can be used to tell caches to pay attention to an additional header when deciding whether a request is for the same item. This is most commonly used to tell caches to key by the Accept-Encoding
header as well, so that the cache will know to differentiate between compressed and uncompressed content.The Vary
header provides you with the ability to store different versions of the same content at the expense of diluting the entries in the cache.
In the case of Accept-Encoding
, setting the Vary
header allows for a critical distinction to take place between compressed and uncompressed content. This is needed to correctly serve these items to browsers that cannot handle compressed content and is necessary in order to provide basic usability. One characteristic that tells you that Accept-Encoding
may be a good candidate for Vary
is that it only has two or three possible values.
Items like User-Agent
might at first glance seem to be a good way to differentiate between mobile and desktop browsers to serve different versions of your site. However, since User-Agent
strings are non-standard, the result will likely be many versions of the same content on intermediary caches, with a very low cache hit ratio. The Vary
header should be used sparingly, especially if you do not have the ability to normalize the requests in intermediate caches that you control (which may be possible, for instance, if you leverage a content delivery network).
Above, we mentioned how the Cache-Control
header is used for modern cache policy specification. A number of different policy instructions can be set using this header, with multiple instructions being separated by commas.
Some of the Cache-Control
options you can use to dictate your content’s caching policy are:
no-cache
: This instruction specifies that any cached content must be re-validated on each request before being served to a client. This, in effect, marks the content as stale immediately, but allows it to use revalidation techniques to avoid re-downloading the entire item again.no-store
: This instruction indicates that the content cannot be cached in any way. This is appropriate to set if the response represents sensitive data.public
: This marks the content as public, which means that it can be cached by the browser and any intermediate caches. For requests that utilized HTTP authentication, responses are marked private
by default. This header overrides that setting.private
: This marks the content as private
. Private content may be stored by the user’s browser, but must not be cached by any intermediate parties. This is often used for user-specific data.max-age
: This setting configures the maximum age that the content may be cached before it must revalidate or re-download the content from the origin server. In essence, this replaces the Expires
header for modern browsing and is the basis for determining a piece of content’s freshness. This option takes its value in seconds with a maximum valid freshness time of one year (31536000 seconds).s-maxage
: This is very similar to the max-age
setting, in that it indicates the amount of time that the content can be cached. The difference is that this option is applied only to intermediary caches. Combining this with the above allows for more flexible policy construction.must-revalidate
: This indicates that the freshness information indicated by max-age
, s-maxage
or the Expires
header must be obeyed strictly. Stale content cannot be served under any circumstance. This prevents cached content from being used in case of network interruptions and similar scenarios.proxy-revalidate
: This operates the same as the above setting, but only applies to intermediary proxies. In this case, the user’s browser can potentially be used to serve stale content in the event of a network interruption, but intermediate caches cannot be used for this purpose.no-transform
: This option tells caches that they are not allowed to modify the received content for performance reasons under any circumstances. This means, for instance, that the cache is not able to send compressed versions of content it did not receive from the origin server compressed and is not allowed.These can be combined in different ways to achieve various caching behavior. Some mutually exclusive values are:
no-cache
, no-store
, and the regular caching behavior indicated by absence of eitherpublic
and private
The no-store
option supersedes the no-cache
if both are present. For responses to unauthenticated requests, public
is implied. For responses to authenticated requests, private
is implied. These can be overridden by including the opposite option in the Cache-Control
header.
In a perfect world, everything could be cached aggressively and your servers would only be contacted to validate content occasionally. This doesn’t often happen in practice though, so you should try to set some sane caching policies that aim to balance between implementing long-term caching and responding to the demands of a changing site.
There are many situations where caching cannot or should not be implemented due to how the content is produced (dynamically generated per user) or the nature of the content (sensitive banking information, for example). Another problem that many administrators face when setting up caching is the situation where older versions of your content are out in the wild, not yet stale, even though new versions have been published.
These are both frequently encountered issues that can have serious impacts on cache performance and the accuracy of content you are serving. However, we can mitigate these issues by developing caching policies that anticipate these problems.
While your situation will dictate the caching strategy you use, the following recommendations can help guide you towards some reasonable decisions.
There are certain steps that you can take to increase your cache hit ratio before worrying about the specific headers you use. Some ideas are:
In terms of selecting the correct headers for different items, the following can serve as a general reference:
no-cache
or no-store
options can be set in the Cache-Control
header to achieve this.Etag
and the Last-Modified
headers allow caches to validate their content and re-serve it if it has not been modified at the origin, further reducing load.The key is to strike a balance that favors aggressive caching where possible while leaving opportunities to invalidate entries in the future when changes are made. Your site will likely have a combination of:
The goal is to move content into the first categories when possible while maintaining an acceptable level of accuracy.
Taking the time to ensure that your site has proper caching policies in place can have a significant impact on your site. Caching allows you to cut down on the bandwidth costs associated with serving the same content repeatedly. Your server will also be able to handle a greater amount of traffic with the same hardware. Perhaps most importantly, clients will have a faster experience on your site, which may lead them to return more frequently. While effective web caching is not a silver bullet, setting up appropriate caching policies can give you measurable gains with minimal work.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
This is amazing, thanks for taking the time to share this info, Justin. I’ve been working with web for the last couple of years and the caching thing has always been a bit of a grey area for me (A grey area you’ve just helped to clear up).
This comment has been deleted
Excellent article. Thanks!
Great one. Thanks.