Future of Web Apps – Session 4 – Web Caching Techniques – Wim Godden
This was live blogged – there will be mistakes.
In order to understand the next step, we need some history
In the Stone Age, if you wanted to publish news you would have to write on a wall. This is not portable or shareable. It’s just a bunch of pictures
This is what html was at the start
In the Egyptian era, we drew on tablets and this was more portable but we still have to make copies by drawing all over agains
We have had something similar in the old world where if we have a dynamic site, we have to get that data Everytime
In the industrial revolution, we made the printing press to make easy copies of documents
In the web, we used caching to keep results of database queries etc without getting them each time
In the 1800s, we just changed the latest news “extra extra”.
We can also do this will websites, caching different parts of the page
In the modern era, we use television to push out to people but we need the infrastructure to support it.
In the web, if we have twice as many users, we need to make sure the site can support it. Also, if we have multiple servers, you need to make sure the information is the same on each one. We could use memcached or redis for this.
Today, things are different. We don’t just watch tv, we don’t just use the Internet at home sat at a desk. We get data all the time putting load on our infrastructure – we can’t simply just add more and more servers.
So companies started using reverse proxies to serve reusable content to the user. Eg Varnish
On a typical website, we have various parts of the site and we want to ca he each part independently. If we want to do this is code it’s expensive. We can use reverse proxies to take the work off us and it means the web server does less work.
You can cache all GET requests for static data like HTML or JS or CSS in varnish. But you can’t cache POST requests or cookie requests. Varnish is therefore pointless if you are using a Site with login as they will all contain cookies.
There a couple of solutions to this.
Instead of fetching pages from varnish and getting database information, we connected the reverse proxy to the caching layer.
Nginx is a web server but also a reverse proxy which is lightweight and fast with a low memory footprint. It’s the 2nd largest server in the world and uses no threads because it’s event driven.
Imagine you request /page, nginx will not know about that page and will collect placeholder block with our content in. It will store those placeholders to nginx. We now need to get the contents so nginx fills the placeholder will the content and store that in memcached using session information.
On the second request, nginx knows that it has the data so just reads from the cache and doesn’t hit the web server at all.
But what about updates to then data. For example you receive a news message you can update the database but crucially also update the cache, so that future requests still read from the cache, but this is updated to show current state.
You can also preload lots of information about the user in the cache too
You can also do this for non user specific content. You can push the latest news into that database when added but also into the cache ready for requesting.
If you use SLIC, you can also use variable and conditionals to display different information from the cache.
This allows us to cache information once and fill the placeholders with the user specific data.
In order to identify the user, we can pass variables via nginx. For example the session Id.
Nginx has native memcached support and has a excellent sub requests system whilst handling thousands of connections per worker.
Using these techniques, it dramatically reduces load on the server because this doesn’t hit the web servers, it all just comes from the cache.