Instant, Fresh and Easy Cache

Instant, Fresh and Easy Cache

While working on your website you might have experienced loading times each time you run a PHP or other time-consuming script. Actually, web development is just one of many cases where saving some data and storing it for some time for fast later access could really help. Our devices are implementing solutions to this problem at many different levels. Programmatically we can distinguish SQL servers which can be bombarded with queries of which many could be exactly the same. Another would be internet browsers which cache files on probably most sites so the requests to the server don't even have to be made. On the hardware side, we meet hybrid drives where two kinds of disks were united to bring both speed and high capacity. And of course the most importantly we know processors which have many levels of ultrafast cache memory themselves.

You get the point. Recently I implemented the Music widget you can see on the right and I will tell you how I used cache for it. You are about to see what implementations I tried and what turned out to work.

What's the problem bro

Well, first of all - the delay.   When we ask for data, either in software or hardware, we always have to wait. Sometimes it may not be long but other times we could really move a house. Ok, maybe I'm exaggerating here, but again - you get the point. Maybe our data is valid and can be used even few times before we would be forced to ask for it again. This affects clients mainly.

And the second thing hits the server side and that would be - the load. While we saved time and processing power on not preparing the request, that did not have to be done, we could allocate this resources elsewhere. Especially useful when we are dealing with systems that utilize many machines and handle many requests all the time. Or we could just let them idle if there is nothing to be done, and save power I guess.

So on either side you are there could be benefits for you. OwnTime was improved in both: website load times which decreased by over 0.7s because my Music Widget is placed on every page and that's how much time the requests to Spotify and LastFM servers take. And my 35$ server, which I am very proud of as it exceeded my expectations dozens of times, can take a breath or be ready for my next challenges.

"Caching is for lamers, you should just make your code faster"

Said no one, nowhere, never, I hope. And that is unusual for me to say because I am the one who cares about speed and algorithms a lot. But we know there is always a limit to how fast we can make things. My PHP script consumed already ~97% of the time for those two external requests and I couldn't improve on those times much (unless LastFM or Spotify hire me in the very near future). At least that's what I thought at the beginning. When I thought a bit longer I came up with an idea of caching one of those. The Spotify data didn't need to be refreshed every time someone requested my widget. I fetched from them only my number of followers and two links. Those change very little every day or not at all. So I could simply cut the time in half just by saving this to a file. So that's what I did.

How did I do it?

PHP is feature rich so I didn't have to work too much to implement my simple caching 'system'.

function getFile($file, $time){
	$cachedfile = 'cached-'.$file.'.html';
	if (!file_exists($cachedfile) || time() - $time >= filemtime($cachedfile))
		exec("php $file.php > $cachedfile");
	return file_get_contents($cachedfile);
}

And now every time I need a file which could be cached I use this function. Let's say we need the result of the 'XYZ.php'. With this function, we can get it by passing 'XYZ' as a $file and it will make sure that your result is no older than $time seconds. Keep in mind it is not foolproof solution though. It does not make sure to not write to a file when it has data already written to. But it suits my needs for now. Maybe I will improve it in the future when OwnTime grows. Feel free to take it and modify to your needs.

The cached-XYZ.html file would be created in our example. And every time we ask for this file again, our code would check if a cached version of it exists and if it's fresh enough. If not, it will run the script and update the cache file before returning its output.

So both of us would agree now on the 'easy' part. But what about the rest?

How could cache not be instant or fresh?

So I've tried few things before I settled on a solution. My first idea was to return the data immediately even when its cache is older and refresh it afterward. This way the response would be always immediate. But this created problem that when no one entered my site for a day then they would get very old data, which wasn't ok with me. And I couldn't manage to end the connection in the PHP properly before executing the second half. Sometimes ob_end_flush();ob_flush();flush(); worked but sometimes it left the browser hanging.

The second idea was to do the refresh part by using a daemon. This could work for some of you but the idea of spawning daemons for each update was bothering me. What if someone would insert by mistake a never ending script? Would he or she remember to kill those? I wasn't so sure about it so I left this idea as I don't want to write code that others can misuse.

I even thought about merging those ideas and sending the data twice if the cache was old. But it wasn't scalable as every client would need to implement the update function and it basically created more problems than solutions.

The solution

So my final idea and the one I have chosen eventually was to write a program that would refresh the cache every $time seconds. I was even going to write one myself in C++. But then I though that I'm making it too complicated for just those few files. I already knew a program that was able to solve my simple problem. It is CRON and I used it a few times before. Most of you probably used it or at least heard of it if you are using Linux. CRON can run as a background service and we can tell it to run commands at specific times.

So now the cache is:

  1. fresh (not old),
  2. instant as I can immediately return it,
  3. and it was very easy as I only needed one function and one line for every file I wanted to cache.

For now it works perfectly but if my list of 'files to update' grows then I am going to write dedicated program to dynamically decide how often the file should be refreshed based on requests. But about that, I would write another article. And as of now, you can spy on follow my music taste without any delay. Have a good day 🙂