In web and server-side development, a memcache is a remote service that keeps transient data in temporary, in-memory store and allows for fast access. It is usually based on keys which identify values being stored and retrieved. Due to storing technique – RAM rather than actual disks – it offers great speed (often less than few milliseconds per operation) while introducing a possibility for values to be evicted (deleted) if memcache runs out of space. Should that happen, oldest values are usually disposed of first.
This functionality makes memcache a good secondary storage that complements the usual persistent database, increasing the efficiency of the system as a whole. The usual scenario is to poll memcache first, and resort to querying the database only if the result cannot be found in cache. Once the query is made, its results are memcached for some reasonable amount time (usually called TTL: time to live), depending on allowed trade-offs between speed and being up-to-date with our results.
While making applications more responsive is the primary use case for memcache, it turns out that we can also utilize it for something completely different. That’s because memcache implementations offer something more besides the obvious get/set commands. They also have operations which make it possible to use memcache as synchronization facility.
Well, not necessarily – it depends on scope. Having an entirely global lock on some resource is obviously not acceptable, as it’s a direct offense to scalability. (That’s why it’s not so easy to implement something as deceptively simple as a counter). But if we’re talking about something much more local – like data specific to a single user – it might make sense to allow some kind of mutual exclusion. The most important point here is to make sure that the scope of such critical section remains constant, irrespective to future growth of the whole system.
Bearing this in mind, we can use memcache to build some useful synchronization primitives.
Individual memcache operations – such as get and set – are atomic, as memcache servers ensure they are always properly ordered and fully applied. This is very similar to reading and writing most variables of scalar types in imperative programming languages. By analogy, however, a sequence of memcache operations is not atomic due to exact same reasons why updating a value of variable isn’t: it results in race condition if multiple threads attempt to do so at the same time.
As a possible remedy for this problem, concurrency frameworks usually offer compound atomic operations, such as get & set or incrementation (which consists of read, add and write). They “just work” as the atomicity is usually provided by the underlying machine code. It simply prevents any collision (race condition) from happening.
Unfortunately, compound operations on memcache aren’t that simple: in general case, the collision cannot be prevented but only detected. The upside is that we don’t need a multitude of specialized atomic operations. A single one will suffice – and it’s usually Compare & Set, or CAS for short.
How does it work? In essence, it’s almost like a simple set command, because (somewhat surprisingly) the ‘compare’ part does not apply to the value being overwritten. Instead, it refers to checking whether the value was changed by someone else – a symptome indicating a race condition. If it’s found out to be true, CAS fails and must be reattempted. In other words, what we have here is not collision prevention, but collision detection.
For this to work, firstly we must somehow “reserve” the value for updating via CAS. This is done by a special form of get operation which is usually named gets. It retrieves not only the current value for specified key, but also some unique identifier of this value. For example, in Google App Engine’s memcache this identifier is called
cas_id and is essentially a timestamp of value’s last modification. This timestamp is sent along with updated value and compared against by the memcache server. CAS will succeed only if it’s still current; otherwise we have detected a concurrent modification.
Putting it all into code, we can create a generic
update function for values in memcache:
Result is pretty much a generalization of the CAS-based counter increment, described by Guido van Rossum. It also has mostly the same caveats, with nasty busy waiting being the most obvious one. It serves good as a remainder, though: if this loop can often run for more than just few spins, we are likely having synchronization which is too excessive or too wide in scope.
Right, it would seem that I forgot about the gist: we were to create a synchronization primitive. But in fact, that’s exactly what we’ve already done! The beforementioned
update through CAS seems to allow us to implement at least the simple one: mutual exclusion primitive, or mutex:
We could enhance it with
__exit__ methods in order to make it usable for
with clauses, but that’s a pythonic detail for something which is pretty much language-independent. Usage is still very straightforward:
Everything would be nice and dandy if not for the fact that – at least on App Engine – such mutex doesn’t work: it loops in
cas never succeeds!
It turns out that the problem arises because the mutex’ key does not exist in memcache when
acquire is invoked for the first time. Since
cas is unable to update a non-existent value, it will always fail. It would seem that we need to handle this case separately, so it’s best to unroll the
update into our
This seems sufficient, as we are explicitly setting the mutex’ value to
False if it isn’t present in the memcache yet. But it’s not hard to spot a serious problem here: we have introduced a race condition between
set calls! It might result in two processes receiving
gets and both entering the critical section, as
set will happily overwrite
True in the memcache.
What to do then? A quick fix is available and it involves using
add instead of
Our simple solution, however, has grown to be actually pretty complex now – and unnecessarily so. As we have laid our eyes on the
add operation, we might discover that it is also atomic – just like its counterpart,
delete. Perhaps unsurprisingly, those two facts alone are enough to implement our mutex in astonishingly simple manner:
The difference between this and previous version is mechanism used to represent the lock. For CAS-based solution, it is a boolean value under memcache key. For
delete one, it’s the mere existence of such value.
Our outcome doesn’t mean that CAS is entirely useless for synchronization. It still provides protection against concurrent modification, which is absolutely necessary in this context. It doesn’t lend a hand in solving the initilization problem, though.
On the other hand, using
delete takes advantage of memcache’s “natural” synchronization mechanisms, revolving around the existence of cached values. For our purposes, we can consider all non-existent memcache values as mutexes which are implicitly initialized to ‘unlocked’ state. Thus the initialization problem disappears completely – it’s solved before it even surfaces.
So what about CAS?… In theory, having a mutex (also known as binary semaphore) enables implementation of all other synchronization objects, such as counting semaphores. In practice, they are normally better if implemented from scratch – using atomic increments and decrements, for example. Once we get to sufficiently high level – say, around monitors – a need for CAS would likely arise again.
But in general, we do not want to reach this point. Synchronization really is bad, and if you find yourself using more than an occasional simple mutex , you may be asking for trouble in the long run.
I would go with add/delete instead of set. Then you can (and should) set expiration time for the key. If your process dies the system will not stay locked forever. It might also help with deadlocks.
How many instances of memcached do you use?
If you had single memcached server then it is obvious SPOF – it goes down and you can no longer use locks.
If you had multiple instances then this locking solution won’t work due to CAP theorem.
Setting the timeout – as well as namespacing the locks, to not collide with actual cached items – are details I intentionally omitted. The post has grown sufficiently long even without them :)
As for number of servers, I assumed the memcache itself is consistent. It generally requires a single server (which – if I recall correctly – is the case of Heroku standard memcache, for example) or some magic behind-the-scenes (which is probably the case GAE). If this condition cannot be met, the lock may no longer be reliable – which kinda confirms the main point that ‘synchronization is bad’.
I like the point about synchronization being bad :-)
As to consistency – memcached doesn’t replicate anything, so there is nothing to be inconsistent. However, for given key, you might get value on one node, and None on another ;-) (by default Python memcached library returns None when the server is not available).
That’s exactly what I meant by lack of consistency :)
I guess you could work around that by assuming that non-existent values correspond to locked mutexes, and never busy-wait on acquiring them. This severly limits the usefulness of such locks, though.