Scaling Memcached clusters is kind of a pain. Because it’s consistently hashed, clients all need to know about new cache instances and you can’t scale up or down too quickly without blowing your entire cache.
I took a couple evenings after Mesosphere’s MesosCon to solve this problem by building out a Memcached auto-scaling framework for Mesos. It’s a simple implementation that schedules Memcached servers and watches cpu and memory utilization across the cluster. When thresholds are breached, it intelligently schedules another Memcached instance to join the cluster.
Understanding Mesos Frameworks
There’s some documentation on this in the official Mesos app framework development guide, but it leaves you a lot to figure out on your own. At this point, reading the source code, particularly the protocol buffers definitions, is the most useful place to start to understand the interplay between components.
The scheduler is the brains of the outfit. It receives resource offers from Mesos and decides whether or not to accept them and how many resources to grab.
The executor runs on Mesos agents and launches tasks when the scheduler tells it to. They can unreliably report back to the scheduler using framework messages as needed.
A framework consists of an implementation of both a scheduler and an Executor that interplay with Mesos in a message passing kind of workflow.
The APIs for each are a little complicated because of the diverse needs that a framework might have. I expect that more tools will be built with useful abstractions for developing frameworks on these APIs (like Netflix’s Fenzo), but it’s still a pretty limited ecosystem.
I wanted the first version to be relatively simple so I didn’t want to depend on other tools. Because of that, it needed to handle status monitoring, reporting, and provide some service discovery mechanism.
The scheduler starts by scheduling a single task and starting a web-service that reports the locations of running tasks Memcached instances. Services consuming Memcached should use this endpoint to setup their local Memcached client server set. The scheduler receives resource offers, decides whether it should start another instance, and if so, schedules configurable amounts of CPU, RAM and a single port for the new task.
When the executor gets word that it should start a task, it starts a Memcached Docker container and bridges the allocated port into the container. It also continuously watches the resource utilization of the container and reports it back to the scheduler.
The scheduler watches the global state of the cluster and when average resources get too high, it allocates more of them upon receiving new resource offers.
Because Memcached uses a consistent hashing algorithm to route requests, there are negative consequences to making drastic changes to the cluster size. Right now there’s a basic throttle on starting new instances (obviously this algorithm could be significantly improved). There are also tools like Facebook’s Mcrouter that could be used by the framework to help drive scheduling and lessen the impact of cluster changes.
Get involved. Feedback welcome!
Need help getting started with Mesos? Reach out to Must Win.
We handle deployment, management and custom framework development.
Mike Ihbe is a founding partner of The Must Win All Star Web & Mobile Consultancy.
Mike is an expert in a dizzying array of technologies and has loads of experience managing fast-moving dev teams, designing systems, and scaling large applications. When not on the job, he can be found cooking delicious meals on ski slopes in exotic locales.