A Handy Guide to the Mesos-Kubernetes-Swarm Jungle
MUST WIN BLOG

How to choose the best container orchestrator for you

Like Marc Andreesen said, “Software is eating the world” so now all companies regardless of their age, size, or industry are transitioning into software companies.

As these company’s software organizations mature, they will need to change the way they think about their Software Architecture, their team and the platform used to run their software.

Having one big “mainframe” running your whole application in a dusty datacenter worked well in the 70s, but now enterprises are running high availability software on-premise, in the cloud all while trying to enabling their agile teams to delivery quickly.

In the old days software teams ran huge applications with millions of lines of code in those big machines, but now more and more companies are adopting a “divide and conquer strategy” where instead of having one “Monolith” your application is divided into tens or even hundreds of smaller applications. We call these small applications microservices.

Running that many microservices can be complicated, so managing them requires an orchestrator (also sometimes called a scheduler) for distributing services around your datacenter. There are lots of choices for schedulers.

This blog post will talk about what a scheduler is and how to pick the one that best fits the needs of your team.

Microservices are Awesome

There are many articles out there about what is a microservice and why you should adopt them. So instead of doing a lengthy explanation of why they are Awesome, we’ll just highlight some of the best characteristics of them:

  • Microservices have smaller codebases are faster to build, and easier to test and maintain
  • Microservices allow standardizing tools across the org
  • Microservices reduce risk by separating the mission critical parts of your code from others that have lower priority

So, You’ve decided you want to “Do Microservices”

You have done a beta test with microservices, your team has deployed their first or second microservice in your datacenter. Slowly you’re realizing that these deployments are not going as fast as you would like…

What slows down a microservice deployment

The traditional provisioning cycle can take up to 6 weeks. This “Confidential” slide comes from this presentation

a. Having to deal with Dependencies and Base images

If you want to run a service, you need a machine to run it on. This machine needs not only access to the code it has to run, but to the whole environment. This machine needs a Java/Ruby/PHP environment, all the packages/gems/modules and any static libraries your code relies on.

While tools like Packer simplify creating this base image, having to maintain multiple different images can be time consuming for your org.

b. Having to constantly update your Infrastucture

Every time you change something about your environment, either a certificate, a credential or a dependency, you need to propagate these changes to your whole infrastructure.

Tools like Terraform, Chef and Puppet simplify keeping your infrastructure up to date, but can create a dependency on your Cloud Provider (AWS, Azure, Google Cloud Engine, etc)

c. Having and maintaining multiple availability zones is not easy

To guarantee your application’s uptime, it’s important to have redundant services. Having multiple availability zones is now a requirement for any large software company. But even with Infrastructure as Code this is not easy and keeping all zones in sync can be an arduous chore for even the best development and operations teams.

Containers to the rescue

A Microservice can be run in multiple ways, either on a machine alongside your existing app (creating a single point of failure, and a maintenance nightmare), on a separate machine (more expensive), or in a container on an orchestrator, which provides a standard deployment strategy for all your applications.

A container is a low level virtualization method that instead of creating a whole virtual machine, it creates an image that contains only your application code + environment. This allows a single machine to run multiple containers without having to deal with the app environment or dependencies. The two most popular ones are LXC and RKT

Running applications in containers provides the most flexibility by both optimizing usage while allowing scaling services up or down as required.

Using Schedulers to manage containers instead of machines

Schedulers come in different colors and sizes from noble Kubernetes Ships, Nomad groups, to Docker swarms.

The most popular schedulers are:

  • Mesos (Mesosphere)
  • Kubernetes (Google)
  • Nomad (Hashicorp)
  • Docker Swarm (Docker)
  • Cloud Foundry (Pivotal, HP, IBM, Others)
  • OpenShift (Redhat)
A Scheduler is the ship where your application (Illustrated as a giraffe here) lives.

How are they Different?

While most schedulers run Docker (or rkt) containers, they differ in two main ways, strategies and features.

Strategies:

  • Monolithic : Hadoop YARN, Docker swarm

The Monolithic strategy has a single point of coordination that knows the full state of the entire system and does smart allocations. They’re typically pretty static systems, like YARN that allows running things like Hadoop and Spark on the same hardware — so there aren’t a lot of moving parts

  • Two-Level (Pessimistically concurrent): Mesos

Two level schedulers are a little more free for all: a central authority offers resource to each framework running on it, those frameworks accept resources as they see fit. So, if we’re using Mesos as a scheduler, then as a consumer we’d actually talk to something like Marathon which is a framework to start long running processes, for example.

  • Shared state (Optimistically Concurrent): Nomad, Kubernetes.

Then there are shared state schedulers, these guys start containers the fastest. Nomad has a benchmark where they start 1mm containers on 5k nodes in 5 minutes. Pretty amazing.

For most applications, the strategy is not important so most of us will focus on features instead.

Features:

Scheduler Feature Matrix

Enterprise support

Enterprise support is for the most part available on all providers, with the exception of Google. Kubernetes support is only available for their GCE cloud offering, but there are some third party providers that provide support.

Multi-Datacenter / Availability Zones Support

Not all Schedulers support Multiple Availability Zones.

Bare Metal

Most schedulers with the notable exception of Cloud Foundry can be installed on “bare metal” or physical machines inside your datacenter. This can save you big on hypervisor licensing fees.

Volume Mounts

Volume mounts allow you to persist data across container deployments. This is a key differentiator depending on your applications’ needs. Mesos is the leader here, and Kubernetes is slowly catching up.

Secrets Management

Secrets Management is a big piece of configuration orchestration that doesn’t just go away when you start using a scheduler. Docker and Mesos don’t have built in solutions here, but everyone else does at this point

How to Choose?

Here is where it gets tricky, while all solutions run containers and scale well, you need to pick the one that fits your engineering team better.

Features

The feature differences might seem small, but they end up being very significant depending on your application needs. Things like persistent volumes and private docker registry support are often overly restrictive showstoppers for some platforms.

Compliance Needs

Do you have strict audit-ability or compliance needs? Only Cloud foundry provides any kind of user/permission model and it might be too basic for your requirements.

Momentum / Support

As schedulers are becoming more and more popular, more options keep appearing every day

Maturity

Cloud Foundry — 2011
Mesosphere — 2013
Kubernetes — 2014
Nomad — 2015

Need help choosing one?

MustWin actively contributes to many of these projects and communities. We’d be happy to help you create requirements for your app and evaluate, test and deploy schedulers for your company.


Gonzalo Maldonado is a Tech Lead at The Must Win All Star Web & Mobile Consultancy.

Gonzalo is a customer-focused engineer who has helped teams scale from 5 to 500+. He has experience with everything from mobile apps, to DevOps, to a myriad of web frameworks. After hours Gonzalo likes hunting for vinyl, running, and grilling.