Docker: evil spawn or useful tool

2016-01-30 5-minute read

There are plenty of criticisms of docker, the system for building a container-based virtual machine running just a single application. I’ve read many of them have have consistently been either in agreement or at least amused.

The most relevant criticism is about the basic approach of building single-application virtual machines. To understand this criticism, let’s remember - in traditional application deployment there are at least three distinct jobs:

  • Distrubtion developers - the team the integrates all the various packages available into a coherent and well-functioning system and, importantly, monitors the upstream of all the packages to ensure that security bugs are properly packaged and made easily available for installation. These people don’t give a shit about your PHP parse error and don’t really care if you can’t figure out the command to create a new Postgres Database.
  • System administrators - these are the people that ensure that the security updates provided by the distribution developers get installed on a regular basis and if something breaks during this process, they are the ones that fix it. They know how to create your new postgres database. They don’t really care about your PHP parse error either, but may get roped in to tell you it’s a PHP parse error.
  • Application developers - these are the people that care about the PHP parse error. They also know how to create beautiful things that end users can interact with.

These three groups have lived in happy tension for years.

Now we have containers.

The problem with containers is that suddenly the system administrators are out entirely and the distribution developers’ role may have been dramatically minimized or circumvented altogether. Meanwhile, the application developer is both liberated from any constraints imposed by either the system administrator or the distribution developers (hooray!) but is also saddled with the enormous responsibilities of these two roles - which may not be apparent to the application developer at first.

I have practically no distribution developer experience and have considerable experience as a system administrator and an application developer. And, it took me months to sort through how to properly develop and deploy an application using docker in a way that I thought was responsible and secure.

I started by learning about how docker wants you to deploy images - by downloading them from their shared registry. As I mentioned, I have very little experience in the realm of distribution development, but I at least know enough about Debian to know that a lot of time and thought has gone into cryptographically verifying packages that I install, which apparently is not done at all with Docker images. Is that obvious to everyone using Docker??

Fortunately, you can work around this problem by creating your own base image which is trivially easy. So, now I build all images, from scratch, locally. That helps put Debian developers back into the mix.

Next, I started looking at Docker Files that I could use to construct my images and discovered something else troubling. Take, for example, the official nginx Docker image. It is based on Debian Jessie (hooray - our distribution developers are in the mix!). However, it then proceeds to install nginx from the nginx repository, not the debian repository. Well, I guess if you are nginx you want to have full control, but still, the Debian developers version of nginx has been vetted to ensure it works with Debian Jessie, so you are really losing something here.

So… my next break from Docker convention is to use all Debian packages in my Docker files.

Once my images were built and my application was tested and running, I was done, right?

Wrong. Well, I would have been done if I didn’t care about upgrading my systems, backing up my data or running cron jobs. Remember those things? Those are things that distribution developers and system administrators have been perfecting for decades. And, they are not trivial.

I then built a series of scripts to help alert me when a image I am using has updates (you can’t just use cron-apt any more since cron isn’t running in your container) and help me update the image and deploy it to all my applications (which involves restarting the application). Backing up data is a whole different can of worms - sometimes involving interacting with your container (if it’s a database container, you have to launch a database dump) or simply copying files from the host, assuming you got the right Docker volume strategy (which took me days to fully understand). Lastly, I had to run a cron job from the host that then runs whatever commands are needed on each of my containers.

This was complicated.

In the end, was it worth it? Yes, I think so. However, not because it was simple, which seems to be the Docker mantra. I think it’s worth it because I can run 100 instances of my application using significantly less resources than when I was using full virtualization and because I can more easily and flexibly adjust the resource allocation. However, check in with me in a year and I’ll probably have a different opinion.