Friday, August 29, 2014

Dockerfile Golf (or optimizing the Docker build process)

I was working with a friend of mine on his startup Down For Whatever and he wanted to use Docker.

I created a docker farm for him and we're using Centurion for deployments, we signed up for a docker hub account to store images and started pushing code.

A few days later he emailed me saying that he wanted to switch to capistrano for code deployments instead of building a docker image each time because pushing the image to docker hub took too damn long. (upwards of 10 minutes for him at it's worst)

That felt wrong, and kind of dirty. To me, docker is about creating a bundle of code that you can deploy anywhere. Not about creating a bundle of infrastructure that then you can then deploy into.

It was also surprising because I had started off with a Dockerfile based on Brian Morearty's blog post about skipping the bundle install each time you build a docker image. So I didn't think I had a lot of optimization left available to me.

But once we got into Golfing around with the Dockerfile and the .dockerignore file we found massive improvements to be had.

The .dockerignore file

One of the issues that we found out right away was that he had 500M of log files in his app's log directory. So we added log/* to the .dockerignore. We were also picking up the .git directory, the db/*.sqlite3 databases, etc.

Once we removed those we dropped around 600M off of the size of our docker image, and that helped quite a bit.

Here's what we ended up with in the .dockerignore

The Dockerfile

Next we looked into the Dockerfile itself. Here's what we started with.
FROM dfw1/base:latest

USER root
RUN mkdir -p /opt/app/dfw
WORKDIR /opt/app/dfw
ADD Gemfile /opt/app/dfw/Gemfile
ADD Gemfile.lock /opt/app/dfw/Gemfile.lock
RUN chown -R app:app /opt/app/dfw
USER app
RUN jruby -S bundle install --without development test --no-color --path /opt/app/dfw
ADD . /opt/app/dfw
USER root
RUN chown -R app:app /opt/app/dfw
ADD ./container/dfw-supervisor.conf /etc/supervisor.d/dfw.conf
ADD ./container/dfw-nginx.conf /etc/nginx/sites-enabled/dfw.conf
USER app
RUN EXECJS_RUNTIME='Node' JRUBY_OPTS="-X-C" RAILS_ENV=production jruby -S bundle exec rake assets:precompile
USER root 

Yeah, that kind of sucks. So we started some Dockerfile golf.

( The base image that I call is basically "Install a bunch of stuff and then run supervisord as my CMD" )

Optimization Goals

I kept a couple of goals in mind while going through the dockerfile.
  • Create as few layers as possible
    • I actually had this drilled in me from the early days of Docker back when AUFS had a 42 layer limit so I was already grouping my 'RUN' commands and trying to be as sparse as possible elsewhere.
  • There should only be one big layer that isn't cached and that's the layer with the code in it. 
    • We found that with the above dockefile we were pushing a 70 MB layer and then a 100 MB layer and 15 MB layer. That was too many layers and too large. 

What We Learned

Avoid USER switching (if you can)
The first thing that I didn't like was all of the USER switching. Each switch is a layer that needs to be pushed, and though it's small pushing still takes a few seconds each for the transfer and for the layer verification.

In my new dockerfile all of the RUN commands run as root. The application itself runs as the app user thanks to supervisord.

You just need to know where to fix the permissions after the fact.

Sequence Matters!
Look at all the work i'm doing AFTER the line: ADD . /opt/app/dfw

Nothing after that line will ever be cached (because we're building a docker image because we've changed code) even though a lot of it will rarely change.

Move the static ADDs like supervisord config's to the top of the Dockerfile before the un-cacheable add.

After those two changes we're looking better
FROM dfw1/base:latest

RUN mkdir -p /opt/app/dfw
WORKDIR /opt/app/dfw
ADD ./container/dfw-supervisor.conf /etc/supervisor.d/dfw.conf
ADD ./container/dfw-nginx.conf /etc/nginx/sites-enabled/dfw.conf
ADD Gemfile /opt/app/dfw/Gemfile
ADD Gemfile.lock /opt/app/dfw/Gemfile.lock
RUN chown -R app:app /opt/app/dfw
RUN jruby -S bundle install --without development test --no-color --path /opt/app/dfw
ADD . /opt/app/dfw
RUN chown -R app:app /opt/app/dfw
RUN EXECJS_RUNTIME='Node' JRUBY_OPTS="-X-C" RAILS_ENV=production jruby -S bundle exec rake assets:precompile

But there are a few things we can still do.

Try to DRY up the Dockerfile

My "dfw1/base" image has a massive 'RUN' in it that's joined by a bunch of &&'s 
I have some things in this file that could be moved up into that RUN so that it's just 1 layer.

Specifically my mkdir -p /opt/app/dfw.

Sure, not all of the apps that use that base image will use /opt/app/dfw but it doesn't hurt
those apps for it to be there as an empty directory.

Also, since all of my apps will live in /opt/app/ and be run as 'app' I can
take out the first chown -R app:app /opt/app/dfw and have it in the 'base' image as just
chown -R app:app /opt/app (that also helps squash the dockerfile for other apps i build off that image)

Beware the chown!

This one was a huge /facepalm for me.

Look at what i do:
ADD . /opt/app/dfw
RUN chown -R app:app /opt/app/dfw

I HAVE to chown the dfw directory to app:app because Docker ADDs create files with UID 0. 
So app can't write to it's logs or create any cache files, etc.

But think about what we've done.
Layer 1: ADD ~72M of source
Layer 2: touch every file in that 72M of source to update it's user info.

Basically I now have two 72M layers that need to be pushed!

This one is a puzzler to fix. 

I solved it by changing my CMD strategy. 
The only CMD in my images was CMD ["supervisord","-c","/etc/supervisord.conf"]

I switched that to call a shell script:
chown -R app:app /opt/app
supervisord -c /etc/supervisord.conf

Now I could take all the chown's out of my images and we're not duplicating layers unnecessarily.
It also opened up quite a few optimization paths for my general workflow to add other things to that shell script instead of doing it in the layers.

Not everything needs to be done in the Dockerfile

At this point we're really well optimized. However there's one thing that still took quite a long time (as it usually does) and that's precompiling the assets for Rails.

My friend said, "Can't we just do that outside of docker?" I'm not a rails expert by any means, but he is, and if he thought we could do it I'd give it a try.

We were already using a build script to create / push our docker images. It's effectively.:
docker build -t dfw1/dfw:latest . && docker push dfw1/dfw:latest

We're developing on macs using boot2docker, which means that really all of the 'docker build' work is being done inside a virtualbox vm, which isn't the fastest way.

Also, in the container we only have jruby, which takes a bit of time to bootstrap itself.

Running the rake ahead of time cuts down the build time pretty significantly.


In the end we knocked about 9 minutes or more off the build/push of our dockerfiles and my friend was no longer calling for moving to capistrano. Our Dockerfile ended up being really nice and concise

FROM dfw1/base:latest
ADD Gemfile /opt/app/dfw/Gemfile
ADD Gemfile.lock /opt/app/dfw/Gemfile.lock
RUN RAILS_ENV=production /opt/jruby/bin/jruby -S bundle install --without development test --no-color --path /opt/app/dfw
ADD . /opt/app/dfw

It's still not perfect. The push phase still takes way to long, and the 'already pushed' images take a couple seconds each to determine that they've already been pushed.

We've speculated that if the push phase could run in parallel, or just faster we could shave 20 seconds or more off the build.  But as it is, we're content with what we have.

Wednesday, June 25, 2014

PostgreSQL Performance on Docker

While preparing my presentation on Postgres and Docker to the PDX Postgres User Group for the June meeting, I thought it would be cool to run Postgres on Docker through some rough benchmarks. What I ended up finding was fairly surprising.

The Setup

For this test I used Digital Ocean's smallest droplet 512M RAM/1 CPU/20G SSD Disks.
They have an awesome feature where you can chose the type of application you want to run on the droplet and they'll pre-configure it for you. They even have one for Docker, so that's the one I chose.

Here's a list of all of the versions of the software I used.

Host OS: Ubuntu 14.04 LTS
Kernel: 3.11.0-12-generic
Docker version 1.0.1, build 990021a
PostgreSQL Version: 9.3.4

I used pgtune to generate my configuration, i used these same values across all tests:
maintenance_work_mem = 30MB 
checkpoint_completion_target = 0.7
effective_cache_size = 352MB 
work_mem = 2304kB 
wal_buffers = 4MB 
checkpoint_segments = 8 
shared_buffers = 120MB 
max_connections = 200 

Dockerfiles can be found here

The Tests

I used Greg Smith's pgbench-tools to run my tests with the following configuration
SCALES="1 10 100"
SETCLIENTS="1 2 4 8 16"

My plan was to test how much overhead things such as Docker's virtual networking and virtual I/O incurred, and generally how much slower Postgres would be on Docker. So the test that I thought would be representative were:

  • Baseline Postgres on the system w/o Docker
  • Postgres inside Docker with virtual networking and virtual I/O
  • Postgres inside Docker with virtual networking but no virtual I/O
  • Postgres inside Docker with no virtual networking or I/O
After the results of those tests came in I ran another group of tests
  • Baseline Postgres with fsync=on
  • Baseline Postgres with fsync=off
  • Docker Postgres with fsync=on
  • Docker Postgres with fsync=off
After all of those tests were run it occurred to me that the base image I was using for docker was CentOS and my "baseline Postgres" was ubuntu. So i added one final test:
  • Docker Postgres w/Ubuntu base image
I dropped caches, restarted docker and or postgres between each run to ensure that I wasn't dealing with different caches.

EDIT: 2014-07-09

This original post was targeted at DBAs and folks who hadn't heard much of Docker, so I glossed over some of the specifics above. So I'd like to attempt to rectify that.

When I say "With no virtual networking" what I mean is Host networking --net==host
When I say "With no Virtual I/O above" what I mean is that I used a volume 

I didn't add my volume to my Docker image so that I could use the exact same image for all tests.

With regards to the validity of the test on a VM.

I ran the full suite of tests 20 to 30 times (because I didn't believe the results).
It's certainly possible that runs I managed to hit Digital Ocean at Just.The.Right.Time(tm) each time, but i doubt it.

Regardless, I don't claim that these results are perfect. I've posted my methodology and given access to the source. So please reproduce and dispute. Honestly if you found that I did something significantly wrong in these benchmarks that would make them "fit" into more sensible expectations I'd be very grateful.

The Results

The results for the first round of tests were pretty surprising.

What wasn't surprising is that with the virtual I/O and networking Docker performed predictably slower than the baseline.

However, Docker with host networking and no virtual I/O was by far faster. This was very surprising. And I still don't trust these results, despite repeated runs confirming them.

I presented these results to the PDXPUG and received some good feedback on things to try to help validate the results. 

A prevailing theory was that Docker may be messing with/caching/breaking fsync so I ran the test with fsync on and fsync off in Postgres - here are the results:

So dockerized postgres with fsync ON is faster than vanilla postgres with fsync off...
That seems really wrong, however Digital Ocean gives SSDs to their droplets which may mean that the I/O really isn't much of a factor. That would mean that the difference is elsewhere.

Another theory is that Postgres on CentOS is simply faster than Postgres on Ubuntu despite the fact that they're sharing the same Kernel, it's possible that the CentOS C library has some optimizations that the Ubuntu one does not. I ran a test on Dockerized Postgres on Ubuntu vs Vanilla Postgres on the host machine (ubuntu as well). Here's the output from that run:

Dockerized Ubuntu is pretty much on par with Dockerized CentOS. So that's good at least.

Further thoughts and analysis

First, let me state that I wouldn't be surprised at all if I missed some major fact in these tests or if I was somehow tricked by the results. The results don't feel right at all. I've run the tests over and over again and have gotten consistent results, but that doesn't mean that there isn't some sort of environmental explanation. Perhaps the Digital Ocean droplet itself is biased towards Docker due to something underneath the covers. 

But, let's suppose for a minute that these results are legit and reproducible elsewhere how could we explain it? My theory is that since Docker is a wrapper around LXC that it's possible that LCX has a more efficient execution path within the Kernel and is allowing the Dockerized Postgres to use more resources with less interruptions. 

I have some support for that, during my tests I was also gathering low level system statistics and found that for Dockerized Postgres there are significantly fewer context switches than with normal Postgres.

Dockerized Postgres

Normal Postgres

The Conclusion

It's definitely a surprising result, and one that goes against my expectations. Obviously further research is needed to confirm the result.

The thought that LXC somehow has an optimized path within the kernel is very satisfying, but at the same time there are too many variables to know for sure.

I look forward to suggestions for improving my tests and any conflicting evidence from your own tests. 

Wednesday, May 7, 2014

Stackdriver Acquired by google

Grats to Stackdriver on their Acquisition by Google.

Now if they'd like to bring me, and my Stackdriver Pluggable Custom Metrics along. That'd be just peachy.

Web Statistics