Wednesday, June 25, 2014

PostgreSQL Performance on Docker

While preparing my presentation on Postgres and Docker to the PDX Postgres User Group for the June meeting, I thought it would be cool to run Postgres on Docker through some rough benchmarks. What I ended up finding was fairly surprising.

The Setup

For this test I used Digital Ocean's smallest droplet 512M RAM/1 CPU/20G SSD Disks.
They have an awesome feature where you can chose the type of application you want to run on the droplet and they'll pre-configure it for you. They even have one for Docker, so that's the one I chose.

Here's a list of all of the versions of the software I used.

Host OS: Ubuntu 14.04 LTS
Kernel: 3.11.0-12-generic
Docker version 1.0.1, build 990021a
PostgreSQL Version: 9.3.4

I used pgtune to generate my configuration, i used these same values across all tests:
maintenance_work_mem = 30MB 
checkpoint_completion_target = 0.7
effective_cache_size = 352MB 
work_mem = 2304kB 
wal_buffers = 4MB 
checkpoint_segments = 8 
shared_buffers = 120MB 
max_connections = 200 

Dockerfiles can be found here

The Tests

I used Greg Smith's pgbench-tools to run my tests with the following configuration
SCALES="1 10 100"
SETCLIENTS="1 2 4 8 16"

My plan was to test how much overhead things such as Docker's virtual networking and virtual I/O incurred, and generally how much slower Postgres would be on Docker. So the test that I thought would be representative were:

  • Baseline Postgres on the system w/o Docker
  • Postgres inside Docker with virtual networking and virtual I/O
  • Postgres inside Docker with virtual networking but no virtual I/O
  • Postgres inside Docker with no virtual networking or I/O
After the results of those tests came in I ran another group of tests
  • Baseline Postgres with fsync=on
  • Baseline Postgres with fsync=off
  • Docker Postgres with fsync=on
  • Docker Postgres with fsync=off
After all of those tests were run it occurred to me that the base image I was using for docker was CentOS and my "baseline Postgres" was ubuntu. So i added one final test:
  • Docker Postgres w/Ubuntu base image
I dropped caches, restarted docker and or postgres between each run to ensure that I wasn't dealing with different caches.

EDIT: 2014-07-09

This original post was targeted at DBAs and folks who hadn't heard much of Docker, so I glossed over some of the specifics above. So I'd like to attempt to rectify that.

When I say "With no virtual networking" what I mean is Host networking --net==host
When I say "With no Virtual I/O above" what I mean is that I used a volume 

I didn't add my volume to my Docker image so that I could use the exact same image for all tests.

With regards to the validity of the test on a VM.

I ran the full suite of tests 20 to 30 times (because I didn't believe the results).
It's certainly possible that runs I managed to hit Digital Ocean at Just.The.Right.Time(tm) each time, but i doubt it.

Regardless, I don't claim that these results are perfect. I've posted my methodology and given access to the source. So please reproduce and dispute. Honestly if you found that I did something significantly wrong in these benchmarks that would make them "fit" into more sensible expectations I'd be very grateful.

The Results

The results for the first round of tests were pretty surprising.

What wasn't surprising is that with the virtual I/O and networking Docker performed predictably slower than the baseline.

However, Docker with host networking and no virtual I/O was by far faster. This was very surprising. And I still don't trust these results, despite repeated runs confirming them.

I presented these results to the PDXPUG and received some good feedback on things to try to help validate the results. 

A prevailing theory was that Docker may be messing with/caching/breaking fsync so I ran the test with fsync on and fsync off in Postgres - here are the results:

So dockerized postgres with fsync ON is faster than vanilla postgres with fsync off...
That seems really wrong, however Digital Ocean gives SSDs to their droplets which may mean that the I/O really isn't much of a factor. That would mean that the difference is elsewhere.

Another theory is that Postgres on CentOS is simply faster than Postgres on Ubuntu despite the fact that they're sharing the same Kernel, it's possible that the CentOS C library has some optimizations that the Ubuntu one does not. I ran a test on Dockerized Postgres on Ubuntu vs Vanilla Postgres on the host machine (ubuntu as well). Here's the output from that run:

Dockerized Ubuntu is pretty much on par with Dockerized CentOS. So that's good at least.

Further thoughts and analysis

First, let me state that I wouldn't be surprised at all if I missed some major fact in these tests or if I was somehow tricked by the results. The results don't feel right at all. I've run the tests over and over again and have gotten consistent results, but that doesn't mean that there isn't some sort of environmental explanation. Perhaps the Digital Ocean droplet itself is biased towards Docker due to something underneath the covers. 

But, let's suppose for a minute that these results are legit and reproducible elsewhere how could we explain it? My theory is that since Docker is a wrapper around LXC that it's possible that LCX has a more efficient execution path within the Kernel and is allowing the Dockerized Postgres to use more resources with less interruptions. 

I have some support for that, during my tests I was also gathering low level system statistics and found that for Dockerized Postgres there are significantly fewer context switches than with normal Postgres.

Dockerized Postgres

Normal Postgres

The Conclusion

It's definitely a surprising result, and one that goes against my expectations. Obviously further research is needed to confirm the result.

The thought that LXC somehow has an optimized path within the kernel is very satisfying, but at the same time there are too many variables to know for sure.

I look forward to suggestions for improving my tests and any conflicting evidence from your own tests. 
Web Statistics