Tuesday, November 19, 2013

Tune your Postgres RDS instance via parameter groups

It seems like the folks at Amazon set some strange defaults for their RDS Postgres instances and they make it pretty darn difficult to allow for dynamically sized instance.

You tune your Postgres RDS instance via Parameter Groups. In the parameter group configuration is all of the normal PG tuning parameters from your postgresql.conf.

They provide you with a variable: {DBInstanceClassMemory} which returns the memory in bytes available to the instance, and you can use that in some limited ways to dynamically set parameters based on the instance type you chose for your RDS database. There may be more of these variables but I wasn't able to find them.

One commenter pointed out that DBInstanceClassMemory is possibly not the entire memory of the machine. So for example: DBInstanceClassMemory on an m1.xlarge would not be 16106127360 (16GB) but instead they lower it to take into account the memory allocated to the OS. I hope that this will be changed in the futures since most postgres tuning guides are based on total system memory and not mem - arbitrary OS usage.

It'd be easy to just hop in and hardcode you're values, but that's not really very fun now is it? And besides we want to be able to grow our instances or spin up smaller / larger read replicas w/o defining a Parameter Group for each size we might want. So I'm endeavoring to make my parameter group as dynamic as possible.

I spun up an m1.xlarge RDS instance and did some playing around.

Before we get stared: one thing to note is that you don't want any spaces between the {}'s when doing calculations, otherwise AWS will respond with a format error. It also seems to have problems with multiple divisions or multiplications. Some areas they take, often they don't.

Below is a list of configuration parameters that I usually set in all my databases (which I base on PgTune). Naturally you'd want to set these based on your needs, but I find that PgTune gets you 90% there most of the time.

Edit: As some commenters have pointed out I got a few of the calculations wrong by not taking into account that RDS was using kb for some parameters, or other reasons (I actually suspect that amazon made a change to how they're calculating these parameters). I've updated the values below with the correct / current information.

max_connections 

We all know that max_connections is best expressed at a function of # of CPUs, however Amazon went with this formula: {DBInstanceClassMemory/12582880} - which on the m1.large would leave you with 604 connections.... Unless Amazon added a connection pooler to postgres I don't see how that would possibly work.

Folks migrating from MySQL or Oracle could easily get tripped up by not paying proper attention that that parameter.

I usually keep max connections to around 200 just because I usually deal with Ruby/Java where I need to oversubscribe a bit.


shared_buffers 

The default here is: {DBInstanceClassMemory/32768} which is correct. Originally it didn't seem to calculate correctly so I had recommended a change.


checkpoint_segments

The default # of checkpoint segments is 16 which is a pretty good base, however if you're a web application (which you probably are if you're using RDS) PgTune recommends that you set it to 8.


work_mem

work_mem is a little tricky. I, and pgtune, usually base it on max_connections to calculate this value. However I haven't managed to find a way to reference a database parameter from within the RDS parameter section. For example: {max_connections} doesn't work.

So you'll need to hard code something like your connections in there.

So work_mem: {DBInstanceClassMemory/204800} # (200 (max_connections) * 1024 = 204800)

On an m1.xlarge that would give you  ~74MB which is what pgtune would give you on the same instance.


maintenance_work_mem

For this parameter RDS just uses the postgres default which is 16MB so creating indexes on large tables would end up taking a few years.

I try to keep this one around 1GB if the machine has enough memory. A nice rule of thumb though would be System Memory (in bytes) / 16 (which is what pgtune generates).

We need to take into account the fact that RDS deals in kb for this parameter so: {DBInstanceClassMemory/16384}  (16*1024)

This gives us 924MB on an m1.xlarge


effective_cache_size

Effective cache size is generally tricky in AWS since even with provisioned IOPS you're getting slightly better than thumb-drive speed out of your disks. So you may benefit from a smaller effective_cache_size.

As commenter Alexandre Russel pointed out RDSs unit of measure for effective_cache_size is in 8kB blocks

The effective_cache_size defaults associated with RDS are {DBInstanceClassMemory/16384}
So DBInstanceClassMemory / 2 * 8k or, in other words, 50%.

To get it closer to the 75% we would normally use we can try (75/(100*8192))
{DBInstanceClassMemory*75/819200} which brings us up to a little over 10GB for an m1.xlarge.


wal_buffers

Uses the postgres default of 2048 bytes. I usually do 512 * checkpoint segments for this one so you may need to just hardcode it in at 4096


checkpoint_completion_target

Again no good way to dynamically set this one, fortunately it's pretty fixed for a web workload at 0.7

Web Statistics