Thursday, January 15, 2015

Introducing MrTuner and RDSTune

Two simple gems to help tune your Postgres databases.

I'm a big fan of PgTune. I think that in many cases you can run PgTune and set-it-and-forget-it for your Postgres parameters. I like it so much that I often wish I had access to it in my code - especially when working with Puppet to provision new databases servers.

When I started looking into RDS Postgres a while back I realized that the default configuration for those instances was lacking and I really wished I could run PgTune on the RDS instances.

It was to solve those problems above that these two projects formed.

  • RDSTune will create a MrTuner-ized RDS Parameter Group 
  • MrTuner is a Ruby gem that follows in the sprit of PgTune if not directly in it's footsteps.

Both will run from the command line but, more importantly, they can be `required` by your ruby projects to allow you to access these values programmatically. 

Both Gems are available on rubygems and source, examples, configuration and docks available at their respective bitbucket pages.

RDSTune - https://bitbucket.org/davidkerr/rdstune
MrTuner - https://bitbucket.org/davidkerr/mrtuner

Feedback and Pull requests very welcome!

10 comments:

  1. Thank you for the RDSTune link. Used it today and it helped immensely.

    I do have a Postgres RDS question that I cannot seem to find an answer for and am hoping you might know.

    I have a very large de-normalized table (250M records) that is non-volatile (doesn't change). To increase performance I made this table unlogged.

    Once I populated all my data I then did an RDS Snapshot. Unfortunately, today I needed to restore that snapshot and when I did none of the data was there.

    Does Amazon's RDS Snapshot logic clear out all UNLOGGED declarations in a Postgres instance?

    ReplyDelete
    Replies
    1. Glad they were useful!

      Sorry to hear about your data - I found this bit in the PG docs to be very insightful.
      "an unlogged table is automatically truncated after a crash or unclean shutdown."

      A snapshot is effectively a server crash, the system relies on the WAL to replay to bring the system back into consistency.

      I'd suggest that instead of going with unlogged for loading the data. To, instead, drop all of the indexes and use \copy to get the data in there, that should be fairly quick and should survive a crash.

      Delete
    2. Yes, it definitely seems as if the RDS snapshot is not the basic "VM" snapshot and is considered a "risk event". This is a bummer as I was really hoping it'd be a nice redundancy strategy.

      Are you currently using Postgres RDS in production? I am currently evaluating the possibilities and am trying to see what RDS can do compared to XC/XL. And with 250M+ records... I'm really not seeing a big improvement over a standard EC2 implementation.

      We're actually an Oracle shop but since we have spatial query requirements I figured I would see what RDS+PostGis could do. Any recommendations besides the usual?

      Thanks again!

      Delete
    3. Even a VM snapshot will perform instance recovery on startup. Also most people don't bother to throw postgres into hot backup mode.

      I'm not sure if unlogged tables will get truncated in the event of recovering from a Hot Backup.

      That might be an interesting experiment throw PG into hot backup mode, make a snapshot and then take it out of hot backup and see if your data is still present.

      I'm in the process of migrating from EC2 to RDS for my Postgres instances.
      There are a number of reasons for that. Automatic Failover being a big one.

      I actually haven't used PostGIS, and so far I'm treating RDS just like a database on EC2. So nothing really special: other than the fact that I think RDS is best suited for 1 database / RDS Instance.

      Good luck!

      Delete
  2. Hi David, thanks for your work on RDSTune, which I've used and is great, and your presentation on migration to RDS using Bucardo. I had a question about your presentation here: http://www.slideshare.net/DavidKerr17/migrating-postgres-from-ec2-to-rds where you mention that you can't set the session_replication_role in PGOPTIONS, but can when connecting to RDS using psql. So I'm wondering how you did your dump/restore successfully given this? As using psql to set it only works for that session.

    The only way I could think of was to use a raw-string (non custom format) pg dump and then prepend 'set replication_session_role=replica;' to the file. But it seemed like you managed to make things work without this clunkiness?!

    Thanks,

    Nic

    ReplyDelete
    Replies
    1. Hi Nic,

      I actually cheated and used pg_dump | pg_restore -Fc --disable-triggers ...

      Disabling triggers is really all the session replication role was being used for in this case. So this actually worked for me!

      Delete
    2. Ah interesting. When I've tried that, I get 'ERROR: permission denied: "RI_ConstraintTrigger_c_30951" is a system trigger' unfortunately. Perhaps I need to augment my user permissions - although I am using the main RDS superuser(-like) role that is the master account.

      Delete
    3. Are you bringing everything over in 2 steps:
      First step: pg_dump -s -n public | psql -h -U user
      Second Step: pg_dump -Fc --data-only -N bucardo | pg_restore -Fc --disable-triggers -U user -h host -d database

      make sure your rds_user has been granted rds_superuser;

      Delete
  3. Hi David, just to say we migrated to RDS successfully quite a while ago now (September time) and your blog posts and tools were incredibly helpful on this.

    I've finally written a blog post about the process, which you might find interesting:

    http://www.theguardian.com/info/developer-blog/2016/feb/04/migrating-postgres-to-rds-without-downtime

    Thanks for the help again,

    Nic

    p.s. couldn't see another way to contact you so hoping you notice this!

    ReplyDelete

Web Statistics