Monday, October 15, 2012

Mention in the Postgres release notes

I recently submitted a (very) small patch for PostgreSQL to fix a problem we were having with the backup label not getting fsynced to disk. (which is a problem if you're using something like AWS or SAN snapshots.)

 I kindly got a mention in the release notes for it, pretty cool! (thanks guys!)

Wednesday, September 12, 2012

Presentation @PDXPUG

I recently did a talk about vertically scaling Postgres for the PDXPUG Portland Postgres User group.
The presentation is mostly about some observations and challenges that I've run into while designing and running a very large Postgres install.
They were kind enough to record the video and post it on Vimo.
(Also, it turns out I'm not the greatest public speaker)

PDXPUG: Vertically Scaling Postgres from Mark Wong on Vimeo.

Monday, August 13, 2012

Streaming Replication, syncing mirrors and PANIC: WAL contains references to invalid pages

I've recently come across a problem with Streaming Replication where on failover the system generates a PANIC: WAL contains references to invalid pages error message.
Aug  9 14:06:22 db01 postgres[11005]: [8-1] user=,db=,host= LOG:  trigger file found: /db/9.1/data/failover
Aug  9 14:06:22 db01 postgres[11012]: [3-1] user=,db=,host= FATAL:  terminating walreceiver process due to administrator command
Aug  9 14:06:23 db01 postgres[11005]: [9-1] user=,db=,host= LOG:  invalid record length at 398/1D0002F0
Aug  9 14:06:23 db01 postgres[11005]: [10-1] user=,db=,host= LOG:  redo done at 398/1D000298
Aug  9 14:06:23 db01 postgres[11005]: [11-1] user=,db=,host= LOG:  last completed transaction was at log time 2012-08-09 09:16:08.497124+00
Aug  9 14:06:23 db01 postgres[11005]: [12-1] user=,db=,host= LOG:  selected new timeline ID: 10
Aug  9 14:06:23 db01 postgres[11005]: [13-1] user=,db=,host= LOG:  archive recovery complete
Aug  9 14:06:23 db01 postgres[11005]: [14-1] user=,db=,host= WARNING:  page 1563020 of relation base/16436/47773 was uninitialized
Aug  9 14:06:23 db01 postgres[11005]: [15-1] user=,db=,host= WARNING:  page 1563023 of relation base/16436/47773 was uninitialized
Aug  9 14:06:23 db01 postgres[11005]: [16-1] user=,db=,host= WARNING:  page 1563021 of relation base/16436/47773 was uninitialized
Aug  9 14:06:23 db01 postgres[11005]: [17-1] user=,db=,host= WARNING:  page 1563022 of relation base/16436/47773 was uninitialized
Aug  9 14:06:23 db01 postgres[11005]: [18-1] user=,db=,host= WARNING:  page 792619 of relation base/16436/47776 was uninitialized
Aug  9 14:06:23 db01 postgres[11005]: [19-1] user=,db=,host= PANIC:  WAL contains references to invalid pages
Aug  9 14:06:49 db01 postgres[11003]: [2-1] user=,db=,host= LOG:  startup process (PID 11005) was terminated by signal 6: Aborted
Aug  9 14:06:49 db01 postgres[11003]: [3-1] user=,db=,host= LOG:  terminating any other active server processes
I believe that this has something to do with the methodology that I use to resync my mirrors after a failover. The environments that I've seen this on have gone through a lot of failover testing. The scenario looks something like:
A[Master] => B[Standby]
rsync A from B
B[Master] => A[Standby]
resync B from A
The process I use to sync, and resync doesn't change.


if [ -z "$SOURCE" ]; then
 echo "Usage: $0 <master postgres host>"
 exit 1

. /etc/sysconfig/pgsql/postgresql-9.1

export PGDATA
export PGPORT

rm -f $PGDATA/failover

pg_ctl stop -D $PGDATA -m immediate
psql -h $SOURCE -p $PGPORT <<EOD
select pg_start_backup('mirror');
rsync -avv --delete-delay \
  --exclude postgresql.conf  \
  --exclude pg_hba.conf \
  --exclude server.conf \
  --exclude archive.conf \
  --exclude recovery.conf \
  --exclude recovery.done \
  --exclude pg_ident.conf \
  --exclude failover \
         --exclude pg_xlog \
         --exclude \
psql -h $SOURCE -p $PGPORT -c "select pg_stop_backup()"
cp $PGDATA/recovery.done $PGDATA/recovery.conf
pg_ctl start -D $PGDATA 
I don't think anything inherently wrong with this method, so I could be coming across a linux bug or a Postgres bug.

Some investigation with the fine fellows at 2nd Quadrant has shown that the failed pages are all in index relations.

Our next step in debugging the process will be to start the server with DEBUG3 and see if we can get more info. Unfortunately I've blown away that mirror, so I have to wait for it to happen again.

Stay tuned.

Update 10/15/2012
With the release of Postgres 9.1.6 I believe that what we were running into was the critical bug that's fixed in that release regarding WAL replay.
So Mystery Solved!

Tuesday, July 10, 2012

I/O Spikes during Checkpoint

I run Postgres on a fairly large linux server with 256G of ram.

During high load, I found that the I/O of the $PGDATA volume was spiking to 100% making the database slow down to a crawl for seconds at a time, despite having a fairly fast I/O subsystem.

This is what the spike looked like from an iostat output:
Date                    r/s     w/s     rsec/s  wsec/s          await   svctm   %util
07/10/12 00:35:36       0       69.8    0       2233.6          0.63    0.07    0.46
07/10/12 00:35:41       1.2     810     99.2    22200           4.13    0.05    4.02
07/10/12 00:35:46       0       111.6   0       5422.4          1.82    0.08    0.9
07/10/12 00:35:51       0       299.2   0       5670.4          1.27    0.04    1.24
07/10/12 00:35:56       0.8     176.6   41.6    3654.4          2.16    0.07    1.32
07/10/12 00:36:01       0       364.8   0       6670.4          1.1     0.04    1.62
07/10/12 00:36:06       0.8     334.6   12.8    5953.6          1.18    0.05    1.64
07/10/12 00:36:11       0       118.6   0       6948.8          1.82    0.07    0.82
07/10/12 00:36:16       0       8274.6  0       148764.8        10.55   0.07    61.18
07/10/12 00:36:21       0.2     8577.4  3.2     161806.4        16.68   0.12    99.62
07/10/12 00:36:26       0.8     9244.6  12.8    167841.6        15.01   0.11    99.82
07/10/12 00:36:31       0.8     9434.2  44.8    208156.8        16.22   0.11    99.7
07/10/12 00:36:36       0       9582.8  0       202508.8        14.84   0.1     99.72
07/10/12 00:36:41       0       9830.2  0       175326.4        14.42   0.1     99.5
07/10/12 00:36:46       0       8208.6  0       149372.8        17.82   0.12    99.64
07/10/12 00:36:51       3       1438.4  102.4   26748.8         8.49    0.12    18
07/10/12 00:36:56       0.6     2004.6  9.6     27400           1.25    0.03    5.74
07/10/12 00:37:01       0.6     1723    9.6     23758.4         1.85    0.03    5.08
07/10/12 00:37:06       0.4     181.2   35.2    2928            1.49    0.06    1.06

The Linux I/O subsystem can get overwhelmed with dirty buffers when using a system with high memory + high volume.

If you consider that on a 256GB system - the defaults for RHEL 6.2:
vm.dirty_ratio = 10 == 26Gb
vm.dirty_background_ratio = 5 == 13Gb

You can see that you can have quite a bit of data to flush out when the fsync() is called as part of a checkpoint

To alleviate the problem there are 2 new kernel parameters:
vm.dirty_bytes and vm.dirty_background_bytes

These settings allow you to specify a much smaller value for your filesystem buffer.

These settings take some tweaking to get right - you may consider the size of your RAID cache or your general I/O throughput.

In my case I found that the following settings worked well with my disk subsystem.
vm.dirty_background_bytes = 33554432 # 32MB
vm.dirty_bytes = 268435456 #256MB

You can see the difference in the following chart, the blue line is the default and the red line is with the new VM settings.

You can see that you're using slightly more I/O with the new settings but the spike doesn't end up happening.

Thanks to Maxim, Jeff and Andres in this thread for pointing me in te right direction And Greg who tried to explain it to me but I was too block headed at that time to get it.

Monday, June 25, 2012

pg_service.conf in redhat

I'm toying around with pg_service.conf. It wasn't obvious - but I was able to determine that if you add pg_service.conf to /etc/sysconfig/pgsql/ it will get picked up globally.

Thursday, June 14, 2012

psql - reverse search history

psql the postgresql command line client is usually compiled with libreadline support. As a result of this you can use some neat functionality such as reverse search by hitting ctl-r within the client.
hit CTL-r
start typing
(reverse-i-search)`sel':  select * from report.audit where 

Wednesday, June 6, 2012

Create a backup manifest in PostgreSQL with oid2name

oid2name is a nifty little program comes with postgresql an it allows you to take those obscure relfilenode named datafiles at the OS level and map them to database objects.

The only downside to this little tool is that it doesn't exactly work when your database is down since it uses pg_* to figure out this info. So if you're in a situation where you're restoring the database and find out that you're missing a file from the backup - you may want to know what table is lost.

If you thought ahead and have a manifest.txt in your backups you'll be able to see that the following maps to pgbench_accounts
-rw------- 1 postgres postgres 13434880 Jun  7 04:44 /db/9.1/data/base/1952441/1952454

If you use the -ix flag you get schema, OID and tablespace. If you also add the -S flag you get all the system tables too which is very important in a recovery situation.
/usr/pgsql-9.1/bin/oid2name -ix -d pgbench
From database "pgbench":
  Filenode             Table Name      Oid  Schema  Tablespace
   1952454       pgbench_accounts  1952448  public  pg_default
   1952459  pgbench_accounts_pkey  1952459  public  pg_default
   1952442       pgbench_branches  1952442  public  pg_default
   1952455  pgbench_branches_pkey  1952455  public  pg_default
   1952451        pgbench_history  1952451  public  pg_default
   1952445        pgbench_tellers  1952445  public  pg_default
   1952457   pgbench_tellers_pkey  1952457  public  pg_default

Friday, June 1, 2012


I remember back in my oracle days, one of the first things I'd turn off in my OLTP environments was hash_join_enabled. Oracle would spend more time creating the hash table than it would actually processing the query.

I ran into that again this week, but this time with Postgres 9.1 and this time with Materialize.

Here's the query stats (from the wonderfull pretty clearly showed me where the problem was.
node typecountsum of times% of query
Bitmap Heap Scan561251.792 ms0.1 %
Bitmap Index Scan6666.669 ms0.0 %
BitmapAnd10.225 ms0.0 %
Hash64.904 ms0.0 %
Hash Join637540.586 ms0.1 %
HashAggregate198.755 ms0.0 %
Index Scan53777.276 ms0.0 %
Materialize928084279.438 ms54.9 %
Merge Left Join1100.578 ms0.0 %
Nested Loop7607.531 ms0.0 %
Nested Loop Left Join1022904169.450 ms44.8 %
Seq Scan1572686.868 ms0.1 %
Sort3137.154 ms0.0 %

Here's the same query in PG9.0.

node typecountsum of times% of query
Bitmap Heap Scan3238.963 ms0.1 %
Bitmap Index Scan441.682 ms0.0 %
BitmapAnd10.458 ms0.0 %
Hash1529947.841 ms9.4 %
Hash Join773363.019 ms22.9 %
Hash Left Join841505.547 ms13.0 %
HashAggregate1164.592 ms0.1 %
Index Scan6187.713 ms0.1 %
Materialize15122.977 ms1.6 %
Nested Loop685.415 ms0.0 %
Nested Loop Left Join310780.593 ms3.4 %
Seq Scan16158396.545 ms49.5 %
Sort115.215 ms0.0 %

It seems like PG9.1 favors materializing a little more than 9.0 does, I even knocked the statistics up to 10k in the PG9.1 environment to get it to knock that off but no luck.

To get by the problem, while we tune the query and look at some other PG tuning options, I just disabled materializing for this specific application user.

Sunday, May 13, 2012

A script to see row counts and table size in PostgreSQL

Here is a little perl script that I use to give me a list of tables in a schema, a rowcount and the pg_relation_size of a table. The usage is just:
pg_rowcount <database> <schema>
ex: ./ david.kerr pg_catalog
The output looks like this:
Table                                Rows                 Size
pg_statistic                         1853              1672 kB
pg_type                               910               168 kB
pg_attribute                         9569              1488 kB
pg_class                             1371               416 kB
pg_authid                              52                16 kB
pg_index                              766               112 kB
pg_operator                           705               104 kB
pg_database                            44                16 kB
You can download the script here

How To Fix Stalled CUPS Printing

Every once in a while CUPS printing just stalls, no reason that I can tell. (I blame the network, naturally) but what ends up happening is that the print jobs get backed up and you need to clear out the queue. The fastest way to do this is to do:
rm /var/spool/cups/*
/etc/init.d/cups restart

How To Reset Sequences in PostgreSQL

I use a lot of "serial" data type fields, sometimes the nextval of my sequence doesn't reflect the max() of my table. Since I do a lot of moving data around these can get messed up. The basic command that you want to use to reset a sequence is:
select setval('sequence_name', value);
However, the cool thing about Postgres is that can be a query too so I use:
select setval('sequence_name', select max(column_name) from tablename);
So I wrote a little perl script to look through all of the columns that have a default like "nextval" (which if you look in the database is how a serial column actually looks) and sets the sequence to the max of the value in that table. You can get it here -d  [-s ] [-u ] [-p ]
If you specify a database it will reset all of the columns in the database If you specify a schema it will reset all of the columns within a specific schema. If you specify a user it will reset all of the columns for a specific user.

Using generate_series() in PostgreSQL

If you want to dummy up a bunch of data in PostgreSQL you can use this neat little trick
create table test (a int);
insert into test (select generate_series(0,999,1));
INSERT 0 1000;

Using insert .. returning with perl and PostgreSQL

PostgreSQL provides the ability to return a value on insert. so for example:

create table foo (id serial, name text);
insert into foo('dave') returning id;
Will return the auto-generated value for "id" assigned to 'dave'
You can leverage this from within Perl by doing something like this:
my $foo_insert = "insert into foo(?) returning id"
my $foo_cur = $dbh->prepare($foo_insert);
my $foo_rec = $foo_cur->fetchrow_hashref();
print $foo_rec{"id"};

Parsing Large files with Pgfouine

ppgfouine is a nice logfile analyzer for PostgreSQL written in php.

When doing a trace on a very long running ETL process the logfile generated was ~11GB.

This left me running up against a 2GB barrier in php for fopen().

If you've got a 64bit machine and can recompile php with -D_FILE_OFFSET_BITS=64 then you're good to go. But in my case, I wasn't able to do either.

 The error i'd get is: PHP Fatal error: File is not readable. in /var/lib/pgsql/pgfouine-1.1/include/GenericLogReader.class.php on line 85

The simple solution was to use a named pipe since pgfouine expects a file and doesn't seem to be able to read from stdin.

mknod /tmp/pg2 p
cat /var/log/postgres > /tmp/pg2 | ./pgfouine.php -file /tmp/pg2 > bla.html
Another variation on this, for those of us using UTC and find that our postgres log is rotated mid-day is to do something like:
(gzcat /var/log/ && cat /var/log/postgres) > /tmp/pg2 | ./pgfouine.php -file /tmp/pgt > bla.html

How To Add Quotes to results in dynamic SQL

I frequently find the need to have single quoted output when i generate dynamic SQL.

It's always a pain to remember the exact number of ticks needed to get the quoted output.

Here is a reminder on how to do it:
select ''''||schemaname||'.'||tablename||'''' from pg_tables
i.e., four quotes when there's no text, or three quotes on the outside and one on the inside when there is text.
select '''public.'||tablename||''';' from pg_tablesselect 'select pg_size_pretty(pg_relation_size('''||tablename||'''));' from pg_tables where schemaname = 'bla';

Moving to

Until now, I've more or less preferred to run my own hosted blogging platform. However, my server is getting old and unreliable and with the plethora of free services available, I can't justify the cost of an upgrade or really continued hosting.

So I'll give this a try and see if I can live with it. So far I like the customization options.

I've migrated my most popular posts, however I haven't kept the permalinks in tact so you may need to search or check the tags. (there wasn't much there anyway)
Web Statistics