Tuesday, July 10, 2012

I/O Spikes during Checkpoint

I run Postgres on a fairly large linux server with 256G of ram.

During high load, I found that the I/O of the $PGDATA volume was spiking to 100% making the database slow down to a crawl for seconds at a time, despite having a fairly fast I/O subsystem.

This is what the spike looked like from an iostat output:
Date                    r/s     w/s     rsec/s  wsec/s          await   svctm   %util
[...]
07/10/12 00:35:36       0       69.8    0       2233.6          0.63    0.07    0.46
07/10/12 00:35:41       1.2     810     99.2    22200           4.13    0.05    4.02
07/10/12 00:35:46       0       111.6   0       5422.4          1.82    0.08    0.9
07/10/12 00:35:51       0       299.2   0       5670.4          1.27    0.04    1.24
07/10/12 00:35:56       0.8     176.6   41.6    3654.4          2.16    0.07    1.32
07/10/12 00:36:01       0       364.8   0       6670.4          1.1     0.04    1.62
07/10/12 00:36:06       0.8     334.6   12.8    5953.6          1.18    0.05    1.64
07/10/12 00:36:11       0       118.6   0       6948.8          1.82    0.07    0.82
07/10/12 00:36:16       0       8274.6  0       148764.8        10.55   0.07    61.18
07/10/12 00:36:21       0.2     8577.4  3.2     161806.4        16.68   0.12    99.62
07/10/12 00:36:26       0.8     9244.6  12.8    167841.6        15.01   0.11    99.82
07/10/12 00:36:31       0.8     9434.2  44.8    208156.8        16.22   0.11    99.7
07/10/12 00:36:36       0       9582.8  0       202508.8        14.84   0.1     99.72
07/10/12 00:36:41       0       9830.2  0       175326.4        14.42   0.1     99.5
07/10/12 00:36:46       0       8208.6  0       149372.8        17.82   0.12    99.64
07/10/12 00:36:51       3       1438.4  102.4   26748.8         8.49    0.12    18
07/10/12 00:36:56       0.6     2004.6  9.6     27400           1.25    0.03    5.74
07/10/12 00:37:01       0.6     1723    9.6     23758.4         1.85    0.03    5.08
07/10/12 00:37:06       0.4     181.2   35.2    2928            1.49    0.06    1.06

The Linux I/O subsystem can get overwhelmed with dirty buffers when using a system with high memory + high volume.

If you consider that on a 256GB system - the defaults for RHEL 6.2:
vm.dirty_ratio = 10 == 26Gb
vm.dirty_background_ratio = 5 == 13Gb

You can see that you can have quite a bit of data to flush out when the fsync() is called as part of a checkpoint

To alleviate the problem there are 2 new kernel parameters:
vm.dirty_bytes and vm.dirty_background_bytes

These settings allow you to specify a much smaller value for your filesystem buffer.

These settings take some tweaking to get right - you may consider the size of your RAID cache or your general I/O throughput.

In my case I found that the following settings worked well with my disk subsystem.
vm.dirty_background_bytes = 33554432 # 32MB
vm.dirty_bytes = 268435456 #256MB


You can see the difference in the following chart, the blue line is the default and the red line is with the new VM settings.

You can see that you're using slightly more I/O with the new settings but the spike doesn't end up happening.




Thanks to Maxim, Jeff and Andres in this thread for pointing me in te right direction And Greg who tried to explain it to me but I was too block headed at that time to get it.

No comments:

Post a Comment

Web Statistics