Tweaking PostgreSQLs automatic statistics collector

PostgreSQL, like many RDBMSs uses a cost based optimizer. CBOs rely on database “statistics” (number of rows, data distribution, etc.) to generate a good execution plan for any query that is sent to the engine. If you have bad statistics, then you potentially have bad query plans. So a query that should take 1 second ends up taking 3 hours.

I recently ran into a problem like that in PostgreSQL, the query ran fine *most* of the time, but every once in a while it would go off into the weeds.

After a little digging I realized that the table’s statistics hadn’t been updated in quite a while. This struck me as odd since I knew that PostgreSQL automatically updates statistics on a regular basis. So I did some digging to figure out exactly what triggers statistics to be updated in PostgreSQL.

First off, it’s important to note that in postgresql the vacuum process is what handles statistics. “vacuum analyze” is the command although you can just use “analyze” too. Vacuuming and analyzing are different operations, and have different, but very similar parameters that control them.

The key postgresql.conf parameters that affect whether or not a table gets auto-analyzed are:

#autovacuum_analyze_threshold = 50
#autovacuum_analyze_scale_factor = 0.1

(You can see the defaults in the postgresql.conf above)

The formula that Postgres uses to determine if a table needs to be auto-analyzed is
([ # of rows in table ] * [ scale factor ]) + [ threshold ]

So, for example, if you have a table with 10,000 rows in it, then it would look like this:
( 10000 * 0.1 ) + 50 = 1050

So if you were to add 1049 rows, the table would not get auto-analyzed.

So how do you change it? Well, there are 2 ways. you could change the 2 parameters in the postgresql.conf, and this is a good idea if you think you need to globally lower or raise them.
However, I think in most cases you’ll want to make the change just for one table, which means you need to utilize the pg_autovacuum table.

postgres=# \d pg_autovacuum
    Table "pg_catalog.pg_autovacuum"
      Column      |  Type   | Modifiers
------------------+---------+-----------
 vacrelid         | oid     | not null
 enabled          | boolean | not null
 vac_base_thresh  | integer | not null
 vac_scale_factor | real    | not null
 anl_base_thresh  | integer | not null
 anl_scale_factor | real    | not null
 vac_cost_delay   | integer | not null
 vac_cost_limit   | integer | not null
 freeze_min_age   | integer | not null
 freeze_max_age   | integer | not null
Indexes:
    "pg_autovacuum_vacrelid_index" UNIQUE, btree (vacrelid)

Remember when I mentioned above that vacuuming and analyzing were handled via similar mechanisms? you can see that here, the table has fields related to vacuuming and analyzing.

The fields we care about for this particular problem are: vacrelid, enabled, anl_base_thresh and anl_scale_factor.

  • vacrelid is the OID of your table. you can get that via:
  • select oid from pg_class where relname = [tablename]

  • enabled is “true”
  • anl_base_thresh is your new autovacuum_analyze_threshold
  • anl_scale_factor is your new autovacuum_scale_factor

To make the parameter change for just one table we then need to insert into the pg_autovacuum table.

postgres@postgres=# select oid from pg_class where relname = 'test_vac';
   oid
---------
 1579952

postgres@postgres=# insert into pg_autovacuum(1579952,true,50,0.2,50,0.09,20,-1,100000000,200000000)

I’m changing the scale factor from 0.1 to 0.9, the rest are the default values for those parameters
What that means is in our example:
( 10000 * 0.1 ) + 50 = 1050
Changes to:
( 10000 * 0.09 ) + 50 = 950

In other words, the autovacuum would kick off 100 records sooner, not a huge difference, I know however to the planner it could make all of the difference in the world.

Leave a comment

0 Comments.

Leave a Reply


[ Ctrl + Enter ]