<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>david kerr &#187; postgresql</title>
	<atom:link href="http://www.davidmkerr.com/tag/postgresql/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.davidmkerr.com</link>
	<description>Weapons designer. Innovator, inventor, world changer</description>
	<lastBuildDate>Thu, 20 May 2010 14:30:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>HowTo: Add Single Quotes to results in dynamic SQL</title>
		<link>http://www.davidmkerr.com/databases/howto-add-single-quotes-to-results-in-dynamic-sql/</link>
		<comments>http://www.davidmkerr.com/databases/howto-add-single-quotes-to-results-in-dynamic-sql/#comments</comments>
		<pubDate>Mon, 29 Mar 2010 20:13:08 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.davidmkerr.com/?p=252</guid>
		<description><![CDATA[I frequently find the need to have single quoted output when i generate dynamic SQL. It&#8217;s always a pain to remember the exact number of ticks needed to get the quoted output. Here is a reminder on how to do it: select ''''&#124;&#124;schemaname&#124;&#124;'.'&#124;&#124;tablename&#124;&#124;'''' from pg_tables 4 quotes when there&#8217;s no text, or 3 quotes on [...]]]></description>
			<content:encoded><![CDATA[<p>I frequently find the need to have single quoted output when i generate dynamic SQL.</p>
<p>It&#8217;s always a pain to remember the exact number of ticks needed to get the quoted output.</p>
<p>Here is a reminder on how to do it:</p>
<pre>select ''''||schemaname||'.'||tablename||'''' from pg_tables</pre>
<p>4 quotes when there&#8217;s no text, or 3 quotes on the outside and one on the inside when there is text.</p>
<pre>select '''public.'||tablename||'''' from pg_tables</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.davidmkerr.com/databases/howto-add-single-quotes-to-results-in-dynamic-sql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tweaking PostgreSQLs automatic statistics collector</title>
		<link>http://www.davidmkerr.com/databases/tweaking-postgresqls-automatic-statistics-collector/</link>
		<comments>http://www.davidmkerr.com/databases/tweaking-postgresqls-automatic-statistics-collector/#comments</comments>
		<pubDate>Tue, 16 Mar 2010 23:20:43 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://www.davidmkerr.com/?p=225</guid>
		<description><![CDATA[PostgreSQL, like many RDBMSs uses a cost based optimizer. CBOs rely on database &#8220;statistics&#8221; (number of rows, data distribution, etc.) to generate a good execution plan for any query that is sent to the engine. If you have bad statistics, then you potentially have bad query plans. So a query that should take 1 second [...]]]></description>
			<content:encoded><![CDATA[<p>PostgreSQL, like many RDBMSs uses a cost based optimizer. CBOs rely on database &#8220;statistics&#8221; (number of rows, data distribution, etc.) to generate a good execution plan for any query that is sent to the engine. If you have bad statistics, then you potentially have bad query plans. So a query that should take 1 second ends up taking 3 hours.</p>
<p>I recently ran into a problem like that in PostgreSQL, the query ran fine *most* of the time, but every once in a while it would go off into the weeds.</p>
<p>After a little digging I realized that the table&#8217;s statistics hadn&#8217;t been updated in quite a while. This struck me as odd since I knew that PostgreSQL automatically updates statistics on a regular basis. So I did some digging to figure out exactly what triggers statistics to be updated in PostgreSQL.</p>
<p>First off, it&#8217;s important to note that in postgresql the vacuum process is what handles statistics. &#8220;vacuum analyze&#8221; is the command although you  can just use &#8220;analyze&#8221; too. Vacuuming and analyzing are different operations, and have different, but very similar parameters that control them.</p>
<p>The key postgresql.conf parameters that affect whether or not a table gets auto-analyzed are:</p>
<pre>
#autovacuum_analyze_threshold = 50
#autovacuum_analyze_scale_factor = 0.1
</pre>
<p>(You can see the defaults in the postgresql.conf above)</p>
<p>The formula that Postgres uses to determine if a table needs to be auto-analyzed is<br />
<code>([ # of rows in table ] * [ scale factor ]) + [ threshold ]</code></p>
<p>So, for example, if you have a table with 10,000 rows in it, then it would look like this:<br />
<code>( 10000 * 0.1 ) + 50 = 1050</code></p>
<p>So if you were to add 1049 rows, the table would not get auto-analyzed. </p>
<p>So how do you change it? Well, there are 2 ways. you could change the 2 parameters in the postgresql.conf, and this is a good idea if you think you need to globally lower or raise them.<br />
However, I think in most cases you&#8217;ll want to make the change just for one table, which means you need to utilize the pg_autovacuum table.</p>
<pre>postgres=# \d pg_autovacuum
    Table "pg_catalog.pg_autovacuum"
      Column      |  Type   | Modifiers
------------------+---------+-----------
 vacrelid         | oid     | not null
 enabled          | boolean | not null
 vac_base_thresh  | integer | not null
 vac_scale_factor | real    | not null
 anl_base_thresh  | integer | not null
 anl_scale_factor | real    | not null
 vac_cost_delay   | integer | not null
 vac_cost_limit   | integer | not null
 freeze_min_age   | integer | not null
 freeze_max_age   | integer | not null
Indexes:
    "pg_autovacuum_vacrelid_index" UNIQUE, btree (vacrelid)</pre>
<p>Remember when I mentioned above that vacuuming and analyzing were handled via similar mechanisms? you can see that here, the table has fields related to vacuuming and analyzing.</p>
<p>The fields we care about for this particular problem are: vacrelid, enabled, anl_base_thresh and anl_scale_factor.</p>
<ul>
<li><strong>vacrelid</strong> is the OID of your table. you can get that via:</li>
<p><code>select oid from pg_class where relname = [tablename]</code></p>
<li><strong>enabled </strong>is &#8220;true&#8221;</li>
<li><strong>anl_base_thresh </strong>is your new autovacuum_analyze_threshold</li>
<li><strong>anl_scale_factor </strong>is your new autovacuum_scale_factor</li>
</ul>
<p>To make the parameter change for just one table we then need to insert into the pg_autovacuum table.</p>
<pre style="overflow-x:auto; overflow-y:hidden">
postgres@postgres=# select oid from pg_class where relname = 'test_vac';
   oid
---------
 1579952

postgres@postgres=# insert into pg_autovacuum(1579952,true,50,0.2,50,<strong>0.09</strong>,20,-1,100000000,200000000)
</pre>
<p>I&#8217;m changing the scale factor from 0.1 to 0.9, the rest are the default values for those parameters<br />
What that means is in our example:<br />
<code>( 10000 * 0.1 ) + 50 = 1050</code><br />
Changes to:<br />
<code>( 10000 * 0.09 ) + 50 = 950</code></p>
<p>In other words, the autovacuum would kick off 100 records sooner, not a huge difference, I know however to the planner it could make all of the difference in the world.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidmkerr.com/databases/tweaking-postgresqls-automatic-statistics-collector/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL Clustering: HA-JDBC</title>
		<link>http://www.davidmkerr.com/databases/postgresql-clustering-ha-jdbc/</link>
		<comments>http://www.davidmkerr.com/databases/postgresql-clustering-ha-jdbc/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 19:03:16 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://www.davidmkerr.com/?p=210</guid>
		<description><![CDATA[A little while ago I posted about PostgreSQL clustering and I said that I was going to evaluate HA-JDBC as an option. The reason I wanted to use HA-JDBC is that I was looking for a no-coding required solution for seamless fail over. (Similar to Oracle RAC) for PostgreSQL. I&#8217;ll be using a Shared Disk [...]]]></description>
			<content:encoded><![CDATA[<p>A little while ago I <a href="http://www.davidmkerr.com/databases/postgresql-ha-clustering-options/">posted</a> about PostgreSQL clustering and I said that I was going to evaluate HA-JDBC as an option.</p>
<p>The reason I wanted to use HA-JDBC is that I was looking for a no-coding required solution for seamless fail over. (Similar to Oracle RAC) for PostgreSQL.</p>
<p>I&#8217;ll be using a Shared Disk / Heartbeat cluster on the server side, however when the node fails the application will register an error, which is undesirable.</p>
<p>After doing more research it&#8217;s been determined that HA-JDBC won&#8217;t work.</p>
<p>It seems that HA-JDBC is, at best, a SQL replicator, where you have 2 active nodes and HA-JDBC will perform inserts and updates across both databases to keep them in sync. This is fine if you&#8217;re not using the &#8220;serial&#8221; data type in PostgreSQL, Triggers, Functions, time based default values, etc. Using any sort of trigger, the code fires independently on insert and you end up getting out of sync databases.</p>
<p>Another reason why HA-JDBC won&#8217;t work is that if a node is un-reachable HA-JDBC removes it from consideration. So your 2nd &#8220;standby&#8221; when it becomes active won&#8217;t be considered by HA-JDBC without some manual intervention.</p>
<p><del datetime="2010-01-20T01:51:19+00:00">And finally, I&#8217;d advise steering away from HA-JDBC at this point even if the above works for you. I posted a few clarifying questions regarding the above to both the HA-JDBC forums and their mailing list and received no response. If your business is looking into true high availability for their servers you need to choose all of your components with care. A non-responsive community either means the project is dead, or un-caring, both of which are unacceptable when you&#8217;re looking into HA solutions.</del></p>
<p>Finally got a response from Paul Ferraro, Nice guy, very helpful posted below:</p>
<pre>Sorry for the slow response...

HA-JDBC is not the right tool for this job.  HA-JDBC is an
*alternative* to shared disk failover - it was not designed
to be used in concert with it.  Instead, you want something
like JBoss HA DataSource or Weblogic multi-pools.  These are
DataSource proxies whose getConnection() method returns a
raw connection from the first available data source.
DataSource-level.  Connections returned by HA-JDBC's DataSource,
on the other hand, are proxies to connections to each active
database in your cluster.

I can go into more detail if you'd like, such as the advantages/
disadvantages of HA-JDBC over shared-disk failover, if you're
interested.

Paul</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.davidmkerr.com/databases/postgresql-clustering-ha-jdbc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parsing large files with pgfouine in linux</title>
		<link>http://www.davidmkerr.com/databases/parsing-large-files-with-pgfouine-in-linux/</link>
		<comments>http://www.davidmkerr.com/databases/parsing-large-files-with-pgfouine-in-linux/#comments</comments>
		<pubDate>Fri, 08 Jan 2010 00:01:45 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Operating Systems]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://www.davidmkerr.com/?p=206</guid>
		<description><![CDATA[pgfouine is a nice logfile analyzer for PostgreSQL written in php. I&#8217;m doing a trace on a very long running ETL process and the logfile generated is ~11GB. I&#8217;m running up against a 2GB barrier in php for fopen(). If you&#8217;ve got a 64bit machine and can recompile php with -D_FILE_OFFSET_BITS=64 then you&#8217;re good to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://pgfouine.projects.postgresql.org/">pgfouine</a> is a nice logfile analyzer for PostgreSQL written in php.</p>
<p>I&#8217;m doing a trace on a very long running ETL process and the logfile generated is ~11GB.</p>
<p>I&#8217;m running up against a 2GB barrier in php for fopen(). If you&#8217;ve got a 64bit machine and can recompile php with -D_FILE_OFFSET_BITS=64 then you&#8217;re good to go. But in my case, I can&#8217;t do either.</p>
<p>The error i&#8217;d get is:</p>
<pre style="overflow-x:auto; overflow-y:hidden">
PHP Fatal error:  File  is not readable. in /var/lib/pgsql/pgfouine-1.1/include/GenericLogReader.class.php on line 85
</pre>
<p>So for Plan B I had to remember back to the days when 64 bit wasn&#8217;t even an option (back in my day, we had 8 bits and we liked &#8216;em!)</p>
<p>I used a named pipe since pgfouine expects a file and doesn&#8217;t seem to be able to read from stdin.</p>
<pre style="overflow-x:auto; overflow-y:hidden">
mknod /tmp/pg2 p
cat /var/log/postgres > /tmp/pg2 | ./pgfouine.php -file /tmp/pg2 > bla.html
</pre>
<p>Once that kicked off I stopped getting that error and pgfouine was able to process the file.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidmkerr.com/databases/parsing-large-files-with-pgfouine-in-linux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL Clustering: JDBC</title>
		<link>http://www.davidmkerr.com/databases/postgresql-clustering-jdbc/</link>
		<comments>http://www.davidmkerr.com/databases/postgresql-clustering-jdbc/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 21:17:12 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://www.davidmkerr.com/?p=175</guid>
		<description><![CDATA[Now that I&#8217;ve got my basic active/passive cluster setup using the shared disk Linux heartbeat method mentioned]]></description>
			<content:encoded><![CDATA[<p>Now that I&#8217;ve got my basic active/passive cluster setup using the shared disk Linux heartbeat method mentioned <a href=http://www.davidmkerr.com/?p=164">here</a>. One thing is left, and that&#8217;s allowing my java app to fail-over to the new database without re-coding the app.</p>
<p>Without updating the JDBC driver you would have to catch the failure at the java container level or in the app itself and manage the switch from the down node to the active node.<br />
I don&#8217;t think that&#8217;s &#8220;industry standard&#8221; and it&#8217;s certainly not easy by any means.<br />
The normal way is to let the JDBC driver manage it.</p>
<p>Unfortunately the PostgreSQL JDBC driver doesn&#8217;t handle this event out of the box so we need to invoke a 3rd party.</p>
<p>There aren&#8217;t a lot of options in this area here are two:</p>
<ul>
<li><a href="http://ha-jdbc.sourceforge.net/">HA-JDBC</a>
</li>
<li><a href="http://www.continuent.com/community/tungsten-sql-router">Tungsten SQL Router</a> which used to be called <a href="http://ha-jdbc.sourceforge.net/faq.html#faq-N10143">Sequoia</a>
</li>
</ul>
<p>I found a good discussion around HA-JDBC <a href="http://tom.jteam.nl/?p=5">here</a> </p>
<p>I&#8217;m using Hibernate + Geronimo so i need to do testing to see if that&#8217;s going to work with HA-JDBC but from the sounds of it, it should work just fine.</p>
<p>I&#8217;ll need to evaluate both of these to determine which is the best.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidmkerr.com/databases/postgresql-clustering-jdbc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL HA Clustering Options</title>
		<link>http://www.davidmkerr.com/databases/postgresql-ha-clustering-options/</link>
		<comments>http://www.davidmkerr.com/databases/postgresql-ha-clustering-options/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 00:26:07 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[drbd]]></category>
		<category><![CDATA[gndb]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[slony]]></category>

		<guid isPermaLink="false">http://www.davidmkerr.com/?p=164</guid>
		<description><![CDATA[I&#8217;ve been evaluating PostgreSQL clustering options for my current project. The reason I&#8217;m looking at clustering is that the DB server will be handling a large number of users and any downtime is catastrophic. So reliability comes before any performance or administrative concerns in a clustering solution. My platform is PostgreSQL 8.3 and SLES Linux. [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been evaluating PostgreSQL clustering options for my current project.</p>
<p>The reason I&#8217;m looking at clustering is that the DB server will be handling a large number of users and any downtime is catastrophic. So reliability comes before any performance or administrative concerns in a clustering solution.</p>
<p>My platform is PostgreSQL 8.3 and SLES Linux.</p>
<p>I looked at 4 Solutions:<br />
Option 1: Shared Disk (Heartbeat) Cluster (<a href="http://www.linux-ha.org/">Heartbeat: SLES</a>)<br />
Option 2: Filesystem Replication Based (<a href="http://www.drbd.org/">DR:BD</a> / <a href="http://sourceware.org/cluster/gnbd/">GNDB</a>)<br />
Option 3: DB Replication Based (<a href="http://www.slony.info">Slony I</a>)<br />
Option 4: DB Replication Based (<a href="http://pgcluster.projects.postgresql.org/">PGCluster</a>)</p>
<p>I weighed the pro&#8217;s and con&#8217;s of each of them and eventually chose Option 1 as the best for my needs. </p>
<p>I like the heartbeat solution because:</p>
<ul>
<li>It&#8217;s simple
</li>
<li>There&#8217;s no data loss in a shared disk cluster
</li>
<li>There&#8217;s no replication overhead so no performance impact
</li>
</ul>
<p>Unfortunately, there is very little public documentation regarding heartbeat clusters used with PostgreSQL. I  hope to rectify that over the next weeks and months, so stay tuned.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidmkerr.com/databases/postgresql-ha-clustering-options/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
