I attended PGDay on January 31st, in Brussels. This event was held just before FOSDEM, which I also attended (expect another blog post). Here are some of the notes I took during the conference.
Due to transit delays, I only caught the last half of that talk.
The main goal of this talk was to explain some of Postgres' per-column statistics. In a nutshell, Postgres needs to have some idea about tables' content in order to choose an appropriate query plan.
Heikki explained which sorts of statistics gathers, such as most common values and histograms. Another interesting stat is the correlation between physical pages and data ordering (see CLUSTER).
Column statistics are gathered when running ANALYZE and stored in the pg_statistic system catalog. The pg_stats view provides a human-readable version of these stats.
Heikki also explained how to determine whether performance issues are due to out-of-date statistics or not. As it turns out, EXPLAIN ANALYZE shows for each step of the query planner how many rows it expects to process and how many it actually processed. The rule of thumb is that similar values (no more than an order of magnitude apart) mean that column statistics are doing their job. A wider margin between expected and actual rows mean that statistics are possibly preventing the query planner from picking a more optimized plan.
It was noted though that statistics-related performance issues often happen on tables with very frequent modifications. Running ANALYZE manually or increasing the frequency of the automatic ANALYZE may help in those situations.
Dimitri explained with very simple cases the use of some of Postgres' lesser-known extensions and the overall extension mechanism.
Here's a grocery-list of the extensions and types he introduced:
An all-around great talk with simple but meaningful examples.
What Magnus presented almost amounted to a tutorial on caching strategies for busy web sites. He went through simple examples, using the ubiquitous Django framework for the web view part and Varnish for the HTTP caching part.
The whole talk revolved around adding private (X-prefixed) HTTP headers in replies containing one or more "entity IDs" so that Varnish's cache can be purged whenever said entities change. The hard problem lies in how and when to call PURGE on Varnish.
The obvious solution is to override Django's save() method on Model-derived objects. One can then use httplib (or better yet requests) to purge the cache. This solution can be slightly improved by using Django's signal mechanism instead, which sound an awful-lot like CubicWeb's hooks.
The problem with the above solution is that any DB modification not going through Django (and they will happen) will not invalidate the cached pages. So Magnus then presented how to write the same cache-invalidating code in PL/Python in triggers.
While this does solve that last issue, it introduces synchronous HTTP calls in the DB, killing write performance completely (or killing it completely if the HTTP calls fail). So to fix those problems, while introducing limited latency, is to use SkyTools' PgQ, a simple message queue based on Postgres. Moving the HTTP calls outside of the main database and into a Consumer (a class provided by PgQ's python bindings) makes the cache-invalidating trigger asynchronous, reducing write overhead.
A clear, concise and useful talk for any developer in charge of high-traffic web sites or applications.
Christophe humorously went back to that dreadful day in the collective Postgres memory: the release of 9.3.1 and the streaming replication chaos.
My overall impression of the talk: Thank $DEITY I'm not a DBA!
But Christophe also gave some valuable advice, even for non-DBAs:
Thom presented on of the new features of Postgres 9.3, materialized views.
In a nutshell, materialized views (MV) are read-only snapshots of queried data that's stored on disk, mostly for performance reasons. An interesting feature of materialized views is that they can have indexes, just like regular tables.
The REFRESH MATERIALIZED VIEW command can be used to update an MV: it will simply run the original query again and store the new results.
There are a number of caveats with the current implementation of MVs:
Looking towards 9.4 and beyond, here are some of the upcoming MV features:
Markus' goal with this talk showed that very few people in the SQL world actually know - let alone really care - about indexes. According to his own experience and that of others (even with competing RDBMS), poorly written SQL is still a leading cause of production downtime (he puts the number at around 50% of downtime though others he quoted put that number higher). SQL queries can indeed put such stress on DB systems and cause them to fail.
One major issue, he argues, is poorly designed indexes. He went back in time to explain possible reasons for the lack of knowledge about indexes with both SQL developers and DBAs. One such reason may be that indexes are not part of the SQL standard and are left as implementation-specific details. Thus many books about SQL barely cover indexes, if at all.
He then took us through a simple quiz he wrote on the topic, with only 5 questions. The questions and explanations were very insightful and I must admit my knowledge of indexes was not up to par. I think everyone in the room got his message loud and clear: indexes are part of the schema, devs should care about them too.
Try out the test : http://use-the-index-luke.com/3-minute-test
For the last talk of the day, Michael went back to the history of the Postgres project and its community. Unlike other IT domains such as email, HTTP servers or even operating systems, RDBMS are still largely dominated by proprietary vendors such as Oracle, IBM and Microsoft. He argues that the reasons are not technical: from a developer stand point, Postgres has all the features of the leading RDMBS (and many more) and the few missing administrative features related to scalability are being addressed.
Instead, he argues decision makers inside companies don't yet fully trust Postgres due to its (perceived) lack of corporate backers.
He went on to suggest ways to overcome those perceptions, for example with an "official" Postgres certification program.
A motivational talk for the Postgres community.