Blog entries

  • Febuary 2013: Mercurial channel "tour"

    2013/01/22 by Pierre-Yves David

    The Release candidate version of Mercurial 2.5 was released last sunday.

    This new version makes a major change in the way "hidden" changesets are handled. In 2.4 only hg log (and a few others) would support effectively hiding "hidden" changesets. Now all hg commands are transparently compatible with the hidden revision concept. This is a considerable step towards changeset evolution, the next-generation collaboration technology that I'm developing for Mercurial.

    The 2.5 cycle is almost over, but there is no time to rest yet, Saturday the 2th of February, I will give a talk about changeset evolution concept at FOSDEM in the Mozilla Room. This talk in an updated version of the one I gave at 2012 (video in french).

    The week after, I'm crossing the channel to attend the Mercurial 2.6 Sprint hosted by Facebook London. I expect a lot of discussion about the user interface and network access of changeset evolution.

    The HG 2.3 sprint

  • FOSDEM 2013

    2013/02/12 by Pierre-Yves David

    I was in Bruxelles for FOSDEM 2013. As with previous FOSDEM there were too many interesting talks and people to see. Here is a summary of what I saw:

    In the Mozilla's room:

    1. The html5 pdf viewer pdfjs is impressive. The PDF specification is really scary but this full featured "native" viewer is able to renders most of it with very good performance. Have a look at the pdfjs demo!
    1. Firefox debug tools overview with a specific focus of Firefox OS emulator in your browser.
    1. Introduction to webl10n: an internationalization format and library used in Firefox OS. A successful mix that results in a format that is idiot-proof enough for a duck to use, that relies on Unicode specifications to handle complex pluralization rules and that allows cascading translation definitions.
    typical webl10n user
    1. Status of html5 video and audio support in Firefox. The topic looks like a real headache but the team seems to be doing really well. Special mention for the reverse demo effect: The speaker expected some format to be still unsupported but someone else apparently implemented them over night.
    2. Last but not least I gave a talk about the changeset evolution concept that I'm putting in Mercurial. Thanks goes to Feth for asking me his not-scripted-at-all-questions during this talk. (slides)

    In the postgresql room:

    1. Insightful talk about more event trigger in postgresql engine and how this may becomes the perfect way to break your system.
    2. Full update of the capability of postgis 2.0. The postgis suite was already impressive for storing and querying 2D data, but it now have impressive capability regarding 3D data.

    On python related topic:
    • Victor Stinner has started an interesting project to improve CPython performance. The first one: astoptimizer breaks some of the language semantics to apply optimisation on compiling to byte code (lookup caching, constant folding,…). The other, registervm is a full redefinition of how the interpreter handles reference in byte code.

    After the FOSDEM, I crossed the channel to attend a Mercurial sprint in London. Expect more on this topic soon.

  • FOSDEM PGDay 2014

    2014/02/11 by Rémi Cardona

    I attended PGDay on January 31st, in Brussels. This event was held just before FOSDEM, which I also attended (expect another blog post). Here are some of the notes I took during the conference.

    Statistics in PostgreSQL, Heikki Linnakangas

    Due to transit delays, I only caught the last half of that talk.

    The main goal of this talk was to explain some of Postgres' per-column statistics. In a nutshell, Postgres needs to have some idea about tables' content in order to choose an appropriate query plan.

    Heikki explained which sorts of statistics gathers, such as most common values and histograms. Another interesting stat is the correlation between physical pages and data ordering (see CLUSTER).

    Column statistics are gathered when running ANALYZE and stored in the pg_statistic system catalog. The pg_stats view provides a human-readable version of these stats.

    Heikki also explained how to determine whether performance issues are due to out-of-date statistics or not. As it turns out, EXPLAIN ANALYZE shows for each step of the query planner how many rows it expects to process and how many it actually processed. The rule of thumb is that similar values (no more than an order of magnitude apart) mean that column statistics are doing their job. A wider margin between expected and actual rows mean that statistics are possibly preventing the query planner from picking a more optimized plan.

    It was noted though that statistics-related performance issues often happen on tables with very frequent modifications. Running ANALYZE manually or increasing the frequency of the automatic ANALYZE may help in those situations.

    Advanced Extension Use Cases, Dimitri Fontaine

    Dimitri explained with very simple cases the use of some of Postgres' lesser-known extensions and the overall extension mechanism.

    Here's a grocery-list of the extensions and types he introduced:

    • intarray extension, which adds operators and functions to the standard ARRAY type, specifically tailored for arrays of integers,
    • the standard POINT type which provides basic 2D flat-earth geometry,
    • the cube extension that can represent N-dimensional points and volumes,
    • the earthdistance extension that builds on cube to provide distance functions on a sphere-shaped Earth (a close-enough approximation for many uses),
    • the pg_trgm extension which provides text similarity functions based on trigram matching (a much simpler thus faster alternative to Levenshtein distances), especially useful for "typo-resistant" auto-completion suggestions,
    • the hstore extension which provides a simple-but-efficient key value store that has everyone talking in the Postgres world (it's touted as the NoSQL killer),
    • the hll extensions which implements the HyperLogLog algorithm which seems very well suited to storing and counting unique visitor on a web site, for example.

    An all-around great talk with simple but meaningful examples.

    Integrated cache invalidation for better hit ratios, Magnus Hagander

    What Magnus presented almost amounted to a tutorial on caching strategies for busy web sites. He went through simple examples, using the ubiquitous Django framework for the web view part and Varnish for the HTTP caching part.

    The whole talk revolved around adding private (X-prefixed) HTTP headers in replies containing one or more "entity IDs" so that Varnish's cache can be purged whenever said entities change. The hard problem lies in how and when to call PURGE on Varnish.

    The obvious solution is to override Django's save() method on Model-derived objects. One can then use httplib (or better yet requests) to purge the cache. This solution can be slightly improved by using Django's signal mechanism instead, which sound an awful-lot like CubicWeb's hooks.

    The problem with the above solution is that any DB modification not going through Django (and they will happen) will not invalidate the cached pages. So Magnus then presented how to write the same cache-invalidating code in PL/Python in triggers.

    While this does solve that last issue, it introduces synchronous HTTP calls in the DB, killing write performance completely (or killing it completely if the HTTP calls fail). So to fix those problems, while introducing limited latency, is to use SkyTools' PgQ, a simple message queue based on Postgres. Moving the HTTP calls outside of the main database and into a Consumer (a class provided by PgQ's python bindings) makes the cache-invalidating trigger asynchronous, reducing write overhead.

    A clear, concise and useful talk for any developer in charge of high-traffic web sites or applications.

    The Worst Day of Your Life, Christophe Pettus

    Christophe humorously went back to that dreadful day in the collective Postgres memory: the release of 9.3.1 and the streaming replication chaos.

    My overall impression of the talk: Thank $DEITY I'm not a DBA!

    But Christophe also gave some valuable advice, even for non-DBAs:

    • Provision 3 times the necessary disk space, in case you need to pg_dump or otherwise do a snapshot of your currently running database,
    • Do backups and test them:
      • give them to developers,
      • use them for analytics,
      • test the restore, make it foolproof, try to automate it,
    • basic Postgres hygiene:
      • fsync = on (on by default, DON'T TURN IT OFF, there are better ways)
      • full_page_writes = on (on by default, don't turn it off)
      • deploy minor versions as soon as possible,
      • plan upgrade strategies before EOL,
      • 9.3+ checksums (createdb option, performance cost is minimal),
      • application-level consistency checks (don't wait for auto vacuum to "discover" consistency errors).

    Materialised views now and in the future, Thom Brown

    Thom presented on of the new features of Postgres 9.3, materialized views.

    In a nutshell, materialized views (MV) are read-only snapshots of queried data that's stored on disk, mostly for performance reasons. An interesting feature of materialized views is that they can have indexes, just like regular tables.

    The REFRESH MATERIALIZED VIEW command can be used to update an MV: it will simply run the original query again and store the new results.

    There are a number of caveats with the current implementation of MVs:

    • pg_dump never saves the data, only the query used to build it,
    • REFRESH requires an exclusive lock,
    • due to implementation details (frozen rows or pages IIRC), MVs may exhibit non-concurrent behavior with other running transactions.

    Looking towards 9.4 and beyond, here are some of the upcoming MV features:

    • 9.4 adds the CONCURRENTLY keyword:
      • + no longer needs an exclusive lock, doesn't block reads
      • - requires a unique index
      • - may require VACUUM
    • roadmap (no guarantees):
      • unlogged (disables the WAL),
      • incremental refresh,
      • lazy automatic refresh,
      • planner awareness of MVs (would use MVs as cache/index).

    Indexes: The neglected performance all-rounder, Markus Winand

    Markus' goal with this talk showed that very few people in the SQL world actually know - let alone really care - about indexes. According to his own experience and that of others (even with competing RDBMS), poorly written SQL is still a leading cause of production downtime (he puts the number at around 50% of downtime though others he quoted put that number higher). SQL queries can indeed put such stress on DB systems and cause them to fail.

    One major issue, he argues, is poorly designed indexes. He went back in time to explain possible reasons for the lack of knowledge about indexes with both SQL developers and DBAs. One such reason may be that indexes are not part of the SQL standard and are left as implementation-specific details. Thus many books about SQL barely cover indexes, if at all.

    He then took us through a simple quiz he wrote on the topic, with only 5 questions. The questions and explanations were very insightful and I must admit my knowledge of indexes was not up to par. I think everyone in the room got his message loud and clear: indexes are part of the schema, devs should care about them too.

    Try out the test :

    PostgreSQL - Community meets Business, Michael Meskes

    For the last talk of the day, Michael went back to the history of the Postgres project and its community. Unlike other IT domains such as email, HTTP servers or even operating systems, RDBMS are still largely dominated by proprietary vendors such as Oracle, IBM and Microsoft. He argues that the reasons are not technical: from a developer stand point, Postgres has all the features of the leading RDMBS (and many more) and the few missing administrative features related to scalability are being addressed.

    Instead, he argues decision makers inside companies don't yet fully trust Postgres due to its (perceived) lack of corporate backers.

    He went on to suggest ways to overcome those perceptions, for example with an "official" Postgres certification program.

    A motivational talk for the Postgres community.

  • Nous allons à FOSDEM 2016 et cfgmgmtcamp

    2016/01/18 by Arthur Lutz

    FOSDEM est le rendez-vous incontournable du logiciel libre en Europe. Logilab y participe depuis plusieurs années (en 2013 avec changeset evolution dans mercurial , puis en 2014 sur PostgreSQL, 2015 en tant que participants).

    Cette année, nous allons donc participer à FOSDEM dans le track Configuration Management devroom, je presenterai Salt en mettant l'accent sur la supervision orchestrée par ce framework en collectant les données dans graphite et les exploitant dans grafana.

    Nous profitons d'être en Belgique pour enchaîner avec cfgmgmtcamp "Config Management Camp" qui se déroulera les jours suivants à Gent (1er et 2 février). J'y présenterai à peu près la même chose dans le cadre du Track "Salt". N'hésitez pas à jeter un coup d'œil à l'ensemble des présentations qui promettent d'être passionnantes.

    Si vous allez à l'un de ces événements, faites-nous signe qu'on en profite pour se voir. Nous nous efforcerons de partager une partie de ce que nous aurons appris lors de nos pérégrinations dans un article de blog, donc "watch this space" et nos comptes twitter (@arthurlutz, @douardda, @logilab).

  • We went to FOSDEM 2016 (and cfgmgmtcamp)

    2016/02/09 by Arthur Lutz

    David & I went to FOSDEM and cfgmgmtcamp this year to see some conferences, do two presentations, and discuss with the members of the open source communities we contribute to.

    At FOSDEM, we started early by doing a presentation at 9.00 am in the "Configuration Management devroom", which to our surprise was a large room which was almost full. The presentation was streamed over the Internet and should be available to view shortly.

    I presented "Once you've configured your infrastructure using salt, monitor it by re-using that definition". (mirrored on slideshare. The main part was a demo, the code being published on bitbucket.

    The presentation was streamed live (I came across someone that watched it on the Internet to "sleep in"), and should be available to watch when it gets encoded on

    FOSDEM video box

    We then saw the following talks :

    • Unified Framework for Big Data Foreign Data Wrappers (FDW) by Shivram Mani in the Postgresql Track
    • Mainflux Open Source IoT Cloud
    • EzBench, a tool to help you benchmark and bisect the Graphics Stack's performance
    • The RTC components in the debian infrastructure
    • CoreOS: A Linux distribution designed for application containers that scale
    • Using PostgreSQL for Bibliographic Data (since we've worked on with and PostgreSQL)
    • The FOSDEM infrastructure review

    Congratulations to all the FOSDEM organisers, volunteers and speakers. We will hopefully be back for more.

    We then took the train to Gent where we spent two days learning and sharing about Configuration Management Systems and all the ecosystem around it (orchestration, containers, clouds, testing, etc.).

    More on our cfmgmtcamp experience in another blog post.

    Photos under creative commons CC-BY, by Ludovic Hirlimann and Deborah Bryant here and here

  • We went to cfgmgmtcamp 2016 (after FOSDEM)

    2016/02/09 by Arthur Lutz

    Following a day at FOSDEM (another post about it), we spend two days at cfgmgmtcamp in Gent. At cfgmgmtcamp, we obviously spent some time in the Salt track since it's our tool of choice as you might have noticed. But checking out how some of the other tools and communities are finding solutions to similar problems is also great.

    cfgmgmtcamp logo

    I presented Roll out active Supervision with Salt, Graphite and Grafana (mirrored on slideshare), you can find the code on bitbucket.

    We saw :

    Day 1

    • Mark Shuttleworth from Canonical presenting Juju and its ecosystem, software modelling. MASS (Metal As A Service) was demoed on the nice "OrangeBox". It promises to spin up an OpenStack infrastructure in 15 minutes. One of the interesting things with charms and bundles of charms is the interfaces that need to be established between different service bricks. In the salt community we have salt-formulas but they lack maturity in the sense that there's no possibility to plug in multiple formulas that interact with each other... yet.
    juju deploy of openstack
    • Mitch Michell from Hashicorp presented vault. Vault stores your secrets (certificates, passwords, etc.) and we will probably be trying it out in the near future. A lot of concepts in vault are really well thought out and resonate with some of the things we want to do and automate in our infrastructure. The use of Shamir Secret Sharing technique (also used in the debian infrastructure team) for the N-man challenge to unvault the secrets is quite nice. David is already looking into automating it with Salt and having GSSAPI (kerberos) authentication. bikes!

    Day 2

    • Gareth Rushgrove from PuppetLabs talked about the importance of metadata in docker images and docker containers by explaining how these greatly benefit tools like dpkg and rpm and that the container community should be inspired by the amazing skills and experience that has been built by these package management communities (think of all the language-specific package managers that each reinvent the wheel one after the other).
    • Testing Immutable Infrastructure: we found some inspiration from test-kitchen and running the tests inside a docker container instead of vagrant virtual machine. We'll have to take a look at the SaltStack provisioner for test-kitchen. We already do some of that stuff in docker and OpenStack using salt-cloud. But maybe we can take it further with such tools (or testinfra whose author will be joining Logilab next month).
    coreos, rkt, kubernetes
    • How CoreOS is built, modified, and updated: From repo sync to Omaha by Brian "RedBeard" Harrington. Interesting presentation of the CoreOS system. Brian also revealed that CoreOS is now capable of using the TPM to enforce a signed OS, but also signed containers. Official CoreOS images shipped through Omaha are now signed with a root key that can be installed in the TPM of the host (ie. they didn't use a pre-installed Microsoft key), along with a modified TPM-aware version of GRUB. For now, the Omaha platform is not open source, so it may not be that easy to build one's own CoreOS images signed with a personal root key, but it is theoretically possible. Brian also said that he expect their Omaha server implementation to become open source some day.
    • The use of Salt in Foreman was presented and demoed by Stephen Benjamin. We'll have to retry using that tool with the newest features of the smart proxy.
    • Jonathan Boulle from CoreOS presented "rkt and Kubernetes: What’s new with Container Runtimes and Orchestration" In this last talk, Johnathan gave a tour of the rkt project and how it is used to build, coupled with kubernetes, a comprehensive, secure container running infrastructure (which uses saltstack!). He named the result "rktnetes". The idea is to use rkt as the kubelet's (primany node agent) container runtime of a kubernetes cluster powered by CoreOS. Along with the new CoreOS support for TPM-based trust chain, it allows to ensure completely secured executions, from the bootloader to the container! The possibility to run fully secured containers is one of the reasons why CoreOS developed the rkt project.

    We would like to thank the cfgmgmntcamp organisation team, it was a great conference, we highly recommend it. Thanks for the speaker event the night before the conference, and the social event on Monday evening. (and thanks for the chocolate!).