blog entries created by Philippe Pepiot
  • Reduce docker image size with multi-stage builds

    2018/03/21 by Philippe Pepiot

    At logilab we use more and more docker both for test and production purposes.

    For a CubicWeb application I had to write a Dockerfile that does a pip install and compiles javascript code using npm.

    The first version of the Dockerfile was:

    FROM debian:stretch
    RUN apt-get update && apt-get -y install \
        wget gnupg ca-certificates apt-transport-https \
        python-pip python-all-dev libgecode-dev g++
    RUN echo "deb https://deb.nodesource.com/node_9.x stretch main" > /etc/apt/sources.list.d/nodesource.list
    RUN wget https://deb.nodesource.com/gpgkey/nodesource.gpg.key -O - | apt-key add -
    RUN apt-get update && apt-get -y install nodejs
    COPY . /sources
    RUN pip install /sources
    RUN cd /sources/frontend && npm install && npm run build && \
        mv /sources/frontend/bundles /app/bundles/
    # ...
    

    The resulting image size was about 1.3GB which cause issues while uploading it to registries and with the required disk space on production servers.

    So I looked how to reduce this image size. What is important to know about Dockerfile is that each operation result in a new docker layer, so removing useless files at the end will not reduce the image size.

    https://www.logilab.org/file/10128049/raw/docker-multi-stage.png

    The first change was to use debian:stretch-slim as base image instead of debian:stretch, this reduced the image size by 18MB.

    Also by default apt-get would pull a lot of extra optional packages which are "Suggestions" or "Recommends". We can simply disable it globally using:

    RUN echo 'APT::Install-Recommends "0";' >> /etc/apt/apt.conf && \
        echo 'APT::Install-Suggests "0";' >> /etc/apt/apt.conf
    

    This reduced the image size by 166MB.

    Then I looked at the content of the image and see a lot of space used in /root/.pip/cache. By default pip build and cache python packages (as wheels), this can be disabled by adding --no-cache to pip install calls. This reduced the image size by 26MB.

    In the image we also have a full nodejs build toolchain which is useless after the bundles are generated. The old workaround is to install nodejs, build files, remove useless build artifacts (node_modules) and uninstall nodejs in a single RUN operation, but this result in a ugly Dockerfile and will not be an optimal use of the layer cache. Instead we can setup a multi stage build

    The idea behind multi-stage builds is be able to build multiple images within a single Dockerfile (only the latest is tagged) and to copy files from one image to another within the same build using COPY --from= (also we can use a base image in the FROM clause).

    Let's extract the javascript building in a single image:

    FROM debian:stretch-slim as base
    RUN echo 'APT::Install-Recommends "0";' >> /etc/apt/apt.conf && \
        echo 'APT::Install-Suggests "0";' >> /etc/apt/apt.conf
    
    FROM base as node-builder
    RUN apt-get update && apt-get -y install \
        wget gnupg ca-certificates apt-transport-https \
    RUN echo "deb https://deb.nodesource.com/node_9.x stretch main" > /etc/apt/sources.list.d/nodesource.list
    RUN wget https://deb.nodesource.com/gpgkey/nodesource.gpg.key -O - | apt-key add -
    RUN apt-get update && apt-get -y install nodejs
    COPY . /sources
    RUN cd /sources/frontend && npm install && npm run build
    
    FROM base
    RUN apt-get update && apt-get -y install python-pip python-all-dev libgecode-dev g++
    RUN pip install --no-cache /sources
    COPY --from=node-builder /sources/frontend/bundles /app/bundles
    

    This reduced the image size by 252MB

    The Gecode build toolchain is required to build rql with gecode extension (which is a lot faster that native python), my next idea was to build rql as a wheel inside a staging image, so the resulting image would only need the gecode library:

    FROM debian:stretch-slim as base
    RUN echo 'APT::Install-Recommends "0";' >> /etc/apt/apt.conf && \
        echo 'APT::Install-Suggests "0";' >> /etc/apt/apt.conf
    
    FROM base as node-builder
    RUN apt-get update && apt-get -y install \
        wget gnupg ca-certificates apt-transport-https \
    RUN echo "deb https://deb.nodesource.com/node_9.x stretch main" > /etc/apt/sources.list.d/nodesource.list
    RUN wget https://deb.nodesource.com/gpgkey/nodesource.gpg.key -O - | apt-key add -
    RUN apt-get update && apt-get -y install nodejs
    COPY . /sources
    RUN cd /sources/frontend && npm install && npm run build
    
    FROM base as wheel-builder
    RUN apt-get update && apt-get -y install python-pip python-all-dev libgecode-dev g++ python-wheel
    RUN pip wheel --no-cache-dir --no-deps --wheel-dir /wheels rql
    
    FROM base
    RUN apt-get update && apt-get -y install python-pip
    COPY --from=wheel-builder /wheels /wheels
    RUN pip install --no-cache --find-links /wheels /sources
    # a test to be sure that rql is built with gecode extension
    RUN python -c 'import rql.rql_solve'
    COPY --from=node-builder /sources/frontend/bundles /app/bundles
    

    This reduced the image size by 297MB.

    So the final image size goes from 1.3GB to 300MB which is more suitable to use in a production environment.

    Unfortunately I didn't find a way to copy packages from a staging image, install it and remove the package in a single docker layer, a possible workaround is to use an intermediate http server.

    The next step could be to build a Debian package inside a staging image, so the build process would be separated from the Dockerfile and we could provide Debian packages along with the docker image.

    Another approach could be to use Alpine as base image instead of Debian.


  • Logilab présent à pgDay Toulouse

    2017/06/16 by Philippe Pepiot

    Le 8 juin 2017 nous avons assisté à pgDay, le moment de rencontre et de conférences de la communauté PostgreSQL francophone, qui s'est déroulée au campus de Météo France à Toulouse.

    https://www.logilab.org/file/10126216/raw/logo_pgfr_sans_900_400x400.png

    Partitionement

    Gilles Darold nous a fait un tour d'horizon des solutions de partitionnement, de la méthode manuelle avec des triggers et d'héritage de table en passant par l'extension pg_partman jusqu'au partitionnement déclaratif de la future version 10 avec la syntaxe PARTITION OF

    Le partitionnement permet de gérer plus facilement la maintenance et les performances de tables avec beaucoup d'enregistrements.

    Transaction autonomes

    Le même Gilles Darold nous a parlé des transactions autonomes c'est-à-dire des transactions qui s'exécutent dans une transaction parente et qui peut être validée ou annulée indépendamment de celle-ci, ce qui peut être utile pour enregistrer des événements.

    PostgreSQL buffers

    Vik Fearing nous a expliqué le fonctionnement et l'interaction des différents tampons mémoire dans PostgreSQL.

    Les pages sont chargées du disque vers les shared_buffers, qui sont partagés par toutes les connexions, et ont un usageCount entre un et cinq qui est incrémenté à chaque fois qu'elle est accédée. Lorsqu'une nouvelle page doit être chargée, un mécanisme de clock-sweep boucle sur le cache et décrémente l'usageCount et quand il vaut zéro la page est remplacée par la nouvelle. Ainsi pour une page avec un usageCount à cinq, il faudra au moins cinq tours des shared_buffers par le clock-sweep avant quelle ne soit remplacée.

    En cas d'un accès à une grosse table pour ne pas vider tout le cache, PostgreSQL utilise un tampon circulaire (ou ring buffer) limité en taille pour cette table.

    Les tables temporaires utilisent un tampon dédié, le temp_buffers.

    Quand une page est modifiée, elle l'est d'abord dans les wal buffers qui sont écrits sur disque lors du commit par le processus wal writer.

    Le writer process parcoure les shared_buffers tout les bgwriter_delay (200ms) et écrit sur disque un certain nombre de pages qui ont été modifiées, ce nombre est contrôlé par les paramètres bgwriter_lru_maxpages et bgwriter_lru_multiplier.

    Des checkpoint s'exécutent aussi tout les checkpoint_timeout ou plus fréquemment quand la taille des wals dépasse la valeur du paramètre max_wal_size. Lors d'un checkpoint on cherche des pages à écrire (ou dirty pages) et on les trie pour éviter les écritures aléatoires sur le disque. Le paramètre checkpoint_completion_target permet d'étaler la vitesse d'écriture entre deux checkpoint. Idéalement on veut qu'ils se déclenchent toujours par timeout et que l'écriture soit la plus étalée pour avoir des performances de lecture et d'écriture constantes.

    Pour déboguer l'utilisation des buffers et les I/O disques il y a la table pg_stat_bgwriter, l'extension pg_buffercache, et le paramètre track_io_timing à utiliser avec EXPLAIN (ANALYZE, BUFFERS).

    Les pires pratiques PostgreSQL

    Thomas Reiss et Philippe Beaudoin nous ont présenté quelques unes des plus mauvaises pratiques avec PostgreSQL, notamment de celle répandue du manque ou d'excès d'index. À ce sujet Dalibo a développé l'outil PoWA qui analyse l'activité d'une base et fait des suggestions d'index. Attention aussi à la tentation de (trop) destructurer les données, PostgreSQL possède de nombreux types qui offrent une garantie forte sur la consistance des données et de nombreuses opérations dessus, par exemple les types ranges.

    La communauté des développeurs de PostgreSQL

    Daniel Vérité nous a fait un historique de Ingres puis Postgres, et enfin PostgreSQL avant de nous montrer des statistiques sur les commits et la liste de diffusion utilisée pour le développement de PostgreSQL

    Les éléphants mangent-ils des cubes ?

    Cédric Villemain nous a parlé des fonctionnalités de PostgreSQL pour des requêtes de type OLAP. L'implémentation de TABLESAMPLE qui permet de faire des requêtes sur un échantillon aléatoire d'une table. Le paramètre default_statistic_target et la nouvelle commande de la version 10 CREATE STATISTICS qui permettent d'obtenir de meilleurs statistiques sur la distribution de la table et donc d'avoir de meilleurs plans d'exécution.

    Aussi depuis la version 9.4, la syntaxe GROUP BY ROLLUP permet de faire des agrégats sur plusieurs GROUP BY dans une seule requête. Auparavant il fallait faire plusieurs UNION pour obtenir le même résultat.

    À noter aussi l'utilisation d'index BRIN et BLOOM.

    Comment fonctionne la recherche plein texte ?

    Adrien Nayrat nous a présenté les fonctions de recherche plein texte dans PostgreSQL et le moyen de l'améliorer en créant ses propres configurations et dictionnaires de mots, ainsi qu'à la rendre plus performante avec les index GIN et GIST.

    GeoDataScience

    Olivier Courtin nous a montré avec un exemple concret comment PostgreSQL pouvait être un environnement idéal pour la géomatique et le machine learning avec les extensions PostGIS, ainsi que plpythonu utilisé pour exécuter du code python directement sur le serveur PostgreSQL. L'extension dédiée crankshaft propose des API basées sur scipy et scikit-learn et peut être appelée via des procédures SQL.

    https://www.logilab.org/file/10126217/raw/freefall.gif

  • Testing salt formulas with testinfra

    2016/07/21 by Philippe Pepiot

    In a previous post we talked about an environment to develop salt formulas. To add some spicy requirements, the formula must now handle multiple target OS (Debian and Centos), have tests and a continuous integration (CI) server setup.

    http://testinfra.readthedocs.io/en/latest/_static/logo.png

    I started a year ago to write a framework to this purpose, it's called testinfra and is used to execute commands on remote systems and make assertions on the state and the behavior of the system. The modules API provides a pythonic way to inspect the system. It has a smooth integration with pytest that adds some useful features out of the box like parametrization to run tests against multiple systems.

    Writing useful tests is not an easy task, my advice is to test code that triggers implicit actions, code that has caused issues in the past or simply test the application is working correctly like you would do in the shell.

    For instance this is one of the tests I wrote for the saemref formula

    def test_saemref_running(Process, Service, Socket, Command):
        assert Service("supervisord").is_enabled
    
        supervisord = Process.get(comm="supervisord")
        # Supervisor run as root
        assert supervisord.user == "root"
        assert supervisord.group == "root"
    
        cubicweb = Process.get(ppid=supervisord.pid)
        # Cubicweb should run as saemref user
        assert cubicweb.user == "saemref"
        assert cubicweb.group == "saemref"
        assert cubicweb.comm == "uwsgi"
        # Should have 2 worker process with 8 thread each and 1 http proccess with one thread
        child_threads = sorted([c.nlwp for c in Process.filter(ppid=cubicweb.pid)])
        assert child_threads == [1, 8, 8]
    
        # uwsgi should bind on all ipv4 adresses
        assert Socket("tcp://0.0.0.0:8080").is_listening
    
        html = Command.check_output("curl http://localhost:8080")
        assert "<title>accueil (Référentiel SAEM)</title>" in html
    

    Now we can run tests against a running container by giving its name or docker id to testinfra:

    % testinfra --hosts=docker://1a8ddedf8164 test_saemref.py
    [...]
    test/test_saemref.py::test_saemref_running[docker:/1a8ddedf8164] PASSED
    

    The immediate advantage of writing such test is that you can reuse it for monitoring purpose, testinfra can behave like a nagios plugin:

    % testinfra -qq --nagios --hosts=ssh://prod test_saemref.py
    TESTINFRA OK - 1 passed, 0 failed, 0 skipped in 2.31 seconds
    .
    

    We can now integrate the test suite in our run-tests.py by adding some code to build and run a provisioned docker image and add a test command that runs testinfra tests against it.

    provision_option = click.option('--provision', is_flag=True, help="Provision the container")
    
    @cli.command(help="Build an image")
    @image_choice
    @provision_option
    def build(image, provision=False):
        dockerfile = "test/{0}.Dockerfile".format(image)
        tag = "{0}-formula:{1}".format(formula, image)
        if provision:
            dockerfile_content = open(dockerfile).read()
            dockerfile_content += "\n" + "\n".join([
                "ADD test/minion.conf /etc/salt/minion.d/minion.conf",
                "ADD {0} /srv/formula/{0}".format(formula),
                "RUN salt-call --retcode-passthrough state.sls {0}".format(formula),
            ]) + "\n"
            dockerfile = "test/{0}_provisioned.Dockerfile".format(image)
            with open(dockerfile, "w") as f:
                f.write(dockerfile_content)
            tag += "-provisioned"
        subprocess.check_call(["docker", "build", "-t", tag, "-f", dockerfile, "."])
        return tag
    
    
    @cli.command(help="Spawn an interactive shell in a new container")
    @image_choice
    @provision_option
    @click.pass_context
    def dev(ctx, image, provision=False):
        tag = ctx.invoke(build, image=image, provision=provision)
        subprocess.call([
            "docker", "run", "-i", "-t", "--rm", "--hostname", image,
            "-v", "{0}/test/minion.conf:/etc/salt/minion.d/minion.conf".format(BASEDIR),
            "-v", "{0}/{1}:/srv/formula/{1}".format(BASEDIR, formula),
            tag, "/bin/bash",
        ])
    
    
    @cli.command(help="Run tests against a provisioned container",
                 context_settings={"allow_extra_args": True})
    @click.pass_context
    @image_choice
    def test(ctx, image):
        import pytest
        tag = ctx.invoke(build, image=image, provision=True)
        docker_id = subprocess.check_output([
            "docker", "run", "-d", "--hostname", image,
            "-v", "{0}/test/minion.conf:/etc/salt/minion.d/minion.conf".format(BASEDIR),
            "-v", "{0}/{1}:/srv/formula/{1}".format(BASEDIR, formula),
            tag, "tail", "-f", "/dev/null",
        ]).strip()
        try:
            ctx.exit(pytest.main(["--hosts=docker://" + docker_id] + ctx.args))
        finally:
            subprocess.check_call(["docker", "rm", "-f", docker_id])
    

    Tests can be run on a local CI server or on travis, they "just" require a docker server, here is an example of .travis.yml

    sudo: required
    services:
      - docker
    language: python
    python:
      - "2.7"
    env:
      matrix:
        - IMAGE=centos7
        - IMAGE=jessie
    install:
      - pip install testinfra
    script:
      - python run-tests.py test $IMAGE -- -v
    

    I wrote a dummy formula with the above code, feel free to use it as a template for your own formula or open pull requests and break some tests.

    There is a highly enhanced version of this code in the saemref formula repository, including:

    • Building a provisioned docker image with custom pillars, we use it to run an online demo
    • Destructive tests where each test is run in a dedicated "fresh" container
    • Run Systemd in the containers to get a system close to the production one (this enables the use of Salt service module)
    • Run a postgresql container linked to the tested container for specific tests like upgrading a Cubicweb instance.

    Destructive tests rely on advanced pytest features that may produce weird bugs when mixed together, too much magic involved here. Also, handling Systemd in docker is really painful and adds a lot of complexity, for instance some systemctl commands require a running systemd as PID 1 and this is not the case during the docker build phase. So the trade-off between complexity and these features may not be worth.

    There is also a lot of quite new tools to develop and test infrastructure code that you could include in your stack like test-kitchen, serverspec, and goss. Choose your weapon and go test your infrastructure code.


  • Developing salt formulas with docker

    2016/07/21 by Philippe Pepiot
    https://www.logilab.org/file/248336/raw/Salt-Logo.png

    While developing salt formulas I was looking for a simple and reproducible environment to allow faster development, less bugs and more fun. The formula must handle multiple target OS (Debian and Centos).

    The first barrier is the master/minion installation of Salt, but fortunately Salt has a masterless mode. The idea is quite simple, bring up a virtual machine, install a Salt minion on it, expose the code inside the VM and call Salt states.

    https://www.logilab.org/file/7159870/raw/docker.png

    At Logilab we like to work with docker, a lightweight OS-level virtualization solution. One of the key features is docker volumes to share local files inside the container. So I started to write a simple Python script to build a container with a Salt minion installed and run it with formula states and a few config files shared inside the VM.

    The formula I was working on is used to deploy the saemref project, which is a Cubicweb based application:

    % cat test/centos7.Dockerfile
    FROM centos:7
    RUN yum -y install epel-release && \
        yum -y install https://repo.saltstack.com/yum/redhat/salt-repo-latest-1.el7.noarch.rpm && \
        yum clean expire-cache && \
        yum -y install salt-minion
    
    % cat test/jessie.Dockerfile
    FROM debian:jessie
    RUN apt-get update && apt-get -y install wget
    RUN wget -O - https://repo.saltstack.com/apt/debian/8/amd64/latest/SALTSTACK-GPG-KEY.pub | apt-key add -
    RUN echo "deb http://repo.saltstack.com/apt/debian/8/amd64/latest jessie main" > /etc/apt/sources.list.d/saltstack.list
    RUN apt-get update && apt-get -y install salt-minion
    
    % cat test/minion.conf
    file_client: local
    file_roots:
      base:
        - /srv/salt
        - /srv/formula
    

    And finally the run-tests.py file, using the beautiful click module

    #!/usr/bin/env python
    import os
    import subprocess
    
    import click
    
    @click.group()
    def cli():
        pass
    
    formula = "saemref"
    BASEDIR = os.path.abspath(os.path.dirname(__file__))
    
    image_choice = click.argument("image", type=click.Choice(["centos7", "jessie"]))
    
    
    @cli.command(help="Build an image")
    @image_choice
    def build(image):
        dockerfile = "test/{0}.Dockerfile".format(image)
        tag = "{0}-formula:{1}".format(formula, image)
        subprocess.check_call(["docker", "build", "-t", tag, "-f", dockerfile, "."])
        return tag
    
    
    @cli.command(help="Spawn an interactive shell in a new container")
    @image_choice
    @click.pass_context
    def dev(ctx, image):
        tag = ctx.invoke(build, image=image)
        subprocess.call([
            "docker", "run", "-i", "-t", "--rm", "--hostname", image,
            "-v", "{0}/test/minion.conf:/etc/salt/minion.d/minion.conf".format(BASEDIR),
            "-v", "{0}/{1}:/srv/formula/{1}".format(BASEDIR, formula),
            tag, "/bin/bash",
        ])
    
    
    if __name__ == "__main__":
        cli()
    

    Now I can run quickly multiple containers and test my Salt states inside the containers while editing the code locally:

    % ./run-tests.py dev centos7
    [root@centos7 /]# salt-call state.sls saemref
    
    [ ... ]
    
    [root@centos7 /]# ^D
    % # The container is destroyed when it exits
    

    Notice that we could add some custom pillars and state files simply by adding specific docker shared volumes.

    With a few lines we created a lightweight vagrant like, but faster, with docker instead of virtualbox and it remain fully customizable for future needs.