latest blogs

Following the recent introduction of Python type annotations (aka "type hints") in Mercurial (see, e.g. this changeset by Augie Fackler), I've been playing a bit with this and pytype.

pytype is a static type analyzer for Python code. It compares with the more popular mypy but I don't have enough perspective to make a meaningful comparison at the moment. In this post, I'll illustrate how I worked with pytype to gradually add type hints in a Mercurial module and while doing so, fix bugs!

The module I focused on is mercurial.mail, which contains mail utilities and that I know quite well. Other modules are also being worked on, this one is a good starting point because it has a limited number of "internal" dependencies, which both makes it faster to iterate with pytype and reduces side effects of other modules not being correctly typed already.

$ pytype mercurial/
Computing dependencies
Analyzing 1 sources with 36 local dependencies
ninja: Entering directory `.pytype'
[19/19] check mercurial.mail
Success: no errors found

The good news is that the module apparently already type-checks. Let's go deeper and merge the type annotations generated by pytype:

$ merge-pyi -i mercurial/ out/mercurial/mail.pyi

(In practice, we'd use --as-comments option to write type hints as comments, so that the module is still usable on Python 2.)

Now we have all declarations annotated with types. Typically, we'd get many things like:

def codec2iana(cs) -> Any:
   cs = pycompat.sysbytes(email.charset.Charset(cs).input_charset.lower())
   # "latin1" normalizes to "iso8859-1", standard calls for "iso-8859-1"
   if cs.startswith(b"iso") and not cs.startswith(b"iso-"):
      return b"iso-" + cs[3:]
   return cs

The function signature has been annotated with Any (omitted for parameters, explicit for return value). This somehow means that type inference failed to find the type of that function. As it's (quite) obvious, let's change that into:

def codec2iana(cs: bytes) -> bytes:

And re-run pytype:

$ pytype mercurial/
Computing dependencies
Analyzing 1 sources with 36 local dependencies
ninja: Entering directory `.pytype'
[1/1] check mercurial.mail
FAILED: .pytype/pyi/mercurial/mail.pyi
pytype-single --imports_info .pytype/imports/mercurial.mail.imports --module-name mercurial.mail -V 3.7 -o .pytype/pyi/mercurial/mail.pyi --analyze-annotated --nofail --quick mercurial/
File "mercurial/", line 253, in codec2iana: Function Charset.__init__ was called with the wrong arguments [wrong-arg-types]
  Expected: (self, input_charset: str = ...)
  Actually passed: (self, input_charset: bytes)

For more details, see
ninja: build stopped: subcommand failed.

Interesting! email.charset.Charset is apparently instantiated with the wrong argument type. While it's not exactly a bug, because Python will handle bytes instead of str well in general, we can again change the signature (and code) to:

def codec2iana(cs: str) -> str:
   cs = email.charset.Charset(cs).input_charset.lower()
   # "latin1" normalizes to "iso8859-1", standard calls for "iso-8859-1"
   if cs.startswith("iso") and not cs.startswith("iso-"):
      return "iso-" + cs[3:]
   return cs

Obviously, this involves a larger refactoring in client code of this simple function, see respective changeset for details.

Another example is this function:

def _encode(ui, s, charsets) -> Any:
    '''Returns (converted) string, charset tuple.
    Finds out best charset by cycling through sendcharsets in descending
    order. Tries both encoding and fallbackencoding for input. Only as
    last resort send as is in fake ascii.
    Caveat: Do not use for mail parts containing patches!'''
    sendcharsets = charsets or _charsets(ui)
    if not isinstance(s, bytes):
        # We have unicode data, which we need to try and encode to
        # some reasonable-ish encoding. Try the encodings the user
        # wants, and fall back to garbage-in-ascii.
        for ocs in sendcharsets:
                return s.encode(pycompat.sysstr(ocs)), ocs
            except UnicodeEncodeError:
            except LookupError:
                ui.warn(_(b'ignoring invalid sendcharset: %s\n') % ocs)
            # Everything failed, ascii-armor what we've got and send it.
            return s.encode('ascii', 'backslashreplace')
    # We have a bytes of unknown encoding. We'll try and guess a valid
    # encoding, falling back to pretending we had ascii even though we
    # know that's wrong.
    except UnicodeDecodeError:
        for ics in (encoding.encoding, encoding.fallbackencoding):
            ics = pycompat.sysstr(ics)
                u = s.decode(ics)
            except UnicodeDecodeError:
            for ocs in sendcharsets:
                    return u.encode(pycompat.sysstr(ocs)), ocs
                except UnicodeEncodeError:
                except LookupError:
                    ui.warn(_(b'ignoring invalid sendcharset: %s\n') % ocs)
    # if ascii, or all conversion attempts fail, send (broken) ascii
    return s, b'us-ascii'

It quite clear from the return value (last line) that we can change its type signature to:

def _encode(ui, s:Union[bytes, str], charsets: List[bytes]) -> Tuple[bytes, bytes]

And re-run pytype:

$ pytype mercurial/
Computing dependencies
Analyzing 1 sources with 36 local dependencies
ninja: Entering directory `.pytype'
[1/1] check mercurial.mail
FAILED: .pytype/pyi/mercurial/mail.pyi
pytype-single --imports_info .pytype/imports/mercurial.mail.imports --module-name mercurial.mail -V 3.7 -o .pytype/pyi/mercurial/mail.pyi --analyze-annotated --nofail --quick mercurial/
File "mercurial/", line 342, in _encode: bad option in return type [bad-return-type]
  Expected: Tuple[bytes, bytes]
  Actually returned: bytes

For more details, see
ninja: build stopped: subcommand failed.

That's a real bug. Line 371 contains return s.encode('ascii', 'backslashreplace') in the middle of the function. We indeed return a bytes value instead of a Tuple[bytes, bytes] as elsewhere in the function. The following changes fixes the bug:

diff --git a/mercurial/ b/mercurial/
--- a/mercurial/
+++ b/mercurial/
@@ -339,7 +339,7 @@ def _encode(ui, s, charsets) -> Any:
                 ui.warn(_(b'ignoring invalid sendcharset: %s\n') % ocs)
             # Everything failed, ascii-armor what we've got and send it.
-            return s.encode('ascii', 'backslashreplace')
+            return s.encode('ascii', 'backslashreplace'), b'us-ascii'
     # We have a bytes of unknown encoding. We'll try and guess a valid
     # encoding, falling back to pretending we had ascii even though we
     # know that's wrong.

Steps by steps, by replacing Any with real types, adjusting "wrong" types like in the first example and fixing bugs, we finally get the whole module (almost) completely annotated.

blog entry of

Hier, je suis allé au Meetup Cloud Native Computing Nantes sur KubeCon et la sécurité avec Kubernetes.

Gaëlle Acas et Eric Briand ont présenté un retour sur la KubeCon Europe 2019.

Alexandre Roman a présenté "La sécurité avec Kubernetes et les conteneurs Docker".

Beaucoup de technologies ont été mentionnées ou discutées. À défaut de faire un résumé de ce qui s'y est dit, voici quelque liens collecté pendant le meetup :

Merci au Meetup Cloud Native Computing Nantes, au VMware User Group, à Dell pour le sponsoring.

blog entry of

Mercurial Paris conference will take place on May 28th at Mozilla's headquarters, in Paris.

Mercurial is a free distributed Source Control Management system. It offers an intuitive interface to efficiently handle projects of any size. With its powerful extension system, Mercurial can easily adapt to any environment.

This first edition targets organizations that are currently using Mercurial or considering switching from another Version Control System, such as Subversion.

Attending the conference will allow users to share ideas and version control experiences in different industries and at a different scale. It is a great opportunity to connect with Mercurial core developers and get updates about modern workflow and features.

You are welcome to register here and be part of this first edition with us!

Mercurial conference is co-organized by Logilab, Octobus & Rhode-Code.

blog entry of

Le partage de la connaissance est une composante importante à Logilab. Elle se décline en de nombreux formats dont je ne pourrais pas faire une liste exhaustive, parmi lesquels : la documentation interne, les communautés de logiciel libre, les listes de discussion, stackoverflow ou autres supports de ce type, l'organisation ou la participation à des conférences techniques et meetup en tant qu'auditeur ou en tant qu'orateur... et les "5mintalks".

Les 5mintalk sont des présentations internes à Logilab, qui durent rarement 5 minutes et visent les objectifs suivants :

  • partager sa connaissance et diffuser les bonnes pratiques ;
  • faire connaître les atouts et les difficultés d'un projet aux autres salariés ;
  • faire des point d'étape sur des nouveautés liées aux services internes ;
  • fournir un moment d'attention partagé pour une entreprise distribuée sur deux sites géographiques (Paris et Toulouse) et de nombreuses personnes en télétravail (Nantes, Valence, Nice) ;
  • fournir un "espace protégé et bienveillant" pour que les personnes puissent s'exercer à la prise de parole en public, ce qui peut ensuite se transformer en prise de parole en public dans le cadre de conférences ;
  • s'entraîner à synthétiser et transmettre ses idées.

Tout cela se fait sur la base du volontariat.

Depuis 2018, nous utilisons de plus en plus la visioconférence pour faire ces présentations, afin que les personnes en télétravail et les sites géographiques distribués puissent les suivre.

Depuis 2018 aussi, nous encourageons et nous formons à l'enregistrement de ces présentations sous forme de screencast pour que les absents puissent bénéficier de ces contenus en différé. Ces contenus sont ensuite partagés sur une instance peertube interne (peertube est un formidable logiciel de partage vidéo). Ces contenus peuvent aussi être utiles aux nouvelles recrues.

Bien qu'on nous le demande parfois, ces contenus ne sont pas visibles de l'extérieur pour deux raisons principales. La première est l'aspect "espace protégé", qui permet l'expérimentation, alors qu'un certain niveau de qualité serait requis pour un publication externe. Avant de publier sur notre peertube interne, les personnes redemandent souvent pour vérifier que ce ne sera pas rendu public. La seconde raison est que nous faisons librement référence aux projets clients et à la manière dont ils sont réalisés, en incluant des détails qui ne sont pas partageables à l'extérieur.

Les contenus sont très variés :

  • nos pratiques techniques ;
  • notre veille technologique ;
  • métiers de certains projets clients ;
  • retour suite à une conférence ;
  • activités en dehors du cadre professionnel.

Vous pouvez retrouver la description succincte du contenu des présentations sur le mot-dièse #5mintalk sur notre instance mastodon, pour participer à ces échanges de pratiques... rejoignez nous.

blog entry of

Logilab co-organizes with Octobus, a mini-sprint Mercurial to be held from Thursday 4 to Sunday 7 April in Paris.

Logilab will host mercurial mini-sprint in its Paris premises on Thursday 4 and Friday 5 April. Octobus will be communicating very soon the place chosen to sprint during the weekend.

To participate to mercurial mini-sprint, please complete the survey by informing your name and which days you will be joining us.

Some of the developers working on mercurial or associated tooling plan to focus on improving the workflow and tool and documentation used for online collaboration through Mercurial (Kalithea, RhodeCode, Heptapod, Phabricator, sh.rt, etc. You can also fill the pad below to indicate the themes that you want to tackle during this sprint:

Let's code together!

blog entry of

Hier soir j'ai participé à un workshop/meetup sur Prometheus et Grafana co-organisé par le Meetup Nantes Monitoring et le Meetup Cloud Native Computing Nantes.

Le workshop était super bien préparé sous forme d'un dépôt git avec des instructions, des questions, des solutions : Si ces technos vous intéressent, je vous encourage vivement à dérouler ce workshop.

J'ai parcouru la liste des exporters

La liste des exporters qui pourraient nous intéresser à Logilab (technologies qu'on utilise) :

On a aussi discuté avec des membres du workshop d'autres outils de métriques :

Pour avoir des storage-schema.conf comme dans graphite, il faut avoir plusieurs prometheus qui se "scrape" les uns les autres avec des schémas de rétention et de granularité différents.

Apparemment prometheus n'est pas facile à scaler, cortex est un projet qui essaye de le faire :

Bref, un excellent meetup mené avec brio par Samuel Berthe et Mickael Alibert. Merci à Epitech Nantes pour l'acceuil et à Zenika pour l'apéro à la fin du workshop.

blog entry of

A very large conference

This year I attended the FOSDEM in Brussels for the first time. I have been doing free software for more than 20 years, but for some reason, I had never been to FOSDEM. I was pleasantly surprised to see that it was much larger than I thought and that it gathered thousands of people. This is by far the largest free software event I have been to. My congratulations to the organizers and volunteers, since this must be a huge effort to pull off.

My presentation about CubicWeb

I went to FOSDEM to present Logilab's latest project, a reboot of CubicWeb to turn it into a web extension to browse the web of data. The recording of the talk, the slides and the video of the demo are online, I hope you enjoy them and get in touch with me if you are to comment or contribute.

My highlights

As usual, the "hallway track" was the most useful for me and there are more sets of slides I read than talks I could attend.

I met with Bram, the author of redbaron and we had a long discussion about programming in different languages.

I also met with Octobus. We discussed Heptapod, a project to add Mercurial support to Gitlab. Logilab would back such a project with money if it were to become usable and (please) move towards federation (with ActivityPub?) and user queries (with GraphQL?). We also discussed the so-called oxydation of Mercurial, which consists in rewriting some parts in Rust. After a quick search I saw that tools like PyO3 can help writing Python extensions in Rust.

Some of the talks that interested me included:

  • Memex, that reuses the name of the very first hypertext system described in the litterature, and tries to implement a decentralized annotation system for the web. It reminded me of Web hypothesis and W3C's annotations recommendation which they say they will be compatible with.
  • GraphQL was presented both in GraphQL with Python and Testing GraphQL with Javascript. I have been following GraphQL about two years because it compares to the RQL language of CubicWeb. We have been testing it at Logilab with cubicweb-graphql.
  • Web Components are one of the options to develop user interfaces in the browser. I had a look at the Future of Web Components, which I relate to the work we are doing with the CubicWeb browser (see above) and the work the Virtual Assembly has been doing to implement Transiscope.
  • Pyodide, the scientific python stack compiled to Web Assembly, I try to compare it to using Jupyter notebooks.
  • Chat-over-IMAP another try to rule them all chat protocols. It is right that everyone has more than one email address, that email addresses are more and more used as logins in many web sites and that using these email addresses as instant-messaging / chat addresses would be nice. We will see if it takes off!
blog entry of

Le mardi 29 janvier 2019 entre 19h à 21h, des contributeurs et utilisateurs de Debian se sont rencontré pour échanger sur Debian. Debian est un système d'exploitation libre pour votre ordinateur. Un système d'exploitation est une suite de programmes de base et d’utilitaires permettant à un ordinateur de fonctionner.

Merci à Capensis pour l’accueil dans ses locaux.

Trois présentations ont été faites :

On a aussi parlé de l'association Debian France qui soutien nos rencontres. Rendez vous à la mini debconf Marseille ?

blog entry of

Nous étions présents à la conférence Pas Sage en Seine 2018 pour assister aux conférences, mais aussi pour participer à la tenue du stand de l'APRIL dont Logilab est adhérente depuis sa création pour soutenir la promotion et la défense du logiciel libre.

Voici un court retour sous forme de notes sur les conférences auxquelles nous avons assistées et qui sont en lien avec notre activité.

Le programme était chargé, retrouvez le sur

Conf zero knowledge webapp

M4Dz de AlwaysData a effectué avec brio cette présentation.

Article wikipedia sur Zero Knowledge Proof

Dans le navigateur, certains acronymes sont familiers, d'autres un peu moins :

  • CORS
  • CSP
  • SRI - protection (signature de code)
  • Referrer-Policy
  • Key-storage (WebCrypto, File-API), éviter LocalStorage (peut être purgé comme du cache)

WebAssembly (ASMjs) - prévient la lecture du code executé et rend l'extraction de données complexe. Plus difficile d'exploiter un binaire que un bout de JS où on peut se brancher où on veut (ralentir les attaques).

Pendant les questions/réponses il a été question de "chiffrement homomorphique" comme potentielle solution pour les questions d'indexations de contenus chiffrés.

Applications où y des choses similaires, dont certaines que nous utilisons à Logilab :

Full-remote : guide de survie en environnement distant

Encore une fois M4Dz de AlwaysData.

Beaucoup de contenu, mais j'ai bien aimé les notions suivantes :

  • mentorat
  • mini-projet interne pour débuter - pour aller discuter avec les anciens
  • manifesto : exemple (auquel on peut apporter quelques critiques)
  • exemple de remote le matin, pour ensuite être dans les bureaux l'après-midi
  • environnement sonore (à personnaliser)

Pas mal de propos sur les canaux de communication:

  • tchat / voice / video / documents
  • exemple de github qui a enlevé le mail de ses outils de communication
  • virtualopenspace, notamment pour pause café (voir notre inspiration peopledoc et gitlab)
  • abuser des status type jabber/xmpp - pour communiquer sur notre disponibilité (notamment quand on a besoin d'être concentré et pas interrompu)

Le temps de trajet peut être utile pour décompreser ou se mettre en condition pour travailler. Selon M4Dz, c'est reproductible en teletravail (aller faire un tour du quartier).

Exemples de rencontres informelles entre télétravailleurs : nextcloud a une équipe très distribuée, ils vont travailler chez les uns les autres (ceux qui ont des affinités entre eux).

À voir absolument si votre entreprise a une reflexion sur le télétravail, même en partiel (comme c'est le cas à Logilab).

Jour 2

Privacy by design

Notes :

  • GRDP check list
  • déléger l'authentification à l'autres par exemple : openid
  • libjs : jwcrypto, jsencrypt, js-nacl (lien dans les slides pour la liste complète)
  • séparer les consentements (par service tiers)
  • exemple de nextcloud qui fait bien les choses
  • matomo (ex-piwik) fait bien les choses en terme de respect de vie privée
  • documentation des api (swagger et apiary pour supprimer les données

Sur la pseudonimiation (pour publier des jeux de données), M4Dz nous a parlé de Differential privacy

Crypto quantique

Comment commencer à tester la crypto quantique en pratique :

Conclusion, il est urgent d'attendre, on a plein d'autres problèmes de securité à résoudre.


Super retour d'experience sur comment aborder un texte de loi ou une reglementation compliquée de manière collective.

La RGPD pour les noobs

Excellent complément de la conférence précédente sur le RGPD.


Voilà. Plein d'autres conférences méritent d'être visionnées sur (peertube, what else?), bravo pour la captation, la diffusion en direct et la mise à disposition des contenus.

Bravo à tout l'équipe d'organisation pour les contenus riches et variés de cette édition du festival. Et merci à toutes les personnes avec qui j'ai pu échanger en marge des conférences

Si vous êtes arrivés jusqu'ici, déjà bravo, et puis sachez que Logilab recrute, rejoignez nous pour travailler sur ces questions.

Logilab recrute!

blog entry of

"Will my refactoring break my code ?" is the question the developer asks himself because he is not sure the tests cover all the cases. He should wonder, because tests that cover all the cases would be costly to write, run and maintain. Hence, most of the time, small decisions are made day after day to test this and not that. After some time, you could consider that in a sense, the implementation has become the specification and the rest of the code expects it not to change.

Enters Scientist, by GitHub, that inspired a Python port named Laboratory.

Let us assume you want to add a cache to a function that reads data from a database. The function would be named read_from_db, it would take an int as parameter item_id and return a dict with attributes of the items and their values.

You could experiment with the new version of this function like so:

import laboratory

def read_from_db_with_cache(item_id):
    data = {}
    # some implementation with a cache
    return data

def read_from_db(item_id):
     data = {}
     #  fetch data from db
     return data

When you run the above code, calling read_from_db returns its result as usual, but thanks to laboratory, a call to read_from_db_with_cache is made and its execution time and result are compared with the first one. These measurements are logged to a file or sent to your metrics solution for you to compare and study.

In other words, things continue to work as usual as you keep the original function, but at the same time you experiment with its candidate replacement to make sure switching will not break or slow things down.

I like the idea ! Thank you for Scientist and Laboratory that are both available under the MIT license.
blog entry of

At logilab we use more and more docker both for test and production purposes.

For a CubicWeb application I had to write a Dockerfile that does a pip install and compiles javascript code using npm.

The first version of the Dockerfile was:

FROM debian:stretch
RUN apt-get update && apt-get -y install \
    wget gnupg ca-certificates apt-transport-https \
    python-pip python-all-dev libgecode-dev g++
RUN echo "deb stretch main" > /etc/apt/sources.list.d/nodesource.list
RUN wget -O - | apt-key add -
RUN apt-get update && apt-get -y install nodejs
COPY . /sources
RUN pip install /sources
RUN cd /sources/frontend && npm install && npm run build && \
    mv /sources/frontend/bundles /app/bundles/
# ...

The resulting image size was about 1.3GB which cause issues while uploading it to registries and with the required disk space on production servers.

So I looked how to reduce this image size. What is important to know about Dockerfile is that each operation result in a new docker layer, so removing useless files at the end will not reduce the image size.

The first change was to use debian:stretch-slim as base image instead of debian:stretch, this reduced the image size by 18MB.

Also by default apt-get would pull a lot of extra optional packages which are "Suggestions" or "Recommends". We can simply disable it globally using:

RUN echo 'APT::Install-Recommends "0";' >> /etc/apt/apt.conf && \
    echo 'APT::Install-Suggests "0";' >> /etc/apt/apt.conf

This reduced the image size by 166MB.

Then I looked at the content of the image and see a lot of space used in /root/.pip/cache. By default pip build and cache python packages (as wheels), this can be disabled by adding --no-cache to pip install calls. This reduced the image size by 26MB.

In the image we also have a full nodejs build toolchain which is useless after the bundles are generated. The old workaround is to install nodejs, build files, remove useless build artifacts (node_modules) and uninstall nodejs in a single RUN operation, but this result in a ugly Dockerfile and will not be an optimal use of the layer cache. Instead we can setup a multi stage build

The idea behind multi-stage builds is be able to build multiple images within a single Dockerfile (only the latest is tagged) and to copy files from one image to another within the same build using COPY --from= (also we can use a base image in the FROM clause).

Let's extract the javascript building in a single image:

FROM debian:stretch-slim as base
RUN echo 'APT::Install-Recommends "0";' >> /etc/apt/apt.conf && \
    echo 'APT::Install-Suggests "0";' >> /etc/apt/apt.conf

FROM base as node-builder
RUN apt-get update && apt-get -y install \
    wget gnupg ca-certificates apt-transport-https \
RUN echo "deb stretch main" > /etc/apt/sources.list.d/nodesource.list
RUN wget -O - | apt-key add -
RUN apt-get update && apt-get -y install nodejs
COPY . /sources
RUN cd /sources/frontend && npm install && npm run build

FROM base
RUN apt-get update && apt-get -y install python-pip python-all-dev libgecode-dev g++
RUN pip install --no-cache /sources
COPY --from=node-builder /sources/frontend/bundles /app/bundles

This reduced the image size by 252MB

The Gecode build toolchain is required to build rql with gecode extension (which is a lot faster that native python), my next idea was to build rql as a wheel inside a staging image, so the resulting image would only need the gecode library:

FROM debian:stretch-slim as base
RUN echo 'APT::Install-Recommends "0";' >> /etc/apt/apt.conf && \
    echo 'APT::Install-Suggests "0";' >> /etc/apt/apt.conf

FROM base as node-builder
RUN apt-get update && apt-get -y install \
    wget gnupg ca-certificates apt-transport-https \
RUN echo "deb stretch main" > /etc/apt/sources.list.d/nodesource.list
RUN wget -O - | apt-key add -
RUN apt-get update && apt-get -y install nodejs
COPY . /sources
RUN cd /sources/frontend && npm install && npm run build

FROM base as wheel-builder
RUN apt-get update && apt-get -y install python-pip python-all-dev libgecode-dev g++ python-wheel
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /wheels rql

FROM base
RUN apt-get update && apt-get -y install python-pip
COPY --from=wheel-builder /wheels /wheels
RUN pip install --no-cache --find-links /wheels /sources
# a test to be sure that rql is built with gecode extension
RUN python -c 'import rql.rql_solve'
COPY --from=node-builder /sources/frontend/bundles /app/bundles

This reduced the image size by 297MB.

So the final image size goes from 1.3GB to 300MB which is more suitable to use in a production environment.

Unfortunately I didn't find a way to copy packages from a staging image, install it and remove the package in a single docker layer, a possible workaround is to use an intermediate http server.

The next step could be to build a Debian package inside a staging image, so the build process would be separated from the Dockerfile and we could provide Debian packages along with the docker image.

Another approach could be to use Alpine as base image instead of Debian.

blog entry of

En janvier, j'ai fait une présentation interactive sur :

Le tout en mode "on monte une infra en live/atelier"...

Les diapos de la présentation sont sur :

Pour les prochaines rencontres, inscrivez vous sur la page du meetup

blog entry of

Et de 2 !

La semaine reprend tranquillement après ma participation au 2ème hackathon organisé par la BnF. Tout comme pour le premier, il faut tout d'abord saluer une organisation exemplaire de la BnF et la grande disponibilité de ses membres représentés par l'équipe jaune pour aider l'équipe rouge des participants. Quelqu'un a fait une allusion à une célèbre émission de télévision où le dernier survivant gagne… je ne sais pas dans notre cas si c'est un rouge ou un jaune.

Lors de ce hackathon, j'ai eu le plaisir de retrouver certains de mes coéquipiers de l'an dernier et même si nous étions cette année sur des projets différents, je leur ai donné un petit coup de main sur des requêtes SPARQL dans Peut-être que je pourrai demander un T-shirt mi-jaune mi-rouge l'an prochain ?

Pour ma part, j'ai intégré, avec une personne des Archives Nationales, une équipe pré-existante (des personnes de la société PMB et de Radio-France qui travaillent déjà ensemble dans le cadre du projet doremus) qui avait envisagé le projet suivant : partir d'une programmation de concert à la Philharmonie de Paris ou Radio-France et découvrir l'environnement des œuvres musicales qui y sont jouées. Pour y aboutir, nous avons choisi de :

  • récupérer la programmation des concerts depuis doremus,
  • récupérer les liens des œuvres de ce concert vers les notices de la BnF lorsqu'ils existent,
  • récupérer des informations sur (date de création, style, compositeur, chorégraphe, etc.)
  • récupérer des extraits sonores de Gallica en utilisant l'API SRU,
  • récupérer les extraits des journaux de Gallica qui parlent de l'œuvre en question au moment de sa création,
  • récupérer des informations de wikipedia et IMSLP en utilisant les alignements fournis par,
  • récupérer un extrait sonore et une jaquette de CD en utilisant l'API de Deezer à partir des EAN ou des codes ISRC récupérés de ou des alignements sur MusicBrainz

Finalement, rien de trop compliqué pour que ça rentre dans un hackathon de 24h mais comme l'an dernier, tout est allé très vite entre les différents points d'étape devant le jury, les requêtes qui ne marchent pas comme on voudrait, la fatigue et la difficulté à bien répartir le travail entre tous les membres de l'équipe…

Félicitations à MusiViz qui a remporté l'adhésion du jury ! Je suis pour ma part très heureux des échanges que j'ai eus avec mes coéquipiers et du résultat auquel nous avons abouti. Nous avons plein d'idées pour continuer le projet dont le code se trouve désormais sur et un démonstrateur temporaire ici.

Yapuka continuer…
blog entry of


In late September, I participated on behalf of Logilab to the Mercurial 4.4 sprint that held at Facebook Dublin. It was the opportunity to meet developers of the project, follow active topics and forthcoming plans for Mercurial. Here, I'm essentially summarizing the main points that held my attention.

Amongst three days of sprint, the first two mostly dedicated to discussions and the last one to writing code. Overall, the organization was pretty smooth thanks to Facebook guys doing a fair amount of preparation. Amongst an attendance of 25 community members, some companies had a significant presence: noticeably Facebook had 10 employees and Google 5, the rest consisting of either unaffiliated people and single-person representing their affiliation. No woman, unfortunately, and a vast majority of people with English as a native or dominant usage language.

The sprint started by with short talks presenting the state of the Mercurial project. Noticeably, in the state of the community talk, it was recalled that the project governance now holds on the steering committee while some things are still not completely formalized (in particular, the project does not have a code of conduct yet). Quite surprisingly, the committee made no explicit mention of the recurring tensions in the project which recently lead to a banishment of a major contributor.

Facebook, Google, Mozilla and Unity then presented by turns the state of Mercurial usages in their organization. Both Facebook and Google have a significant set of tools around hg, most of them either aiming at making the user experience more smooth with respect to performance problems coming from their monorepos or towards GUI tools. Other than that, it's interesting to note most of these corporate users have now integrated evolve on the user side, either as is or with a convenience wrapper layer.

After that, followed several "vision statements" presentations combined with breakout sessions. (Just presenting a partial selection.)

The first statement was about streamlining the style of the code base: that one was fairly consensual as most people agreed upon the fact that something has to be done in this respect; it was finally decided (on the second day) to adopt a PEP-like process. Let's see how things evolve!

Second, Boris Feld and I talked about the development of the evolve extension and the ongoing task about moving it into Mercurial core (slides). We talked about new usages and workflows around the changeset evolution concept and the topics concept. Despite the recent tensions on this technical topic in the community, we tried to feel positive and reaffirmed that the whole community has interests in moving evolve into core. On the same track, in another vision statement, Facebook presented their restack workflow (sort of evolve for "simple" cases) and suggested to push this into core: this is encouraging as it means that evolve-like workflows tend to get mainstream.


Another interesting vision statement was about using Rust in Mercurial. Most people agreed that Mercurial would benefit from porting its native C code in Rust, essentially for security reasons and hopefully to gain a bit of performance and maintainability. More radical ideas were also proposed such as making the hg executable a Rust program (thus embedding a Python interpreter along with its standard library) or reimplementing commands in Rust (which would pose problems with respect to the Python extension system). Later on, Facebook presented mononoke, a promising Mercurial server implemented in Rust that is expected to scale better with respect to high committing rates.

Back to a community subject, we discussed about code review and related tooling. It was first recalled that the project would benefit from more reviewers, including people without a committer status. Then the discussion essentially focused on the Phabricator experiment that started in July. Introduction of a web-based review tool in Mercurial was arguably a surprise for the community at large since many reviewers and long-time contributors have always expressed a clear preference over email-based review. This experiment is apparently meant to lower the contribution barrier so it's nice to see the project moving forward on this topic and attempt to favor diversity by contribution. On the other hand, the choice of Phabricator was quite controversial. From the beginning (see replies to the announcement email linked above), several people expressed concerns (about notifications notably) and some reviewers also complained about the increase of review load and loss of efficiency induced by the new system. A survey recently addressed to the whole community apparently (no official report yet at the time of this writing) confirms that most employees from Facebook or Google seem pretty happy with the experiment while other community members generally dislike it. Despite that, it was decided to keep the "experiment" going while trying to improve the notification system (better threading support, more diff context, subscription-based notification, etc.). That may work, but there's a risk of community split as non-corporate members might feel left aside. Overall, adopting a consensus-based decision model on such important aspects would be beneficial.

Yet another interesting discussion took place around branching models. Noticeably, Facebook and Google people presented their recommended (internal) workflow respectively based on remotenames and local bookmarks while Pulkit Goyal presented the current state to the topics extension and associated workflows. Bridging the gap between all these approaches would be nice and it seems that a first step would be to agree upon the definition of a stack of changesets to define a current line of work. In particular, both the topics extension and the show command have their own definition, which should be aligned.

(Comments on reddit.)

blog entry of

For about a year, Logilab has been involved in Mercurial development in the framework of a Mozilla Open Source Support (MOSS) program. Mercurial is a foundational technology for Mozilla, as the VCS used for the development of Firefox. As the main protagonist of the program, I'd first like to mention this has been a very nice experience for me, both from the social and technical perspectives. I've learned a lot and hope to continue working on this nice piece of software.

The general objective of the program was to improve "code archaeology" workflows, especially when using hgweb (the web UI for Mercurial). Outcomes of program spread from versions 4.1 to 4.4 of Mercurial and I'm going to present the main (new) features in this post.

Better navigation in "blame/annotate" view (hgweb)

The first feature concerns the "annotate" view in hgweb; the idea was to improve navigation along "blamed" revisions, a process that is often tedious and involves a lot of clicks in the web UI (not easier from the CLI, for that matter). Basically, we added an hover box on the left side of file content displaying more context on blamed revision along with a couple of links to make navigation easier (links to parent changesets, diff and changeset views). See below for an example about mercurial.error module (to try it at

The hover box in annotate view in hgweb.


While this wasn't exactly in the initial scope of the program, the most interesting result of my work is arguably the introduction of the "followlines" feature set. The idea is to build upon the log command instead of annotate to make it easier to follow changes across revisions and the new thing is to make this possible by filtering changes only affecting a block of lines in a particular file.

The first component introduced a revset, named followlines, which accepts at least two arguments a file path and a line range written as fromline:toline (following Python slice syntax). For instance, say we are interested the history of LookupError class in mercurial/ module which, at tag 4.3, lives between line 43 and 59, we'll use the revset as follows:

$ hg log -r 'followlines(mercurial/, 43:59)'
changeset:   7633:08cabecfa8a8
user:        Matt Mackall <>
date:        Sun Jan 11 22:48:28 2009 -0600
summary:     errors: move revlog errors

changeset:   24038:10d02cd18604
user:        Martin von Zweigbergk <>
date:        Wed Feb 04 13:57:35 2015 -0800
summary:     error: store filename and message on LookupError for later

changeset:   24137:dcfdfd63bde4
user:        Siddharth Agarwal <>
date:        Wed Feb 18 16:45:16 2015 -0800
summary:     error.LookupError: rename 'message' property to something

changeset:   25945:147bd9e238a1
user:        Gregory Szorc <>
date:        Sat Aug 08 19:09:09 2015 -0700
summary:     error: use absolute_import

changeset:   34016:6df193b5c437
user:        Yuya Nishihara <>
date:        Thu Jun 01 22:43:24 2017 +0900
summary:     py3: implement __bytes__() on most of our exception classes

This only yielded changesets touching this class.

This is not an exact science (because the algorithm works on diff hunks), but in many situations it gives interesting results way faster that with an iterative "annotate" process (in which one has to step from revision to revision and run annotate every times).

The followlines() predicate accepts other arguments, in particular, descend which, in combination with the startrev, lets you walk the history of a block of lines in the descending direction of history. Below the detailed help (hg help revset.followlines) of this revset:

$ hg help revsets.followlines
    "followlines(file, fromline:toline[, startrev=., descend=False])"
      Changesets modifying 'file' in line range ('fromline', 'toline').

      Line range corresponds to 'file' content at 'startrev' and should hence
      be consistent with file size. If startrev is not specified, working
      directory's parent is used.

      By default, ancestors of 'startrev' are returned. If 'descend' is True,
      descendants of 'startrev' are returned though renames are (currently)
      not followed in this direction.

In hgweb

The second milestone of the program was to port this followlines filtering feature into hgweb. This has been implemented as a line selection mechanism (using mouse) reachable from both the file and annotate views. There, you'll see a small green ± icon close to the line number. By clicking on this icon, you start the line selection process which can be completed by clicking on a similar icon on another line of the file.

Starting a line selection for followlines in hgweb.

After that, you'll see a box inviting you to follow the history of lines <selected range> , either in the ascending (older) or descending (newer) direction.

Line selection completed for followlines in hgweb.

Here clicking on the "newer" link, we get:

Followlines result in hgweb.

As you can notice, this gives a similar result as with the command line but it also displays the patch of each changeset. In these patch blocks, only the diff hunks affecting selected line range are displayed (sometimes with some extra context).

What's next?

At this point, the "followlines" feature is probably complete as far as hgweb is concerned. In the remainder of the MOSS program, I'd like to focus on a command-line interface producing a similar output as the one above in hgweb, i.e. filtering patches to only show followed lines. That would take the form of a --followlines/-L option to hg log command, e.g.:

$ hg log -L mercurial/,43:59 -p

That's something I'd like to tackle at the upcoming Mercurial 4.4 sprint in Dublin!

blog entry of