Supporting _ast and compiler

Python 2.5 introduces a new module _ast for Abstract Syntax Tree (AST) representation of python code. This module is quite faster than the compiler.ast representation that logilab-astng (and therefore pylint) used until now and the compiler module was removed in Python 3.0.

Faster is good, but the representations of python code are quite different in _ast and in compiler : some nodes exist in one AST but not the other and almost all child nodes have different names.

We had to engage in a big refactoring to use the new _ast module, since we wanted to stay compatible with python version < 2.5, which meant keeping the compiler module support. A lot of work was done to find a common representation for the two different trees. In most cases we used _ast-like representations and names, but in some cases we kept ideas or attribute names of compiler.

Abstract Syntax Trees

Let's look at an example to compare both representations. Here is a seamingly harmless snippet of code:

CODE = """
if cond:
    del delvar
elif next:
    print
"""

Now, compare the respective _ast and compiler representations (nodes are in upper case and their attributes are in lower case).

compiler representation

Module
    node =
    Stmt
        nodes = [
        If
            tests = [
            Name
                name = 'cond'
            Stmt
                nodes = [
                AssName
                    flags = 'OP_DELETE'
                    name = 'delvar'
                ]
            Name
                name = 'next'
            Stmt
                nodes = [
                Printnl
                ]
            ]

_ast representation

Module
    body = [
    If
        test =
        Name
            id = 'cond'
        body = [
        Delete
            targets = [
            Name
                id = 'delvar'
            ]
        ]
        orelse = [
        If
            test =
            Name
                id = 'next'
            body = [
            Print
                nl = True
            ]
        ]
    ]

Can you spot any differences? I would say, they differ quite a lot... For instance, compiler turns a "elif" statements into a list called 'tests' when _ast treats "elif cond:" as if it were "else:if cond:".

Tree Rebuilding

We transform these trees by renaming attributes and nodes, or removing or introducing new ones: with compiler, we remove the Stmt node, introduce a Delete node, and recursively build the If nodes coming from an "elif"; and with _ast, we reintroduce the AssName node. This might be only a temporary step towards full _ast like representation.

This is done by the TreeRebuilder Visitors, one for each representation, which are respectively in astng._nodes_compiler and astng._ast.

In the simplest case, the TreeRebuilder method looks like this (_nodes_compiler):

def visit_list(self, node):
    node.elts = node.nodes
    del node.nodes

(and nothing to do for _ast).

So, after doing all this and a lot more, we get the following representation from both input trees:

Module()
    body = [
    If()
        test =
        Name(cond)
        body = [
        Delete()
            targets = [
            DelName(delvar)
            ]
        ]
        orelse = [
        If()
            test =
            Name(next)
            body = [
            Print()
                dest =
                None
                values = [
                ]
            ]
            orelse = [
            ]
        ]
    ]

Faster towards Py3k

Of course, you can imagine these modifications had some API repercussions, and thus required a lot of smaller Pylint modifications. But all was done so that you should see no difference in Pylint's behavior using either python <2.5 or python >=2.5, except that with the _ast module pylint is around two times faster!

Oh, and we fixed small bugs on the way and maybe introduced a few new ones...

Finally, it is a major step towards Pylint Py3k!

blog entry of

Logilab.org - en