Python 2.5 introduces a new module _ast
for Abstract Syntax Tree (AST) representation of python code.
This module is quite faster than the compiler.ast representation that
logilab-astng (and therefore pylint) used until now and the compiler
module was removed in Python 3.0.
Faster is good, but the representations of python code are quite different in
_ast and in compiler : some nodes exist in one AST but not the other and
almost all child nodes have different names.
We had to engage in a big refactoring to use the new _ast module,
since we wanted to stay compatible with python version < 2.5,
which meant keeping the compiler module support. A lot of work was done
to find a common representation for the two different trees.
In most cases we used _ast-like representations and names, but in some
cases we kept ideas or attribute names of compiler.
Let's look at an example to compare both representations.
Here is a seamingly harmless snippet of code:
CODE = """
if cond:
del delvar
elif next:
print
"""
Now, compare the respective _ast and compiler representations
(nodes are in upper case and their attributes are in lower case).
Module
node =
Stmt
nodes = [
If
tests = [
Name
name = 'cond'
Stmt
nodes = [
AssName
flags = 'OP_DELETE'
name = 'delvar'
]
Name
name = 'next'
Stmt
nodes = [
Printnl
]
]
Module
body = [
If
test =
Name
id = 'cond'
body = [
Delete
targets = [
Name
id = 'delvar'
]
]
orelse = [
If
test =
Name
id = 'next'
body = [
Print
nl = True
]
]
]
Can you spot any differences? I would say, they differ quite a lot...
For instance, compiler turns a "elif" statements into a list called
'tests' when _ast treats "elif cond:" as if it were "else:if cond:".
We transform these trees by renaming attributes and nodes, or removing or introducing
new ones: with compiler, we remove the Stmt node,
introduce a Delete node, and recursively build the If nodes coming
from an "elif"; and with _ast, we reintroduce the AssName node.
This might be only a temporary step towards full _ast like representation.
This is done by the TreeRebuilder Visitors, one for each representation,
which are respectively in astng._nodes_compiler and astng._ast.
In the simplest case, the TreeRebuilder method looks like this
(_nodes_compiler):
def visit_list(self, node):
node.elts = node.nodes
del node.nodes
(and nothing to do for _ast).
So, after doing all this and a lot more, we get the following
representation from both input trees:
Module()
body = [
If()
test =
Name(cond)
body = [
Delete()
targets = [
DelName(delvar)
]
]
orelse = [
If()
test =
Name(next)
body = [
Print()
dest =
None
values = [
]
]
orelse = [
]
]
]
Of course, you can imagine these modifications had some API
repercussions, and thus required a lot of smaller Pylint modifications.
But all was done so that you should see no difference in Pylint's behavior
using either python <2.5 or python >=2.5, except that with the _ast
module pylint is around two times faster!
Oh, and we fixed small bugs on the way and maybe introduced a few new ones...
Finally, it is a major step towards Pylint Py3k!