blog entries created by Emile Anclin

Thoughts on the python3 conversion workflow

2010/11/30 by Emile Anclin

Python3

The 2to3 script is a very useful tool. We can just use it to run over all code base, and end up with a python3 compatible code whilst keeping a python2 code base. To make our code python3 compatible, we do (or did) two things:

  • small python2 compatible modifications of our source code
  • run 2to3 over our code base to generate a python3 compatible version

However, we not only want to have one python3 compatible version, but also keep developping our software. Hence, we want to be able to easily test it for both python2 and python3. Furthermore if we use patches to get nice commits, this is starting to be quite messy. Let's consider this in the case of Pylint. Indeed, the workflow described before proved to be unsatisfying.

  • I have two repositories, one for python2, one for python3. On the python3 side, I run 2to3 and store the modifications in a patch or a commit.

  • Whenever I implement a fix or a functionality on either side, I have to test if it still works on the other side; but as the 2to3 modifications are often quite heavy, directly creating patches on one side and applying them on the other side won't work most of the time.

  • Now say, I implement something in my python2 base and hold it in a patch or commit it. I can then pull it to my python3 repo:

    • running 2to3 on all Pylint is quite slow: around 30 sec for Pylint without the tests, and around 2 min with the tests. (I'd rather not imagine how long it would take for say CubicWeb).

    • even if I have all my 2to3 modifications on a patch, it takes 5-6 sec to "qpush" or "qpop" them all. Commiting the 2to3 changes instead and using:

      hg pull -u --rebase
      

      is not much faster. If I don't use --rebase, I will have merges on each pull up. Furthermore, we often have either a patch application failure, merge conflict or end up with something which is not python3 compatible (like a newly introduced "except Error, exc").

  • So quite often, I will have to fix it with:

    hg revert -r REV <broken_files>
    2to3 -nw <broken_files>
    hg qref # or hg resolve -m; hg rebase -c
    
  • Suppose that 2to3 transition worked fine, or that we fixed it. I run my tests with python3 and see it does not work; so I modify the patch: it all starts again; and the new patch or the patch modification will create a new head in my python3 repo...

2to3 Fixers

Considering all that, let's investigate 2to3: it comes with a lot of fixers that can be activated or desactived. Now, a lot of them fix just very seldom use cases or stuff deprecated since years. On the other hand, the 2to3 fixers work with regular expressions, so the more we remove, the faster 2to3 should be. Depending on the project, most cases will just not appear, and for the others, we should be able to find other means of disabling them. The lists proposed here after are just suggestions, it will depend on the source base and other overall considerations which and how fixers could actually be disabled.

python2 compatible

Following fixers are 2.x compatible and should be run once and for all (and can then be disabled on daily conversion usage):

  • apply
  • execfile (?)
  • exitfunc
  • getcwdu
  • has_key
  • idioms
  • ne
  • nonzero
  • paren
  • repr
  • standarderror
  • sys_exec
  • tuple_params
  • ws_comma

compat

This can be fixed using imports from a "compat" module like the logilab.common.compat module which holds convenient compatible objects.

  • callable
  • exec
  • filter (Wraps filter() usage in a list call)
  • input
  • intern
  • itertools_imports
  • itertools
  • map (Wraps map() in a list call)
  • raw_input
  • reduce
  • zip (Wraps zip() usage in a list call)

strings and bytes

Maybe they could also be handled by compat:

  • basestring
  • unicode
  • print

For print for example, we could think of a once-and-for-all custom fixer, that would replace it by a convenient echo function (or whatever name you like) defined in compat.

manually

Following issues could probably be fixed manually:

  • dict (it fixes dict iterator methods; it should be possible to have code where we can disable this fixer)
  • import (Detects sibling imports; we could convert them to absolute import)
  • imports, imports2 (renamed modules)

necessary

These changes seem to be necessary:

  • except
  • long
  • funcattrs
  • future
  • isinstance (Fixes duplicate types in the second argument of isinstance(). For example, isinstance(x, (int, int)) is converted to isinstance(x, (int)))
  • metaclass
  • methodattrs
  • numliterals
  • next
  • raise

Consider however that a lot of them might never be used in some projects, like long, funcattrs, methodattrs and numliterals or even metaclass. Also, isinstance is probably motivated by long to int and unicode to str conversions and hence might also be somehow avoided.

don't know

Can we fix these one also with compat ?

  • renames
  • throw
  • types
  • urllib
  • xrange
  • xreadlines

2to3 and Pylint

Pylint is a special case since its test suite has a lot of bad and deprecated code which should stay there. However, in order to have a reasonable work flow, it seems that something must be done to reduce the 1:30 minutes of 2to3 parsing of the tests. Probably nothing could be gained from the above considerations since most cases just should be in the tests, and actually are. Realise that We can expect to be supporting python2 and python3 for several years in parallel.

After a quick look, we see that 90 % of the refactorings of test/input files are just concerning the print statements; more over most of them have nothing to do with the tested functionality. Hence a solution might be to avoid to run 2to3 on the test/input directory, since we already have a mechanism to select depending on python version whether a test file should be tested or not. To some extend, astng is a similar case, but the test suite and the whole project is much smaller.


Notes on making "logilab-common" Py3k-compatible

2010/09/28 by Emile Anclin

The version 3 of Python is incompatible with the 2.x series. In order to make pylint usable with Python3, I did some work on making the logilab-common library Python3 compatible, since pylint depends on it.

The strategy is to have one source code version, and to use the 2to3 tool for publishing a Python3 compatible version.

Pytest vs. Unittest

The first problem was that we use the pytest runner, that depends on logilab.common.testlib which extends the unittest module.

Without major modification we could use unittest2 instead of unittest in Python2.6. I thought that the unittest2 module was equivalent to the unittest in Python3, but then realized I was wrong:

  • Python3.1/unittest is some strange "forward port" of unittest. Both are a single file, but they must be quite different since 3.1 has 1623 lines compared to 875 from 2.6...
  • Python2.x/unittest2 is a python package, backported from the alpha-release of Python3.2/unittest.

I did not investigate if there are other unittest and unittest2 versions corresponding.

What we can see is that the 3.1 version of unittest is different from everything else; whereas the 2.6-unittest2 is equivalent to 3.2-unittest. So, after trying to run pytest on Python3.1 and since there is a backport of unittest2 for Python3.1, it became clear that the best is to ignore py3.1-unittest and work on Python3.2 and unittest2 directly.

Meanwhile, some work was being done on logilab-common to switch from unittest to unittest2. This was included in logilab.common-0.52.

'python2.6 -3' and 2to3

The -3 option of python2.6 warns about Python3 incompatible stuff.

Since I already knew that pytest would work with unittest2, I wanted to know as fast as possible if pytest would run on Python3.x. So I run all logilab.common tests with "python2.6 -3 bin/pytest" and found a couple of problems that I quick-fixed or discarded, waiting to know the real solution.

The 2to3 script (from the 2to3 library) does its best to transform Python2.x code into Python3 compatible code, but manual work is often needed to handle some cases. For example file is not considered a deprecated base class, calls to raw_input(...) are handled but not using raw_input as an instance attribute, etc. At times, 2to3 can be overzealous, and for example do modifications such as:

-                for name, local_node in node.items():
+                for name, local_node in list(node.items()):

Procedure

After a while, I found that the best solution was to adopt the following working procedure:

  • run the tests with python2.6 -3 and solve the appearing issues.
  • run 2to3 on all that has to be transformed:
2to3-2.6 -n -w *py test/*py ureports/*py

Since we are in a mercurial repository we don't need backups (-n) and we can write the modifications to the files directly (-w).

  • create a 223.diff patch that will be applied and removed repeatedly.

    Now, we will push and pop this patch (which is much faster than running 2to3), and only regenerate it from time to time to make sure it still works:

  • run "python3.2 bin/pytest -x", to find problems and solutions for crashes and tests that do not work. Note that after some quick fixes on logilab.common.testlib, pytest works quite well, and that we can use the "-x" option. Using Python's Whatsnew_3.0 documentation for hints is quite useful.

  • hg qpop 223.diff

  • write the solution into the 2.x code, convert it into a patch or a commit, and run the tests: some trivial things might not work or not be 2.4 compatible.

  • hg qpush 223.diff

  • repeat the procedure

I used two repositories when working on logilab.common, one for Python2 and one for Python3, because other tools, like astng and pylint, depend on that library. Setting the PYTHONPATH was enough to get astng and pylint to use the right version.

Concrete examples

  • We had to remove "os.path.walk" by replacing it with "os.walk".

  • The renaming of raw_input to input, __builtin__ to builtins and IOString to io could easily be resolved by using the improved logilab.common.compat technique: write a python version dependent definition of a variable, function, or class in logilab.common.compat and import it from there.

    For builtin, it is even easier: as 2to3 recognizes direct imports, so we can write in compat.py:

import __builtin__ as builtins # 2to3 will tranform '__builtin__' to 'builtins'

The most difficult point is the replacement of str/unicode by bytes/str.

In Python3.x, we only use unicode strings called just str (the u'' syntax and unicode disappear), but everything written on disk will have to be converted to bytes, with some explicit encoding. In Python3.x, file descriptors have a defined encoding, and will automatically transform the strings to bytes.

I wrote two functions in logilab.common.compat. One converts str to bytes and the other simply ignores the encoding in case of 3.x where it was expected in 2.x. But there might be a need to write additional tests to make sure the modifications work as expected.

Conclusion

  • After less than a week of work, most of the logilab.common tests pass. The biggest remaining problem are the tests for testlib.py. But we can already start working on the Python3 compatibility for astng and finally pylint.
  • Looking at the lib2to3 library, one can see that 2to3 works with regular expressions which reproduce the Python grammar. Hence, it can not do much code investigation or static inference like astng. I think that using astng, we could improve 2to3 without too much effort.
  • for astng the difficulties are quite different: syntax changes become semantic changes, we will have to add new types of astng nodes.
  • For testing astng and pylint we will probably have to check the different test examples, a lot of them being code snippets which 2to3 will not parse; they will have to be corrected by hand.

As a general conclusion, I found no need for using sa2to3, although it might be a very good tool. I would instead suggest to have a small compat module and keep only one version of the code, as far as possible. The code base being either on 2.x or on 3.x and using the (possibly customized) 2to3 or 3to2 scripts to publish two different versions.


Astng 0.20.0 and Pylint 0.20.0 releases

2010/03/24 by Emile Anclin

We are happy to announce the Astng 0.20.0 and Pylint 0.20.0 releases.

Pylint is a static code checker based on Astng, both depending on logilab-common 0.49.

Astng

Astng 0.20.0 is a major refactoring: instead of parsing and modifying the syntax tree generated from python's _ast or compiler.ast modules, the syntax tree is rebuilt. Thus the code becomes much clearer, and all monkey patching will eventually disappear from this module.

Speed improvement is achieved by caching the parsed modules earlier to avoid double parsing, and avoiding some repeated inferences, all along fixing a lot of important bugs.

Pylint

Pylint 0.20.0 uses the new Astng, and fixes a lot of bugs too, adding some new functionality:

  • parameters with leading "_" shouldn't count as "local" variables
  • warn on assert( a, b )
  • warning if return or break inside a finally
  • specific message for NotImplemented exception

We would like to thank Chmouel Boudjnah, Johnson Fletcher, Daniel Harding, Jonathan Hartley, Colin Moris, Winfried Plapper, Edward K. Ream and Pierre Rouleau for their contributions, and all other people helping the project to progress.


Projman 0.14.0 includes a Graphical User Interface

2009/10/19 by Emile Anclin

Introduction

Projman is a project manager. With projman 0.14.0, the first sketch of a GUI has been updated, and important functionalities added. You can now easily see and edit task dependencies and test the resulting scheduling. Furthermore, a begin-after-end-previous constraint has been added which should really simplify the edition of the scheduling.

The GUI can be used the two following ways:

$ projman-gui
$ projman-gui <path/to/project.xml>

The file <path/to/project.xml> is the well known main file of a projman project. Starting projman-gui with no project.xml specified, or after opening a project, you can open an existing project simply with "File->Open". (For now, you can't create a new project with projman-gui.) You can edit the tasks and then save the modifications to the task file with "File->Save".

http://www.logilab.org/image/18731?vid=download

The Project tab

The Project tab shows simply the four needed files of a projman project for resources, activities, tasks and schedule.

Resources

The Resources tab presents the different resources:

  • human resources
  • resource roles describing the different roles that resources can play
  • Different calendars for different resources with their "offdays"

Activities

For now, the Activities tab is not implemented. It should show the planning of the activities for each resource and the progress of the project.

Tasks

The Tasks tab is for now the most important one; it shows a tree view of the task hierarchy, and for each task:

  • the title of the task,
  • the role for that task,
  • the load (time in days),
  • the scheduling type,
  • the list of the constraints for the scheduling,
  • and the description of the task,

each of which can be edited. You easily can drag and drop tasks inside the task tree and add and delete tasks and constraints.

See the attached screenshot of the projman-gui task panel.

Scheduling

In the Scheduling tab you can simply test your scheduling by clicking "START". If you expect the scheduling to take a longer time, you can modify the maximum time of searching a solution.

Known bugs

  • The begin-after-end-previous constraint does not work for a task having subtasks.
  • Deleting a task doesn't check for depending tasks, so scheduling won't work anymore.

Pylint and Astng support for the _ast module

2009/03/19 by Emile Anclin

Supporting _ast and compiler

Python 2.5 introduces a new module _ast for Abstract Syntax Tree (AST) representation of python code. This module is quite faster than the compiler.ast representation that logilab-astng (and therefore pylint) used until now and the compiler module was removed in Python 3.0.

Faster is good, but the representations of python code are quite different in _ast and in compiler : some nodes exist in one AST but not the other and almost all child nodes have different names.

We had to engage in a big refactoring to use the new _ast module, since we wanted to stay compatible with python version < 2.5, which meant keeping the compiler module support. A lot of work was done to find a common representation for the two different trees. In most cases we used _ast-like representations and names, but in some cases we kept ideas or attribute names of compiler.

Abstract Syntax Trees

Let's look at an example to compare both representations. Here is a seamingly harmless snippet of code:

CODE = """
if cond:
    del delvar
elif next:
    print
"""

Now, compare the respective _ast and compiler representations (nodes are in upper case and their attributes are in lower case).

compiler representation

Module
    node =
    Stmt
        nodes = [
        If
            tests = [
            Name
                name = 'cond'
            Stmt
                nodes = [
                AssName
                    flags = 'OP_DELETE'
                    name = 'delvar'
                ]
            Name
                name = 'next'
            Stmt
                nodes = [
                Printnl
                ]
            ]

_ast representation

Module
    body = [
    If
        test =
        Name
            id = 'cond'
        body = [
        Delete
            targets = [
            Name
                id = 'delvar'
            ]
        ]
        orelse = [
        If
            test =
            Name
                id = 'next'
            body = [
            Print
                nl = True
            ]
        ]
    ]

Can you spot any differences? I would say, they differ quite a lot... For instance, compiler turns a "elif" statements into a list called 'tests' when _ast treats "elif cond:" as if it were "else:if cond:".

Tree Rebuilding

We transform these trees by renaming attributes and nodes, or removing or introducing new ones: with compiler, we remove the Stmt node, introduce a Delete node, and recursively build the If nodes coming from an "elif"; and with _ast, we reintroduce the AssName node. This might be only a temporary step towards full _ast like representation.

This is done by the TreeRebuilder Visitors, one for each representation, which are respectively in astng._nodes_compiler and astng._ast.

In the simplest case, the TreeRebuilder method looks like this (_nodes_compiler):

def visit_list(self, node):
    node.elts = node.nodes
    del node.nodes

(and nothing to do for _ast).

So, after doing all this and a lot more, we get the following representation from both input trees:

Module()
    body = [
    If()
        test =
        Name(cond)
        body = [
        Delete()
            targets = [
            DelName(delvar)
            ]
        ]
        orelse = [
        If()
            test =
            Name(next)
            body = [
            Print()
                dest =
                None
                values = [
                ]
            ]
            orelse = [
            ]
        ]
    ]

Faster towards Py3k

Of course, you can imagine these modifications had some API repercussions, and thus required a lot of smaller Pylint modifications. But all was done so that you should see no difference in Pylint's behavior using either python <2.5 or python >=2.5, except that with the _ast module pylint is around two times faster!

Oh, and we fixed small bugs on the way and maybe introduced a few new ones...

Finally, it is a major step towards Pylint Py3k!


Pyreverse : UML Diagrams for Python

2008/12/23 by Emile Anclin

Pyreverse analyses Python code and extracts UML class diagrams and package depenndencies. Since september 2008 it has been integrated with Pylint (0.15).

Introduction

Pyreverse builds a diagram representation of the source code with:
  • class attributes, if possible with their type
  • class methods
  • inheritance links between classes
  • association links between classes
  • representation of Exceptions and Interfaces

Generation of UML diagrams with Pyreverse

The command pyreverse generates the diagrams in all formats that graphviz/dot knows, or in VCG :

The following command shows what dot knows:

$ dot -Txxx
Format: "xxx" not recognized. Use one of: canon cmap cmapx cmapx_np dia dot
eps fig gd gd2 gif hpgl imap imap_np ismap jpe jpeg jpg mif mp pcl pdf pic
plain plain-ext png ps ps2 svg svgz tk vml vmlz vrml vtx wbmp xdot xlib

pyreverse creates by default two diagrams:

$ pyreverse -o png -p Pyreverse pylint/pyreverse/
[...]
creating diagram packages_Pyreverse.png
creating diagram classes_Pyreverse.png
  • -o : sets the output format
  • -p name : yields the output files packages_name.png and classes_name.png

Options

One can modify the output with following options:

-a N, -A    depth of research for ancestors
-s N, -S    depth of research for associated classes
-A, -S      all ancestors, resp. all associated
-m[yn]      add or remove the module name
-f MOD      filter the attributes : PUB_ONLY/SPECIAL/OTHER/ALL
-k          show only the classes (no attributes and methods)
-b          show 'builtin' objects

Examples:

General Vue on a Module

pyreverse -ASmy -k -o png pyreverse/main.py -p Main
[image : classes_Main.png, class diagram with all dependencies]

full size image

With these options you can have a quick vue of the dependencies without being lost in endless lists of methods and attributes.

Detailed Vue on a Module

pyreverse -c PyreverseCommand -a1 -s1 -f ALL -o png  pyreverse/main.py
[image : PyreverseCommand.png, pyreverse.diagram.ClassDiagram class diagram with one dependency level]

module in full size image

Show all methods and attributes of the class (-f ALL). By default, the class diagram option -c uses the options -A, -S, -my, but here we desactivate them to get a reasonably small image.

Configuration File

You can put some options into the file ".pyreverserc" in your home directory.

Exemple:

--filter-mode=PUB_ONLY --ignore doc --ignore test
This will exclude documentation and test files in the doc and test directories. Also, we will see only "public" methods.