Python3

The 2to3 script is a very useful tool. We can just use it to run over all code base, and end up with a python3 compatible code whilst keeping a python2 code base. To make our code python3 compatible, we do (or did) two things:

  • small python2 compatible modifications of our source code
  • run 2to3 over our code base to generate a python3 compatible version

However, we not only want to have one python3 compatible version, but also keep developping our software. Hence, we want to be able to easily test it for both python2 and python3. Furthermore if we use patches to get nice commits, this is starting to be quite messy. Let's consider this in the case of Pylint. Indeed, the workflow described before proved to be unsatisfying.

  • I have two repositories, one for python2, one for python3. On the python3 side, I run 2to3 and store the modifications in a patch or a commit.

  • Whenever I implement a fix or a functionality on either side, I have to test if it still works on the other side; but as the 2to3 modifications are often quite heavy, directly creating patches on one side and applying them on the other side won't work most of the time.

  • Now say, I implement something in my python2 base and hold it in a patch or commit it. I can then pull it to my python3 repo:

    • running 2to3 on all Pylint is quite slow: around 30 sec for Pylint without the tests, and around 2 min with the tests. (I'd rather not imagine how long it would take for say CubicWeb).

    • even if I have all my 2to3 modifications on a patch, it takes 5-6 sec to "qpush" or "qpop" them all. Commiting the 2to3 changes instead and using:

      hg pull -u --rebase
      

      is not much faster. If I don't use --rebase, I will have merges on each pull up. Furthermore, we often have either a patch application failure, merge conflict or end up with something which is not python3 compatible (like a newly introduced "except Error, exc").

  • So quite often, I will have to fix it with:

    hg revert -r REV <broken_files>
    2to3 -nw <broken_files>
    hg qref # or hg resolve -m; hg rebase -c
    
  • Suppose that 2to3 transition worked fine, or that we fixed it. I run my tests with python3 and see it does not work; so I modify the patch: it all starts again; and the new patch or the patch modification will create a new head in my python3 repo...

2to3 Fixers

Considering all that, let's investigate 2to3: it comes with a lot of fixers that can be activated or desactived. Now, a lot of them fix just very seldom use cases or stuff deprecated since years. On the other hand, the 2to3 fixers work with regular expressions, so the more we remove, the faster 2to3 should be. Depending on the project, most cases will just not appear, and for the others, we should be able to find other means of disabling them. The lists proposed here after are just suggestions, it will depend on the source base and other overall considerations which and how fixers could actually be disabled.

python2 compatible

Following fixers are 2.x compatible and should be run once and for all (and can then be disabled on daily conversion usage):

  • apply
  • execfile (?)
  • exitfunc
  • getcwdu
  • has_key
  • idioms
  • ne
  • nonzero
  • paren
  • repr
  • standarderror
  • sys_exec
  • tuple_params
  • ws_comma

compat

This can be fixed using imports from a "compat" module like the logilab.common.compat module which holds convenient compatible objects.

  • callable
  • exec
  • filter (Wraps filter() usage in a list call)
  • input
  • intern
  • itertools_imports
  • itertools
  • map (Wraps map() in a list call)
  • raw_input
  • reduce
  • zip (Wraps zip() usage in a list call)

strings and bytes

Maybe they could also be handled by compat:

  • basestring
  • unicode
  • print

For print for example, we could think of a once-and-for-all custom fixer, that would replace it by a convenient echo function (or whatever name you like) defined in compat.

manually

Following issues could probably be fixed manually:

  • dict (it fixes dict iterator methods; it should be possible to have code where we can disable this fixer)
  • import (Detects sibling imports; we could convert them to absolute import)
  • imports, imports2 (renamed modules)

necessary

These changes seem to be necessary:

  • except
  • long
  • funcattrs
  • future
  • isinstance (Fixes duplicate types in the second argument of isinstance(). For example, isinstance(x, (int, int)) is converted to isinstance(x, (int)))
  • metaclass
  • methodattrs
  • numliterals
  • next
  • raise

Consider however that a lot of them might never be used in some projects, like long, funcattrs, methodattrs and numliterals or even metaclass. Also, isinstance is probably motivated by long to int and unicode to str conversions and hence might also be somehow avoided.

don't know

Can we fix these one also with compat ?

  • renames
  • throw
  • types
  • urllib
  • xrange
  • xreadlines

2to3 and Pylint

Pylint is a special case since its test suite has a lot of bad and deprecated code which should stay there. However, in order to have a reasonable work flow, it seems that something must be done to reduce the 1:30 minutes of 2to3 parsing of the tests. Probably nothing could be gained from the above considerations since most cases just should be in the tests, and actually are. Realise that We can expect to be supporting python2 and python3 for several years in parallel.

After a quick look, we see that 90 % of the refactorings of test/input files are just concerning the print statements; more over most of them have nothing to do with the tested functionality. Hence a solution might be to avoid to run 2to3 on the test/input directory, since we already have a mechanism to select depending on python version whether a test file should be tested or not. To some extend, astng is a similar case, but the test suite and the whole project is much smaller.

blog entry of

Logilab.org - en