The version 3 of Python is incompatible with the 2.x series. In order to make pylint usable with Python3, I did some work on making the logilab-common library Python3 compatible, since pylint depends on it.
The strategy is to have one source code version, and to use the 2to3 tool for publishing a Python3 compatible version.
The first problem was that we use the pytest runner, that depends on logilab.common.testlib which extends the unittest module.
Without major modification we could use unittest2 instead of unittest in Python2.6. I thought that the unittest2 module was equivalent to the unittest in Python3, but then realized I was wrong:
- Python3.1/unittest is some strange "forward port" of unittest. Both are a single file, but they must be quite different since 3.1 has 1623 lines compared to 875 from 2.6...
- Python2.x/unittest2 is a python package, backported from the alpha-release of Python3.2/unittest.
I did not investigate if there are other unittest and unittest2 versions corresponding.
What we can see is that the 3.1 version of unittest is different from everything else; whereas the 2.6-unittest2 is equivalent to 3.2-unittest. So, after trying to run pytest on Python3.1 and since there is a backport of unittest2 for Python3.1, it became clear that the best is to ignore py3.1-unittest and work on Python3.2 and unittest2 directly.
Meanwhile, some work was being done on logilab-common to switch from unittest to unittest2. This was included in logilab.common-0.52.
The -3 option of python2.6 warns about Python3 incompatible stuff.
Since I already knew that pytest would work with unittest2, I wanted to know as fast as possible if pytest would run on Python3.x. So I run all logilab.common tests with "python2.6 -3 bin/pytest" and found a couple of problems that I quick-fixed or discarded, waiting to know the real solution.
The 2to3 script (from the 2to3 library) does its best to transform Python2.x code into Python3 compatible code, but manual work is often needed to handle some cases. For example file is not considered a deprecated base class, calls to raw_input(...) are handled but not using raw_input as an instance attribute, etc. At times, 2to3 can be overzealous, and for example do modifications such as:
- for name, local_node in node.items(): + for name, local_node in list(node.items()):
After a while, I found that the best solution was to adopt the following working procedure:
- run the tests with python2.6 -3 and solve the appearing issues.
- run 2to3 on all that has to be transformed:
2to3-2.6 -n -w *py test/*py ureports/*py
Since we are in a mercurial repository we don't need backups (-n) and we can write the modifications to the files directly (-w).
create a 223.diff patch that will be applied and removed repeatedly.
Now, we will push and pop this patch (which is much faster than running 2to3), and only regenerate it from time to time to make sure it still works:
run "python3.2 bin/pytest -x", to find problems and solutions for crashes and tests that do not work. Note that after some quick fixes on logilab.common.testlib, pytest works quite well, and that we can use the "-x" option. Using Python's Whatsnew_3.0 documentation for hints is quite useful.
hg qpop 223.diff
write the solution into the 2.x code, convert it into a patch or a commit, and run the tests: some trivial things might not work or not be 2.4 compatible.
hg qpush 223.diff
repeat the procedure
I used two repositories when working on logilab.common, one for Python2 and one for Python3, because other tools, like astng and pylint, depend on that library. Setting the PYTHONPATH was enough to get astng and pylint to use the right version.
We had to remove "os.path.walk" by replacing it with "os.walk".
The renaming of raw_input to input, __builtin__ to builtins and IOString to io could easily be resolved by using the improved logilab.common.compat technique: write a python version dependent definition of a variable, function, or class in logilab.common.compat and import it from there.
For builtin, it is even easier: as 2to3 recognizes direct imports, so we can write in compat.py:
import __builtin__ as builtins # 2to3 will tranform '__builtin__' to 'builtins'
The most difficult point is the replacement of str/unicode by bytes/str.
In Python3.x, we only use unicode strings called just str (the u'' syntax and unicode disappear), but everything written on disk will have to be converted to bytes, with some explicit encoding. In Python3.x, file descriptors have a defined encoding, and will automatically transform the strings to bytes.
I wrote two functions in logilab.common.compat. One converts str to bytes and the other simply ignores the encoding in case of 3.x where it was expected in 2.x. But there might be a need to write additional tests to make sure the modifications work as expected.
- After less than a week of work, most of the logilab.common tests pass. The biggest remaining problem are the tests for testlib.py. But we can already start working on the Python3 compatibility for astng and finally pylint.
- Looking at the lib2to3 library, one can see that 2to3 works with regular expressions which reproduce the Python grammar. Hence, it can not do much code investigation or static inference like astng. I think that using astng, we could improve 2to3 without too much effort.
- for astng the difficulties are quite different: syntax changes become semantic changes, we will have to add new types of astng nodes.
- For testing astng and pylint we will probably have to check the different test examples, a lot of them being code snippets which 2to3 will not parse; they will have to be corrected by hand.
As a general conclusion, I found no need for using sa2to3, although it might be a very good tool. I would instead suggest to have a small compat module and keep only one version of the code, as far as possible. The code base being either on 2.x or on 3.x and using the (possibly customized) 2to3 or 3to2 scripts to publish two different versions.