The version 3 of Python is incompatible with the 2.x series.
In order to make pylint usable with Python3, I did some work on making the logilab-common library Python3 compatible, since pylint depends on it.
The strategy is to have one source code version, and to use the 2to3 tool for publishing a Python3 compatible version.
The first problem was that we use the pytest runner, that depends on logilab.common.testlib which extends the unittest module.
Without major modification we could use unittest2 instead of unittest in Python2.6. I thought that the unittest2 module was equivalent to the unittest in Python3, but then realized I was wrong:
- Python3.1/unittest is some strange "forward port" of unittest.
Both are a single file, but they must be quite different
since 3.1 has 1623 lines compared to 875 from 2.6...
- Python2.x/unittest2 is a python package, backported from the
alpha-release of Python3.2/unittest.
I did not investigate if there are other unittest and
unittest2 versions corresponding.
What we can see is that the 3.1 version of unittest
is different from everything else; whereas the 2.6-unittest2 is
equivalent to 3.2-unittest. So, after trying to run pytest on
Python3.1 and since there is a backport of unittest2 for
Python3.1, it became clear that the best is to ignore
py3.1-unittest and work on Python3.2 and unittest2 directly.
Meanwhile, some work was being done on logilab-common to
switch from unittest to unittest2. This was included in logilab.common-0.52.
The -3 option of python2.6 warns about Python3 incompatible
stuff.
Since I already knew that pytest would work with unittest2, I
wanted to know as fast as possible if pytest would run on
Python3.x. So I run all logilab.common tests with
"python2.6 -3 bin/pytest" and found a couple of problems that I
quick-fixed or discarded, waiting to know the real solution.
The 2to3 script (from the 2to3 library) does its best to
transform Python2.x code into Python3 compatible code, but manual
work is often needed to handle some cases. For example file is
not considered a deprecated base class, calls to raw_input(...)
are handled but not using raw_input as an instance attribute,
etc. At times, 2to3 can be overzealous, and for example do
modifications such as:
- for name, local_node in node.items():
+ for name, local_node in list(node.items()):
After a while, I found that the best solution was to adopt
the following working procedure:
- run the tests with python2.6 -3 and solve the appearing
issues.
- run 2to3 on all that has to be transformed:
2to3-2.6 -n -w *py test/*py ureports/*py
Since we are in a mercurial repository we don't need backups
(-n) and we can write the modifications to the files directly
(-w).
create a 223.diff patch that will be applied and removed
repeatedly.
Now, we will push and pop this patch (which is much faster than
running 2to3), and only regenerate it from time to time to make
sure it still works:
run "python3.2 bin/pytest -x", to find problems and solutions
for crashes and tests that do not work. Note that after some
quick fixes on logilab.common.testlib, pytest works quite
well, and that we can use the "-x" option. Using Python's
Whatsnew_3.0 documentation for hints is quite useful.
hg qpop 223.diff
write the solution into the 2.x code, convert it into a patch or
a commit, and run the tests: some trivial things might not work
or not be 2.4 compatible.
hg qpush 223.diff
repeat the procedure
I used two repositories when working on logilab.common, one for
Python2 and one for Python3, because other tools, like astng and
pylint, depend on that library. Setting the PYTHONPATH was
enough to get astng and pylint to use the right version.
We had to remove "os.path.walk" by replacing it with "os.walk".
The renaming of raw_input to input, __builtin__ to
builtins and IOString to io could easily be resolved by
using the improved logilab.common.compat technique: write a
python version dependent definition of a variable, function, or
class in logilab.common.compat and import it from there.
For builtin, it is even easier: as 2to3 recognizes direct
imports, so we can write in compat.py:
import __builtin__ as builtins # 2to3 will tranform '__builtin__' to 'builtins'
The most difficult point is the replacement of str/unicode by
bytes/str.
In Python3.x, we only use unicode strings called just str (the
u'' syntax and unicode disappear), but everything written on
disk will have to be converted to bytes, with some explicit
encoding. In Python3.x, file descriptors have a defined encoding,
and will automatically transform the strings to bytes.
I wrote two functions in logilab.common.compat. One converts
str to bytes and the other simply ignores the encoding in case
of 3.x where it was expected in 2.x. But there might be a need to
write additional tests to make sure the modifications work as
expected.
- After less than a week of work, most of the logilab.common
tests pass. The biggest remaining problem are the tests for
testlib.py. But we can already start working on the Python3
compatibility for astng and finally pylint.
- Looking at the lib2to3 library, one can see that 2to3 works with
regular expressions which reproduce the Python grammar.
Hence, it can not do much code investigation or static
inference like astng. I think that using
astng, we could improve 2to3 without too much effort.
- for astng the difficulties are quite different: syntax changes
become semantic changes, we will have to add new types of astng
nodes.
- For testing astng and pylint we will probably have to check
the different test examples, a lot of them being code snippets
which 2to3 will not parse; they will have to be corrected by
hand.
As a general conclusion, I found no need for using sa2to3,
although it might be a very good tool. I would instead suggest to
have a small compat module and keep only one version of the
code, as far as possible. The code base being either on 2.x or on
3.x and using the (possibly customized) 2to3 or 3to2 scripts
to publish two different versions.