We are planning a one day coding sprint on scikits.learn the 1st April.
Venues, or remote participation on IRC are more than welcome !
More information can be found on the wiki:
Logilab.org - en
We are planning a one day coding sprint on scikits.learn the 1st April.
More information can be found on the wiki:
We're very happy to host the Distutils2 sprint this week in Paris.
The sprint has started yesterday with some of Logilab's developers and others contributors. We'll sprint during 4 days, trying to pull up the new python package manager.
Let's sumarize this first day:
Join us on IRC at #distutils on irc.freenode.net !
Recent discussions on the #disutils irc channel and with my logilab co-workers led me to the following conclusions:
I would define a software distribution as :
Pypi is a public index where:
Pypi is not a software distribution, it is a software index.
There is a long way from the pure source used by the developer to the software installed on the system of the end user.
First, the source must be extracted from a (D)VCS to make a version tarball, while executing several release specific actions (eg: changelog generation from a tracker) Second, the version tarball is used to generate a platform independent build, while executing several build steps (eg, Cython compilation into C files or documentation generation). Third, the platform independent build is used to generate a platform dependant build, while executing several platforms dependant build (eg, compilation of C extension). Finally, the platform dependant build is installed and each file gets dispatched to its proper location during the installation process.
Pieces of software can be distributed as development snapshots taken from the (D)VCS, version tarballs, source packages, platform independent package or platform dependent package.
Distribution packagers usually have the necessary infrastructure and skills to build packages from version tarballs. Moreover they might have specific needs that require as much control as possible over the various build steps. For example:
Standard users want it to "just work". They prefer simple and quick ways to install stuff. Build steps done on their machine increase the duration of the installation, add potential new dependencies and may trigger an error. Standard users are very disappointed when an installed failed because an error occurred while building the documentation. User give up when they have to download extra dependency and setup complicated compilation environment.
Users want as many build steps as possible to be done by someone else. That's why many users usually choose a distribution that do the job for them (eg, ubuntu, red-hat, python xy)
But there are several situations where the user can't rely on his distribution to install python software:
When this happens, the user will use Pypi to fetch python packages. To help them, Pypi accepts binary packages of python modules and people have developed dedicated tools that ease installation of packages and their dependencies: pip, easy_install.
Pip + Pypi provides the tools of a distribution without its consistency. This is better than nothing.
Pypi should contain version tarballs of all known python modules. It is the first purpose of an index. Version tarball should let distribution and power user perform as many build steps as possible. Pypi will continue to be used as a distribution by people without a better option. Packages provided to these users should require as little as possible to be installed, meaning they either have no build step to perform or have only platforms dependent build step (that could not be executed by the developer).
If the incoming distutils2 provides a way to differentiate platform dependent build steps from platform independent ones, python developers will be able to upload three different kind of package on Pypi.
(Image under creative commons Card File by-nc-nd by Mr. Ducke / Matt, Thomas Fisher Rare Book Library by bookchen, package! by Beck Gusler, Cheese Factory by James Yu)
I am pleased to annouce the 0.2 release of lutin77 for running Fortran 77 tests by using a C compiler as the only dependency. Moreover this very light framework of 97 lines of C code makes a very good demo of Fortran and C interfacing. The next level could be to write it in GAS (GNU Assembler).
For the over excited maintainers of legacy code, here comes a screenshot:
$ cat test_error.f subroutine success end subroutine error integer fid open(fid, status="old", file="nofile.txt") write(fid, *) "Ola" end subroutine checke call check(.true.) call check(.false.) call abort end program run call runtest("error") call runtest("success") call runtest("absent") call runtest("checke") call resume end
Then you can build the framework by:
$ gcc -Wall -pedantic -c lutin77.c
An now run your tests:
$ gfortran -o test_error test_error.f lutin77.o -ldl -rdynamic $ ./test_error At line 6 of file test_error.f Fortran runtime error: File 'nofile.txt' does not exist Error with status 512 for the test "error". "absent" test not found. Failure at check statement number 2. Error for the test "checke". 4 tests run (1 PASSED, 0 FAILED, 3 ERRORS)
See also the list of test frameworks for Fortran.
At Logilab, we have the pleasure to host a distutils2 sprint in January. Sprinters are welcome in our Paris office from 9h on the 27th of January to 19h the 30th of January. This sprint will focus on polishing distutils2 for the next alpha release and on the install/remove scripts.
Distutils2 is an important project for Python. Every contribution will help to improve the current state of packaging in Python. See the wiki page on python.org for details about participation. If you can't attend or join us in Paris, you can participate on the #distutils channel of the freenode irc network
At Logilab, we work a lot with virtual machines for testing and developping code on customers architecture. We access virtual machines through the network and copy data with scp command. However in case you get a network failure, there is still a way to access your data by mounting a rescue disk on the virtual machine. The following commands will use qemu but the idea could certainly be adapted for others emulators.
For later mounting the rescue disk on your system, it is necessary to use the raw image format (by default on qemu):
$ qemu-img create data-rescue.img 10M
Then run your virtual machine with the 'data-rescue.img' attached (you need to add a disk storage on virtmanager). Once in your virtual system, you will have to partition and format your new hard disk. As a an example with Linux (win32 users will prefer right clicks):
$ fdisk /dev/sdb $ mke2fs -j /dev/sdb1
Then the new disk can be mounted and used:
$ mount /dev/sdb1 /media/usb $ cp /home/dede/important-customer-code.tar.bz2 /media/usb $ umount /media/usb
You can then stop your virtual machine.
You will then have to carry your 'data-rescue.img' on a system where you can mount a file with the 'loop' option. But first we need to find where our partition start:
$ fdisk -ul data.img You must set cylinders. You can do this from the extra functions menu. Disk data.img: 0 MB, 0 bytes 255 heads, 63 sectors/track, 0 cylinders, total 0 sectors Units = sectors of 1 * 512 = 512 bytes Disk identifier: 0x499b18da Device Boot Start End Blocks Id System data.img1 63 16064 8001 83 Linux
Now we can mount the partition and get back our code:
$ mkdir /media/rescue $ mount -o loop,offset=$((63 * 512)) data-rescue.img /media/rescue/ $ ls /media/rescue/ important-customer-code.tar.bz2
The 2to3 script is a very useful tool. We can just use it to run over all code base, and end up with a python3 compatible code whilst keeping a python2 code base. To make our code python3 compatible, we do (or did) two things:
However, we not only want to have one python3 compatible version, but also keep developping our software. Hence, we want to be able to easily test it for both python2 and python3. Furthermore if we use patches to get nice commits, this is starting to be quite messy. Let's consider this in the case of Pylint. Indeed, the workflow described before proved to be unsatisfying.
Considering all that, let's investigate 2to3: it comes with a lot of fixers that can be activated or desactived. Now, a lot of them fix just very seldom use cases or stuff deprecated since years. On the other hand, the 2to3 fixers work with regular expressions, so the more we remove, the faster 2to3 should be. Depending on the project, most cases will just not appear, and for the others, we should be able to find other means of disabling them. The lists proposed here after are just suggestions, it will depend on the source base and other overall considerations which and how fixers could actually be disabled.
Following fixers are 2.x compatible and should be run once and for all (and can then be disabled on daily conversion usage):
This can be fixed using imports from a "compat" module like the logilab.common.compat module which holds convenient compatible objects.
Maybe they could also be handled by compat:
For print for example, we could think of a once-and-for-all custom fixer, that would replace it by a convenient echo function (or whatever name you like) defined in compat.
Following issues could probably be fixed manually:
These changes seem to be necessary:
Consider however that a lot of them might never be used in some projects, like long, funcattrs, methodattrs and numliterals or even metaclass. Also, isinstance is probably motivated by long to int and unicode to str conversions and hence might also be somehow avoided.
Can we fix these one also with compat ?
Pylint is a special case since its test suite has a lot of bad and deprecated code which should stay there. However, in order to have a reasonable work flow, it seems that something must be done to reduce the 1:30 minutes of 2to3 parsing of the tests. Probably nothing could be gained from the above considerations since most cases just should be in the tests, and actually are. Realise that We can expect to be supporting python2 and python3 for several years in parallel.
After a quick look, we see that 90 % of the refactorings of test/input files are just concerning the print statements; more over most of them have nothing to do with the tested functionality. Hence a solution might be to avoid to run 2to3 on the test/input directory, since we already have a mechanism to select depending on python version whether a test file should be tested or not. To some extend, astng is a similar case, but the test suite and the whole project is much smaller.
The version 3 of Python is incompatible with the 2.x series. In order to make pylint usable with Python3, I did some work on making the logilab-common library Python3 compatible, since pylint depends on it.
The strategy is to have one source code version, and to use the 2to3 tool for publishing a Python3 compatible version.
The first problem was that we use the pytest runner, that depends on logilab.common.testlib which extends the unittest module.
Without major modification we could use unittest2 instead of unittest in Python2.6. I thought that the unittest2 module was equivalent to the unittest in Python3, but then realized I was wrong:
I did not investigate if there are other unittest and unittest2 versions corresponding.
What we can see is that the 3.1 version of unittest is different from everything else; whereas the 2.6-unittest2 is equivalent to 3.2-unittest. So, after trying to run pytest on Python3.1 and since there is a backport of unittest2 for Python3.1, it became clear that the best is to ignore py3.1-unittest and work on Python3.2 and unittest2 directly.
Meanwhile, some work was being done on logilab-common to switch from unittest to unittest2. This was included in logilab.common-0.52.
The -3 option of python2.6 warns about Python3 incompatible stuff.
Since I already knew that pytest would work with unittest2, I wanted to know as fast as possible if pytest would run on Python3.x. So I run all logilab.common tests with "python2.6 -3 bin/pytest" and found a couple of problems that I quick-fixed or discarded, waiting to know the real solution.
The 2to3 script (from the 2to3 library) does its best to transform Python2.x code into Python3 compatible code, but manual work is often needed to handle some cases. For example file is not considered a deprecated base class, calls to raw_input(...) are handled but not using raw_input as an instance attribute, etc. At times, 2to3 can be overzealous, and for example do modifications such as:
- for name, local_node in node.items(): + for name, local_node in list(node.items()):
After a while, I found that the best solution was to adopt the following working procedure:
2to3-2.6 -n -w *py test/*py ureports/*py
Since we are in a mercurial repository we don't need backups (-n) and we can write the modifications to the files directly (-w).
I used two repositories when working on logilab.common, one for Python2 and one for Python3, because other tools, like astng and pylint, depend on that library. Setting the PYTHONPATH was enough to get astng and pylint to use the right version.
import __builtin__ as builtins # 2to3 will tranform '__builtin__' to 'builtins'
The most difficult point is the replacement of str/unicode by bytes/str.
In Python3.x, we only use unicode strings called just str (the u'' syntax and unicode disappear), but everything written on disk will have to be converted to bytes, with some explicit encoding. In Python3.x, file descriptors have a defined encoding, and will automatically transform the strings to bytes.
I wrote two functions in logilab.common.compat. One converts str to bytes and the other simply ignores the encoding in case of 3.x where it was expected in 2.x. But there might be a need to write additional tests to make sure the modifications work as expected.
As a general conclusion, I found no need for using sa2to3, although it might be a very good tool. I would instead suggest to have a small compat module and keep only one version of the code, as far as possible. The code base being either on 2.x or on 3.x and using the (possibly customized) 2to3 or 3to2 scripts to publish two different versions.
SemWeb.Pro, the first french conference dedicated to the Semantic Web will take place in Paris on January 17/18 2011.
One day of talks, one day of tutorials.
Want to grok the Web 3.0? Be there.
Something you want to share? Call for papers ends on October 15, 2010.
logilab-common library contains a lot of utilities which are often unknown. I will write a series of blog entries to explore nice features of this library.
We will begin with the logilab.common.deprecation module which contains utilities to warn users when:
When a function or a method is deprecated, you can use the deprecated decorator. It will print a message to warn the user that the function is deprecated.
The decorator takes two optional arguments:
We have a class Person defined in a file person.py. The get_surname method is deprecated, we must use the get_lastname method instead. For that, we use the deprecated decorator on the get_surname method.
from logilab.common.deprecation import deprecated class Person(object): def __init__(self, firstname, lastname): self._firstname = firstname self._lastname = lastname def get_firstname(self): return self._firstname def get_lastname(self): return self._lastname @deprecated('[1.2] use get_lastname instead') def get_surname(self): return self.get_lastname() def create_user(firstname, lastname): return Person(firstname, lastname) if __name__ == '__main__': person = create_user('Paul', 'Smith') surname = person.get_surname()
When running person.py we have the message below:
Now we moved the class Person in a new_person.py file. We notice in the person.py file that the class has been moved:
from logilab.common.deprecation import class_moved import new_person Person = class_moved(new_person.Person) if __name__ == '__main__': person = Person('Paul', 'Smith')
When we run the person.py file, we have the following message:
The class_moved function takes one mandatory argument and two optional:
The class_renamed function automatically creates a class which fires a DeprecationWarning when instantiated.
The function takes two mandatory arguments and one optional:
We now rename the Person class into User class in the new_person.py file. Here is the new person.py file:
from logilab.common.deprecation import class_renamed from new_person import User Person = class_renamed('Person', User) if __name__ == '__main__': person = Person('Paul', 'Smith')
When running person.py, we have the following message:
The moved function is used to tell that a callable has been moved to a new module. It returns a callable wrapper, so that when the wrapper is called, a warning is printed telling where the object can be found. Then the import is done (and not before) and the actual object is called.
The usage is somewhat limited on classes since it will fail if the wrapper is used in a class ancestors list: use the class_moved function instead (which has no lazy import feature though).
The moved function takes two mandatory parameters:
We will use in person.py, the create_user function which is now defined in the new_person.py file:
from logilab.common.deprecation import moved create_user = moved('new_person', 'create_user') if __name__ == '__main__': person = create_user('Paul', 'Smith')
When running person.py, we have the following message: