] > Main Blog (in English)

Main Blog (in English)

Discovering logilab-common Part 1 - deprecation module

2010/09/02 by Stéphanie Marcu

logilab-common library contains a lot of utilities which are often unknown. I will write a series of blog entries to explore nice features of this library.

We will begin with the logilab.common.deprecation module which contains utilities to warn users when:

  • a function or a method is deprecated
  • a class has been moved into another module
  • a class has been renamed
  • a callable has been moved to a new module

deprecated

When a function or a method is deprecated, you can use the deprecated decorator. It will print a message to warn the user that the function is deprecated.

The decorator takes two optional arguments:

  • reason: the deprecation message. A good practice is to specify at the beginning of the message, between brackets, the version number from which the function is deprecated. The default message is 'The function "[function name]" is deprecated'.
  • stacklevel: This is the option of the warnings.warn function which is used by the decorator. The default value is 2.

We have a class Person defined in a file person.py. The get_surname method is deprecated, we must use the get_lastname method instead. For that, we use the deprecated decorator on the get_surname method.

from logilab.common.deprecation import deprecated

class Person(object):

    def __init__(self, firstname, lastname):
        self._firstname = firstname
        self._lastname = lastname

    def get_firstname(self):
        return self._firstname

    def get_lastname(self):
        return self._lastname

    @deprecated('[1.2] use get_lastname instead')
    def get_surname(self):
        return self.get_lastname()

def create_user(firstname, lastname):
    return Person(firstname, lastname)

if __name__ == '__main__':
    person = create_user('Paul', 'Smith')
    surname = person.get_surname()

When running person.py we have the message below:

person.py:22: DeprecationWarning: [1.2] use get_lastname instead
surname = person.get_surname()

class_moved

Now we moved the class Person in a new_person.py file. We notice in the person.py file that the class has been moved:

from logilab.common.deprecation import class_moved
import new_person
Person = class_moved(new_person.Person)

if __name__ == '__main__':
    person = Person('Paul', 'Smith')

When we run the person.py file, we have the following message:

person.py:6: DeprecationWarning: class Person is now available as new_person.Person
person = Person('Paul', 'Smith')

The class_moved function takes one mandatory argument and two optional:

  • new_class: this mandatory argument is the new class
  • old_name: this optional argument specify the old class name. By default it is the same name than the new class. This argument is used in the default printed message.
  • message: with this optional argument, you can specify a custom message

class_renamed

The class_renamed function automatically creates a class which fires a DeprecationWarning when instantiated.

The function takes two mandatory arguments and one optional:

  • old_name: a string which contains the old class name
  • new_class: the new class
  • message: an optional message. The default one is '[old class name] is deprecated, use [new class name]'

We now rename the Person class into User class in the new_person.py file. Here is the new person.py file:

from logilab.common.deprecation import class_renamed
from new_person import User

Person = class_renamed('Person', User)

if __name__ == '__main__':
    person = Person('Paul', 'Smith')

When running person.py, we have the following message:

person.py:5: DeprecationWarning: Person is deprecated, use User
person = Person('Paul', 'Smith')

moved

The moved function is used to tell that a callable has been moved to a new module. It returns a callable wrapper, so that when the wrapper is called, a warning is printed telling where the object can be found. Then the import is done (and not before) and the actual object is called.

Note

The usage is somewhat limited on classes since it will fail if the wrapper is used in a class ancestors list: use the class_moved function instead (which has no lazy import feature though).

The moved function takes two mandatory parameters:

  • modpath: a string representing the path to the new module
  • objname: the name of the new callable

We will use in person.py, the create_user function which is now defined in the new_person.py file:

from logilab.common.deprecation import moved

create_user = moved('new_person', 'create_user')

if __name__ == '__main__':
    person = create_user('Paul', 'Smith')

When running person.py, we have the following message:

person.py:4: DeprecationWarning: object create_user has been moved to module new_person
person = create_user('Paul', 'Smith')

pdb.set_trace no longer working: problem solved

2010/08/12 by Alexandre Fayolle

I had a bad case of bug hunting today which took me > 5 hours to track down (with the help of Adrien in the end).

I was trying to start a CubicWeb instance on my computer, and was encountering some strange pyro error at startup. So I edited some source file to add a pdb.set_trace() statement and restarted the instance, waiting for Python's debugger to kick in. But that did not happen. I was baffled. I first checked for standard problems:

  • no pdb.py or pdb.pyc was lying around in my Python sys.path
  • the pdb.set_trace function had not been silently redefined
  • no other thread was bugging me
  • the standard input and output were what they were supposed to be
  • I was not able to reproduce the issue on other machines

After triple checking everything, grepping everywhere, I asked a question on StackOverflow before taking a lunch break (if you go there, you'll see the answer). After lunch, no useful answer had come in, so I asked Adrien for help, because two pairs of eyes are better than one in some cases. We dutifully traced down the pdb module's code to the underlying bdb and cmd modules and learned some interesting things on the way down there. Finally, we found out that the Python code frames which should have been identical where not. This discovery caused further bafflement. We looked at the frames, and saw that one of those frames's class was a psyco generated wrapper.

It turned out that CubicWeb can use two implementation of the RQL module: one which uses gecode (a C++ library for constraint based programming) and one which uses logilab.constraint (a pure python library for constraint solving). The former is the default, but it would not load on my computer, because the gecode library had been replaced by a more recent version during an upgrade. The pure python implementation tries to use psyco to speed up things. Installing the correct version of libgecode solved the issue. End of story.

When I checked out StackOverflow, Ned Batchelder had provided an answer. I didn't get the satisfaction of answering the question myself...

Once this was figured out, solving the initial pyro issue took 2 minutes...


EuroSciPy'10

2010/07/13 by Adrien Chauve
http://www.logilab.org/image/9852?vid=download

The EuroSciPy2010 conference was held in Paris at the Ecole Normale Supérieure from July 8th to 11th and was organized and sponsored by Logilab and other companies.

July, 8-9: Tutorials

The first two days were dedicated to tutorials and I had the chance to talk about SciPy with André Espaze, Gaël Varoquaux and Emanuelle Gouillart in the introductory track. This was nice but it was a bit tricky to present SciPy in such a short time while trying to illustrate the material with real and interesting examples. One very nice thing for the introductory track is that all the material was contributed by different speakers and is freely available in a github repository (licensed under CC BY).

July, 10-11: Scientific track

The next two days were dedicated to scientific presentations and why python is such a great tool to develop scientific software and carry out research.

Keynotes

I had a very great time listening to the presentations, starting with the two very nice keynotes given by Hans Petter Langtangen and Konrad Hinsen. The latter gave us a very nice summary of what happened in the scientific python world during the past 15 years, what is happening now and of course what could happen during the next 15 years. Using a crystal ball and a very humorous tone, he made it very clear that the challenge in the next years will be about how using our hundreds, thousands or even more cores in a bug-free and efficient way. Functional programming may be a very good solution to this challenge as it provides a deterministic way of parallelizing our programs. Konrad also provided some hints about future versions of python that could provide a deeper and more efficient support of functional programming and maybe the addition of a keyword 'async' to handle the computation of a function in another core.

In fact, the PEP 3148 entitled "Futures - execute computations asynchronously" was just accepted two days ago. This PEP describes the new package called "futures" designed to facilitate the evaluation of callables using threads and processes in future versions of python. A full implementation is already available.

Parallelization

Parallelization was indeed a very popular issue across presentations, and as for resolving embarrassingly parallel problems, several solutions were presented.

  • Playdoh: Distributes computations over computers connected to a secure network (see playdoh presentation).

    Distributing the computation of a function over two machines is as simple as:

    import playdoh
    result1, result2 = playdoh.map(fun, [arg1, arg2], _machines = ['machine1.network.com', 'machine2.network.com'])
    
  • Theano: Allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. In particular it can use GPU transparently and generate optimized C code (see theano presentation).

  • joblib: Provides among other things helpers for embarrassingly parallel problems. It's built over the multiprocessing package introduced in python 2.6 and brings more readable code and easier debugging.

Speed

Concerning speed, Fransesc Alted has showed us interesting tools for memory optimization currently successfully used in PyTables 2.2. You can read more details on these kind of optimizations in EuroSciPy'09 (part 1/2): The Need For Speed.

SCons

Last but not least, I talked with Cristophe Pradal who is one of the core developer of OpenAlea. He convinced me that SCons is worth using once you have built a nice extension for it: SConsX. I'm looking forward to testing it.


HOWTO install lodgeit pastebin under Debian/Ubuntu

2010/06/24 by Arthur Lutz

Lodge it is a simple open source pastebin... and it's written in Python!

The installation under debian/ubuntu goes as follows:

sudo apt-get update
sudo apt-get -uVf install python-imaging python-sqlalchemy python-jinja2 python-pybabel python-werkzeug python-simplejson
cd local
hg clone http://dev.pocoo.org/hg/lodgeit-main
cd lodgeit-main
vim manage.py

For debian squeeze you have to downgrade python-werkzeug, so get the old version of python-werkzeug from snapshot.debian.org at http://snapshot.debian.org/package/python-werkzeug/0.5.1-1/

wget http://snapshot.debian.org/archive/debian/20090808T041155Z/pool/main/p/python-werkzeug/python-werkzeug_0.5.1-1_all.deb

Modify the dburi and the SECRET_KEY. And launch application:

python manage.py runserver

Then off you go configure your apache or lighthttpd.

An easy (and dirty) way of running it at startup is to add the following command to the www-data crontab

@reboot cd /tmp/; nohup /usr/bin/python /usr/local/lodgeit-main/manage.py runserver &

This should of course be done in an init script.

http://rn0.ru/static/help/advanced_features.png

Hopefully we'll find some time to package this nice webapp for debian/ubuntu.


EuroSciPy 2010 schedule is out !

2010/06/06 by Nicolas Chauvat
https://www.euroscipy.org/data/logo.png

The EuroSciPy 2010 conference will be held in Paris from july 8th to 11th at Ecole Normale Supérieure. Two days of tutorials, two days of conference, two interesting keynotes, a lightning talk session, an open space for collaboration and sprinting, thirty quality talks in the schedule and already 100 delegates registered.

If you are doing science and using Python, you want to be there!


Salomé accepted into Debian unstable

2010/06/03 by Andre Espaze

Salomé is a platform for pre and post-processing of numerical simulation available at http://salome-platform.org/. It is now available as a Debian package http://packages.debian.org/source/sid/salome and should soon appear in Ubuntu https://launchpad.net/ubuntu/+source/salome as well.

http://salome-platform.org/salome_screens.png/image_preview

A difficult packaging work

A first package of Salomé 3 was made by the courageous Debian developper Adam C. Powell, IV on January 2008. Such packaging is very resources intensive because of the building of many modules. But the most difficult part was to bring Salomé to an unported environment. Even today, Salomé 5 binaries are only provided by upstream as a stand-alone piece of software ready to unpack on a Debian Sarge/Etch or a Mandriva 2006/2008. This is the first reason why several patches were required for adapting the code to new versions of the dependencies. The version 3 of Salomé was so difficult and time consuming to package that Adam decided to stop during two years.

The packaging of Salomé started back with the version 5.1.3 in January 2010. Thanks to Logilab and the OpenHPC project, I could join him during 14 weeks of work for adapting every module to Debian unstable. Porting to the new versions of the dependencies was a first step, but we had also to adapt the code to the Debian packaging philosophy with binaries, librairies and data shipped to dedicated directories.

A promising future

Salomé being accepted to Debian unstable means that porting it to Ubuntu should follow in a near future. Moreover the work done for adapting Salomé to a GNU/Linux distribution may help developpers on others platforms as well.

That is excellent news for all people involved in numerical simulation because they are going to have access to Salomé services by using their packages management tools. It will help the spreading of Salomé code on any fresh install and moreover keep it up to date.

Join the fun

For mechanical engineers, a derived product called Salomé-Méca has recently been published. The goal is to bring the functionalities from the Code Aster finite element solver to Salomé in order to ease simulation workflows. If you are as well interested in Debian packages for those tools, you are invited to come with us and join the fun.

I have submitted a proposal to talk about Salomé at EuroSciPy 2010. I look forward to meet other interested parties during this conference that will take place in Paris on July 8th-11th.


Enable and disable encrypted swap - Ubuntu

2010/05/18 by Arthur Lutz
http://ubuntu-party.org/wp-content/themes/ubuntu-party/scripts/timthumb.php?src=//wp-content/uploads/2010/04/evl-pochette21.png&w=210&h=192&zc=1&q=100

With the release of Ubuntu Lucid Lynx, the use of an encrypted /home is becoming a pretty common and simple to setup thing. This is good news for privacy reasons obviously. The next step which a lot of users are reluctant to accomplish is the use of an encrypted swap. One of the most obvious reasons is that in most cases it breaks the suspend and hibernate functions.

Here is a little HOWTO on how to switch from normal swap to encrypted swap and back. That way, when you need a secure laptop (trip to a conference, or situtation with risk of theft) you can active it, and then deactivate it when you're at home for example.

Turn it on

That is pretty simple

sudo ecryptfs-setup-swap

Turn it off

https://launchpadlibrarian.net/17699584/ecryptfs_64.png

The idea is to turn off swap, remove the ecryptfs layer, reformat your partition with normal swap and enable it. We use sda5 as an example for the swap partition, please use your own (fdisk -l will tell you which swap partition you are using - or in /etc/crypttab)

sudo swapoff -a
sudo cryptsetup remove /dev/mapper/cryptswap1
sudo vim /etc/crypttab
*remove the /dev/sda5 line*
sudo /sbin/mkswap /dev/sda5
sudo swapon /dev/sda5
sudo vim /etc/fstab
*replace /dev/mapper/cryptswap1 with /dev/sda5*

If this is is useful, you can probably stick it in a script to turn on and off... maybe we could get an ecryptfs-unsetup-swap into ecryptfs.


The DEBSIGN_KEYID trick

2010/05/12 by Nicolas Chauvat

I have been wondering for some time why debsign would not use the DEBSIGN_KEYID environment variable that I exported from my bashrc. Debian bug 444641 explains the trick: debsign ignores environment variables and sources ~/.devscripts instead. A simple export DEBSIGN_KEYID=ABCDEFG in ~/.devscripts is enough to get rid of the -k argument once and for good.


pylint bug days #2 report

2010/04/19 by Sylvain Thenault

First of all, I've to say that pylint bugs day wasn't that successful in term of 'community event': I've been sprinting almost alone. My Logilab's felows were tied to customer projects, and no outside people shown up on jabber. Fortunatly Tarek Ziade came to visit us, and that was a nice opportunity to talk about pylint, distribute, etc ... Thank you Tarek, you saved my day ;)

As I felt a bit alone, I decided to work on somethings funnier than bug fixing: refactoring!

First, I've greatly simplified the command line: enable-msg/enable-msg-cat/enable-checker/enable-report and their disable-* counterparts were all merged into single --enable/--disable options.

I've also simplified "pylint --help" output, providing a --long-help option to get what we had before. Generic support in `logilab.common.configuration of course.

And last but not least, I refactored pylint so we can have multiple checkers with the same name. The idea behind this is that we can split checker into smaller chunks, basically only responsible for one or a few related messages. When pylint runs, it only uses necessary checkers according to activated messages and reports. When all checkers will be splitted, it should improve performance of "pylint --error-only".

So, I can say I'm finally happy with the results of that pylint bugs day! And hopefuly we will be more people for the next edition...


Virtualenv - Play safely with a Python

2010/03/26 by Alain Leufroy
http://farm5.static.flickr.com/4031/4255910934_80090f65d7.jpg

virtualenv, pip and Distribute are tree tools that help developers and packagers. In this short presentation we will see some virtualenv capabilities.

Please, keep in mind that all above stuff has been made using : Debian Lenny, python 2.5 and virtualenv 1.4.5.

Abstract

virtualenv builds python sandboxes where it is possible to do whatever you want as a simple user without putting in jeopardy your global environment.

virtualenv allows you to safety:

  • install any python packages
  • add debug lines everywhere (not only in your scripts)
  • switch between python versions
  • try your code as you are a final user
  • and so on ...

Install and usage

Install

Prefered way

Just download the virtualenv python script at http://bitbucket.org/ianb/virtualenv/raw/tip/virtualenv.py and call it using python (e.g. python virtualenv.py).

For conveinience, we will refers to this script using virtualenv.

Other ways

For Debian (ubuntu as well) addicts, just do :

$ sudo aptitude install python-virtualenv

Fedora users would do:

$ sudo yum install python-virtualenv

And others can install from PyPI (as superuser):

$ pip install virtualenv

or

$ easy_install pip && pip install virtualenv

You could also get the source here.

Quick Guide

To work in a python sandbox, do as follow:

$ virtualenv my_py_env
$ source my_py_env/bin/activate
(my_py_env)$ python

"That's all Folks !"

Once you have finished just do:

(my_py_env)$ deactivate

or quit the tty.

What does virtualenv actually do ?

At creation time

Let's start again ... more slowly. Consider the following environment:

$ pwd
/home/you/some/where
$ ls

Now create a sandbox called my-sandbox:

$ virtualenv my-sandbox
New python executable in "my-sandbox/bin/python"
Installing setuptools............done.

The output said that you have a new python executable and specific install tools. Your current directory now looks like:

$ ls -Cl
my-sandbox/ README
$ tree -L 3 my-sandbox
my-sandbox/
|-- bin
|   |-- activate
|   |-- activate_this.py
|   |-- easy_install
|   |-- easy_install-2.5
|   |-- pip
|   `-- python
|-- include
|   `-- python2.5 -> /usr/include/python2.5
`-- lib
    `-- python2.5
        |-- ...
        |-- orig-prefix.txt
        |-- os.py -> /usr/lib/python2.5/os.py
        |-- re.py -> /usr/lib/python2.5/re.py
        |-- ...
        |-- site-packages
        |   |-- easy-install.pth
        |   |-- pip-0.6.3-py2.5.egg
        |   |-- setuptools-0.6c11-py2.5.egg
        |   `-- setuptools.pth
        |-- ...

In addition to the new python executable and the install tools you have an whole new python environment containing libraries, a site-packages/ (where your packages will be installed), a bin directory, ...

Note:
virtualenv does not create every file needed to get a whole new python environment. It uses links to global environment files instead in order to save disk space end speed up the sandbox creation. Therefore, there must already have an active python environment installed on your system.

At activation time

At this point you have to activate the sandbox in order to use your custom python. Once activated, python still has access to the global environment but will look at your sandbox first for python's modules:

$ source my-sandbox/bin/activate
(my-sandbox)$ which python
/home/you/some/where/my-sandbox/bin/python
$ echo $PATH
/home/you/some/where/my-sandbox/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
(pyver)$ python -c 'import sys;print sys.prefix;'
/home/you/some/where/my-sandbox
(pyver)$ python -c 'import sys;print "\n".join(sys.path)'
/home/you/some/where/my-sandbox/lib/python2.5/site-packages/setuptools-0.6c8-py2.5.egg
[...]
/home/you/some/where/my-sandbox
/home/you/personal/PYTHONPATH
/home/you/some/where/my-sandbox/lib/python2.5/
[...]
/usr/lib/python2.5
[...]
/home/you/some/where/my-sandbox/lib/python2.5/site-packages
[...]
/usr/local/lib/python2.5/site-packages
/usr/lib/python2.5/site-packages
[...]

First of all, a (my-sandbox) message is automatically added to your prompt in order to make it clear that you're using a python sandbox environment.

Secondly, my-sandbox/bin/ is added to your PATH. So, running python calls the specific python executable placed in my-sandbox/bin.

Note
It is possible to improve the sandbox isolation by ignoring the global paths and your PYTHONPATH (see Improve isolation section).

Installing package

It is possible to install any packages in the sandbox without any superuser privilege. For instance, we will install the pylint development revision in the sandbox.

Suppose that you have the pylint stable version already installed in your global environment:

(my-sandbox)$ deactivate
$ python -c 'from pylint.__pkginfo__ import version;print version'
0.18.0

Once your sandbox activated, install the development revision of pylint as an update:

$ source /home/you/some/where/my-sandbox/bin/activate
(my-sandbox)$ pip install -U hg+http://www.logilab.org/hg/pylint#egg=pylint-0.19

The new package and its dependencies are only installed in the sandbox:

(my-sandbox)$ python -c 'import pylint.__pkginfo__ as p;print p.version, p.__file__'
0.19.0 /home/you/some/where/my-sandbox/lib/python2.6/site-packages/pylint/__pkginfo__.pyc
(my-sandbox)$ deactivate
$ python -c 'import pylint.__pkginfo__ as p;print p.version, p.__file__'
0.18.0 /usr/lib/pymodules/python2.6/pylint/__pkginfo__.pyc

You can safely do any change in the new pylint code or in others sandboxed packages because your global environment is still unchanged.

Useful options

Improve isolation

As said before, your sandboxed python sys.path still references the global system path. You can however hide them by:

  • either use the --no-site-packages that do not give access to the global site-packages directory to the sandbox
  • or change your PYTHONPATH in my-sandbox/bin/activate in the same way as for PATH (see tips)
$ virtualenv --no-site-packages closedPy
$ sed -i '9i PYTHONPATH="$_OLD_PYTHON_PATH"
      9i export PYTHONPATH
      9i unset _OLD_PYTHON_PATH
      40i _OLD_PYTHON_PATH="$PYTHONPATH"
      40i PYTHONPATH="."
      40i export PYTHONPATH' closedPy/bin/activate
$ source closedPy/bin/activate
(closedPy)$ python -c 'import sys; print "\n".join(sys.path)'
/home/you/some/where/closedPy/lib/python2.5/site-packages/setuptools-0.6c8-py2.5.egg
/home/you/some/where/closedPy
/home/you/some/where/closedPy/lib/python2.5
/home/you/some/where/closedPy/lib/python2.5/plat-linux2
/home/you/some/where/closedPy/lib/python2.5/lib-tk
/home/you/some/where/closedPy/lib/python2.5/lib-dynload
/usr/lib/python2.5
/usr/lib64/python2.5
/usr/lib/python2.5/lib-tk
/home/you/some/where/closedPy/lib/python2.5/site-packages
$ deactivate

This way, you'll get an even more isolated sandbox, just as with a brand new python environment.

Work with different versions of Python

It is possible to dedicate a sandbox to a particular version of python by using the --python=PYTHON_EXE which specifies the interpreter that virtualenv was installed with (default is /usr/bin/python):

$ virtualenv --python=python2.4 pyver24
$ source pyver24/bin/activate
(pyver24)$ python -V
Python 2.4.6
$ deactivate
$ virtualenv --python=python2.5 pyver25
$ source pyver25/bin/activate
(pyver25)$ python -V
Python 2.5.2
$ deactivate

Distribute a sandbox

To distribute your sandbox, you must use the --relocatable option that makes an existing sandbox relocatable. This fixes up scripts and makes all .pth files relative This option should be called just before you distribute the sandbox (each time you have changed something in your sandbox).

An important point is that the host system should be similar to your own.

Tips

Speed up sandbox manipulation

Add these scripts to your .bashrc in order to help you using virtualenv and automate the creation and activation processes.

rel2abs() {
#from http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2005-01/0206.html
  [ "$#" -eq 1 ] || return 1
  ls -Ld -- "$1" > /dev/null || return
  dir=$(dirname -- "$1" && echo .) || return
  dir=$(cd -P -- "${dir%??}" && pwd -P && echo .) || return
  dir=${dir%??}
  file=$(basename -- "$1" && echo .) || return
  file=${file%??}
  case $dir in
    /) printf '%s\n' "/$file";;
    /*) printf '%s\n' "$dir/$file";;
    *) return 1;;
  esac
  return 0
}
function activate(){
    if [[ "$1" == "--help" ]]; then
        echo -e "usage: activate PATH\n"
        echo -e "Activate the sandbox where PATH points inside of.\n"
        return
    fi
    if [[ "$1" == '' ]]; then
        local target=$(pwd)
    else
        local target=$(rel2abs "$1")
    fi
    until  [[ "$target" == '/' ]]; do
        if test -e "$target/bin/activate"; then
            source "$target/bin/activate"
            echo "$target sandbox activated"
            return
        fi
        target=$(dirname "$target")
    done
    echo 'no sandbox found'
}
function mksandbox(){
    if [[ "$1" == "--help" ]]; then
        echo -e "usage: mksandbox NAME\n"
        echo -e "Create and activate a highly isaolated sandbox named NAME.\n"
        return
    fi
    local name='sandbox'
    if [[ "$1" != "" ]]; then
        name="$1"
    fi
    if [[ -e "$1/bin/activate" ]]; then
        echo "$1 is already a sandbox"
        return
    fi
    virtualenv --no-site-packages --clear --distribute "$name"
    sed -i '9i PYTHONPATH="$_OLD_PYTHON_PATH"
            9i export PYTHONPATH
            9i unset _OLD_PYTHON_PATH
           40i _OLD_PYTHON_PATH="$PYTHONPATH"
           40i PYTHONPATH="."
           40i export PYTHONPATH' "$name/bin/activate"
    activate "$name"
}
Note:
The virtualenv-commands and virtualenvwrapper projects add some very interesting features to virtualenv. So, put on eye on them for more advanced features than the above ones.

Conclusion

I found it to be irreplaceable for testing new configurations or working on projects with different dependencies. Moreover, I use it to learn about other python projects, how my project exactly interacts with its dependencies (during debugging) or to test the final user experience.

All of this stuff can be done without virtualenv but not in such an easy and secure way.

I will continue the series by introducing other useful projects to enhance your productivity : pip and Distribute. See you soon.


Astng 0.20.0 and Pylint 0.20.0 releases

2010/03/24 by Emile Anclin

We are happy to announce the Astng 0.20.0 and Pylint 0.20.0 releases.

Pylint is a static code checker based on Astng, both depending on logilab-common 0.49.

Astng

Astng 0.20.0 is a major refactoring: instead of parsing and modifying the syntax tree generated from python's _ast or compiler.ast modules, the syntax tree is rebuilt. Thus the code becomes much clearer, and all monkey patching will eventually disappear from this module.

Speed improvement is achieved by caching the parsed modules earlier to avoid double parsing, and avoiding some repeated inferences, all along fixing a lot of important bugs.

Pylint

Pylint 0.20.0 uses the new Astng, and fixes a lot of bugs too, adding some new functionality:

  • parameters with leading "_" shouldn't count as "local" variables
  • warn on assert( a, b )
  • warning if return or break inside a finally
  • specific message for NotImplemented exception

We would like to thank Chmouel Boudjnah, Johnson Fletcher, Daniel Harding, Jonathan Hartley, Colin Moris, Winfried Plapper, Edward K. Ream and Pierre Rouleau for their contributions, and all other people helping the project to progress.


pylint bugs day #2 on april 16, 2010

2010/03/22 by Sylvain Thenault

Hey guys,

we'll hold the next pylint bugs day on april 16th 2010 (friday). If some of you want to come and work with us in our Paris office, you'll be much welcome.

Else you can still join us on jabber / irc:

See you then!


PostgreSQL on windows : plpythonu and "specified module could not be found" error

2010/03/22 by Alexandre Fayolle

I recently had to (remotely) debug an issue on windows involving PostgreSQL and PL/Python. Basically, two very similar computers, with Python2.5 installed via python(x,y), PostgreSQL 8.3.8 installed via the binary installer. On the first machine create language plpythonu; worked like a charm, and on the other one, it failed with C:\\Program Files\\Postgresql\\8.3\\plpython.dll: specified module could not be found. This is caused by the dynamic linker not finding some DLL. Using Depends.exe showed that plpython.dll looks for python25.dll (the one it was built against in the 8.3.8 installer), but that the DLL was there.

I'll save the various things we tried and jump directly to the solution. After much head scratching, it turned out that the first computer had TortoiseHg installed. This caused C:\\Program Files\\TortoiseHg to be included in the System PATH environment variable, and that directory contains python25.dll. On the other hand C:\\Python25 was in the user's PATH environment variable on both computers. As the database Windows service runs using a dedicated local account (typically with login postgres), it would not have C:\\Python25 in its PATH, but if TortoiseHg was there, it would find the DLL in some other directory. So the solution was to add C:\\Python25 to the system PATH.


Launching Python scripts via Condor

2010/02/17 by Alexandre Fayolle
http://farm2.static.flickr.com/1362/1402963775_0185d2e62f.jpg

As part of an ongoing customer project, I've been learning about the Condor queue management system (actually it is more than just a batch queue management system, tacking the High-throughput computing problem, but in my current project, we're not using the full possibilities of Condor, and the choice was dictated by other considerations outside the scope of this note). The documentation is excellent, and the features of the product are really amazing (pity the project runs on Windows, and we cannot use 90% of these...).

To launch a job on a computer participating in the Condor farm, you just have to write a job file which looks like this:

Universe=vanilla
Executable=$path_to_executabe
Arguments=$arguments_to_the_executable
InitialDir=$working_directory
Log=$local_logfile_name
Output=$local_file_for_job_stdout
Error=$local_file_for_job_stderr
Queue

and then run condor_submit my_job_file and use condor_q to monitor the status your job (queued, running...)

My program is generating Condor job files and submitting them, and I've spent hours yesterday trying to understand why they were all failing : the stderr file contained a message from Python complaining that it could not import site and exiting.

A point which was not clear in the documentation I read (but I probably overlooked it) is that the executable mentionned in the job file is supposed to be a local file on the submission host which is copied to the computer running the job. In the jobs generated by my code, I was using sys.executable for the Executable field, and a path to the python script I wanted to run in the Arguments field. This resulted in the Python interpreter being copied on the execution host and not being able to run because it was not able to find the standard files it needs at startup.

Once I figured this out, the fix was easy: I made my program write a batch script which launched the Python script and changed the job to run that script.

UPDATE : I'm told there is a Transfer_executable=False line I could have put in the script to achieve the same thing.

(photo by gudi&cris licenced under CC-BY-ND)


Why you shoud get rid of os.system, os.popen, etc. in your code

2010/02/12 by Alexandre Fayolle

I regularly come across code such as:

output = os.popen('diff -u %s %s' % (appl_file, ref_file), 'r')

Code like this might well work machine but it is buggy and will fail (preferably during the demo or once shipped).

Where is the bug?

It is in the use of %s, which can inject in your command any string you want and also strings you don't want. The problem is that you probably did not check appl_file and ref_file for weird things (spaces, quotes, semi colons...). Putting quotes around the %s in the string will not solve the issue.

So what should you do? The answer is "use the subprocess module": subprocess.Popen takes a list of arguments as first parameter, which are passed as-is to the new process creation system call of your platform, and not interpreted by the shell:

pipe = subprocess.Popen(['diff', '-u', appl_file, ref_file], stdout=subprocess.PIPE)
output = pipe.stdout

By now, you should have guessed that the shell=True parameter of subprocess.Popen should not be used unless you really really need it (and even them, I encourage you to question that need).


Apycot for Mercurial

2010/02/11 by Pierre-Yves David
http://www.logilab.org/image/20439?vid=download

What is apycot

apycot is a highly extensible test automatization tool used for Continuous Integration. It can:

  • download the project from a version controlled repository (like SVN or Hg);
  • install it from scratch with all dependencies;
  • run various checkers;
  • store the results in a CubicWeb database;
  • post-process the results;
  • display the results in various format (html, xml, pdf, mail, RSS...);
  • repeat the whole procedure with various configurations;
  • get triggered by new changesets or run periodically.

For an example, take a look at the "test reports" tab of the logilab-common project.

Setting up an apycot for Mercurial

During the mercurial sprint, we set up a proof-of-concept environment running six different checkers:

  • Check syntax of all python files.
  • Check syntax of all documentation files.
  • Run pylint on the mercurial source code with the mercurial pylintrc.
  • Run the check-code.py script included in mercurial checking style and python errors
  • Run the Mercurial's test suite.
  • Run Mercurial's benchmark on a reference repository.

The first three checkers, shipped with apycot, were set up quickly. The last three are mercurial specific and required few additional tweaks to be integrated to apycot.

The bot was setup to run with all public mercurial repositories. Five checkers immediately proved useful as they pointed out some errors or warnings (on some rarely used contrib files it even found a syntax error).

Prospectives

A public instance is being set up. It will provide features that the community is looking forward to:

  • testing all python versions;
  • running pure python or the C variant;
  • code coverage of the test suite;
  • performance history.

Conclusion

apycot proved to be highly flexible and could quickly be adapted to Mercurial's test suite even for people new to apycot. The advantages of continuously running different long running tests is obvious. So apycot seems to be a very valuable tool for improving the software development process.


SCons presentation in 5 minutes

2010/02/09 by Andre Espaze
http://www.scons.org/scons-logo-transparent.png

Building software with SCons requires to have Python and SCons installed.

As SCons is only made of Python modules, the sources may be shipped with your project if your clients can not install dependencies. All the following exemples can be downloaded at the end of that blog.

A building tool for every file extension

First a Fortran 77 program will be built made of two files:

$ cd fortran-project
$ scons -Q
gfortran -o cfib.o -c cfib.f
gfortran -o fib.o -c fib.f
gfortran -o compute-fib cfib.o fib.o
$ ./compute-fib
 First 10 Fibonacci numbers:
  0.  1.  1.  2.  3.  5.  8. 13. 21. 34.

The '-Q' option tell to Scons to be less verbose. For cleaning the project, add the '-c' option:

$ scons -Qc
Removed cfib.o
Removed fib.o
Removed compute-fib

From this first example, it can been seen that SCons find the 'gfortran' tool from the file extension. Then have a look at the user's manual if you want to set a particular tool.

Describing the construction with Python objects

A second C program will directly run the execution from the SCons file by adding a test command:

$ cd c-project
$ scons -Q run-test
gcc -o test.o -c test.c
gcc -o fact.o -c fact.c
ar rc libfact.a fact.o
ranlib libfact.a
gcc -o test-fact test.o libfact.a
run_test(["run-test"], ["test-fact"])
OK

However running scons alone builds only the main program:

$ scons -Q
gcc -o main.o -c main.c
gcc -o compute-fact main.o libfact.a
$ ./compute-fact
Computing factorial for: 5
Result: 120

This second example shows that the construction dependency is described by passing Python objects. An interesting point is the possibility to add your own Python functions in the build process.

Hierarchical build with environment

A third C++ program will create a shared library used for two different programs: the main application and a test suite. The main application can be built by:

$ cd cxx-project
$ scons -Q
g++ -o main.o -c -Imbdyn-src main.cxx
g++ -o mbdyn-src/nodes.os -c -fPIC -Imbdyn-src mbdyn-src/nodes.cxx
g++ -o mbdyn-src/solver.os -c -fPIC -Imbdyn-src mbdyn-src/solver.cxx
g++ -o mbdyn-src/libmbdyn.so -shared mbdyn-src/nodes.os mbdyn-src/solver.os
g++ -o mbdyn main.o -Lmbdyn-src -lmbdyn

It shows that SCons handles for us the compilation flags for creating a shared library according to the tool (-fPIC). Moreover extra environment variables have been given (CPPPATH, LIBPATH, LIBS), which are all translated for the chosen tool. All those variables can be found in the user's manual or in the man page. The building and running of the test suite is made by giving an extra variable:

$ TEST_CMD="LD_LIBRARY_PATH=mbdyn-src ./%s" scons -Q run-tests
g++ -o tests/run_all_tests.o -c -Imbdyn-src tests/run_all_tests.cxx
g++ -o tests/test_solver.o -c -Imbdyn-src tests/test_solver.cxx
g++ -o tests/all-tests tests/run_all_tests.o tests/test_solver.o -Lmbdyn-src -lmbdyn
run_test(["tests/run-tests"], ["tests/all-tests"])
OK

Conclusion

That is rather convenient to build softwares by manipulating Python objects, moreover custom actions can be added in the process. SCons has also a configuration mechanism working like autotools macros that can be discovered in the user's manual.


Extended 256 colors in bash prompt

2010/02/07 by Nicolas Chauvat

The Mercurial 1.5 sprint is taking place in our offices this week-end and pair-programming with Steve made me want a better looking terminal. Have you seen his extravagant zsh prompt ? I used to have only 8 colors to decorate my shell prompt, but thanks to some time spent playing around, I now have 256.

Here is what I used to have in my bashrc for 8 colors:

NO_COLOUR="\[\033[0m\]"
LIGHT_WHITE="\[\033[1;37m\]"
WHITE="\[\033[0;37m\]"
GRAY="\[\033[1;30m\]"
BLACK="\[\033[0;30m\]"

RED="\[\033[0;31m\]"
LIGHT_RED="\[\033[1;31m\]"
GREEN="\[\033[0;32m\]"
LIGHT_GREEN="\[\033[1;32m\]"
YELLOW="\[\033[0;33m\]"
LIGHT_YELLOW="\[\033[1;33m\]"
BLUE="\[\033[0;34m\]"
LIGHT_BLUE="\[\033[1;34m\]"
MAGENTA="\[\033[0;35m\]"
LIGHT_MAGENTA="\[\033[1;35m\]"
CYAN="\[\033[0;36m\]"
LIGHT_CYAN="\[\033[1;36m\]"

# set a fancy prompt
export PS1="${RED}[\u@\h \W]\$${NO_COLOUR} "

Just put the following lines in your bashrc to get the 256 colors:

function EXT_COLOR () { echo -ne "\[\033[38;5;$1m\]"; }

# set a fancy prompt
export PS1="`EXT_COLOR 172`[\u@\h \W]\$${NO_COLOUR} "

Yay, I now have an orange prompt! I now need to write a script that will display useful information depending on the context. Displaying the status of the mercurial repository I am in might be my next step.


We're happy to host the mercurial Sprint

2010/02/02 by Arthur Lutz
http://farm1.static.flickr.com/183/419945378_4ead41a76d_m.jpg

We're very happy to be hosting the next mercurial sprint in our brand new offices in central Paris. It is quite an honor to be chosen when the other contender was Google.

So a bunch of mercurial developers are heading out to our offices this coming Friday to sprint for three days on mercurial. We use mercurial a lot here over at Logilab and we also contribute a tool to visualize and manipulate a mercurial repository : hgview.

To check out the things that we will be working on with the mercurial crew, check out the program of the sprint on their wiki.

What is a sprint? "A sprint (sometimes called a Code Jam or hack-a-thon) is a short time period (three to five days) during which software developers work on a particular chunk of functionality. "The whole idea is to have a focused group of people make progress by the end of the week," explains Jeff Whatcott" [source]. For geographically distributed open source communities, it is also a way of physically meeting and working in the same room for a period of time.

Sprinting is a practice that we encourage at Logilab, with CubicWeb we organize as often as possible open sprints, which is an opportunity for users and developers to come and code with us. We even use the sprint format for some internal stuff.

photo by Sebastian Mary under creative commons licence.


hgview 1.2.0 released

2010/01/21 by David Douard

Here is at last the release of the version 1.2.0 of hgview.

http://www.logilab.org/image/19894?vid=download

In a nutshell, this release includes:

  • a basic support for mq extension,
  • a basic support for hg-bfiles extension,
  • working directory is now displayed as a node of the graph (if there are local modifications of course),
  • it's now possible to display only the subtree from a given revision (a bit like hg log -f)
  • it's also possible to activate an annotate view (make navigation slower however),
  • several improvements in the graph filling and rendering mecanisms,
  • I also added toolbar icons for the search and goto "quickbars" so they are not "hidden" any more to the one reluctant to user manuals,
  • it's now possible to go directly to the common ancestor of 2 revisions,
  • when on a merge node, it's now possible to choose the parent the diff is computed against,
  • make search also search in commit messages (it used to search only in diff contents),
  • and several bugfixes of course.
Notes:
there are packages for debian lenny, squeeze and sid, and for ubuntu hardy, interpid, jaunty and karmic. However, for lenny and hardy, provided packages won't work on pure distribs since hgview 1.2 depends on mercurial 1.1. Thus for these 2 distributions, packages will only work if you have installed backported mercurial packages.

New supported repositories for Debian and Ubuntu

2010/01/21 by Arthur Lutz

For the release of hgview 1.2.0 in our Karmic Ubuntu repository, we would like to announce that we are now going to generate packages for the following distributions :

  • Debian Lenny (because it's stable)
  • Debian Sid (because it's the dev branch)
  • Ubuntu Hardy (because it has Long Term Support)
  • Ubuntu Karmic (because it's the current stable)
  • Ubuntu Lucid (because it's the next stable) - no repo yet, but soon...
http://img.generation-nt.com/ubuntulogo_0080000000420571.png

The old packages in the previously supported architectures are still accessible (etch, jaunty, intrepid), but new versions will not be generated for these repositories. Packages will be coming in as versions get released, if before that you need a package, give us a shout and we'll see what we can do.

For instructions on how to use the repositories for Ubuntu or Debian, go to the following page : http://www.logilab.org/card/LogilabDebianRepository


Open Source/Design Hardware

2009/12/13 by Nicolas Chauvat
http://www.logilab.org/image/19338?vid=download

I have been doing free software since I discovered it existed. I bought an OpenMoko some time ago, since I am interested in anything that is open, including artwork like books, music, movies and... hardware.

I just learned about two lists, one at Wikipedia and another one at MakeOnline, but Google has more. Explore and enjoy!


Solution to a common Mercurial task

2009/12/10 by David Douard

An interesting question has just been sent by Greg Ward on the Mercurial devel mailing-list (as a funny coincidence, it happened that I had to solve this problem a few days ago).

Let me quote his message:

here's my problem: imagine a customer is running software built from
changeset A, and we want to upgrade them to a new version, built from
changeset B.  So I need to know what bugs are fixed in B that were not
fixed in A.  I have already implemented a changeset/bug mapping, so I
can trivially lookup the bugs fixed by any changeset.  (It even handles
"ongoing" and "reverted" bugs in addition to "fixed".)

And he gives an example of situation where a tricky case may be found:

                +--- 75 -- 78 -- 79 ------------+
               /                                 \
              /     +-- 77 -- 80 ---------- 84 -- 85
             /     /                        /
0 -- ... -- 74 -- 76                       /
                   \                      /
                    +-- 81 -- 82 -- 83 --+

So what is the problem?

Imagine the lastest distributed stable release is built on rev 81. Now, I need to publish a new bugfix release based on this latest stable version, including every changeset that is a bugfix, but that have not yet been applied at revision 81.

So the first problem we need to solve is answering: what are the revisions ancestors of revision 85 that are not ancestor of revision 81?

Command line solution

Using hg commands, the solution is proposed by Steve Losh:

hg log --template '{rev}\n' --rev 85:0 --follow --prune 81

or better, as suggested by Matt:

hg log -q --template '{rev}\n' --rev 85:0 --follow --prune 81

The second is better since it does only read the index, and thus is much faster. But on big repositories, this command remains quite slow (with Greg's situation, a repo of more than 100000 revisions, the command takes more than 2 minutes).

Python solution

Using Python, one may think about using revlog.nodesbetween(), but it won't work as wanted here, not listing revisions 75, 78 and 79.

On the mailing list, Matt gave the most simple and efficient solution:

cl = repo.changelog
a = set(cl.ancestors(81))
b = set(cl.ancestors(85))
revs = b - a

Idea for a new extension

Using this simple python code, it should be easy to write a nice Mercurial extension (which could be named missingrevisions) to do this job.

Then, it should be interesting to also implement some filtering feature. For example, if there are simple conventions used in commit messages, eg. using something like "[fix #1245]" or "[close #1245]" in the commit message when the changeset is a fix for a bug listed in the bugtracker, then we may type commands like:

hg missingrevs REV -f bugfix

or:

hg missingrevs REV -h HEADREV -f bugfix

to find bugfix revisions ancestors of HEADREV that are not ancestors of REV.

With filters (bugfix here) may be configurables in hgrc using regexps.


pylint bug day report

2009/12/04 by Pierre-Yves David
http://farm1.static.flickr.com/85/243306920_6a12bb48c7.jpg

The first pylint bug day took place on wednesday 25th. Four members of the Logilab crew and two other people spent the day working on pylint.

Several patches submitted before the bug day were processed and some tickets were closed.

Charles Hébert added James Lingard's patches for string formatting and is working on several improvements. Vincent Férotin submitted a patch for simple message listings. Sylvain Thenault fixed significant inference bugs in astng (an underlying module of pylint managing the syntax tree). Émile Anclin began a major astng refactoring to take advantage of new python2.6 functionality. For my part, I made several improvements to the test suite. I applied James Lingard patches for ++ operator and generalised it to -- too. I also added a new checker for function call arguments submitted by James Lingard once again. Finally I improved the message filtering of the --errors-only options.

We thank Maarten ter Huurne, Vincent Férotin for their participation and of course James Lingard for submitting numerous patches.

Another pylint bug day will be held in a few months.

image under creative commons by smccann


Resume of the first Coccinelle users day

2009/11/30 by Andre Espaze

A matching and transformation tool for systems code

The Coccinelle's goal is to ease code maintenance by first revealing code smells based on design patterns and second easing an API (Application Programming Interface) change for a heavily used library. Coccinelle can thus be seen as two tools inside one. The first one matches patterns, the second applies transformations. However facing such a big problem, the project needed to define boundaries in order to increase chances of success. The building motivation was thus to target the Linux kernel. This choice has implied a tool working on the C programming language before the preprocessor step. Moreover the Linux code base adds interesing constraints as it is huge, contains many possible configurations depending on C macros, may contain many bugs and evolves a lot. What was the Coccinelle solution for easing the kernel maintenance?

http://farm1.static.flickr.com/151/398536506_57df539ccf_m.jpg

Generating diff files from the semantic patch langage

The Linux community reads lot of diff files for following the kernel evolution. As a consequence the diff file syntax is widely spread and commonly understood. However this syntax concerns a particular change between two files, its does not allow to match a generic pattern.

The Coccinelle's solution is to build its own langage allowing to declare rules describing a code pattern and a possible transformation. This langage is the Semantic Patch Langage (SmPL), based on the declarative approach of the diff file syntax. It allows to propagate a change rule to many files by generating diff files. Then those results can be directly applied by using the patch command but most of the time they will be reviewed and may be slightly adapted to the programmer's need.

A Coccinelle's rule is made of two parts: metavariable declaration and a code pattern match followed by a possible transformation. A metavariable means a control flow variable, its possibles names inside the program do not matter. Then the code pattern will describe a particular control flow in the program by using the C and SmPL syntaxes manipulating the metavariables. As a result, Coccinelle succeeds to generate diff files because it works on the C program control flow.

A complete SmPL description will not be given here because it can be found in the Coccinelle's documentation. However a brief introduction will be made on a rule declaration. The metavariable part will look like this:

@@
expression E;
constant C;
@@

'expression' means a variable or the result of a function. However 'constant' means a C constant. Then for negating the result of an and operation between an expression and a constant instead of negating the expression first, the transformation part will be:

- !E & C
+ !(E & C)

A file containing several rules like that will be called a semantic patch. It will be applied by using the Coccinelle 'spatch' command that will generate a change written in the diff file syntax each time the above pattern is matched. The next section will illustrate this way of work.

http://www.simplehelp.net/wp-images/icons/topic_linux.jpg

A working example on the Linux kernel 2.6.30

You can download and install Coccinelle 'spatch' command from its website: http://coccinelle.lip6.fr/ if you want to execute the following example. Let's first consider the following structure with accessors in the header 'device.h':

struct device {
    void *driver_data;
};

static inline void *dev_get_drvdata(const struct device *dev)
{
    return dev->driver_data;
}

static inline void dev_set_drvdata(struct device *dev, void* data)
{
    dev->driver_data = data;
}

it imitates the 2.6.30 kernel header 'include/linux/device.h'. Let's now consider the following client code that does not make use of the accessors:

#include <stdlib.h>
#include <assert.h>

#include "device.h"

int main()
{
    struct device devs[2], *dev_ptr;
    int data[2] = {3, 7};
    void *a = NULL, *b = NULL;

    devs[0].driver_data = (void*)(&data[0]);
    a = devs[0].driver_data;

    dev_ptr = &devs[1];
    dev_ptr->driver_data = (void*)(&data[1]);
    b = dev_ptr->driver_data;

    assert(*((int*)a) == 3);
    assert(*((int*)b) == 7);
    return 0;
}

Once this code saved in the file 'fake_device.c', we can check that the code compiles and runs by:

$ gcc fake_device.c && ./a.out

We will now create a semantic patch 'device_data.cocci' trying to add the getter accessor with this first rule:

@@
struct device dev;
@@
- dev.driver_data
+ dev_get_drvdata(&dev)

The 'spatch' command is then run by:

$ spatch -sp_file device_data.cocci fake_device.c

producing the following change in a diff file:

-    devs[0].driver_data = (void*)(&data[0]);
-    a = devs[0].driver_data;
+    dev_get_drvdata(&devs[0]) = (void*)(&data[0]);
+    a = dev_get_drvdata(&devs[0]);

which illustrates the great Coccinelle's way of work on program flow control. However the transformation has also matched code where the setter accessor should be used. We will thus add a rule above the previous one, the semantic patch becomes:

@@
struct device dev;
expression data;
@@
- dev.driver_data = data
+ dev_set_drvdata(&dev, data)

@@
struct device dev;
@@
- dev.driver_data
+ dev_get_drvdata(&dev)

Running the command again will produce the wanted output:

$ spatch -sp_file device_data.cocci fake_device.c
-    devs[0].driver_data = (void*)(&data[0]);
-    a = devs[0].driver_data;
+    dev_set_drvdata(&devs[0], (void *)(&data[0]));
+    a = dev_get_drvdata(&devs[0]);

It is important to write the setter rule before the getter rule else the getter rule will be applied first to the whole file.

At this point our semantic patch is still incomplete because it does not work on 'device' structure pointers. By using the same logic, let's add it to the 'device_data.cocci' semantic patch:

@@
struct device dev;
expression data;
@@
- dev.driver_data = data
+ dev_set_drvdata(&dev, data)

@@
struct device * dev;
expression data;
@@
- dev->driver_data = data
+ dev_set_drvdata(dev, data)

@@
struct device dev;
@@
- dev.driver_data
+ dev_get_drvdata(&dev)

@@
struct device * dev;
@@
- dev->driver_data
+ dev_get_drvdata(dev)

Running Coccinelle again:

$ spatch -sp_file device_data.cocci fake_device.c

will add the remaining transformations for the 'fake_device.c' file:

-    dev_ptr->driver_data = (void*)(&data[1]);
-    b = dev_ptr->driver_data;
+    dev_set_drvdata(dev_ptr, (void *)(&data[1]));
+    b = dev_get_drvdata(dev_ptr);

but a new problem appears: the 'device.h' header is also modified. We meet here an important point of the Coccinelle's philosophy described in the first section. 'spatch' is a tool to ease code maintenance by propagating a code pattern change to many files. However the resulting diff files are supposed to be reviewed and in our case the unwanted modification should be removed. Note that it would be possible to avoid the 'device.h' header modification by using SmPL syntax but the explanation would be too much for a starting tutorial. Instead, we will simply cut the unwanted part:

$ spatch -sp_file device_data.cocci fake_device.c | cut -d $'\n' -f 16-34

This result will now be kept in a diff file by moreover asking 'spatch' to produce it for the current working directory:

$ spatch -sp_file device_data.cocci -patch "" fake_device.c | \
cut -d $'\n' -f 16-34 > device_data.patch

It is now time to apply the change for getting a working C code using accessors:

$ patch -p1 < device_data.patch

The final result for 'fake_device.c' should be:

#include <stdlib.h>
#include <assert.h>

#include "device.h"

int main()
{
    struct device devs[2], *dev_ptr;
    int data[2] = {3, 7};
    void *a = NULL, *b = NULL;

    dev_set_drvdata(&devs[0], (void *)(&data[0]));
    a = dev_get_drvdata(&devs[0]);

    dev_ptr = &devs[1];
    dev_set_drvdata(dev_ptr, (void *)(&data[1]));
    b = dev_get_drvdata(dev_ptr);

    assert(*((int*)a) == 3);
    assert(*((int*)b) == 7);
    return 0;
}

Finally, we can test that the code compiles and runs:

.. sourcecode:: sh
$ gcc fake_device.c && ./a.out

The semantic patch is now ready to be used on the Linux's 2.6.30 kernel:

$ wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.30.tar.bz2
$ tar xjf linux-2.6.30.tar.bz2
$ spatch -sp_file device_data.cocci -dir linux-2.6.30/drivers/net/ \
  > device_drivers_net.patch
$ wc -l device_drivers_net.patch
642

You may also try the 'drivers/ieee1394' directory.

http://coccinelle.lip6.fr/img/lip6.jpg

Conclusion

Coccinelle is made of around 60 thousands lines of Objective Caml. As illustrated by the above example on the linux kernel, the 'spatch' command succeeds to ease code maintenance. For the Coccinelle's team working on the kernel code base, a semantic patch is usually around 100 lines and will generated diff files to sometimes hundred of files. Moreover the processing is rather fast, the average time per file is said to be 0.7s.

Two tools using the 'spatch' engine have already been built: 'spdiff' and 'herodotos'. With the first one you could almost avoid to learn the SmPL language because the idea is to generate a semantic patch by looking to transformations between files pairs. The second allows to correlate defects over software versions once the corresponding code smells have been described in SmPL.

One of the Coccinelle's problem is to not being easily extendable to another language as the engine was designed for analyzing control flows on C programs. The C++ langage may be added but required obviously lot of work. It would be great to also have such a tool on dynamic languages like Python.

image under creative commons by Rémi Vannier


pylint bug day next wednesday!

2009/11/23 by Sylvain Thenault

Remember that the first pylint bug day will be held on wednesday, november 25, from around 8am to 8pm in the Paris (France) time zone.

We'll be a few people at Logilab and hopefuly a lot of other guys all around the world, trying to make pylint better.

Join us on the #public conference room of conference.jabber.logilab.org, or if you prefer using an IRC client, join #public on irc.logilab.org which is a gateway to the jabber forum. And if you're in Paris, come to work with us in our office.

People willing to help but without knowledge of pylint internals are welcome, it's the perfect occasion to learn a lot about it, and to be able to hack on pylint in the future!


First contact with pupynere

2009/11/06 by Pierre-Yves David

I spent some time this week evaluating Pupynere, the PUre PYthon NEtcdf REader written by Roberto De Almeida. I see several advantages in pupynere.

First it's a pure Python module with no external dependency. It doesn't even depend on the NetCDF lib and it is therefore very easy to deploy.

http://www.unidata.ucar.edu/software/netcdf/netcdf1_sm.png

Second, it offers the same interface as Scientific Python's NetCDF bindings which makes transitioning from one module to another very easy.

Third pupynere is being integrated into Scipy as the scypi.io.netcdf module. Once integrated, this could ensure a wide adoption by the python community.

Finally it's easy to dig in this clear and small code base of about 600 lines. I have just sent several fixes and bug reports to the author.

http://docs.scipy.org/doc/_static/scipyshiny_small.png

However pupynere isn't mature yet. First it seems pupynere has been only used for simple cases so far. Many common cases are broken. Moreover there is no support for new NetCDF formats such as long-NetCDF and NetCDF4, and important features such as file update are still missing. In addition, The lack of a test suite is a serious issue. In my opinion, various bugs could already have been detected and fixed with simple unit tests. Contributions would be much more comfortable with the safety net offered by a test suite. I am not certain that the fixes and improvements I made this week did not introduce regressions.

To conclude, pupynere seems too young for production use. But I invite people to try it and provide feedback and fixes to the author. I'm looking forward to using this project in production in the future.


First Pylint Bug Day on Nov 25th, 2009 !

2009/10/21 by Sylvain Thenault
http://www.logilab.org/image/18785?vid=download

Since we don't stop being overloaded here at Logilab, and we've got some encouraging feedback after the "Pylint needs you" post, we decided to take some time to introduce more "community" in pylint.

And the easiest thing to do, rather sooner than later, is a irc/jabber synchronized bug day, which will be held on Wednesday november 25. We're based in France, so main developpers will be there between around 8am and 19pm UTC+1. If a few of you guys are around Paris at this time and wish to come at Logilab to sprint with us, contact us and we'll try to make this possible.

The focus for this bug killing day could be:

  • using logilab.org tracker : getting an account, submitting tickets, triaging existing tickets...
  • using mercurial to develop pylint / astng
  • guide people in the code so they're able to fix simple bugs

We will of course also try to kill a hella-lotta bugs, but the main idea is to help whoever wants to contribute to pylint... and plan for the next bug-killing day !

As we are in the process of moving to another place, we can't organize a sprint yet, but we should have some room available for the next time, so stay tuned :)


Projman 0.14.0 includes a Graphical User Interface

2009/10/19 by Emile Anclin

Introduction

Projman is a project manager. With projman 0.14.0, the first sketch of a GUI has been updated, and important functionalities added. You can now easily see and edit task dependencies and test the resulting scheduling. Furthermore, a begin-after-end-previous constraint has been added which should really simplify the edition of the scheduling.

The GUI can be used the two following ways:

$ projman-gui
$ projman-gui <path/to/project.xml>

The file <path/to/project.xml> is the well known main file of a projman project. Starting projman-gui with no project.xml specified, or after opening a project, you can open an existing project simply with "File->Open". (For now, you can't create a new project with projman-gui.) You can edit the tasks and then save the modifications to the task file with "File->Save".

http://www.logilab.org/image/18731?vid=download

The Project tab

The Project tab shows simply the four needed files of a projman project for resources, activities, tasks and schedule.

Resources

The Resources tab presents the different resources:

  • human resources
  • resource roles describing the different roles that resources can play
  • Different calendars for different resources with their "offdays"

Activities

For now, the Activities tab is not implemented. It should show the planning of the activities for each resource and the progress of the project.

Tasks

The Tasks tab is for now the most important one; it shows a tree view of the task hierarchy, and for each task:

  • the title of the task,
  • the role for that task,
  • the load (time in days),
  • the scheduling type,
  • the list of the constraints for the scheduling,
  • and the description of the task,

each of which can be edited. You easily can drag and drop tasks inside the task tree and add and delete tasks and constraints.

See the attached screenshot of the projman-gui task panel.

Scheduling

In the Scheduling tab you can simply test your scheduling by clicking "START". If you expect the scheduling to take a longer time, you can modify the maximum time of searching a solution.

Known bugs

  • The begin-after-end-previous constraint does not work for a task having subtasks.
  • Deleting a task doesn't check for depending tasks, so scheduling won't work anymore.

hgview 1.1.0 released

2009/09/25 by David Douard

I am pleased to announce the latest release of hgview 1.1.0.

What is it?

For the ones from the back of the classroom near the radiator, let me remind you that hgview is a very helpful tool for daily work using the excellent DVCS Mercurial (which we heavily use at Logilab). It allows to easily and visually navigate your hg repository revision graphlog. It is written in Python and pyqt.

http://www.logilab.org/image/18210?vid=download

What's new

  • user can now configure colors used in the diff area (and they now defaults to white on black)
  • indicate current working directory position by a square node
  • add many other configuration options (listed when typing hg help hgview)
  • removed 'hg hgview-options' command in favor of 'hg help hgview'
  • add ability to choose which parent to diff with for merge nodes
  • dramatically improved UI behaviour (shortcuts)
  • improved help and make it accessible from the GUI
  • make it possible not to display the diffstat column of the file list (which can dramatically improve performances on big repositories)
  • standalone application: improved command line options
  • indicate working directory position in the graph
  • add auto-reload feature (when the repo is modified due to a pull, a commit, etc., hgview detects it, reloads the repo and updates the graph)
  • fix many bugs, especially the file log navigator should now display the whole graph

Download and installation

The source code is available as a tarball, or using our public hg repository of course.

To use it from the sources, you just have to add a line in your .hgrc file, in the [extensions] section:

hgext.hgview=/path/to/hgview/hgext/hgview.py

Debian and Ubuntu users can also easily install hgview (and Logilab other free software tools) using our deb package repositories.


Using tempfile.mkstemp correctly

2009/09/10 by Alexandre Fayolle

The mkstemp function in the tempfile module returns a tuple of 2 values:

  • an OS-level handle to an open file (as would be returned by os.open())
  • the absolute pathname of that file.

I often see code using mkstemp only to get the filename to the temporary file, following a pattern such as:

from tempfile import mkstemp
import os

def need_temp_storage():
    _, temp_path = mkstemp()
    os.system('some_commande --output %s' % temp_path)
    file = open(temp_path, 'r')
    data = file.read()
    file.close()
    os.remove(temp_path)
    return data

This seems to be working fine, but there is a bug hiding in there. The bug will show up on Linux if you call this functions many time in a long running process, and on the first call on Windows. We have leaked a file descriptor.

The first element of the tuple returned by mkstemp is typically an integer used to refer to a file by the OS. In Python, not closing a file is usually no big deal because the garbage collector will ultimately close the file for you, but here we are not dealing with file objects, but with OS-level handles. The interpreter sees an integer and has no way of knowing that the integer is connected to a file. On Linux, calling the above function repeatedly will eventually exhaust the available file descriptors. The program will stop with:

IOError: [Errno 24] Too many open files: '/tmp/tmpJ6g4Ke'

On Windows, it is not possible to remove a file which is still opened by another process, and you will get:

Windows Error [Error 32]

Fixing the above function requires closing the file descriptor using os.close_():

from tempfile import mkstemp
import os

def need_temp_storage():
    fd, temp_path = mkstemp()
    os.system('some_commande --output %s' % temp_path)
    file = open(temp_path, 'r')
    data = file.read()
    file.close()
    os.close(fd)
    os.remove(temp_path)
    return data

If you need your process to write directly in the temporary file, you don't need to call os.write_(fd, data). The function os.fdopen_(fd) will return a Python file object using the same file descriptor. Closing that file object will close the OS-level file descriptor.


You can now register on our sites

2009/09/03 by Arthur Lutz

With the new version of CubicWeb deployed on our "public" sites, we would like to welcome a new (much awaited) functionality : you can now register directly on our websites. Getting an account with give you access to a bunch of functionalities :

http://farm1.static.flickr.com/53/148921611_eadce4f5f5_m.jpg
  • registering to a project's activity with get you automated email reports of what is happening on that project
  • you can directly add tickets on projects instead of talking about it on the mailing lists
  • you can bookmark content
  • tag stuff
  • and much more...

This is also a way of testing out the CubicWeb framework (in this case the forge cube) which you can take home and host yourself (debian recommended). Just click on the "register" link on the top right, or here.

Photo by wa7son under creative commons.


New pylint/astng release, but... pylint needs you !

2009/08/27 by Sylvain Thenault

After several months with no time to fix/enhance pylint beside answering email and filing tickets, I've finally tackled some tasks yesterday night to publish bug fixes releases ([1] and [2]).

The problem is that we don't have enough free time at Logilab to lower the number of tickets in pylint tracker page . If you take a look at the ticket tab, you'll see a lot of pendings bug and must-have features (well, and some other less necessary...). You can already easily contribute thanks to the great mercurial dvcs, and some of you do, either by providing patches or by reporting bugs (more tickets, iiirk ! ;) Thank you all btw !!

Now I was wondering what could be done to make pylint going further, and the first ideas which came to my mind was :

  • do ~3 days sprint
  • do some 'tickets killing' days, as done in some popular oss projects

But for this to be useful, we need your support, so here are some questions for you:

  • would you come to a sprint at Logilab (in Paris, France), so you can meet us, learn a lot about pylint, and work on tickets you wish to have in pylint?
  • if France is too far away for most people, would you have another location to propose?
  • would you be on jabber for a tickets killing day, providing it's ok with your agenda? if so, what's your knowledge of pylint/astng internals?

you may answer by adding a comment to this blog (please register first by using the link at the top right of this page) or by mail to sylvain.thenault@logilab.fr. If we've enough positive answers, we'll take the time to organize such a thing.


Looking for a Windows Package Manager

2009/07/31 by Nicolas Chauvat
http://www.logilab.org/image/9862?vid=download

As said in a previous article, I am convinced that part of the motivation for making package sub-systems like the Python one, which includes distutils, setuptools, etc, is that Windows users and Mac users never had the chance to use a tool that properly manages the configuration of their computer system. They just do not know what it would be like if they had at least a good package management system and do not miss it in their daily work.

I looked for Windows package managers that claim to provide features similar to Debian's dpkg+apt-get and here is what I found in alphabetical order.

AppSnap

AppSnap is written in Python and uses wxPython, PyCurl and PyYAML. It is packaged using Py2Exe, compressed with UPX and installed using NSIS.

It has not seen activity in the svn or on its blog since the end of 2008.

Appupdater

Appupdater provides functionality similar to apt-get or yum. It automates the process of installing and maintaining up to date versions of programs. It claims to be fully customizable and is licensed under the GPL.

It seems under active development at SourceForge.

QWinApt

QWinApt is a Synaptic clone written in C# that has not evolved since september 2007.

WinAptic

WinAptic is another Synaptic clone written this time in Pascal that has not evolved since the end of 2007.

Win-Get

Win-get is an automated install system and software repository for Microsoft Windows. It is similar to apt-get: it connects to a link repository, finds an application and downloads it before performing the installation routine (silent or standard) and deleting the install file.

It is written in pascal and is set up as a SourceForge project, but not much has been done lately.

WinLibre

WinLibre is a Windows free software distribution that provides a repository of packages and a tool to automate and simplify their installation.

WinLibre was selected for Google Summer of Code 2009.

ZeroInstall

ZeroInstall started as a "non-admin" package manager for Linux distributions and is now extending its reach to work on windows.

Conclusion

I have not used any of these tools, the above is just the result of some time spent searching the web.

A more limited approach is to notify the user of the newer versions:

  • App-Get will show you a list of your installed Applications. When an update is available for one of them, it will highlighted and you will be able to update the specific applications in seconds.
  • GetIt is not an application-getter/installer. When you want to install a program, you can look it up in GetIt to choose which program to install from a master list of all programs made available by the various apt-get clones.

The appupdater project also compares itself to the programs automating the installation of software on Windows.

Some columists expect the creation of application stores replicating the iPhone one.

I once read about a project to get the Windows kernel into the Debian distribution, but can not find any trace of it... Remember that Debian is not limited to the Linux kernel, so why not think about a very improbable apt-get install windows-vista ?


The Configuration Management Problem

2009/07/31 by Nicolas Chauvat
http://www.logilab.org/image/9863?vid=download

Today I felt like summing up my opinion on a topic that was discussed this year on the Python mailing lists, at PyCon-FR, at EuroPython and EuroSciPy... packaging software! Let us discuss the two main use cases.

The first use case is to maintain computer systems in production. A trait of production systems, is that they can not afford failures and are often deployed on a large scale. It leaves little room for manually fixing problems. Either the installation process works or the system fails. Reaching that level of quality takes a lot of work.

The second use case is to facilitate the life of software developers and computer users by making it easy for them to give a try to new pieces of software without much work.

The first use case has to be addressed as a configuration management problem. There is no way around it. The best way I know of managing the configuration of a computer system is called Debian. Its package format and its tool chain provide a very extensive and efficient set of features for system development and maintenance. Of course it is not perfect and there are missing bits and open issues that could be tackled, like the dependencies between hardware and software. For example, nothing will prevent you from installing on your Debian system a version of a driver that conflicts with the version of the chip found in your hardware. That problem could be solved, but I do not think the Debian project is there yet and I do not count it as a reason to reject Debian since I have not seen any other competitor at the level as Debian.

The second use case is kind of a trap, for it concerns most computer users and most of those users are either convinced the first use case has nothing in common with their problem or convinced that the solution is easy and requires little work.

The situation is made more complicated by the fact that most of those users never had the chance to use a system with proper package management tools. They simply do not know the difference and do not feel like they are missing when using their system-that-comes-with-a-windowing-system-included.

Since many software developers have never had to maintain computer systems in production (often considered a lower sysadmin job) and never developed packages for computer systems that are maintained in production, they tend to think that the operating system and their software are perfectly decoupled. They have no problem trying to create a new layer on top of existing operating systems and transforming an operating system issue (managing software installation) into a programming langage issue (see CPAN, Python eggs and so many others).

Creating a sub-system specific to a language and hosting it on an operating system works well as long as the language boundary is not crossed and there is no competition between the sub-system and the system itself. In the Python world, distutils, setuptools, eggs and the like more or less work with pure Python code. They create a square wheel that was made round years ago by dpkg+apt-get and others, but they help a lot of their users do something they would not know how to do another way.

A wall is quickly hit though, as the approach becomes overly complex as soon as they try to depend on things that do not belong to their Python sub-system. What if your application needs a database? What if your application needs to link to libraries? What if your application needs to reuse data from or provide data to other applications? What if your application needs to work on different architectures?

The software developers that never had to maintain computer systems in production wish these tasks were easy. Unfortunately they are not easy and cannot be. As I said, there is no way around configuration management for the one who wants a stable system. Configuration management requires both project management work and software development work. One can have a system where packaging software is less work, but that comes at the price of stability and reduced functionnality and ease of maintenance.

Since none of the two use cases will disappear any time soon, the only solution to the problem is to share as much data as possible between the different tools and let each one decide how to install software on his computer system.

Some links to continue your readings on the same topic:


EuroSciPy'09 (part 1/2): The Need For Speed

2009/07/29 by Nicolas Chauvat
http://www.logilab.org/image/9852?vid=download

The EuroSciPy2009 conference was held in Leipzig at the end of July and was sponsored by Logilab and other companies. It started with three talks about speed.

Starving CPUs

In his keynote, Fransesc Alted talked about starving CPUs. Thirty years back, memory and CPU frequencies where about the same. Memory speed kept up for about ten years with the evolution of CPU speed before falling behind. Nowadays, memory is about a hundred times slower than the cache which is itself about twenty times slower than the CPU. The direct consequence is that CPUs are starving and spend many clock cycles waiting for data to process.

In order to improve the performance of programs, it is now required to know about the multiple layers of computer memory, from disk storage to CPU. The common architecture will soon count six levels: mechanical disk, solid state disk, ram, cache level 3, cache level 2, cache level 1.

Using optimized array operations, taking striding into account, processing data blocks of the right size and using compression to diminish the amount of data that is transfered from one layer to the next are four techniques that go a long way on the road to high performance. Compression algorithms like Blosc increase throughput for they strike the right balance between being fast and providing good compression ratios. Blosc compression will soon be available in PyTables.

Fransesc also mentions the numexpr extension to numpy, and its combination with PyTables named tables.Expr, that nicely and easily accelerates the computation of some expressions involving numpy arrays. In his list of references, Fransesc cites Ulrich Drepper article What every programmer should know about memory.

Using PyPy's JIT for science

Maciej Fijalkowski started his talk with a general presentation of the PyPy framework. One uses PyPy to describe an interpreter in RPython, then generate the actual interpreter code and its JIT.

Since PyPy is has become more of a framework to write interpreters than a reimplementation of Python in Python, I suggested to change its misleading name to something like gcgc the Generic Compiler for Generating Compilers. Maciej answered that there are discussions on the mailing list to split the project in two and make the implementation of the Python interpreter distinct from the GcGc framework.

Maciej then focused his talk on his recent effort to rewrite in RPython the part of numpy that exposes the underlying C library to Python. He says the benefits of using PyPy's JIT to speedup that wrapping layer are already visible. He has details on the PyPy blog. Gaël Varoquaux added that David Cournapeau has started working on making the C/Python split in numpy cleaner, which would further ease the job of rewriting it in RPython.

CrossTwine Linker

Damien Diederen talked about his work on CrossTwine Linker and compared it with the many projects that are actively attacking the problem of speed that dynamic and interpreted languages have been dragging along for years. Parrot tries to be the über virtual machine. Psyco offers very nice acceleration, but currently only on 32bits system. PyPy might be what he calls the Right Approach, but still needs a lot of work. Jython and IronPython modify the language a bit but benefit from the qualities of the JVM or the CLR. Unladen Swallow is probably the one that's most similar to CrossTwine.

CrossTwine considers CPython as a library and uses a set of C++ classes to generate efficient interpreters that make calls to CPython's internals. CrossTwine is a tool that helps improving performance by hand-replacing some code paths with very efficient code that does the same operations but bypasses the interpreter and its overhead. An interpreter built with CrossTwine can be viewed as a JIT'ed branch of the official Python interpreter that should be feature-compatible (and bug-compatible) with CPython. Damien calls he approach "punching holes in C substrate to get more speed" and says it could probably be combined with Psyco for even better results.

CrossTwine works on 64bit systems, but it is not (yet?) free software. It focuses on some use cases to greatly improve speed and is not to be considered a general purpose interpreter able to make any Python code faster.

More readings

Cython is a language that makes writing C extensions for the Python language as easy as Python itself. It replaces the older Pyrex.

The SciPy2008 conference had at least two papers talking about speeding Python: Converting Python Functions to Dynamically Compiled C and unPython: Converting Python Numerical Programs into C.

David Beazley gave a very interesting talk in 2009 at a Chicago Python Users group meeting about the effects of the GIL on multicore machines.

I will continue my report on the conference with the second part titled "Applications And Open Questions".


Logilab at OSCON 2009

2009/07/28 by Sandrine Ribeau
http://assets.en.oreilly.com/1/event/27/oscon2009_oscon_11_years.gif

OSCON, Open Source CONvention, takes place every year and promotes Open Source for technology. It is one of the meeting hubs for the growing open source community. This was the occasion for us to learn about new projects and to present CubicWeb during a BAYPIGgies meeting hosted by OSCON.

http://www.openlina.com/templates/rhuk_milkyway/images/header_red_left.png

I had the chance to talk with some of the folks working at OpenLina where they presented LINA. LINA is a thin virtual layer that enables developers to write and compile code using ordinary Linux tools, then package that code into a single executable that runs on a variety of operating systems. LINA runs invisibly in the background, enabling the user to install and run LINAfied Linux applications as if they were native to that user's operating system. They were curious about CubicWeb and took as a challenge to package it with LINA... maybe soon on LINA's applications list.

Two open sources projects catched my attention as potential semantic data publishers. The first one is Family search where they provide a tool to search for family history and genealogy. Also they are working to define a standard format to exchange citation with Open Library. Democracy Lab provide an application to collect votes and build geographic statitics based on political interests. They will at some point publish data semantically so that their application data could be consumed.

It also was for us the occasion of introducing CubicWeb to the BayPIGgies folks. The same presentation as the one held at Europython 2009. I'd like to take the opportunity to answer a question I did not manage to answer at that time. The question was: how different is CubicWeb from Freebase Parallax in terms of interface and views filters? Before answering this question let's detail what Freebase Parallax is.

Freebase Parallax provides a new way to browse and explore data in Freebase. It allows to browse data from a set of data to a related set of data. This interface enables to aggregate visualization. For instance, given the set of US presidents, different types of views could be applied, such as a timeline view, where the user could set up which start and end date to use to draw the timeline. So generic views (which applies to any data) are customizable by the user.

http://res.freebase.com/s/f64a2f0cc4534b2b17140fd169cee825a7ed7ddcefe0bf81570301c72a83c0a8/resources/images/freebase-logo.png

The search powered by Parallax is very similar to CubicWeb faceted search, except that Parallax provides the user with a list of suggested filters to add in addition to the default one, the user can even remove a filter. That is something we could think about for CubicWeb: provide a generated faceted search so that the user could decide which filters to choose.

Parallax also provides related topics to the current data set which ease navigation between sets of data. The main difference I could see with the view filter offered by Parallax and CubicWeb is that Parallax provides the same views to any type of data whereas CubicWeb has specific views depending on the data type and generic views that applies to any type of data. This is a nice Web interface to browse data and it could be a good source of inspiration for CubicWeb.

http://www.zgeek.com/forum/gallery/files/6/3/2/img_228_96x96.jpg

During this talk, I mentionned that CubicWeb now understands SPARQL queries thanks to the fyzz parser.


Quizz WolframAlpha

2009/07/10 by Nicolas Chauvat
http://www.logilab.org/image/9609?vid=download

Wolfram Alpha is a web front-end to huge database of information covering very different topics ranging from mathematical functions to genetics, geography, astronomy, etc.

When you search for a word, it will try to match it with one of the objects it as in its database and display all the information it has concerning that object. For example it can tell you a lot about the Halley Comet, including where it is at the moment you ask the query. This is the main difference with, say Wikipedia, that will know a lot about that comet in general, but is not meant to compute its location in the sky at the moment you enter your query.

Searches are not limited to words. One can key in commands like weather in Paris in june 2009 or x^2+sin(x) and get results for those precise queries. The processing of the input query is far from bad, since it returns results to questions like what are the cities of France, but I would not call it state of the art natural language processing since that query returns the largest cities instead of just the cities it knows about and the question what are the smallest cities of France will not return any result. Natural language processing is a very difficult problem, though, especially when done in the open world as it is the case there with a engine available to the wide public on the internet.

For more examples, visit the WolframAlpha website, where you will also be able to post feature requests or, if you are a developer, get documentation about the WolframAlpha API and maybe use it as a web service in your application when you need to answer certain types of questions.


EuroPython 2009

2009/07/06 by Nicolas Chauvat
http://www.logilab.org/image/9580?vid=download

Once again Logilab sponsored the EuroPython conference. We would like to thank the organization team (especially John Pinner and Laura Creighton) for their hard work. The Conservatoire is a very central location in Birmingham and walking around the city center and along the canals was nice. The website was helpful when preparing the trip and made it easy to find places where to eat and stay. The conference program was full of talks about interesting topics.

I presented CubicWeb and spent a large part of my talk explaining what is the semantic web and what features we need in the tools we will use to be part of that web of data. I insisted on the fact that CubicWeb is made of two parts, the web engine and the data repository, and that the repository can be used without the web engine. I demonstrated this with a TurboGears application that used the CubicWeb repository as its persistence layer. RQL in TurboGears! See my slides and Reinout Van Rees' write-up.

Christian Tismer took over the development of Psyco a few months ago. He said he recently removed some bugs that were show stoppers, including one that was generating way too many recompilations. His new version looks very promising. Performance improved, long numbers are supported, 64bit support may become possible, generators work... and Stackless is about to be rebuilt on top of Psyco! Psyco 2.0 should be out today.

I had a nice chat with Cosmin Basca about the Semantic Web. He suggested using Mako as a templating language for CubicWeb. Cosmin is doing his PhD at DERI and develops SurfRDF which is an Object-RDF mapper that wraps a SPARQL endpoint to provide "discoverable" objects. See his slides and Reinout Van Rees' summary of his talk.

I saw a lightning talk about the Nagare framework which refuses to use templating languages, for the same reason we do not use them in CubicWeb. Is their h.something the right way of doing things? The example reminds me of the C++ concatenation operator. I am not really convinced with the continuation idea since I have been for years a happy user of the reactor model that's implemented in frameworks liked Twisted. Read the blog and documentation for more information.

I had a chat with Jasper Op de Coul about Infrae's OAI Server and the work he did to manage RDF data in Subversion and a relational database before publishing it within a web app based on YUI. We commented code that handles books and library catalogs. Part of my CubicWeb demo was about books in DBpedia and cubicweb-book. He gave me a nice link to the WorldCat API.

Souheil Chelfouh showed me his work on Dolmen and Menhir. For several design problems and framework architecture issues, we compared the solutions offered by the Zope Toolkit library with the ones found by CubicWeb. I will have to read more about Martian and Grok to make sure I understand the details of that component architecture.

I had a chat with Martijn Faassen about packaging Python modules. A one sentence summary would be that the Python community should agree on a meta-data format that describes packages and their dependencies, then let everyone use the tool he likes most to manage the installation and removal of software on his system. I hope the work done during the last PyConUS and led by Tarek Ziadé arrived at the same conclusion. Read David Cournapeau's blog entry about Python Packaging for a detailed explanation of why the meta-data format is the way to go. By the way, Martijn is the lead developer of Grok and Martian.

Godefroid Chapelle and I talked a lot about Zope Toolkit (ZTK) and CubicWeb. We compared the way the two frameworks deal with pluggable components. ZTK has adapters and a registry. CubicWeb does not use adapters as ZTK does, but has a view selection mechanism that required a registry with more features than the one used in ZTK. The ZTK registry only has to match a tuple (Interface, Class) when looking for an adapter, whereas CubicWeb's registry has to find the views that can be applied to a result set by checking various properties:

  • interfaces: all items of first column implement the Calendar Interface,
  • dimensions: more than one line, more than two columns,
  • types: items of first column are numbers or dates,
  • form: form contains key XYZ that has a value lower than 10,
  • session: user is authenticated,
  • etc.

As for Grok and Martian, I will have to look into the details to make sure nothing evil is hinding there. I should also find time to compare zope.schema and yams and write about it on this blog.

And if you want more information about the conference:


Semantic web technology conference 2009

2009/06/17 by Sandrine Ribeau
The semantic web technology conference is taking place every year in San Jose, California. It is meant to be the world's symposium on the business of semantic technologies. Essentially here we discuss about semantic search, how to improve access to the data and how we make sense of structured, but mainly unstructured content. Some exhibitors were more NLP oriented, concepts extraction (such as SemanticV), others were more focused on providing a scalable storage (essentially RDF storage). Most of the solutions includes a data aggregator/unifier in order to combine multi-sources data into a single storage from which ontologies could be defined. Then on top of that is the enhanced search engine. They concentrate on internal data within the enterprise and not that much about using the Web as a resource. For those who built a web application on top of the data, they choosed Flex as their framework (Metatomix).
From all the exhibitors, the ones that kept my attention were The Anzo suite (open source project), ORDI and Allegrograph RDF store.
Developped by Cambridge Semantics, in Java, Anzo suite, especially, Anzo on the web and Anzo collaboration server, is the closest tools to CubicWeb, providing a multi source data server and an AJAX/HTML interface to develop semantic web applications, customize views of the data using a templating language. It is available in open source. The feature that I think was interesting is an assistant to load data into their application that then helps the users define the data model based on that data. The internal representation of the content is totally transparent to the user, types are inferred by the application, as well as relations.
RDF Resource Description Framework IconI did not get a demo of ORDI, but it was just mentionned to me as an open source equivalent to CubicWeb, which I am not too sure about after looking at their web site. It does data integration into RDF.
Allegrograph RDF store is a potential candidate for another source type in CubicWeb . It is already supported by Jena and Sesame framework. They developped a Python client API to attract pythonist in the Java world.
They all agreed on one thing : the use of SPARQL should be the standard query language. I quickly heard about Knowledge Interface Format (KIF) which seems to be an interesting representation of knowledge used for multi-lingual applications. If there was one buzz word to recall from the conference, I would choose ontology :)