subscribe to this blog

Logilab.org - en

News from Logilab and our Free Software projects, as well as on topics dear to our hearts (Python, Debian, Linux, the semantic web, scientific computing...)

Apycot for Mercurial

2010/02/11 by Pierre-Yves David
http://www.logilab.org/image/20439?vid=download

What is apycot

apycot is a highly extensible test automatization tool used for Continuous Integration. It can:

  • download the project from a version controlled repository (like SVN or Hg);
  • install it from scratch with all dependencies;
  • run various checkers;
  • store the results in a CubicWeb database;
  • post-process the results;
  • display the results in various format (html, xml, pdf, mail, RSS...);
  • repeat the whole procedure with various configurations;
  • get triggered by new changesets or run periodically.

For an example, take a look at the "test reports" tab of the logilab-common project.

Setting up an apycot for Mercurial

During the mercurial sprint, we set up a proof-of-concept environment running six different checkers:

  • Check syntax of all python files.
  • Check syntax of all documentation files.
  • Run pylint on the mercurial source code with the mercurial pylintrc.
  • Run the check-code.py script included in mercurial checking style and python errors
  • Run the Mercurial's test suite.
  • Run Mercurial's benchmark on a reference repository.

The first three checkers, shipped with apycot, were set up quickly. The last three are mercurial specific and required few additional tweaks to be integrated to apycot.

The bot was setup to run with all public mercurial repositories. Five checkers immediately proved useful as they pointed out some errors or warnings (on some rarely used contrib files it even found a syntax error).

Prospectives

A public instance is being set up. It will provide features that the community is looking forward to:

  • testing all python versions;
  • running pure python or the C variant;
  • code coverage of the test suite;
  • performance history.

Conclusion

apycot proved to be highly flexible and could quickly be adapted to Mercurial's test suite even for people new to apycot. The advantages of continuously running different long running tests is obvious. So apycot seems to be a very valuable tool for improving the software development process.


SCons presentation in 5 minutes

2010/02/09 by Andre Espaze
http://www.scons.org/scons-logo-transparent.png

Building software with SCons requires to have Python and SCons installed.

As SCons is only made of Python modules, the sources may be shipped with your project if your clients can not install dependencies. All the following exemples can be downloaded at the end of that blog.

A building tool for every file extension

First a Fortran 77 program will be built made of two files:

$ cd fortran-project
$ scons -Q
gfortran -o cfib.o -c cfib.f
gfortran -o fib.o -c fib.f
gfortran -o compute-fib cfib.o fib.o
$ ./compute-fib
 First 10 Fibonacci numbers:
  0.  1.  1.  2.  3.  5.  8. 13. 21. 34.

The '-Q' option tell to Scons to be less verbose. For cleaning the project, add the '-c' option:

$ scons -Qc
Removed cfib.o
Removed fib.o
Removed compute-fib

From this first example, it can been seen that SCons find the 'gfortran' tool from the file extension. Then have a look at the user's manual if you want to set a particular tool.

Describing the construction with Python objects

A second C program will directly run the execution from the SCons file by adding a test command:

$ cd c-project
$ scons -Q run-test
gcc -o test.o -c test.c
gcc -o fact.o -c fact.c
ar rc libfact.a fact.o
ranlib libfact.a
gcc -o test-fact test.o libfact.a
run_test(["run-test"], ["test-fact"])
OK

However running scons alone builds only the main program:

$ scons -Q
gcc -o main.o -c main.c
gcc -o compute-fact main.o libfact.a
$ ./compute-fact
Computing factorial for: 5
Result: 120

This second example shows that the construction dependency is described by passing Python objects. An interesting point is the possibility to add your own Python functions in the build process.

Hierarchical build with environment

A third C++ program will create a shared library used for two different programs: the main application and a test suite. The main application can be built by:

$ cd cxx-project
$ scons -Q
g++ -o main.o -c -Imbdyn-src main.cxx
g++ -o mbdyn-src/nodes.os -c -fPIC -Imbdyn-src mbdyn-src/nodes.cxx
g++ -o mbdyn-src/solver.os -c -fPIC -Imbdyn-src mbdyn-src/solver.cxx
g++ -o mbdyn-src/libmbdyn.so -shared mbdyn-src/nodes.os mbdyn-src/solver.os
g++ -o mbdyn main.o -Lmbdyn-src -lmbdyn

It shows that SCons handles for us the compilation flags for creating a shared library according to the tool (-fPIC). Moreover extra environment variables have been given (CPPPATH, LIBPATH, LIBS), which are all translated for the chosen tool. All those variables can be found in the user's manual or in the man page. The building and running of the test suite is made by giving an extra variable:

$ TEST_CMD="LD_LIBRARY_PATH=mbdyn-src ./%s" scons -Q run-tests
g++ -o tests/run_all_tests.o -c -Imbdyn-src tests/run_all_tests.cxx
g++ -o tests/test_solver.o -c -Imbdyn-src tests/test_solver.cxx
g++ -o tests/all-tests tests/run_all_tests.o tests/test_solver.o -Lmbdyn-src -lmbdyn
run_test(["tests/run-tests"], ["tests/all-tests"])
OK

Conclusion

That is rather convenient to build softwares by manipulating Python objects, moreover custom actions can be added in the process. SCons has also a configuration mechanism working like autotools macros that can be discovered in the user's manual.


Extended 256 colors in bash prompt

2010/02/07 by Nicolas Chauvat

The Mercurial 1.5 sprint is taking place in our offices this week-end and pair-programming with Steve made me want a better looking terminal. Have you seen his extravagant zsh prompt ? I used to have only 8 colors to decorate my shell prompt, but thanks to some time spent playing around, I now have 256.

Here is what I used to have in my bashrc for 8 colors:

NO_COLOUR="\[\033[0m\]"
LIGHT_WHITE="\[\033[1;37m\]"
WHITE="\[\033[0;37m\]"
GRAY="\[\033[1;30m\]"
BLACK="\[\033[0;30m\]"

RED="\[\033[0;31m\]"
LIGHT_RED="\[\033[1;31m\]"
GREEN="\[\033[0;32m\]"
LIGHT_GREEN="\[\033[1;32m\]"
YELLOW="\[\033[0;33m\]"
LIGHT_YELLOW="\[\033[1;33m\]"
BLUE="\[\033[0;34m\]"
LIGHT_BLUE="\[\033[1;34m\]"
MAGENTA="\[\033[0;35m\]"
LIGHT_MAGENTA="\[\033[1;35m\]"
CYAN="\[\033[0;36m\]"
LIGHT_CYAN="\[\033[1;36m\]"

# set a fancy prompt
export PS1="${RED}[\u@\h \W]\$${NO_COLOUR} "

Just put the following lines in your bashrc to get the 256 colors:

function EXT_COLOR () { echo -ne "\[\033[38;5;$1m\]"; }

# set a fancy prompt
export PS1="`EXT_COLOR 172`[\u@\h \W]\$${NO_COLOUR} "

Yay, I now have an orange prompt! I now need to write a script that will display useful information depending on the context. Displaying the status of the mercurial repository I am in might be my next step.


We're happy to host the mercurial Sprint

2010/02/02 by Arthur Lutz
http://farm1.static.flickr.com/183/419945378_4ead41a76d_m.jpg

We're very happy to be hosting the next mercurial sprint in our brand new offices in central Paris. It is quite an honor to be chosen when the other contender was Google.

So a bunch of mercurial developers are heading out to our offices this coming Friday to sprint for three days on mercurial. We use mercurial a lot here over at Logilab and we also contribute a tool to visualize and manipulate a mercurial repository : hgview.

To check out the things that we will be working on with the mercurial crew, check out the program of the sprint on their wiki.

What is a sprint? "A sprint (sometimes called a Code Jam or hack-a-thon) is a short time period (three to five days) during which software developers work on a particular chunk of functionality. "The whole idea is to have a focused group of people make progress by the end of the week," explains Jeff Whatcott" [source]. For geographically distributed open source communities, it is also a way of physically meeting and working in the same room for a period of time.

Sprinting is a practice that we encourage at Logilab, with CubicWeb we organize as often as possible open sprints, which is an opportunity for users and developers to come and code with us. We even use the sprint format for some internal stuff.

photo by Sebastian Mary under creative commons licence.


hgview 1.2.0 released

2010/01/21 by David Douard

Here is at last the release of the version 1.2.0 of hgview.

http://www.logilab.org/image/19894?vid=download

In a nutshell, this release includes:

  • a basic support for mq extension,
  • a basic support for hg-bfiles extension,
  • working directory is now displayed as a node of the graph (if there are local modifications of course),
  • it's now possible to display only the subtree from a given revision (a bit like hg log -f)
  • it's also possible to activate an annotate view (make navigation slower however),
  • several improvements in the graph filling and rendering mecanisms,
  • I also added toolbar icons for the search and goto "quickbars" so they are not "hidden" any more to the one reluctant to user manuals,
  • it's now possible to go directly to the common ancestor of 2 revisions,
  • when on a merge node, it's now possible to choose the parent the diff is computed against,
  • make search also search in commit messages (it used to search only in diff contents),
  • and several bugfixes of course.
Notes:
there are packages for debian lenny, squeeze and sid, and for ubuntu hardy, interpid, jaunty and karmic. However, for lenny and hardy, provided packages won't work on pure distribs since hgview 1.2 depends on mercurial 1.1. Thus for these 2 distributions, packages will only work if you have installed backported mercurial packages.

New supported repositories for Debian and Ubuntu

2010/01/21 by Arthur Lutz

For the release of hgview 1.2.0 in our Karmic Ubuntu repository, we would like to announce that we are now going to generate packages for the following distributions :

  • Debian Lenny (because it's stable)
  • Debian Sid (because it's the dev branch)
  • Ubuntu Hardy (because it has Long Term Support)
  • Ubuntu Karmic (because it's the current stable)
  • Ubuntu Lucid (because it's the next stable) - no repo yet, but soon...
http://img.generation-nt.com/ubuntulogo_0080000000420571.png

The old packages in the previously supported architectures are still accessible (etch, jaunty, intrepid), but new versions will not be generated for these repositories. Packages will be coming in as versions get released, if before that you need a package, give us a shout and we'll see what we can do.

For instructions on how to use the repositories for Ubuntu or Debian, go to the following page : http://www.logilab.org/card/LogilabDebianRepository


Open Source/Design Hardware

2009/12/13 by Nicolas Chauvat
http://www.logilab.org/image/19338?vid=download

I have been doing free software since I discovered it existed. I bought an OpenMoko some time ago, since I am interested in anything that is open, including artwork like books, music, movies and... hardware.

I just learned about two lists, one at Wikipedia and another one at MakeOnline, but Google has more. Explore and enjoy!


Solution to a common Mercurial task

2009/12/10 by David Douard

An interesting question has just been sent by Greg Ward on the Mercurial devel mailing-list (as a funny coincidence, it happened that I had to solve this problem a few days ago).

Let me quote his message:

here's my problem: imagine a customer is running software built from
changeset A, and we want to upgrade them to a new version, built from
changeset B.  So I need to know what bugs are fixed in B that were not
fixed in A.  I have already implemented a changeset/bug mapping, so I
can trivially lookup the bugs fixed by any changeset.  (It even handles
"ongoing" and "reverted" bugs in addition to "fixed".)

And he gives an example of situation where a tricky case may be found:

                +--- 75 -- 78 -- 79 ------------+
               /                                 \
              /     +-- 77 -- 80 ---------- 84 -- 85
             /     /                        /
0 -- ... -- 74 -- 76                       /
                   \                      /
                    +-- 81 -- 82 -- 83 --+

So what is the problem?

Imagine the lastest distributed stable release is built on rev 81. Now, I need to publish a new bugfix release based on this latest stable version, including every changeset that is a bugfix, but that have not yet been applied at revision 81.

So the first problem we need to solve is answering: what are the revisions ancestors of revision 85 that are not ancestor of revision 81?

Command line solution

Using hg commands, the solution is proposed by Steve Losh:

hg log --template '{rev}\n' --rev 85:0 --follow --prune 81

or better, as suggested by Matt:

hg log -q --template '{rev}\n' --rev 85:0 --follow --prune 81

The second is better since it does only read the index, and thus is much faster. But on big repositories, this command remains quite slow (with Greg's situation, a repo of more than 100000 revisions, the command takes more than 2 minutes).

Python solution

Using Python, one may think about using revlog.nodesbetween(), but it won't work as wanted here, not listing revisions 75, 78 and 79.

On the mailing list, Matt gave the most simple and efficient solution:

cl = repo.changelog
a = set(cl.ancestors(81))
b = set(cl.ancestors(85))
revs = b - a

Idea for a new extension

Using this simple python code, it should be easy to write a nice Mercurial extension (which could be named missingrevisions) to do this job.

Then, it should be interesting to also implement some filtering feature. For example, if there are simple conventions used in commit messages, eg. using something like "[fix #1245]" or "[close #1245]" in the commit message when the changeset is a fix for a bug listed in the bugtracker, then we may type commands like:

hg missingrevs REV -f bugfix

or:

hg missingrevs REV -h HEADREV -f bugfix

to find bugfix revisions ancestors of HEADREV that are not ancestors of REV.

With filters (bugfix here) may be configurables in hgrc using regexps.


pylint bug day report

2009/12/04 by Pierre-Yves David
http://farm1.static.flickr.com/85/243306920_6a12bb48c7.jpg

The first pylint bug day took place on wednesday 25th. Four members of the Logilab crew and two other people spent the day working on pylint.

Several patches submitted before the bug day were processed and some tickets were closed.

Charles Hébert added James Lingard's patches for string formatting and is working on several improvements. Vincent Férotin submitted a patch for simple message listings. Sylvain Thenault fixed significant inference bugs in astng (an underlying module of pylint managing the syntax tree). Émile Anclin began a major astng refactoring to take advantage of new python2.6 functionality. For my part, I made several improvements to the test suite. I applied James Lingard patches for ++ operator and generalised it to -- too. I also added a new checker for function call arguments submitted by James Lingard once again. Finally I improved the message filtering of the --errors-only options.

We thank Maarten ter Huurne, Vincent Férotin for their participation and of course James Lingard for submitting numerous patches.

Another pylint bug day will be held in a few months.

image under creative commons by smccann


Resume of the first Coccinelle users day

2009/11/30 by Andre Espaze

A matching and transformation tool for systems code

The Coccinelle's goal is to ease code maintenance by first revealing code smells based on design patterns and second easing an API (Application Programming Interface) change for a heavily used library. Coccinelle can thus be seen as two tools inside one. The first one matches patterns, the second applies transformations. However facing such a big problem, the project needed to define boundaries in order to increase chances of success. The building motivation was thus to target the Linux kernel. This choice has implied a tool working on the C programming language before the preprocessor step. Moreover the Linux code base adds interesing constraints as it is huge, contains many possible configurations depending on C macros, may contain many bugs and evolves a lot. What was the Coccinelle solution for easing the kernel maintenance?

http://farm1.static.flickr.com/151/398536506_57df539ccf_m.jpg

Generating diff files from the semantic patch langage

The Linux community reads lot of diff files for following the kernel evolution. As a consequence the diff file syntax is widely spread and commonly understood. However this syntax concerns a particular change between two files, its does not allow to match a generic pattern.

The Coccinelle's solution is to build its own langage allowing to declare rules describing a code pattern and a possible transformation. This langage is the Semantic Patch Langage (SmPL), based on the declarative approach of the diff file syntax. It allows to propagate a change rule to many files by generating diff files. Then those results can be directly applied by using the patch command but most of the time they will be reviewed and may be slightly adapted to the programmer's need.

A Coccinelle's rule is made of two parts: metavariable declaration and a code pattern match followed by a possible transformation. A metavariable means a control flow variable, its possibles names inside the program do not matter. Then the code pattern will describe a particular control flow in the program by using the C and SmPL syntaxes manipulating the metavariables. As a result, Coccinelle succeeds to generate diff files because it works on the C program control flow.

A complete SmPL description will not be given here because it can be found in the Coccinelle's documentation. However a brief introduction will be made on a rule declaration. The metavariable part will look like this:

@@
expression E;
constant C;
@@

'expression' means a variable or the result of a function. However 'constant' means a C constant. Then for negating the result of an and operation between an expression and a constant instead of negating the expression first, the transformation part will be:

- !E & C
+ !(E & C)

A file containing several rules like that will be called a semantic patch. It will be applied by using the Coccinelle 'spatch' command that will generate a change written in the diff file syntax each time the above pattern is matched. The next section will illustrate this way of work.

http://www.simplehelp.net/wp-images/icons/topic_linux.jpg

A working example on the Linux kernel 2.6.30

You can download and install Coccinelle 'spatch' command from its website: http://coccinelle.lip6.fr/ if you want to execute the following example. Let's first consider the following structure with accessors in the header 'device.h':

struct device {
    void *driver_data;
};

static inline void *dev_get_drvdata(const struct device *dev)
{
    return dev->driver_data;
}

static inline void dev_set_drvdata(struct device *dev, void* data)
{
    dev->driver_data = data;
}

it imitates the 2.6.30 kernel header 'include/linux/device.h'. Let's now consider the following client code that does not make use of the accessors:

#include <stdlib.h>
#include <assert.h>

#include "device.h"

int main()
{
    struct device devs[2], *dev_ptr;
    int data[2] = {3, 7};
    void *a = NULL, *b = NULL;

    devs[0].driver_data = (void*)(&data[0]);
    a = devs[0].driver_data;

    dev_ptr = &devs[1];
    dev_ptr->driver_data = (void*)(&data[1]);
    b = dev_ptr->driver_data;

    assert(*((int*)a) == 3);
    assert(*((int*)b) == 7);
    return 0;
}

Once this code saved in the file 'fake_device.c', we can check that the code compiles and runs by:

$ gcc fake_device.c && ./a.out

We will now create a semantic patch 'device_data.cocci' trying to add the getter accessor with this first rule:

@@
struct device dev;
@@
- dev.driver_data
+ dev_get_drvdata(&dev)

The 'spatch' command is then run by:

$ spatch -sp_file device_data.cocci fake_device.c

producing the following change in a diff file:

-    devs[0].driver_data = (void*)(&data[0]);
-    a = devs[0].driver_data;
+    dev_get_drvdata(&devs[0]) = (void*)(&data[0]);
+    a = dev_get_drvdata(&devs[0]);

which illustrates the great Coccinelle's way of work on program flow control. However the transformation has also matched code where the setter accessor should be used. We will thus add a rule above the previous one, the semantic patch becomes:

@@
struct device dev;
expression data;
@@
- dev.driver_data = data
+ dev_set_drvdata(&dev, data)

@@
struct device dev;
@@
- dev.driver_data
+ dev_get_drvdata(&dev)

Running the command again will produce the wanted output:

$ spatch -sp_file device_data.cocci fake_device.c
-    devs[0].driver_data = (void*)(&data[0]);
-    a = devs[0].driver_data;
+    dev_set_drvdata(&devs[0], (void *)(&data[0]));
+    a = dev_get_drvdata(&devs[0]);

It is important to write the setter rule before the getter rule else the getter rule will be applied first to the whole file.

At this point our semantic patch is still incomplete because it does not work on 'device' structure pointers. By using the same logic, let's add it to the 'device_data.cocci' semantic patch:

@@
struct device dev;
expression data;
@@
- dev.driver_data = data
+ dev_set_drvdata(&dev, data)

@@
struct device * dev;
expression data;
@@
- dev->driver_data = data
+ dev_set_drvdata(dev, data)

@@
struct device dev;
@@
- dev.driver_data
+ dev_get_drvdata(&dev)

@@
struct device * dev;
@@
- dev->driver_data
+ dev_get_drvdata(dev)

Running Coccinelle again:

$ spatch -sp_file device_data.cocci fake_device.c

will add the remaining transformations for the 'fake_device.c' file:

-    dev_ptr->driver_data = (void*)(&data[1]);
-    b = dev_ptr->driver_data;
+    dev_set_drvdata(dev_ptr, (void *)(&data[1]));
+    b = dev_get_drvdata(dev_ptr);

but a new problem appears: the 'device.h' header is also modified. We meet here an important point of the Coccinelle's philosophy described in the first section. 'spatch' is a tool to ease code maintenance by propagating a code pattern change to many files. However the resulting diff files are supposed to be reviewed and in our case the unwanted modification should be removed. Note that it would be possible to avoid the 'device.h' header modification by using SmPL syntax but the explanation would be too much for a starting tutorial. Instead, we will simply cut the unwanted part:

$ spatch -sp_file device_data.cocci fake_device.c | cut -d $'\n' -f 16-34

This result will now be kept in a diff file by moreover asking 'spatch' to produce it for the current working directory:

$ spatch -sp_file device_data.cocci -patch "" fake_device.c | \
cut -d $'\n' -f 16-34 > device_data.patch

It is now time to apply the change for getting a working C code using accessors:

$ patch -p1 < device_data.patch

The final result for 'fake_device.c' should be:

#include <stdlib.h>
#include <assert.h>

#include "device.h"

int main()
{
    struct device devs[2], *dev_ptr;
    int data[2] = {3, 7};
    void *a = NULL, *b = NULL;

    dev_set_drvdata(&devs[0], (void *)(&data[0]));
    a = dev_get_drvdata(&devs[0]);

    dev_ptr = &devs[1];
    dev_set_drvdata(dev_ptr, (void *)(&data[1]));
    b = dev_get_drvdata(dev_ptr);

    assert(*((int*)a) == 3);
    assert(*((int*)b) == 7);
    return 0;
}

Finally, we can test that the code compiles and runs:

.. sourcecode:: sh
$ gcc fake_device.c && ./a.out

The semantic patch is now ready to be used on the Linux's 2.6.30 kernel:

$ wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.30.tar.bz2
$ tar xjf linux-2.6.30.tar.bz2
$ spatch -sp_file device_data.cocci -dir linux-2.6.30/drivers/net/ \
  > device_drivers_net.patch
$ wc -l device_drivers_net.patch
642

You may also try the 'drivers/ieee1394' directory.

http://coccinelle.lip6.fr/img/lip6.jpg

Conclusion

Coccinelle is made of around 60 thousands lines of Objective Caml. As illustrated by the above example on the linux kernel, the 'spatch' command succeeds to ease code maintenance. For the Coccinelle's team working on the kernel code base, a semantic patch is usually around 100 lines and will generated diff files to sometimes hundred of files. Moreover the processing is rather fast, the average time per file is said to be 0.7s.

Two tools using the 'spatch' engine have already been built: 'spdiff' and 'herodotos'. With the first one you could almost avoid to learn the SmPL language because the idea is to generate a semantic patch by looking to transformations between files pairs. The second allows to correlate defects over software versions once the corresponding code smells have been described in SmPL.

One of the Coccinelle's problem is to not being easily extendable to another language as the engine was designed for analyzing control flows on C programs. The C++ langage may be added but required obviously lot of work. It would be great to also have such a tool on dynamic languages like Python.

image under creative commons by Rémi Vannier