subscribe to this blog

Logilab.org - en

News from Logilab and our Free Software projects, as well as on topics dear to our hearts (Python, Debian, Linux, the semantic web, scientific computing...)

show 204 results
  • Solution to a common Mercurial task

    2009/12/10 by David Douard

    An interesting question has just been sent by Greg Ward on the Mercurial devel mailing-list (as a funny coincidence, it happened that I had to solve this problem a few days ago).

    Let me quote his message:

    here's my problem: imagine a customer is running software built from
    changeset A, and we want to upgrade them to a new version, built from
    changeset B.  So I need to know what bugs are fixed in B that were not
    fixed in A.  I have already implemented a changeset/bug mapping, so I
    can trivially lookup the bugs fixed by any changeset.  (It even handles
    "ongoing" and "reverted" bugs in addition to "fixed".)
    

    And he gives an example of situation where a tricky case may be found:

                    +--- 75 -- 78 -- 79 ------------+
                   /                                 \
                  /     +-- 77 -- 80 ---------- 84 -- 85
                 /     /                        /
    0 -- ... -- 74 -- 76                       /
                       \                      /
                        +-- 81 -- 82 -- 83 --+
    

    So what is the problem?

    Imagine the lastest distributed stable release is built on rev 81. Now, I need to publish a new bugfix release based on this latest stable version, including every changeset that is a bugfix, but that have not yet been applied at revision 81.

    So the first problem we need to solve is answering: what are the revisions ancestors of revision 85 that are not ancestor of revision 81?

    Command line solution

    Using hg commands, the solution is proposed by Steve Losh:

    hg log --template '{rev}\n' --rev 85:0 --follow --prune 81
    

    or better, as suggested by Matt:

    hg log -q --template '{rev}\n' --rev 85:0 --follow --prune 81
    

    The second is better since it does only read the index, and thus is much faster. But on big repositories, this command remains quite slow (with Greg's situation, a repo of more than 100000 revisions, the command takes more than 2 minutes).

    Python solution

    Using Python, one may think about using revlog.nodesbetween(), but it won't work as wanted here, not listing revisions 75, 78 and 79.

    On the mailing list, Matt gave the most simple and efficient solution:

    cl = repo.changelog
    a = set(cl.ancestors(81))
    b = set(cl.ancestors(85))
    revs = b - a
    

    Idea for a new extension

    Using this simple python code, it should be easy to write a nice Mercurial extension (which could be named missingrevisions) to do this job.

    Then, it should be interesting to also implement some filtering feature. For example, if there are simple conventions used in commit messages, eg. using something like "[fix #1245]" or "[close #1245]" in the commit message when the changeset is a fix for a bug listed in the bugtracker, then we may type commands like:

    hg missingrevs REV -f bugfix
    

    or:

    hg missingrevs REV -h HEADREV -f bugfix
    

    to find bugfix revisions ancestors of HEADREV that are not ancestors of REV.

    With filters (bugfix here) may be configurables in hgrc using regexps.


  • pylint bug day report

    2009/12/04 by Pierre-Yves David
    http://farm1.static.flickr.com/85/243306920_6a12bb48c7.jpg

    The first pylint bug day took place on wednesday 25th. Four members of the Logilab crew and two other people spent the day working on pylint.

    Several patches submitted before the bug day were processed and some tickets were closed.

    Charles Hébert added James Lingard's patches for string formatting and is working on several improvements. Vincent Férotin submitted a patch for simple message listings. Sylvain Thenault fixed significant inference bugs in astng (an underlying module of pylint managing the syntax tree). Émile Anclin began a major astng refactoring to take advantage of new python2.6 functionality. For my part, I made several improvements to the test suite. I applied James Lingard patches for ++ operator and generalised it to -- too. I also added a new checker for function call arguments submitted by James Lingard once again. Finally I improved the message filtering of the --errors-only options.

    We thank Maarten ter Huurne, Vincent Férotin for their participation and of course James Lingard for submitting numerous patches.

    Another pylint bug day will be held in a few months.

    image under creative commons by smccann


  • Resume of the first Coccinelle users day

    2009/11/30 by Andre Espaze

    A matching and transformation tool for systems code

    The Coccinelle's goal is to ease code maintenance by first revealing code smells based on design patterns and second easing an API (Application Programming Interface) change for a heavily used library. Coccinelle can thus be seen as two tools inside one. The first one matches patterns, the second applies transformations. However facing such a big problem, the project needed to define boundaries in order to increase chances of success. The building motivation was thus to target the Linux kernel. This choice has implied a tool working on the C programming language before the preprocessor step. Moreover the Linux code base adds interesing constraints as it is huge, contains many possible configurations depending on C macros, may contain many bugs and evolves a lot. What was the Coccinelle solution for easing the kernel maintenance?

    http://farm1.static.flickr.com/151/398536506_57df539ccf_m.jpg

    Generating diff files from the semantic patch langage

    The Linux community reads lot of diff files for following the kernel evolution. As a consequence the diff file syntax is widely spread and commonly understood. However this syntax concerns a particular change between two files, its does not allow to match a generic pattern.

    The Coccinelle's solution is to build its own langage allowing to declare rules describing a code pattern and a possible transformation. This langage is the Semantic Patch Langage (SmPL), based on the declarative approach of the diff file syntax. It allows to propagate a change rule to many files by generating diff files. Then those results can be directly applied by using the patch command but most of the time they will be reviewed and may be slightly adapted to the programmer's need.

    A Coccinelle's rule is made of two parts: metavariable declaration and a code pattern match followed by a possible transformation. A metavariable means a control flow variable, its possibles names inside the program do not matter. Then the code pattern will describe a particular control flow in the program by using the C and SmPL syntaxes manipulating the metavariables. As a result, Coccinelle succeeds to generate diff files because it works on the C program control flow.

    A complete SmPL description will not be given here because it can be found in the Coccinelle's documentation. However a brief introduction will be made on a rule declaration. The metavariable part will look like this:

    @@
    expression E;
    constant C;
    @@
    

    'expression' means a variable or the result of a function. However 'constant' means a C constant. Then for negating the result of an and operation between an expression and a constant instead of negating the expression first, the transformation part will be:

    - !E & C
    + !(E & C)
    

    A file containing several rules like that will be called a semantic patch. It will be applied by using the Coccinelle 'spatch' command that will generate a change written in the diff file syntax each time the above pattern is matched. The next section will illustrate this way of work.

    http://www.simplehelp.net/wp-images/icons/topic_linux.jpg

    A working example on the Linux kernel 2.6.30

    You can download and install Coccinelle 'spatch' command from its website: http://coccinelle.lip6.fr/ if you want to execute the following example. Let's first consider the following structure with accessors in the header 'device.h':

    struct device {
        void *driver_data;
    };
    
    static inline void *dev_get_drvdata(const struct device *dev)
    {
        return dev->driver_data;
    }
    
    static inline void dev_set_drvdata(struct device *dev, void* data)
    {
        dev->driver_data = data;
    }
    

    it imitates the 2.6.30 kernel header 'include/linux/device.h'. Let's now consider the following client code that does not make use of the accessors:

    #include <stdlib.h>
    #include <assert.h>
    
    #include "device.h"
    
    int main()
    {
        struct device devs[2], *dev_ptr;
        int data[2] = {3, 7};
        void *a = NULL, *b = NULL;
    
        devs[0].driver_data = (void*)(&data[0]);
        a = devs[0].driver_data;
    
        dev_ptr = &devs[1];
        dev_ptr->driver_data = (void*)(&data[1]);
        b = dev_ptr->driver_data;
    
        assert(*((int*)a) == 3);
        assert(*((int*)b) == 7);
        return 0;
    }
    

    Once this code saved in the file 'fake_device.c', we can check that the code compiles and runs by:

    $ gcc fake_device.c && ./a.out
    

    We will now create a semantic patch 'device_data.cocci' trying to add the getter accessor with this first rule:

    @@
    struct device dev;
    @@
    - dev.driver_data
    + dev_get_drvdata(&dev)
    

    The 'spatch' command is then run by:

    $ spatch -sp_file device_data.cocci fake_device.c
    

    producing the following change in a diff file:

    -    devs[0].driver_data = (void*)(&data[0]);
    -    a = devs[0].driver_data;
    +    dev_get_drvdata(&devs[0]) = (void*)(&data[0]);
    +    a = dev_get_drvdata(&devs[0]);
    

    which illustrates the great Coccinelle's way of work on program flow control. However the transformation has also matched code where the setter accessor should be used. We will thus add a rule above the previous one, the semantic patch becomes:

    @@
    struct device dev;
    expression data;
    @@
    - dev.driver_data = data
    + dev_set_drvdata(&dev, data)
    
    @@
    struct device dev;
    @@
    - dev.driver_data
    + dev_get_drvdata(&dev)
    

    Running the command again will produce the wanted output:

    $ spatch -sp_file device_data.cocci fake_device.c
    -    devs[0].driver_data = (void*)(&data[0]);
    -    a = devs[0].driver_data;
    +    dev_set_drvdata(&devs[0], (void *)(&data[0]));
    +    a = dev_get_drvdata(&devs[0]);
    

    It is important to write the setter rule before the getter rule else the getter rule will be applied first to the whole file.

    At this point our semantic patch is still incomplete because it does not work on 'device' structure pointers. By using the same logic, let's add it to the 'device_data.cocci' semantic patch:

    @@
    struct device dev;
    expression data;
    @@
    - dev.driver_data = data
    + dev_set_drvdata(&dev, data)
    
    @@
    struct device * dev;
    expression data;
    @@
    - dev->driver_data = data
    + dev_set_drvdata(dev, data)
    
    @@
    struct device dev;
    @@
    - dev.driver_data
    + dev_get_drvdata(&dev)
    
    @@
    struct device * dev;
    @@
    - dev->driver_data
    + dev_get_drvdata(dev)
    

    Running Coccinelle again:

    $ spatch -sp_file device_data.cocci fake_device.c
    

    will add the remaining transformations for the 'fake_device.c' file:

    -    dev_ptr->driver_data = (void*)(&data[1]);
    -    b = dev_ptr->driver_data;
    +    dev_set_drvdata(dev_ptr, (void *)(&data[1]));
    +    b = dev_get_drvdata(dev_ptr);
    

    but a new problem appears: the 'device.h' header is also modified. We meet here an important point of the Coccinelle's philosophy described in the first section. 'spatch' is a tool to ease code maintenance by propagating a code pattern change to many files. However the resulting diff files are supposed to be reviewed and in our case the unwanted modification should be removed. Note that it would be possible to avoid the 'device.h' header modification by using SmPL syntax but the explanation would be too much for a starting tutorial. Instead, we will simply cut the unwanted part:

    $ spatch -sp_file device_data.cocci fake_device.c | cut -d $'\n' -f 16-34
    

    This result will now be kept in a diff file by moreover asking 'spatch' to produce it for the current working directory:

    $ spatch -sp_file device_data.cocci -patch "" fake_device.c | \
    cut -d $'\n' -f 16-34 > device_data.patch
    

    It is now time to apply the change for getting a working C code using accessors:

    $ patch -p1 < device_data.patch
    

    The final result for 'fake_device.c' should be:

    #include <stdlib.h>
    #include <assert.h>
    
    #include "device.h"
    
    int main()
    {
        struct device devs[2], *dev_ptr;
        int data[2] = {3, 7};
        void *a = NULL, *b = NULL;
    
        dev_set_drvdata(&devs[0], (void *)(&data[0]));
        a = dev_get_drvdata(&devs[0]);
    
        dev_ptr = &devs[1];
        dev_set_drvdata(dev_ptr, (void *)(&data[1]));
        b = dev_get_drvdata(dev_ptr);
    
        assert(*((int*)a) == 3);
        assert(*((int*)b) == 7);
        return 0;
    }
    

    Finally, we can test that the code compiles and runs:

    .. sourcecode:: sh
    
    $ gcc fake_device.c && ./a.out

    The semantic patch is now ready to be used on the Linux's 2.6.30 kernel:

    $ wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.30.tar.bz2
    $ tar xjf linux-2.6.30.tar.bz2
    $ spatch -sp_file device_data.cocci -dir linux-2.6.30/drivers/net/ \
      > device_drivers_net.patch
    $ wc -l device_drivers_net.patch
    642
    

    You may also try the 'drivers/ieee1394' directory.

    http://coccinelle.lip6.fr/img/lip6.jpg

    Conclusion

    Coccinelle is made of around 60 thousands lines of Objective Caml. As illustrated by the above example on the linux kernel, the 'spatch' command succeeds to ease code maintenance. For the Coccinelle's team working on the kernel code base, a semantic patch is usually around 100 lines and will generated diff files to sometimes hundred of files. Moreover the processing is rather fast, the average time per file is said to be 0.7s.

    Two tools using the 'spatch' engine have already been built: 'spdiff' and 'herodotos'. With the first one you could almost avoid to learn the SmPL language because the idea is to generate a semantic patch by looking to transformations between files pairs. The second allows to correlate defects over software versions once the corresponding code smells have been described in SmPL.

    One of the Coccinelle's problem is to not being easily extendable to another language as the engine was designed for analyzing control flows on C programs. The C++ langage may be added but required obviously lot of work. It would be great to also have such a tool on dynamic languages like Python.

    image under creative commons by Rémi Vannier


  • pylint bug day next wednesday!

    2009/11/23 by Sylvain Thenault

    Remember that the first pylint bug day will be held on wednesday, november 25, from around 8am to 8pm in the Paris (France) time zone.

    We'll be a few people at Logilab and hopefuly a lot of other guys all around the world, trying to make pylint better.

    Join us on the #public conference room of conference.jabber.logilab.org, or if you prefer using an IRC client, join #public on irc.logilab.org which is a gateway to the jabber forum. And if you're in Paris, come to work with us in our office.

    People willing to help but without knowledge of pylint internals are welcome, it's the perfect occasion to learn a lot about it, and to be able to hack on pylint in the future!


  • First contact with pupynere

    2009/11/06 by Pierre-Yves David

    I spent some time this week evaluating Pupynere, the PUre PYthon NEtcdf REader written by Roberto De Almeida. I see several advantages in pupynere.

    First it's a pure Python module with no external dependency. It doesn't even depend on the NetCDF lib and it is therefore very easy to deploy.

    http://www.unidata.ucar.edu/software/netcdf/netcdf1_sm.png

    Second, it offers the same interface as Scientific Python's NetCDF bindings which makes transitioning from one module to another very easy.

    Third pupynere is being integrated into Scipy as the scypi.io.netcdf module. Once integrated, this could ensure a wide adoption by the python community.

    Finally it's easy to dig in this clear and small code base of about 600 lines. I have just sent several fixes and bug reports to the author.

    http://docs.scipy.org/doc/_static/scipyshiny_small.png

    However pupynere isn't mature yet. First it seems pupynere has been only used for simple cases so far. Many common cases are broken. Moreover there is no support for new NetCDF formats such as long-NetCDF and NetCDF4, and important features such as file update are still missing. In addition, The lack of a test suite is a serious issue. In my opinion, various bugs could already have been detected and fixed with simple unit tests. Contributions would be much more comfortable with the safety net offered by a test suite. I am not certain that the fixes and improvements I made this week did not introduce regressions.

    To conclude, pupynere seems too young for production use. But I invite people to try it and provide feedback and fixes to the author. I'm looking forward to using this project in production in the future.


  • First Pylint Bug Day on Nov 25th, 2009 !

    2009/10/21 by Sylvain Thenault
    http://www.logilab.org/image/18785?vid=download

    Since we don't stop being overloaded here at Logilab, and we've got some encouraging feedback after the "Pylint needs you" post, we decided to take some time to introduce more "community" in pylint.

    And the easiest thing to do, rather sooner than later, is a irc/jabber synchronized bug day, which will be held on Wednesday november 25. We're based in France, so main developpers will be there between around 8am and 19pm UTC+1. If a few of you guys are around Paris at this time and wish to come at Logilab to sprint with us, contact us and we'll try to make this possible.

    The focus for this bug killing day could be:

    • using logilab.org tracker : getting an account, submitting tickets, triaging existing tickets...
    • using mercurial to develop pylint / astng
    • guide people in the code so they're able to fix simple bugs

    We will of course also try to kill a hella-lotta bugs, but the main idea is to help whoever wants to contribute to pylint... and plan for the next bug-killing day !

    As we are in the process of moving to another place, we can't organize a sprint yet, but we should have some room available for the next time, so stay tuned :)


  • Projman 0.14.0 includes a Graphical User Interface

    2009/10/19 by Emile Anclin

    Introduction

    Projman is a project manager. With projman 0.14.0, the first sketch of a GUI has been updated, and important functionalities added. You can now easily see and edit task dependencies and test the resulting scheduling. Furthermore, a begin-after-end-previous constraint has been added which should really simplify the edition of the scheduling.

    The GUI can be used the two following ways:

    $ projman-gui
    $ projman-gui <path/to/project.xml>
    

    The file <path/to/project.xml> is the well known main file of a projman project. Starting projman-gui with no project.xml specified, or after opening a project, you can open an existing project simply with "File->Open". (For now, you can't create a new project with projman-gui.) You can edit the tasks and then save the modifications to the task file with "File->Save".

    http://www.logilab.org/image/18731?vid=download

    The Project tab

    The Project tab shows simply the four needed files of a projman project for resources, activities, tasks and schedule.

    Resources

    The Resources tab presents the different resources:

    • human resources
    • resource roles describing the different roles that resources can play
    • Different calendars for different resources with their "offdays"

    Activities

    For now, the Activities tab is not implemented. It should show the planning of the activities for each resource and the progress of the project.

    Tasks

    The Tasks tab is for now the most important one; it shows a tree view of the task hierarchy, and for each task:

    • the title of the task,
    • the role for that task,
    • the load (time in days),
    • the scheduling type,
    • the list of the constraints for the scheduling,
    • and the description of the task,

    each of which can be edited. You easily can drag and drop tasks inside the task tree and add and delete tasks and constraints.

    See the attached screenshot of the projman-gui task panel.

    Scheduling

    In the Scheduling tab you can simply test your scheduling by clicking "START". If you expect the scheduling to take a longer time, you can modify the maximum time of searching a solution.

    Known bugs

    • The begin-after-end-previous constraint does not work for a task having subtasks.
    • Deleting a task doesn't check for depending tasks, so scheduling won't work anymore.

  • hgview 1.1.0 released

    2009/09/25 by David Douard

    I am pleased to announce the latest release of hgview 1.1.0.

    What is it?

    For the ones from the back of the classroom near the radiator, let me remind you that hgview is a very helpful tool for daily work using the excellent DVCS Mercurial (which we heavily use at Logilab). It allows to easily and visually navigate your hg repository revision graphlog. It is written in Python and pyqt.

    http://www.logilab.org/image/18210?vid=download

    What's new

    • user can now configure colors used in the diff area (and they now defaults to white on black)
    • indicate current working directory position by a square node
    • add many other configuration options (listed when typing hg help hgview)
    • removed 'hg hgview-options' command in favor of 'hg help hgview'
    • add ability to choose which parent to diff with for merge nodes
    • dramatically improved UI behaviour (shortcuts)
    • improved help and make it accessible from the GUI
    • make it possible not to display the diffstat column of the file list (which can dramatically improve performances on big repositories)
    • standalone application: improved command line options
    • indicate working directory position in the graph
    • add auto-reload feature (when the repo is modified due to a pull, a commit, etc., hgview detects it, reloads the repo and updates the graph)
    • fix many bugs, especially the file log navigator should now display the whole graph

    Download and installation

    The source code is available as a tarball, or using our public hg repository of course.

    To use it from the sources, you just have to add a line in your .hgrc file, in the [extensions] section:

    hgext.hgview=/path/to/hgview/hgext/hgview.py

    Debian and Ubuntu users can also easily install hgview (and Logilab other free software tools) using our deb package repositories.


  • Using tempfile.mkstemp correctly

    2009/09/10

    The mkstemp function in the tempfile module returns a tuple of 2 values:

    • an OS-level handle to an open file (as would be returned by os.open())
    • the absolute pathname of that file.

    I often see code using mkstemp only to get the filename to the temporary file, following a pattern such as:

    from tempfile import mkstemp
    import os
    
    def need_temp_storage():
        _, temp_path = mkstemp()
        os.system('some_commande --output %s' % temp_path)
        file = open(temp_path, 'r')
        data = file.read()
        file.close()
        os.remove(temp_path)
        return data
    

    This seems to be working fine, but there is a bug hiding in there. The bug will show up on Linux if you call this functions many time in a long running process, and on the first call on Windows. We have leaked a file descriptor.

    The first element of the tuple returned by mkstemp is typically an integer used to refer to a file by the OS. In Python, not closing a file is usually no big deal because the garbage collector will ultimately close the file for you, but here we are not dealing with file objects, but with OS-level handles. The interpreter sees an integer and has no way of knowing that the integer is connected to a file. On Linux, calling the above function repeatedly will eventually exhaust the available file descriptors. The program will stop with:

    IOError: [Errno 24] Too many open files: '/tmp/tmpJ6g4Ke'
    

    On Windows, it is not possible to remove a file which is still opened by another process, and you will get:

    Windows Error [Error 32]
    

    Fixing the above function requires closing the file descriptor using os.close_():

    from tempfile import mkstemp
    import os
    
    def need_temp_storage():
        fd, temp_path = mkstemp()
        os.system('some_commande --output %s' % temp_path)
        file = open(temp_path, 'r')
        data = file.read()
        file.close()
        os.close(fd)
        os.remove(temp_path)
        return data
    

    If you need your process to write directly in the temporary file, you don't need to call os.write_(fd, data). The function os.fdopen_(fd) will return a Python file object using the same file descriptor. Closing that file object will close the OS-level file descriptor.


  • You can now register on our sites

    2009/09/03 by Arthur Lutz

    With the new version of CubicWeb deployed on our "public" sites, we would like to welcome a new (much awaited) functionality : you can now register directly on our websites. Getting an account with give you access to a bunch of functionalities :

    http://farm1.static.flickr.com/53/148921611_eadce4f5f5_m.jpg
    • registering to a project's activity with get you automated email reports of what is happening on that project
    • you can directly add tickets on projects instead of talking about it on the mailing lists
    • you can bookmark content
    • tag stuff
    • and much more...

    This is also a way of testing out the CubicWeb framework (in this case the forge cube) which you can take home and host yourself (debian recommended). Just click on the "register" link on the top right, or here.

    Photo by wa7son under creative commons.


show 204 results