show 315 results

Blog entries

  • hgview 1.1.0 released

    2009/09/25 by David Douard

    I am pleased to announce the latest release of hgview 1.1.0.

    What is it?

    For the ones from the back of the classroom near the radiator, let me remind you that hgview is a very helpful tool for daily work using the excellent DVCS Mercurial (which we heavily use at Logilab). It allows to easily and visually navigate your hg repository revision graphlog. It is written in Python and pyqt.

    http://www.logilab.org/image/18210?vid=download

    What's new

    • user can now configure colors used in the diff area (and they now defaults to white on black)
    • indicate current working directory position by a square node
    • add many other configuration options (listed when typing hg help hgview)
    • removed 'hg hgview-options' command in favor of 'hg help hgview'
    • add ability to choose which parent to diff with for merge nodes
    • dramatically improved UI behaviour (shortcuts)
    • improved help and make it accessible from the GUI
    • make it possible not to display the diffstat column of the file list (which can dramatically improve performances on big repositories)
    • standalone application: improved command line options
    • indicate working directory position in the graph
    • add auto-reload feature (when the repo is modified due to a pull, a commit, etc., hgview detects it, reloads the repo and updates the graph)
    • fix many bugs, especially the file log navigator should now display the whole graph

    Download and installation

    The source code is available as a tarball, or using our public hg repository of course.

    To use it from the sources, you just have to add a line in your .hgrc file, in the [extensions] section:

    hgext.hgview=/path/to/hgview/hgext/hgview.py

    Debian and Ubuntu users can also easily install hgview (and Logilab other free software tools) using our deb package repositories.


  • Pylint a besoin de vous

    2009/09/17

    Après plusieurs mois au point mort ou presque, Sylvain a pu hier soir publier des versions corrigeant un certain nombre de bogues dans pylint et astng ([1] et [2]).

    Il n'en demeure pas moins qu'à Logilab, nous manquons de temps pour faire baisser la pile de tickets ouverts dans le tracker de pylint. Si vous jetez un œuil dans l'onglet Tickets, vous y trouverez un grand nombre de bogues en souffrance et de fonctionalités indispensables (certaines peut-être un peu moins que d'autres...) Il est déjà possible de contribuer en utilisant mercurial pour fournir des patches, ou en signalant des bogues (aaaaaaaaaarg ! encore des tickets !) et certains s'y sont mis, qu'ils en soient remerciés.

    Maintenant, nous nous demandions ce que nous pourrions faire pour faire avance Pylint, et nos premières idées sont :

    • organiser un petit sprint de 3 jours environ
    • organiser des jours de "tuage de ticket", comme ça se pratique sur différents projets OSS

    Mais pour que ça soit utile, nous avons besoin de votre aide. Voici donc quelques questions :

    • est-ce que vous participeriez à un sprint à Logilab (à Paris, France), ce qui nous permettrait de nous rencontrer, de vous apprendre plein de choses sur le fonctionnement de Pylint et de travailler ensemble sur des tickets qui vous aideraient dans votre travail ?
    • si la France c'est trop loin, où est-ce que ça vous arrangerait ?
    • seriez-vous prêt à vous joindre à nous sur le serveur jabber de Logilab ou sur IRC, pour participer à une chasse au ticket (à une date à déterminer). Si oui, quel est votre degré de connaissance du fonctionnement interne de Pylint et astng ?

    Vous pouvez répondre en commentant sur ce blog (pensez à vous enregistrer en utilisant le lien en haut à droite sur cette page) ou en écrivant à sylvain.thenault@logilab.fr. Si nous avons suffisamment de réponses positives nous organiserons quelque chose.


  • Using tempfile.mkstemp correctly

    2009/09/10

    The mkstemp function in the tempfile module returns a tuple of 2 values:

    • an OS-level handle to an open file (as would be returned by os.open())
    • the absolute pathname of that file.

    I often see code using mkstemp only to get the filename to the temporary file, following a pattern such as:

    from tempfile import mkstemp
    import os
    
    def need_temp_storage():
        _, temp_path = mkstemp()
        os.system('some_commande --output %s' % temp_path)
        file = open(temp_path, 'r')
        data = file.read()
        file.close()
        os.remove(temp_path)
        return data
    

    This seems to be working fine, but there is a bug hiding in there. The bug will show up on Linux if you call this functions many time in a long running process, and on the first call on Windows. We have leaked a file descriptor.

    The first element of the tuple returned by mkstemp is typically an integer used to refer to a file by the OS. In Python, not closing a file is usually no big deal because the garbage collector will ultimately close the file for you, but here we are not dealing with file objects, but with OS-level handles. The interpreter sees an integer and has no way of knowing that the integer is connected to a file. On Linux, calling the above function repeatedly will eventually exhaust the available file descriptors. The program will stop with:

    IOError: [Errno 24] Too many open files: '/tmp/tmpJ6g4Ke'
    

    On Windows, it is not possible to remove a file which is still opened by another process, and you will get:

    Windows Error [Error 32]
    

    Fixing the above function requires closing the file descriptor using os.close_():

    from tempfile import mkstemp
    import os
    
    def need_temp_storage():
        fd, temp_path = mkstemp()
        os.system('some_commande --output %s' % temp_path)
        file = open(temp_path, 'r')
        data = file.read()
        file.close()
        os.close(fd)
        os.remove(temp_path)
        return data
    

    If you need your process to write directly in the temporary file, you don't need to call os.write_(fd, data). The function os.fdopen_(fd) will return a Python file object using the same file descriptor. Closing that file object will close the OS-level file descriptor.


  • You can now register on our sites

    2009/09/03 by Arthur Lutz

    With the new version of CubicWeb deployed on our "public" sites, we would like to welcome a new (much awaited) functionality : you can now register directly on our websites. Getting an account with give you access to a bunch of functionalities :

    http://farm1.static.flickr.com/53/148921611_eadce4f5f5_m.jpg
    • registering to a project's activity with get you automated email reports of what is happening on that project
    • you can directly add tickets on projects instead of talking about it on the mailing lists
    • you can bookmark content
    • tag stuff
    • and much more...

    This is also a way of testing out the CubicWeb framework (in this case the forge cube) which you can take home and host yourself (debian recommended). Just click on the "register" link on the top right, or here.

    Photo by wa7son under creative commons.


  • New pylint/astng release, but... pylint needs you !

    2009/08/27 by Sylvain Thenault

    After several months with no time to fix/enhance pylint beside answering email and filing tickets, I've finally tackled some tasks yesterday night to publish bug fixes releases ([1] and [2]).

    The problem is that we don't have enough free time at Logilab to lower the number of tickets in pylint tracker page . If you take a look at the ticket tab, you'll see a lot of pendings bug and must-have features (well, and some other less necessary...). You can already easily contribute thanks to the great mercurial dvcs, and some of you do, either by providing patches or by reporting bugs (more tickets, iiirk ! ;) Thank you all btw !!

    Now I was wondering what could be done to make pylint going further, and the first ideas which came to my mind was :

    • do ~3 days sprint
    • do some 'tickets killing' days, as done in some popular oss projects

    But for this to be useful, we need your support, so here are some questions for you:

    • would you come to a sprint at Logilab (in Paris, France), so you can meet us, learn a lot about pylint, and work on tickets you wish to have in pylint?
    • if France is too far away for most people, would you have another location to propose?
    • would you be on jabber for a tickets killing day, providing it's ok with your agenda? if so, what's your knowledge of pylint/astng internals?

    you may answer by adding a comment to this blog (please register first by using the link at the top right of this page) or by mail to sylvain.thenault@logilab.fr. If we've enough positive answers, we'll take the time to organize such a thing.


  • Looking for a Windows Package Manager

    2009/07/31 by Nicolas Chauvat
    http://www.logilab.org/image/9862?vid=download

    As said in a previous article, I am convinced that part of the motivation for making package sub-systems like the Python one, which includes distutils, setuptools, etc, is that Windows users and Mac users never had the chance to use a tool that properly manages the configuration of their computer system. They just do not know what it would be like if they had at least a good package management system and do not miss it in their daily work.

    I looked for Windows package managers that claim to provide features similar to Debian's dpkg+apt-get and here is what I found in alphabetical order.

    AppSnap

    AppSnap is written in Python and uses wxPython, PyCurl and PyYAML. It is packaged using Py2Exe, compressed with UPX and installed using NSIS.

    It has not seen activity in the svn or on its blog since the end of 2008.

    Appupdater

    Appupdater provides functionality similar to apt-get or yum. It automates the process of installing and maintaining up to date versions of programs. It claims to be fully customizable and is licensed under the GPL.

    It seems under active development at SourceForge.

    QWinApt

    QWinApt is a Synaptic clone written in C# that has not evolved since september 2007.

    WinAptic

    WinAptic is another Synaptic clone written this time in Pascal that has not evolved since the end of 2007.

    Win-Get

    Win-get is an automated install system and software repository for Microsoft Windows. It is similar to apt-get: it connects to a link repository, finds an application and downloads it before performing the installation routine (silent or standard) and deleting the install file.

    It is written in pascal and is set up as a SourceForge project, but not much has been done lately.

    WinLibre

    WinLibre is a Windows free software distribution that provides a repository of packages and a tool to automate and simplify their installation.

    WinLibre was selected for Google Summer of Code 2009.

    ZeroInstall

    ZeroInstall started as a "non-admin" package manager for Linux distributions and is now extending its reach to work on windows.

    Conclusion

    I have not used any of these tools, the above is just the result of some time spent searching the web.

    A more limited approach is to notify the user of the newer versions:

    • App-Get will show you a list of your installed Applications. When an update is available for one of them, it will highlighted and you will be able to update the specific applications in seconds.
    • GetIt is not an application-getter/installer. When you want to install a program, you can look it up in GetIt to choose which program to install from a master list of all programs made available by the various apt-get clones.

    The appupdater project also compares itself to the programs automating the installation of software on Windows.

    Some columists expect the creation of application stores replicating the iPhone one.

    I once read about a project to get the Windows kernel into the Debian distribution, but can not find any trace of it... Remember that Debian is not limited to the Linux kernel, so why not think about a very improbable apt-get install windows-vista ?


  • The Configuration Management Problem

    2009/07/31 by Nicolas Chauvat
    http://www.logilab.org/image/9863?vid=download

    Today I felt like summing up my opinion on a topic that was discussed this year on the Python mailing lists, at PyCon-FR, at EuroPython and EuroSciPy... packaging software! Let us discuss the two main use cases.

    The first use case is to maintain computer systems in production. A trait of production systems, is that they can not afford failures and are often deployed on a large scale. It leaves little room for manually fixing problems. Either the installation process works or the system fails. Reaching that level of quality takes a lot of work.

    The second use case is to facilitate the life of software developers and computer users by making it easy for them to give a try to new pieces of software without much work.

    The first use case has to be addressed as a configuration management problem. There is no way around it. The best way I know of managing the configuration of a computer system is called Debian. Its package format and its tool chain provide a very extensive and efficient set of features for system development and maintenance. Of course it is not perfect and there are missing bits and open issues that could be tackled, like the dependencies between hardware and software. For example, nothing will prevent you from installing on your Debian system a version of a driver that conflicts with the version of the chip found in your hardware. That problem could be solved, but I do not think the Debian project is there yet and I do not count it as a reason to reject Debian since I have not seen any other competitor at the level as Debian.

    The second use case is kind of a trap, for it concerns most computer users and most of those users are either convinced the first use case has nothing in common with their problem or convinced that the solution is easy and requires little work.

    The situation is made more complicated by the fact that most of those users never had the chance to use a system with proper package management tools. They simply do not know the difference and do not feel like they are missing when using their system-that-comes-with-a-windowing-system-included.

    Since many software developers have never had to maintain computer systems in production (often considered a lower sysadmin job) and never developed packages for computer systems that are maintained in production, they tend to think that the operating system and their software are perfectly decoupled. They have no problem trying to create a new layer on top of existing operating systems and transforming an operating system issue (managing software installation) into a programming langage issue (see CPAN, Python eggs and so many others).

    Creating a sub-system specific to a language and hosting it on an operating system works well as long as the language boundary is not crossed and there is no competition between the sub-system and the system itself. In the Python world, distutils, setuptools, eggs and the like more or less work with pure Python code. They create a square wheel that was made round years ago by dpkg+apt-get and others, but they help a lot of their users do something they would not know how to do another way.

    A wall is quickly hit though, as the approach becomes overly complex as soon as they try to depend on things that do not belong to their Python sub-system. What if your application needs a database? What if your application needs to link to libraries? What if your application needs to reuse data from or provide data to other applications? What if your application needs to work on different architectures?

    The software developers that never had to maintain computer systems in production wish these tasks were easy. Unfortunately they are not easy and cannot be. As I said, there is no way around configuration management for the one who wants a stable system. Configuration management requires both project management work and software development work. One can have a system where packaging software is less work, but that comes at the price of stability and reduced functionnality and ease of maintenance.

    Since none of the two use cases will disappear any time soon, the only solution to the problem is to share as much data as possible between the different tools and let each one decide how to install software on his computer system.

    Some links to continue your readings on the same topic:


  • EuroSciPy'09 (part 1/2): The Need For Speed

    2009/07/29 by Nicolas Chauvat
    http://www.logilab.org/image/9852?vid=download

    The EuroSciPy2009 conference was held in Leipzig at the end of July and was sponsored by Logilab and other companies. It started with three talks about speed.

    Starving CPUs

    In his keynote, Fransesc Alted talked about starving CPUs. Thirty years back, memory and CPU frequencies where about the same. Memory speed kept up for about ten years with the evolution of CPU speed before falling behind. Nowadays, memory is about a hundred times slower than the cache which is itself about twenty times slower than the CPU. The direct consequence is that CPUs are starving and spend many clock cycles waiting for data to process.

    In order to improve the performance of programs, it is now required to know about the multiple layers of computer memory, from disk storage to CPU. The common architecture will soon count six levels: mechanical disk, solid state disk, ram, cache level 3, cache level 2, cache level 1.

    Using optimized array operations, taking striding into account, processing data blocks of the right size and using compression to diminish the amount of data that is transfered from one layer to the next are four techniques that go a long way on the road to high performance. Compression algorithms like Blosc increase throughput for they strike the right balance between being fast and providing good compression ratios. Blosc compression will soon be available in PyTables.

    Fransesc also mentions the numexpr extension to numpy, and its combination with PyTables named tables.Expr, that nicely and easily accelerates the computation of some expressions involving numpy arrays. In his list of references, Fransesc cites Ulrich Drepper article What every programmer should know about memory.

    Using PyPy's JIT for science

    Maciej Fijalkowski started his talk with a general presentation of the PyPy framework. One uses PyPy to describe an interpreter in RPython, then generate the actual interpreter code and its JIT.

    Since PyPy is has become more of a framework to write interpreters than a reimplementation of Python in Python, I suggested to change its misleading name to something like gcgc the Generic Compiler for Generating Compilers. Maciej answered that there are discussions on the mailing list to split the project in two and make the implementation of the Python interpreter distinct from the GcGc framework.

    Maciej then focused his talk on his recent effort to rewrite in RPython the part of numpy that exposes the underlying C library to Python. He says the benefits of using PyPy's JIT to speedup that wrapping layer are already visible. He has details on the PyPy blog. Gaël Varoquaux added that David Cournapeau has started working on making the C/Python split in numpy cleaner, which would further ease the job of rewriting it in RPython.

    CrossTwine Linker

    Damien Diederen talked about his work on CrossTwine Linker and compared it with the many projects that are actively attacking the problem of speed that dynamic and interpreted languages have been dragging along for years. Parrot tries to be the über virtual machine. Psyco offers very nice acceleration, but currently only on 32bits system. PyPy might be what he calls the Right Approach, but still needs a lot of work. Jython and IronPython modify the language a bit but benefit from the qualities of the JVM or the CLR. Unladen Swallow is probably the one that's most similar to CrossTwine.

    CrossTwine considers CPython as a library and uses a set of C++ classes to generate efficient interpreters that make calls to CPython's internals. CrossTwine is a tool that helps improving performance by hand-replacing some code paths with very efficient code that does the same operations but bypasses the interpreter and its overhead. An interpreter built with CrossTwine can be viewed as a JIT'ed branch of the official Python interpreter that should be feature-compatible (and bug-compatible) with CPython. Damien calls he approach "punching holes in C substrate to get more speed" and says it could probably be combined with Psyco for even better results.

    CrossTwine works on 64bit systems, but it is not (yet?) free software. It focuses on some use cases to greatly improve speed and is not to be considered a general purpose interpreter able to make any Python code faster.

    More readings

    Cython is a language that makes writing C extensions for the Python language as easy as Python itself. It replaces the older Pyrex.

    The SciPy2008 conference had at least two papers talking about speeding Python: Converting Python Functions to Dynamically Compiled C and unPython: Converting Python Numerical Programs into C.

    David Beazley gave a very interesting talk in 2009 at a Chicago Python Users group meeting about the effects of the GIL on multicore machines.

    I will continue my report on the conference with the second part titled "Applications And Open Questions".


  • Logilab at OSCON 2009

    2009/07/27 by Sandrine Ribeau
    http://assets.en.oreilly.com/1/event/27/oscon2009_oscon_11_years.gif

    OSCON, Open Source CONvention, takes place every year and promotes Open Source for technology. It is one of the meeting hubs for the growing open source community. This was the occasion for us to learn about new projects and to present CubicWeb during a BAYPIGgies meeting hosted by OSCON.

    http://www.openlina.com/templates/rhuk_milkyway/images/header_red_left.png

    I had the chance to talk with some of the folks working at OpenLina where they presented LINA. LINA is a thin virtual layer that enables developers to write and compile code using ordinary Linux tools, then package that code into a single executable that runs on a variety of operating systems. LINA runs invisibly in the background, enabling the user to install and run LINAfied Linux applications as if they were native to that user's operating system. They were curious about CubicWeb and took as a challenge to package it with LINA... maybe soon on LINA's applications list.

    Two open sources projects catched my attention as potential semantic data publishers. The first one is Family search where they provide a tool to search for family history and genealogy. Also they are working to define a standard format to exchange citation with Open Library. Democracy Lab provide an application to collect votes and build geographic statitics based on political interests. They will at some point publish data semantically so that their application data could be consumed.

    It also was for us the occasion of introducing CubicWeb to the BayPIGgies folks. The same presentation as the one held at Europython 2009. I'd like to take the opportunity to answer a question I did not manage to answer at that time. The question was: how different is CubicWeb from Freebase Parallax in terms of interface and views filters? Before answering this question let's detail what Freebase Parallax is.

    Freebase Parallax provides a new way to browse and explore data in Freebase. It allows to browse data from a set of data to a related set of data. This interface enables to aggregate visualization. For instance, given the set of US presidents, different types of views could be applied, such as a timeline view, where the user could set up which start and end date to use to draw the timeline. So generic views (which applies to any data) are customizable by the user.

    http://res.freebase.com/s/f64a2f0cc4534b2b17140fd169cee825a7ed7ddcefe0bf81570301c72a83c0a8/resources/images/freebase-logo.png

    The search powered by Parallax is very similar to CubicWeb faceted search, except that Parallax provides the user with a list of suggested filters to add in addition to the default one, the user can even remove a filter. That is something we could think about for CubicWeb: provide a generated faceted search so that the user could decide which filters to choose.

    Parallax also provides related topics to the current data set which ease navigation between sets of data. The main difference I could see with the view filter offered by Parallax and CubicWeb is that Parallax provides the same views to any type of data whereas CubicWeb has specific views depending on the data type and generic views that applies to any type of data. This is a nice Web interface to browse data and it could be a good source of inspiration for CubicWeb.

    http://www.zgeek.com/forum/gallery/files/6/3/2/img_228_96x96.jpg

    During this talk, I mentionned that CubicWeb now understands SPARQL queries thanks to the fyzz parser.


  • Quizz WolframAlpha

    2009/07/10 by Nicolas Chauvat
    http://www.logilab.org/image/9609?vid=download

    Wolfram Alpha is a web front-end to huge database of information covering very different topics ranging from mathematical functions to genetics, geography, astronomy, etc.

    When you search for a word, it will try to match it with one of the objects it as in its database and display all the information it has concerning that object. For example it can tell you a lot about the Halley Comet, including where it is at the moment you ask the query. This is the main difference with, say Wikipedia, that will know a lot about that comet in general, but is not meant to compute its location in the sky at the moment you enter your query.

    Searches are not limited to words. One can key in commands like weather in Paris in june 2009 or x^2+sin(x) and get results for those precise queries. The processing of the input query is far from bad, since it returns results to questions like what are the cities of France, but I would not call it state of the art natural language processing since that query returns the largest cities instead of just the cities it knows about and the question what are the smallest cities of France will not return any result. Natural language processing is a very difficult problem, though, especially when done in the open world as it is the case there with a engine available to the wide public on the internet.

    For more examples, visit the WolframAlpha website, where you will also be able to post feature requests or, if you are a developer, get documentation about the WolframAlpha API and maybe use it as a web service in your application when you need to answer certain types of questions.


show 315 results