Blog entries

  • Présentation de Gendb à PyconFR

    2008/05/27 by Andre Espaze

    Ma présentation de 30 minutes est disponible à l'adresse suivante :

    http://fr.pycon.org/presentations_2008/andre-espaze-recherche-de-gene.pdf
    http://www.logilab.org/image/5136?vid=download

    La journée s'est bien passée, j'ai eu quelques questions. Un chercheur en génétique a demandé si le projet serait continué sur la recherche des peptides (car ce sont les gènes qui codent les peptides d'après ce que j'ai compris). J'ai transféré cette demande à Eric Eveno, je n'ai pas de réponse pour l'instant. Normalement, la vidéo devrait être disponible sur :

    http://fr.pycon.org/

    Vous pouvez me contactez si vous avez des questions : andre.espaze@logilab.fr


  • Flying to Google I/O

    2008/05/27 by Arthur Lutz
    http://code.google.com/images/io_logo_sm.gif http://code.google.com/appengine/images/appengine_lowres.jpg

    Three of us from Logilab are going to San Francisco to listen, share and discuss at Google I/O.

    It's a two day developer gathering in San Francisco, with various talks about google technologies : http://code.google.com/events/io/

    We're hoping to show and talk about LAX (http://lax.logilab.org) which uses Google AppEngine


  • We're going to Europython'08

    2008/07/02 by Arthur Lutz
    http://europython.org/euro/img/europython.png

    Hey,

    We've decided to go to Europython this year. We're obviously going to give a talk about the exciting things we're doing with LAX and GoogleAppEngine. We're on wednesday at midday in the alfa room, check out the schedule here. Since we think it's important that these events take place, we're also chipping in and sponsoring the event.

    We hope to see you there. Drop us a note if you want to meet up.


  • Venez nous rendre visite à Solution Linux 2009

    2009/03/31 by Arthur Lutz
    http://www.solutionslinux.fr/images/index_07.jpg

    Nous sommes dès ce matin, pendant 3 jours, présents au salon Solutions Linux 2009 au stand du pôle de compétition System@tic dont nous faisons parti. C'est le stand B4/B8, assez prêt de l'entrée sur la gauche (détails).

    Nous allons présenter CubicWeb à plusieurs reprises sur le stand, ainsi que lors des conférences sur le Web2 ce mardi 31 mars de 14h à 17h30 :

    • Adrien présentera "Simile Widgets, des composants de haut niveau pour IHM web"
    • Sylvain présentera "Cubic 3.0 - une plateforme pour les applications web sémantique"

    pour plus de détails consultez le programme.


  • Semantic web technology conference 2009

    2009/06/17 by Sandrine Ribeau
    The semantic web technology conference is taking place every year in San Jose, California. It is meant to be the world's symposium on the business of semantic technologies. Essentially here we discuss about semantic search, how to improve access to the data and how we make sense of structured, but mainly unstructured content. Some exhibitors were more NLP oriented, concepts extraction (such as SemanticV), others were more focused on providing a scalable storage (essentially RDF storage). Most of the solutions includes a data aggregator/unifier in order to combine multi-sources data into a single storage from which ontologies could be defined. Then on top of that is the enhanced search engine. They concentrate on internal data within the enterprise and not that much about using the Web as a resource. For those who built a web application on top of the data, they choosed Flex as their framework (Metatomix).
    From all the exhibitors, the ones that kept my attention were The Anzo suite (open source project), ORDI and Allegrograph RDF store.
    Developped by Cambridge Semantics, in Java, Anzo suite, especially, Anzo on the web and Anzo collaboration server, is the closest tools to CubicWeb, providing a multi source data server and an AJAX/HTML interface to develop semantic web applications, customize views of the data using a templating language. It is available in open source. The feature that I think was interesting is an assistant to load data into their application that then helps the users define the data model based on that data. The internal representation of the content is totally transparent to the user, types are inferred by the application, as well as relations.
    RDF Resource Description Framework IconI did not get a demo of ORDI, but it was just mentionned to me as an open source equivalent to CubicWeb, which I am not too sure about after looking at their web site. It does data integration into RDF.
    Allegrograph RDF store is a potential candidate for another source type in CubicWeb . It is already supported by Jena and Sesame framework. They developped a Python client API to attract pythonist in the Java world.
    They all agreed on one thing : the use of SPARQL should be the standard query language. I quickly heard about Knowledge Interface Format (KIF) which seems to be an interesting representation of knowledge used for multi-lingual applications. If there was one buzz word to recall from the conference, I would choose ontology :)

  • EuroPython 2009

    2009/07/06 by Nicolas Chauvat
    http://www.logilab.org/file/9580/raw/europython_logo.png

    Once again Logilab sponsored the EuroPython conference. We would like to thank the organization team (especially John Pinner and Laura Creighton) for their hard work. The Conservatoire is a very central location in Birmingham and walking around the city center and along the canals was nice. The website was helpful when preparing the trip and made it easy to find places where to eat and stay. The conference program was full of talks about interesting topics.

    I presented CubicWeb and spent a large part of my talk explaining what is the semantic web and what features we need in the tools we will use to be part of that web of data. I insisted on the fact that CubicWeb is made of two parts, the web engine and the data repository, and that the repository can be used without the web engine. I demonstrated this with a TurboGears application that used the CubicWeb repository as its persistence layer. RQL in TurboGears! See my slides and Reinout Van Rees' write-up.

    Christian Tismer took over the development of Psyco a few months ago. He said he recently removed some bugs that were show stoppers, including one that was generating way too many recompilations. His new version looks very promising. Performance improved, long numbers are supported, 64bit support may become possible, generators work... and Stackless is about to be rebuilt on top of Psyco! Psyco 2.0 should be out today.

    I had a nice chat with Cosmin Basca about the Semantic Web. He suggested using Mako as a templating language for CubicWeb. Cosmin is doing his PhD at DERI and develops SurfRDF which is an Object-RDF mapper that wraps a SPARQL endpoint to provide "discoverable" objects. See his slides and Reinout Van Rees' summary of his talk.

    I saw a lightning talk about the Nagare framework which refuses to use templating languages, for the same reason we do not use them in CubicWeb. Is their h.something the right way of doing things? The example reminds me of the C++ concatenation operator. I am not really convinced with the continuation idea since I have been for years a happy user of the reactor model that's implemented in frameworks liked Twisted. Read the blog and documentation for more information.

    I had a chat with Jasper Op de Coul about Infrae's OAI Server and the work he did to manage RDF data in Subversion and a relational database before publishing it within a web app based on YUI. We commented code that handles books and library catalogs. Part of my CubicWeb demo was about books in DBpedia and cubicweb-book. He gave me a nice link to the WorldCat API.

    Souheil Chelfouh showed me his work on Dolmen and Menhir. For several design problems and framework architecture issues, we compared the solutions offered by the Zope Toolkit library with the ones found by CubicWeb. I will have to read more about Martian and Grok to make sure I understand the details of that component architecture.

    I had a chat with Martijn Faassen about packaging Python modules. A one sentence summary would be that the Python community should agree on a meta-data format that describes packages and their dependencies, then let everyone use the tool he likes most to manage the installation and removal of software on his system. I hope the work done during the last PyConUS and led by Tarek Ziadé arrived at the same conclusion. Read David Cournapeau's blog entry about Python Packaging for a detailed explanation of why the meta-data format is the way to go. By the way, Martijn is the lead developer of Grok and Martian.

    Godefroid Chapelle and I talked a lot about Zope Toolkit (ZTK) and CubicWeb. We compared the way the two frameworks deal with pluggable components. ZTK has adapters and a registry. CubicWeb does not use adapters as ZTK does, but has a view selection mechanism that required a registry with more features than the one used in ZTK. The ZTK registry only has to match a tuple (Interface, Class) when looking for an adapter, whereas CubicWeb's registry has to find the views that can be applied to a result set by checking various properties:

    • interfaces: all items of first column implement the Calendar Interface,
    • dimensions: more than one line, more than two columns,
    • types: items of first column are numbers or dates,
    • form: form contains key XYZ that has a value lower than 10,
    • session: user is authenticated,
    • etc.

    As for Grok and Martian, I will have to look into the details to make sure nothing evil is hinding there. I should also find time to compare zope.schema and yams and write about it on this blog.

    And if you want more information about the conference:


  • Logilab at OSCON 2009

    2009/07/27 by Sandrine Ribeau
    http://assets.en.oreilly.com/1/event/27/oscon2009_oscon_11_years.gif

    OSCON, Open Source CONvention, takes place every year and promotes Open Source for technology. It is one of the meeting hubs for the growing open source community. This was the occasion for us to learn about new projects and to present CubicWeb during a BAYPIGgies meeting hosted by OSCON.

    http://www.openlina.com/templates/rhuk_milkyway/images/header_red_left.png

    I had the chance to talk with some of the folks working at OpenLina where they presented LINA. LINA is a thin virtual layer that enables developers to write and compile code using ordinary Linux tools, then package that code into a single executable that runs on a variety of operating systems. LINA runs invisibly in the background, enabling the user to install and run LINAfied Linux applications as if they were native to that user's operating system. They were curious about CubicWeb and took as a challenge to package it with LINA... maybe soon on LINA's applications list.

    Two open sources projects catched my attention as potential semantic data publishers. The first one is Family search where they provide a tool to search for family history and genealogy. Also they are working to define a standard format to exchange citation with Open Library. Democracy Lab provide an application to collect votes and build geographic statitics based on political interests. They will at some point publish data semantically so that their application data could be consumed.

    It also was for us the occasion of introducing CubicWeb to the BayPIGgies folks. The same presentation as the one held at Europython 2009. I'd like to take the opportunity to answer a question I did not manage to answer at that time. The question was: how different is CubicWeb from Freebase Parallax in terms of interface and views filters? Before answering this question let's detail what Freebase Parallax is.

    Freebase Parallax provides a new way to browse and explore data in Freebase. It allows to browse data from a set of data to a related set of data. This interface enables to aggregate visualization. For instance, given the set of US presidents, different types of views could be applied, such as a timeline view, where the user could set up which start and end date to use to draw the timeline. So generic views (which applies to any data) are customizable by the user.

    http://res.freebase.com/s/f64a2f0cc4534b2b17140fd169cee825a7ed7ddcefe0bf81570301c72a83c0a8/resources/images/freebase-logo.png

    The search powered by Parallax is very similar to CubicWeb faceted search, except that Parallax provides the user with a list of suggested filters to add in addition to the default one, the user can even remove a filter. That is something we could think about for CubicWeb: provide a generated faceted search so that the user could decide which filters to choose.

    Parallax also provides related topics to the current data set which ease navigation between sets of data. The main difference I could see with the view filter offered by Parallax and CubicWeb is that Parallax provides the same views to any type of data whereas CubicWeb has specific views depending on the data type and generic views that applies to any type of data. This is a nice Web interface to browse data and it could be a good source of inspiration for CubicWeb.

    http://www.zgeek.com/forum/gallery/files/6/3/2/img_228_96x96.jpg

    During this talk, I mentionned that CubicWeb now understands SPARQL queries thanks to the fyzz parser.


  • EuroSciPy'09 (part 1/2): The Need For Speed

    2009/07/29 by Nicolas Chauvat
    http://www.logilab.org/image/9852?vid=download

    The EuroSciPy2009 conference was held in Leipzig at the end of July and was sponsored by Logilab and other companies. It started with three talks about speed.

    Starving CPUs

    In his keynote, Fransesc Alted talked about starving CPUs. Thirty years back, memory and CPU frequencies where about the same. Memory speed kept up for about ten years with the evolution of CPU speed before falling behind. Nowadays, memory is about a hundred times slower than the cache which is itself about twenty times slower than the CPU. The direct consequence is that CPUs are starving and spend many clock cycles waiting for data to process.

    In order to improve the performance of programs, it is now required to know about the multiple layers of computer memory, from disk storage to CPU. The common architecture will soon count six levels: mechanical disk, solid state disk, ram, cache level 3, cache level 2, cache level 1.

    Using optimized array operations, taking striding into account, processing data blocks of the right size and using compression to diminish the amount of data that is transfered from one layer to the next are four techniques that go a long way on the road to high performance. Compression algorithms like Blosc increase throughput for they strike the right balance between being fast and providing good compression ratios. Blosc compression will soon be available in PyTables.

    Fransesc also mentions the numexpr extension to numpy, and its combination with PyTables named tables.Expr, that nicely and easily accelerates the computation of some expressions involving numpy arrays. In his list of references, Fransesc cites Ulrich Drepper article What every programmer should know about memory.

    Using PyPy's JIT for science

    Maciej Fijalkowski started his talk with a general presentation of the PyPy framework. One uses PyPy to describe an interpreter in RPython, then generate the actual interpreter code and its JIT.

    Since PyPy is has become more of a framework to write interpreters than a reimplementation of Python in Python, I suggested to change its misleading name to something like gcgc the Generic Compiler for Generating Compilers. Maciej answered that there are discussions on the mailing list to split the project in two and make the implementation of the Python interpreter distinct from the GcGc framework.

    Maciej then focused his talk on his recent effort to rewrite in RPython the part of numpy that exposes the underlying C library to Python. He says the benefits of using PyPy's JIT to speedup that wrapping layer are already visible. He has details on the PyPy blog. Gaël Varoquaux added that David Cournapeau has started working on making the C/Python split in numpy cleaner, which would further ease the job of rewriting it in RPython.

    CrossTwine Linker

    Damien Diederen talked about his work on CrossTwine Linker and compared it with the many projects that are actively attacking the problem of speed that dynamic and interpreted languages have been dragging along for years. Parrot tries to be the über virtual machine. Psyco offers very nice acceleration, but currently only on 32bits system. PyPy might be what he calls the Right Approach, but still needs a lot of work. Jython and IronPython modify the language a bit but benefit from the qualities of the JVM or the CLR. Unladen Swallow is probably the one that's most similar to CrossTwine.

    CrossTwine considers CPython as a library and uses a set of C++ classes to generate efficient interpreters that make calls to CPython's internals. CrossTwine is a tool that helps improving performance by hand-replacing some code paths with very efficient code that does the same operations but bypasses the interpreter and its overhead. An interpreter built with CrossTwine can be viewed as a JIT'ed branch of the official Python interpreter that should be feature-compatible (and bug-compatible) with CPython. Damien calls he approach "punching holes in C substrate to get more speed" and says it could probably be combined with Psyco for even better results.

    CrossTwine works on 64bit systems, but it is not (yet?) free software. It focuses on some use cases to greatly improve speed and is not to be considered a general purpose interpreter able to make any Python code faster.

    More readings

    Cython is a language that makes writing C extensions for the Python language as easy as Python itself. It replaces the older Pyrex.

    The SciPy2008 conference had at least two papers talking about speeding Python: Converting Python Functions to Dynamically Compiled C and unPython: Converting Python Numerical Programs into C.

    David Beazley gave a very interesting talk in 2009 at a Chicago Python Users group meeting about the effects of the GIL on multicore machines.

    I will continue my report on the conference with the second part titled "Applications And Open Questions".


  • EuroSciPy 2010 schedule is out !

    2010/06/06 by Nicolas Chauvat
    https://www.euroscipy.org/data/logo.png

    The EuroSciPy 2010 conference will be held in Paris from july 8th to 11th at Ecole Normale Supérieure. Two days of tutorials, two days of conference, two interesting keynotes, a lightning talk session, an open space for collaboration and sprinting, thirty quality talks in the schedule and already 100 delegates registered.

    If you are doing science and using Python, you want to be there!


  • MiniDebConf Paris 2010

    2010/09/09 by Arthur Lutz
    http://france.debian.net/debian-france.png

    Debian France organise le 30 et 31 octobre prochain une minidebconf à Paris. Le wiki de la conférence est en train de s'étoffer, et pour le moment c'est là qu'il faut s'inscrire. À Logilab nous sommes utilisateurs et contributeurs de Debian, c'est donc naturellement que nous allons essayer d'aller participer à cette conférence. Alexandre Fayolle, développeur Debian ira assister (entre autres) à la présentation de Carl Chenet sur l'état de Python dans Debian.


  • SemWeb.Pro - first french Semantic Web conference, Jan 17/18 2011

    2010/09/20 by Nicolas Chauvat

    SemWeb.Pro, the first french conference dedicated to the Semantic Web will take place in Paris on January 17/18 2011.

    One day of talks, one day of tutorials.

    Want to grok the Web 3.0? Be there.

    Something you want to share? Call for papers ends on October 15, 2010.

    http://www.semweb.pro/semwebpro.png

  • Retour sur paris-web 2010

    2010/10/18 by Adrien Di Mascio
    http://www.paris-web.fr/telechargements/logotype-paris-web.png

    La semaine passée avait lieu Paris-Web 2010. C'était la 5ème édition de cet événement mais je n'avais pas eu l'occasion d'y assister les années précédentes.

    Tout d'abord, félicitations aux organisateurs pour l'organisation, notamment pour les sessions du grand amphithéâtre avec la traduction simultanée pour les conférences en anglais (j'avoue ne pas avoir testé le casque audio), les interprètes en langue des signes ainsi que la vélotypie. Un petit bémol toutefois : il n'y avait pas assez de prises de courant pour que les personnes présentes puissent recharger leur ordinateur !

    Quant aux conférences elles-mêmes, pas mal de choses orientées utilisabilité, design, accessibilité et HTML5. Bien que je n'aie pas eu le sentiment d'apprendre beaucoup de choses techniquement, en particulier sur HTML5 où le contenu des présentations se recoupait trop et n'apportait de mon point de vue pas grand chose de nouveau, les orateurs ont su animer et rendre vivante leur présentation. Je retiendrai en particulier trois présentations :

    • Let’s interface - how to make people as excited about tech as we are de Christian Heilmann que je résumerai (très) rapidement en disant : réutiliser les outils et les données qui sont disponibles, ne pas repartir de zéro et rajouter une petite couche qui offre une réelle valeur ajoutée, ce n'est pas forcément très compliqué. Parmi les exemples cités : http://isithackday.com/hacks/geo/addmap.html
    • Innover de 9 à 5 par Olivier Thereaux : ou comment créer des espaces d'échange pour faire émerger de nouvelles idées puis les transformer en projets. Au final, aucune solution concrète ou miracle évidemment, et il me semble que c'était un des buts ("pas de recette de cuisine"), mais je retiens que d'après son expérience, il faut des gens avec l'esprit hacker, entendre bidouilleur. Ensuite, toutes les idées qui émergent ne sont pas bonnes, toutes les bonnes idées n'aboutissent pas et si celui qui a l'idée n'est pas directement celui qui s'occupe de la concrétiser, il y a peu de chances que ça fonctionne. Enfin, il faut ne pas avoir les yeux plus gros que le ventre et tempérer ses envies en fonction du temps (ou moyens) que l'on pourra y accorder et y aller petit pas par petit pas.
    • HTML5 et ses amis par Paul Rouget : une très belle présentation avec des démonstrations HTML5 (version Firefox4) très bien choisies : utilisation des websockets pour diffuser les slides sur ordinateurs clients, utilisation de FileReader pour la prévisualisation d'images côté client et une belle démonstration des capacités WebGL !

    Je n'ai entendu que des retours positifs sur la macrographie de la page Web à laquelle je n'ai malheureusement pas assisté personnellement mais d'autres logilabiens y étaient. J'ai également noté l'absence du web sémantique, ou alors je n'étais pas dans le bon amphi. En tout cas, tout ça m'a remotivé pour jouer avec HTML5 dans cubicweb. J'ai d'ailleurs commencé à faire une démo websocket dans cubicweb, affaire à suvire...


  • Paris Web 2010 - Le texte et le web

    2010/10/20

    J'ai eu la chance d'assister à l'ensemble des conférences données à Paris Web sur le rôle du texte et de la typographie dans le web d'aujourd'hui.

    La présentation Le texte: parent pauvre du web ? (par Jean-Marc Hardy) rappela les points les plus pertinents sur l'usage des éléments textuels par rapport à l'image.

    Outre l'exemple classique sur les outils de référencement qui ne savent aujourd'hui utiliser que le texte brut d'une page (au grand dam des "flasheurs"), D'autres résultats d'études furent donnés:

    • le taux de suivi des liens publicitaires textuels (ceux de Google par exemple sont 10 fois plus efficaces que les bannières classiques qui ont un taux de suivi de 2‰).
    • des cartes de température montrent que les titres (surtout si ceux-ci sont inférieurs à 11 caractères) restent très structurants pour la lecture et le prise d'informations à la différence des images qui restent floues pour le cerveau pendant les premiers dixièmes de secondes
    • l'usage de phrases explicatives plutôt que des infinitifs vagues dans les boutons de formulaires rassurent l'utilisateur ѕur des étapes cruciales d'enregistrement.

    Un dernier contre-exemple étonnant fut donné au sujet d'une boutique en ligne qui en voulant mettre en valeur la corbeille d'achats par une image très colorée a provoqué l'effet inverse: un sentiment de rejet des utilisateurs qui croyaient voir alors une publicité ;-)

    Jean-Marc Hardy a évoqué brièvement le rôle du texte dans l'accessibilité mais a préféré laisser cette partie à d'autres orateurs de Paris Web (l'accessibilité étant à l'honneur cette année).

    J'aurais bien aimé avoir son avis sur l'esthétique souvent utilisée pour les sites dits Web2.0 qui se rapprochent finalement assez bien de ses recommandations.

    Le deuxième jour, j'ai particulièrement apprécié les sujets autour de la typographie et le rhythme des pages...


  • Paris Web 2010 - Spécial typographie

    2010/10/20

    Suite de la première journée.

    Le lendemain, j'ai pu assister à La typographie comme outil de design (par David Rault) qui me semble être une sensibilisation indispensable à tout développeur web. Une introduction efficace et complète sur les familles de polices (classification VOX-ATypI) et les types d'effets produits sur le lecteur. Il faut voir la typographie comme l'équivalent de l'intonation à l'oral. La police apporte un autre contexte à la compréhension du texte. Pour finir, David Rault a parcouru les "web fonts" les plus connues tout en prenant soin de donner son avis d'expert ainsi que des détails historiques croustillants.

    Les organisateurs de Paris Web avaient ensuite judicieusement programmé La macrotypographie de la page Web (par Anne-Sophie Fradier). Après quelques explications historiques sur l'importance du support sur le format, plusieurs techniques de bases ont été présentées, comme par exemple l'usage des grilles pour la construction des pages. Celles-ci fixent un cadre à la créativité et permettent de respecter plus facilement des pauses visuelles pour retrouver un confort de lecture indispensable. L'interlignage doit être important (140% du corps), le fer à gauche et le drapeau à droite et un corps de texte suffisamment gros pour éviter des changements de taille de police intempestifs (qui risquent de "casser" la mise en page).

    Un des sujets intéressants mais souvent méconnu est le respect de la ligne de base dans la construction du flux vertical du texte dans un document. C'est justement sur ce principe et en se basant sur cet article que plusieurs personnes à Logilab ont commencé à implanter des "règles de rythmes" dans le framework CubicWeb lors d'un sprint en mai dernier. Dernier conseil à retenir d'une typographe, il faut donc toujours essayer de "retomber sur ses pattes" :-)

    Une question pertinente fut posée à la fin de la présentation sur la mode des "design fluides"; c'est-à-dire des mises en page calculées tout en proportion plutôt que fixées en pixels. La réponse donnée ne peut être absolue car ceci dépend essentiellement de la créativité et de l'originalité de l'auteur du site ; même si Anne-Sophie Fradier préconise quand même de garder le contrôle sur la largeur (la hauteur étant souvent imposée par le navigateur).

    Conclusion

    L'usage de WOFF, les nouveautés apportées par CSS3 et les effets rendus possibles par javascript vont permettre de créer un nouvel univers au texte et à sa mise en forme. Nous pouvons espérer que le confort de lecture et la lisibilité des textes vont devenir de véritables critères de qualité. Il me paraît aujourd'hui évident à l'issu de ces présentations que la typographie va petit à petit s'imposer comme une nouvelle compétence du web designer de demain.