Blog entries

  • LMGC90 Sprint at Logilab in March 2013

    2013/03/28 by Vladimir Popescu

    LMGC90 Sprint at Logilab

    At the end of March 2013, Logilab hosted a sprint on the LMGC90 simulation code in Paris.

    LMGC90 is an open-source software developed at the LMGC ("Laboratoire de Mécanique et Génie Civil" -- "Mechanics and Civil Engineering Laboratory") of the CNRS, in Montpellier, France. LMGC90 is devoted to contact mechanics and is, thus, able to model large collections of deformable or undeformable physical objects of various shapes, with numerous interaction laws. LMGC90 also allows for multiphysics coupling.

    Sprint Participants

    https://www.logilab.org/file/143585/raw/logo_LMGC.jpg https://www.logilab.org/file/143749/raw/logo_SNCF.jpg https://www.logilab.org/file/143750/raw/logo_LaMSID.jpg https://www.logilab.org/file/143751/raw/logo_LOGILAB.jpg

    More than ten hackers joined in from:

    • the LMGC, which leads LMCG90 development and aims at constantly improving its architecture and usability;
    • the Innovation and Research Department of the SNCF (the French state-owned railway company), which uses LMGC90 to study railway mechanics, and more specifically, the ballast;
    • the LaMSID ("Laboratoire de Mécanique des Structures Industrielles Durables", "Laboratory for the Mechanics of Ageing Industrial Structures") laboratory of the EDF / CNRS / CEA , which has an strong expertise on Code_ASTER and LMGC90;
    • Logilab, as the developer, for the SNCF, of a CubicWeb-based platform dedicated to the simulation data and knowledge management.

    After a great introduction to LMGC90 by Frédéric Dubois and some preliminary discussions, teams were quickly constituted around the common areas of interest.

    Enhancing LMGC90's Python API to build core objects

    As of the sprint date, LMGC90 is mainly developed in Fortran, but also contains Python code for two purposes:

    • Exposing the Fortran functions and subroutines in the LMGC90 core to Python; this is achieved using Fortran 2003's ISO_C_BINDING module and Swig. These Python bindings are grouped in a module called ChiPy.
    • Making it easy to generate input data (so called "DATBOX" files) using Python. This is done through a module called Pre_LMGC.

    The main drawback of this approach is the double modelling of data that this architecture implies: once in the core and once in Pre_LMGC.

    It was decided to build a unique user-level Python layer on top of ChiPy, that would be able to build the computational problem description and write the DATBOX input files (currently achieved by using Pre_LMGC), as well as to drive the simulation and read the OUTBOX result files (currently by using direct ChiPy calls).

    This task has been met with success, since, in the short time span available (half a day, basically), the team managed to build some object types using ChiPy calls and save them into a DATBOX.

    Using the Python API to feed a computation data store

    This topic involved importing LMGC90 DATBOX data into the numerical platform developed by Logilab for the SNCF.

    This was achieved using ChiPy as a Python API to the Fortran core to get:

    • the bodies involved in the computation, along with their materials, behaviour laws (with their associated parameters), geometries (expressed in terms of zones);
    • the interactions between these bodies, along with their interaction laws (and associated parameters, e.g. friction coefficient) and body pair (each interaction is defined between two bodies);
    • the interaction groups, which contain interactions that have the same interaction law.

    There is still a lot of work to be done (notably regarding the charges applied to the bodies), but this is already a great achievement. This could only have occured in a sprint, were every needed expertise is available:

    • the SNCF experts were there to clarify the import needs and check the overall direction;

    • Logilab implemented a data model based on CubicWeb, and imported the data using the ChiPy bindings developed on-demand by the LMGC core developer team, using the usual-for-them ISO_C_BINDING/ Swig Fortran wrapping dance.

      https://www.logilab.org/file/143753/raw/logo_CubicWeb.jpg
    • Logilab undertook the data import; to this end, it asked the LMGC how the relevant information from LMGC90 can be exposed to Python via the ChiPy API.

    Using HDF5 as a data storage backend for LMGC90

    The main point of this topic was to replace the in-house DATBOX/OUTBOX textual format used by LMGC90 to store input and output data, with an open, standard and efficient format.

    Several formats have been considered, like HDF5, MED and NetCDF4.

    MED has been ruled out for the moment, because it lacks the support for storing body contact information. HDF5 was chosen at last because of the quality of its Python libraries, h5py and pytables, and the ease of use tools like h5fs provide.

    https://www.logilab.org/file/143754/raw/logo_HDF.jpg

    Alain Leufroy from Logilab quickly presented h5py and h5fs usage, and the team started its work, measuring the performance impact of the storage pattern of LMGC90 data. This was quickly achieved, as the LMGC experts made it easy to setup tests of various sizes, and as the Logilab developers managed to understand the concepts and implement the required code in a fast and agile way.

    Debian / Ubuntu Packaging of LMGC90

    This topic turned out to be more difficult than initially assessed, mainly because LMGC90 has dependencies to non-packaged external libraries, which thus had to be packaged first:

    • the Matlib linear algebra library, written in C,
    • the Lapack95 library, which is a Fortran95 interface to the Lapack library.

    Logilab kept working on this after the sprint and produced packages that are currently being tested by the LMGC team. Some changes are expected (for instance, Python modules should be prefixed with a proper namespace) before the packages can be submitted for inclusion into Debian. The expertise of Logilab regarding Debian packaging was of great help for this task. This will hopefully help to spread the use of LMGC90.

    https://www.logilab.org/file/143755/raw/logo_Debian.jpg

    Distributed Version Control System for LMGC90

    As you may know, Logilab is really fond of Mercurial as a DVCS. Our company invested a lot into the development of the great evolve extension, which makes Mercurial a very powerful tool to efficiently manage the team development of software in a clean fashion.

    This is why Logilab presented Mercurial's features and advantages over the current VCS used to manage LMGC90 sources, namely svn, to the other participants of the Sprint. This was appreciated and will hopefully benefit to LMGC90 ease of development and spread among the Open Source community.

    https://www.logilab.org/file/143756/raw/logo_HG.jpg

    Conclusions

    All in all, this two-day sprint on LMGC90, involving participants from several industrial and academic institutions has been a great success. A lot of code has been written but, more importantly, several stepping stones have been laid, such as:

    • the general LMGC90 data access architecture, with the Python layer on top of the LMGC90 core;
    • the data storage format, namely HDF5.

    Colaterally somehow, several other results have also been achieved:

    • partial LMGC90 data import into the SNCF CubicWeb-based numerical platform,
    • Debian / Ubuntu packaging of LMGC90 and dependencies.

    On a final note, one would say that we greatly appreciated the cooperation between the participants, which we found pleasant and efficient. We look forward to finding more occasions to work together.


  • Introduction to thesauri and SKOS

    2016/06/27 by Yann Voté

    Recently, I've faced the problem to import the European Union thesaurus, Eurovoc, into cubicweb using the SKOS cube. Eurovoc doesn't follow the SKOS data model and I'll show here how I managed to adapt Eurovoc to fit in SKOS.

    This article is in two parts:

    • this is the first part where I introduce what a thesaurus is and what SKOS is,
    • the second part will show how to convert Eurovoc to plain SKOS.

    The whole text assumes familiarity with RDF, as describing RDF would require more than a blog entry and is out of scope.

    What is a thesaurus ?

    A common need in our digital lives is to attach keywords to documents, web pages, pictures, and so on, so that search is easier. For example, you may want to add two keywords:

    • lily,
    • lilium

    in a picture's metadata about this flower. If you have a large collection of flower pictures, this will make your life easier when you want to search for a particular species later on.

    free-text keywords on a picture

    In this example, keywords are free: you can choose whatever keyword you want, very general or very specific. For example you may just use the keyword:

    • flower

    if you don't care about species. You are also free to use lowercase or uppercase letters, and to make typos...

    free-text keyword on a picture

    On the other side, sometimes you have to select keywords from a list. Such a constrained list is called a controlled vocabulary. For instance, a very simple controlled vocabulary with only two keywords is the one about a person's gender:

    • male (or man),
    • female (or woman).
    a simple controlled vocabulary

    But there are more complex examples: think about how a library organizes books by themes: there are very general themes (eg. Science), then more and more specific ones (eg. Computer science -> Software -> Operating systems). There may also be synonyms (eg. Computing for Computer science) or referrals (eg. there may be a "see also" link between keywords Algebra and Geometry). Such a controlled vocabulary where keywords are organized in a tree structure, and with relations like synonym and referral, is called a thesaurus.

    an example thesaurus with a tree of keywords

    For the sake of simplicity, in the following we will call thesaurus any controlled vocabulary, even a simple one with two keywords like male/female.

    SKOS

    SKOS, from the World Wide Web Consortium (W3C), is an ontology for the semantic web describing thesauri. To make it simple, it is a common data model for thesauri that can be used on the web. If you have a thesaurus and publish it on the web using SKOS, then anyone can understand how your thesaurus is organized.

    SKOS is very versatile. You can use it to produce very simple thesauri (like male/female) and very complex ones, with a tree of keywords, even in multiple languages.

    To cope with this complexity, SKOS data model splits each keyword into two entities: a concept and its labels. For example, the concept of a male person have multiple labels: male and man in English, homme and masculin in French. The concept of a lily flower also has multiple labels: lily in English, lilium in Latin, lys in French.

    Among all labels for a given concept, some can be preferred, while others are alternative. There may be only one preferred label per language. In the person's gender example, man may be the preferred label in English and male an alternative one, while in French homme would be the preferred label and masculin and alternative one. In the flower example, lily (resp. lys) is the preferred label in English (resp. French), and lilium is an alternative label in Latin (no preferred label in Latin).

    SKOS concepts and labels

    And of course, in SKOS, it is possible to say that a concept is broader than another one (just like topic Science is broader than topic Computer science).

    So to summarize, in SKOS, a thesaurus is a tree of concepts, and each concept have one or more labels, preferred or alternative. A thesaurus is also called a concept scheme in SKOS.

    Also, please note that SKOS data model is slightly more complicated than what we've shown here, but this will be sufficient for our purpose.

    RDF URIs defined by SKOS

    In order to publish a thesaurus in RDF using SKOS ontology, SKOS introduces the "skos:" namespace associated to the following URI: http://www.w3.org/2004/02/skos/core#.

    Within that namespace, SKOS defines some classes and predicates corresponding to what has been described above. For example:

    • the triple (<uri>, rdf:type, skos:ConceptScheme) says that <uri> belongs to class skos:ConceptScheme (that is, is a concept scheme),
    • the triple (<uri>, rdf:type, skos:Concept) says that <uri> belongs to class skos:Concept (that is, is a concept),
    • the triple (<uri>, skos:prefLabel, <literal>) says that <literal> is a preferred label for concept <uri>,
    • the triple (<uri>, skos:altLabel, <literal>) says that <literal> is an alternative label for concept <uri>,
    • the triple (<uri1>, skos:broader, <uri2>) says that concept <uri2> is a broder concept of <uri1>.