subscribe to this blog

Logilab.org - en

News from Logilab and our Free Software projects, as well as on topics dear to our hearts (Python, Debian, Linux, the semantic web, scientific computing...)

show 194 results
  • Introduction to thesauri and SKOS

    2016/06/27 by Yann Voté

    Recently, I've faced the problem to import the European Union thesaurus, Eurovoc, into cubicweb using the SKOS cube. Eurovoc doesn't follow the SKOS data model and I'll show here how I managed to adapt Eurovoc to fit in SKOS.

    This article is in two parts:

    • this is the first part where I introduce what a thesaurus is and what SKOS is,
    • the second part will show how to convert Eurovoc to plain SKOS.

    The whole text assumes familiarity with RDF, as describing RDF would require more than a blog entry and is out of scope.

    What is a thesaurus ?

    A common need in our digital lives is to attach keywords to documents, web pages, pictures, and so on, so that search is easier. For example, you may want to add two keywords:

    • lily,
    • lilium

    in a picture's metadata about this flower. If you have a large collection of flower pictures, this will make your life easier when you want to search for a particular species later on.

    free-text keywords on a picture

    In this example, keywords are free: you can choose whatever keyword you want, very general or very specific. For example you may just use the keyword:

    • flower

    if you don't care about species. You are also free to use lowercase or uppercase letters, and to make typos...

    free-text keyword on a picture

    On the other side, sometimes you have to select keywords from a list. Such a constrained list is called a controlled vocabulary. For instance, a very simple controlled vocabulary with only two keywords is the one about a person's gender:

    • male (or man),
    • female (or woman).
    a simple controlled vocabulary

    But there are more complex examples: think about how a library organizes books by themes: there are very general themes (eg. Science), then more and more specific ones (eg. Computer science -> Software -> Operating systems). There may also be synonyms (eg. Computing for Computer science) or referrals (eg. there may be a "see also" link between keywords Algebra and Geometry). Such a controlled vocabulary where keywords are organized in a tree structure, and with relations like synonym and referral, is called a thesaurus.

    an example thesaurus with a tree of keywords

    For the sake of simplicity, in the following we will call thesaurus any controlled vocabulary, even a simple one with two keywords like male/female.

    SKOS

    SKOS, from the World Wide Web Consortium (W3C), is an ontology for the semantic web describing thesauri. To make it simple, it is a common data model for thesauri that can be used on the web. If you have a thesaurus and publish it on the web using SKOS, then anyone can understand how your thesaurus is organized.

    SKOS is very versatile. You can use it to produce very simple thesauri (like male/female) and very complex ones, with a tree of keywords, even in multiple languages.

    To cope with this complexity, SKOS data model splits each keyword into two entities: a concept and its labels. For example, the concept of a male person have multiple labels: male and man in English, homme and masculin in French. The concept of a lily flower also has multiple labels: lily in English, lilium in Latin, lys in French.

    Among all labels for a given concept, some can be preferred, while others are alternative. There may be only one preferred label per language. In the person's gender example, man may be the preferred label in English and male an alternative one, while in French homme would be the preferred label and masculin and alternative one. In the flower example, lily (resp. lys) is the preferred label in English (resp. French), and lilium is an alternative label in Latin (no preferred label in Latin).

    SKOS concepts and labels

    And of course, in SKOS, it is possible to say that a concept is broader than another one (just like topic Science is broader than topic Computer science).

    So to summarize, in SKOS, a thesaurus is a tree of concepts, and each concept have one or more labels, preferred or alternative. A thesaurus is also called a concept scheme in SKOS.

    Also, please note that SKOS data model is slightly more complicated than what we've shown here, but this will be sufficient for our purpose.

    RDF URIs defined by SKOS

    In order to publish a thesaurus in RDF using SKOS ontology, SKOS introduces the "skos:" namespace associated to the following URI: http://www.w3.org/2004/02/skos/core#.

    Within that namespace, SKOS defines some classes and predicates corresponding to what has been described above. For example:

    • the triple (<uri>, rdf:type, skos:ConceptScheme) says that <uri> belongs to class skos:ConceptScheme (that is, is a concept scheme),
    • the triple (<uri>, rdf:type, skos:Concept) says that <uri> belongs to class skos:Concept (that is, is a concept),
    • the triple (<uri>, skos:prefLabel, <literal>) says that <literal> is a preferred label for concept <uri>,
    • the triple (<uri>, skos:altLabel, <literal>) says that <literal> is an alternative label for concept <uri>,
    • the triple (<uri1>, skos:broader, <uri2>) says that concept <uri2> is a broder concept of <uri1>.

  • One way to convert Eurovoc into plain SKOS

    2016/06/27 by Yann Voté

    This is the second part of an article where I show how to import the Eurovoc thesaurus from the European Union into an application using a plain SKOS data model. I've recently faced the problem of importing Eurovoc into CubicWeb using the SKOS cube, and the solution I've chose is discussed here.

    The first part was an introduction to thesauri and SKOS.

    The whole article assumes familiarity with RDF, as describing RDF would require more than a blog entry and is out of scope.

    Difficulties with Eurovoc and SKOS

    Eurovoc

    Eurovoc is the main thesaurus covering European Union business domains. It is published and maintained by the EU commission. It is quite complex and big, structured as a tree of keywords.

    You can see Eurovoc keywords and browse the tree from the Eurovoc homepage using the link Browse the subject-oriented version.

    For example, when publishing statistics about education in the EU, you can tag the published data with the broadest keyword Education and communications. Or you can be more precise and use the following narrower keywords, in increasing order of preference: Education, Education policy, Education statistics.

    Problem: hierarchy of thesauri

    The EU commission uses SKOS to publish its Eurovoc thesaurus, so it should be straightforward to import Eurovoc into our own application. But things are not that simple...

    For some reasons, Eurovoc uses a hierarchy of concept schemes. For example, Education and communications is a sub-concept scheme of Eurovoc (it is called a domain), and Education is a sub-concept scheme of Education and communications (it is called a micro-thesaurus). Education policy is (a label of) the first concept in this hierarchy.

    But with SKOS this is not possible: a concept scheme cannot be contained into another concept scheme.

    Possible solutions

    So to import Eurovoc into our SKOS application, and not loose data, one solution is to turn sub-concept schemes into concepts. We have two strategies:

    • keep only one concept scheme (Eurovoc) and turn domains and micro-thesauri into concepts,
    • keep domains as concept schemes, drop Eurovoc concept scheme, and only turn micro-thesauri into concepts.

    Here we will discuss the latter solution.

    Lets get to work

    Eurovoc thesaurus can be downloaded at the following URL: http://publications.europa.eu/mdr/resource/thesaurus/eurovoc/skos/eurovoc_skos.zip

    The ZIP archive contains only one XML file named eurovoc_skos.rdf. Put it somewhere where you can find it easily.

    To read this file easily, we will use the RDFLib Python library. This library makes it really convenient to work with RDF data. It has only one drawback: it is very slow. Reading the whole Eurovoc thesaurus with it takes a very long time. Make the process faster is the first thing to consider for later improvements.

    Reading the Eurovoc thesaurus is as simple as creating an empty RDF Graph and parsing the file. As said above, this takes a long long time (from half an hour to two hours).

    import rdflib
    
    eurovoc_graph = rdflib.Graph()
    eurovoc_graph.parse('<path/to/eurovoc_skos.rdf>', format='xml')
    
    <Graph identifier=N52834ca3766d4e71b5e08d50788c5a13 (<class 'rdflib.graph.Graph'>)>
    

    We can see that Eurovoc contains more than 2 million triples.

    len(eurovoc_graph)
    
    2828910
    

    Now, before actually converting Eurovoc to plain SKOS, lets introduce some helper functions:

    • the first one, uriref(), will allow us to build RDFLib URIRef objects from simple prefixed URIs like skos:prefLabel or dcterms:title,
    • the second one, capitalized_eurovoc_domains(), is used to convert Eurovoc domain names, all uppercase (eg. 32 EDUCATION ET COMMUNICATION) to a string where only first letter is uppercase (eg. 32 Education and communication)
    import re
    
    from rdflib import Literal, Namespace, RDF, URIRef
    from rdflib.namespace import DCTERMS, SKOS
    
    eu_ns = Namespace('http://eurovoc.europa.eu/schema#')
    thes_ns = Namespace('http://purl.org/iso25964/skos-thes#')
    
    prefixes = {
        'dcterms': DCTERMS,
        'skos': SKOS,
        'eu': eu_ns,
        'thes': thes_ns,
    }
    
    def uriref(prefixed_uri):
        prefix, value = prefixed_uri.split(':', 1)
        ns = prefixes[prefix]
        return ns[value]
    
    def capitalized_eurovoc_domain(domain):
        """Return the given Eurovoc domain name with only the first letter uppercase."""
        return re.sub(r'^(\d+\s)(.)(.+)$',
                      lambda m: u'{0}{1}{2}'.format(m.group(1), m.group(2).upper(), m.group(3).lower()),
                      domain, re.UNICODE)
    

    Now the actual work. After using variables to reference URIs, the loop will parse each triple in original graph and:

    • discard it if it contains deprecated data,
    • if triple is like (<uri>, rdf:type, eu:Domain), replace it with (<uri>, rdf:type, skos:ConceptScheme),
    • if triple is like (<uri>, rdf:type, eu:MicroThesaurus), replace it with (<uri>, rdf:type, skos:Concept) and add triple (<uri>, skos:inScheme, <domain_uri>),
    • if triple is like (<uri>, rdf:type, eu:ThesaurusConcept), replace it with (<uri>, rdf:type, skos:Concept),
    • if triple is like (<uri>, skos:topConceptOf, <microthes_uri>), replace it with (<uri>, skos:broader, <microthes_uri>),
    • if triple is like (<uri>, skos:inScheme, <microthes_uri>), replace it with (<uri>, skos:inScheme, <domain_uri>),
    • keep triples like (<uri>, skos:prefLabel, <label_uri>), (<uri>, skos:altLabel, <label_uri>), and (<uri>, skos:broader, <concept_uri>),
    • discard all other non-deprecated triples.

    Note that, to replace a micro thesaurus with a domain, we have to build a mapping between each micro thesaurus and its containing domain (microthes2domain dict).

    This loop is also quite long.

    eurovoc_ref = URIRef(u'http://eurovoc.europa.eu/100141')
    deprecated_ref = URIRef(u'http://publications.europa.eu/resource/authority/status/deprecated')
    title_ref = uriref('dcterms:title')
    status_ref = uriref('thes:status')
    class_domain_ref = uriref('eu:Domain')
    rel_domain_ref = uriref('eu:domain')
    microthes_ref = uriref('eu:MicroThesaurus')
    thesconcept_ref = uriref('eu:ThesaurusConcept')
    concept_scheme_ref = uriref('skos:ConceptScheme')
    concept_ref = uriref('skos:Concept')
    pref_label_ref = uriref('skos:prefLabel')
    alt_label_ref = uriref('skos:altLabel')
    in_scheme_ref = uriref('skos:inScheme')
    broader_ref = uriref('skos:broader')
    top_concept_ref = uriref('skos:topConceptOf')
    
    microthes2domain = dict((mt, next(eurovoc_graph.objects(mt, uriref('eu:domain'))))
                            for mt in eurovoc_graph.subjects(RDF.type, uriref('eu:MicroThesaurus')))
    
    new_graph = rdflib.ConjunctiveGraph()
    for subj_ref, pred_ref, obj_ref in eurovoc_graph:
        if deprecated_ref in list(eurovoc_graph.objects(subj_ref, status_ref)):
            continue
        # Convert eu:Domain into a skos:ConceptScheme
        if obj_ref == class_domain_ref:
            new_graph.add((subj_ref, RDF.type, concept_scheme_ref))
            for title in eurovoc_graph.objects(subj_ref, pref_label_ref):
                if title.language == u'en':
                    new_graph.add((subj_ref, title_ref,
                                   Literal(capitalized_eurovoc_domain(title))))
                    break
        # Convert eu:MicroThesaurus into a skos:Concept
        elif obj_ref == microthes_ref:
            new_graph.add((subj_ref, RDF.type, concept_ref))
            scheme_ref = next(eurovoc_graph.objects(subj_ref, rel_domain_ref))
            new_graph.add((subj_ref, in_scheme_ref, scheme_ref))
        # Convert eu:ThesaurusConcept into a skos:Concept
        elif obj_ref == thesconcept_ref:
            new_graph.add((subj_ref, RDF.type, concept_ref))
        # Replace <concept> topConceptOf <MicroThesaurus> by <concept> broader <MicroThesaurus>
        elif pred_ref == top_concept_ref:
            new_graph.add((subj_ref, broader_ref, obj_ref))
        # Replace <concept> skos:inScheme <MicroThes> by <concept> skos:inScheme <Domain>
        elif pred_ref == in_scheme_ref and obj_ref in microthes2domain:
            new_graph.add((subj_ref, in_scheme_ref, microthes2domain[obj_ref]))
        # Keep label triples
        elif (subj_ref != eurovoc_ref and obj_ref != eurovoc_ref
              and pred_ref in (pref_label_ref, alt_label_ref)):
            new_graph.add((subj_ref, pred_ref, obj_ref))
        # Keep existing skos:broader relations and existing concepts
        elif pred_ref == broader_ref or obj_ref == concept_ref:
            new_graph.add((subj_ref, pred_ref, obj_ref))
    

    We can check that we now have far less triples than before.

    len(new_graph)
    
    388582
    

    Now we dump this new graph to disk. We choose the Turtle format as it is far more readable than RDF/XML for humans, and slightly faster to parse for machines. This file will contain plain SKOS data that can be directly imported into any application able to read SKOS.

    with open('eurovoc.n3', 'w') as f:
        new_graph.serialize(f, format='n3')
    

    With CubicWeb using the SKOS cube, it is a one command step:

    cubicweb-ctl skos-import --cw-store=massive <instance_name> eurovoc.n3
    

  • Installing Debian Jessie on a "pure UEFI" system

    2016/06/13 by David Douard

    At the core of the Logilab infrastructure is a highly-available pair of small machines dedicated to our main directory and authentication services: LDAP, DNS, DHCP, Kerberos and Radius.

    The machines are small fanless boxes powered by a 1GHz Via Eden processor, 512Mb of RAM and 2Gb of storage on a CompactFlash module.

    They have served us well for many years, but now is the time for an improvement. We've bought a pair of Lanner FW-7543B that have the same form-factor. They are not fanless, but are much more powerful. They are pretty nice, but have one major drawback: their firmware does not boot on a legacy BIOS-mode device when set up in UEFI. Another hard point is that they do not have a video connector (there is a VGA output on the motherboard, but the connector is optional), so everything must be done via the serial console.

    https://www.logilab.org/file/6679313/raw/FW-7543_front.jpg

    I knew the Debian Jessie installer would provide everything that is required to handle an UEFI-based system, but it took me a few tries to get it to boot.

    First, I tried the standard netboot image, but the firmware did not want to boot from a USB stick, probably because the image requires a MBR-based bootloader.

    Then I tried to boot from the Refind bootable image and it worked! At least I had the proof this little beast could boot in UEFI. But, although it is probably possible, I could not figure out how to tweak the Refind config file to make it boot properly the Debian installer kernel and initrd.

    https://www.logilab.org/file/6679257/raw/uefi_lanner_nope.png

    Finally I gave a try to something I know much better: Grub. Here is what I did to have a working UEFI Debian installer on a USB key.

    Partitionning

    First, in the UEFI world, you need a GPT partition table with a FAT partition typed "EFI System":

    david@laptop:~$ sudo fdisk /dev/sdb
    Welcome to fdisk (util-linux 2.25.2).
    Changes will remain in memory only, until you decide to write them.
    Be careful before using the write command.
    
    Command (m for help): g
    Created a new GPT disklabel (GUID: 52FFD2F9-45D6-40A5-8E00-B35B28D6C33D).
    
    Command (m for help): n
    Partition number (1-128, default 1): 1
    First sector (2048-3915742, default 2048): 2048
    Last sector, +sectors or +size{K,M,G,T,P} (2048-3915742, default 3915742):  +100M
    
    Created a new partition 1 of type 'Linux filesystem' and of size 100 MiB.
    
    Command (m for help): t
    Selected partition 1
    Partition type (type L to list all types): 1
    Changed type of partition 'Linux filesystem' to 'EFI System'.
    
    Command (m for help): p
    Disk /dev/sdb: 1.9 GiB, 2004877312 bytes, 3915776 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: 52FFD2F9-45D6-40A5-8E00-B35B28D6C33D
    
    Device     Start    End Sectors  Size Type
    /dev/sdb1   2048 206847  204800  100M EFI System
    
    Command (m for help): w
    

    Install Grub

    Now we need to install a grub-efi bootloader in this partition:

    david@laptop:~$ pmount sdb1
    david@laptop:~$ sudo grub-install --target x86_64-efi --efi-directory /media/sdb1/ --removable --boot-directory=/media/sdb1/boot
    Installing for x86_64-efi platform.
    Installation finished. No error reported.
    

    Copy the Debian Installer

    Our next step is to copy the Debian's netboot kernel and initrd on the USB key:

    david@laptop:~$ mkdir /media/sdb1/EFI/debian
    david@laptop:~$ wget -O /media/sdb1/EFI/debian/linux http://ftp.fr.debian.org/debian/dists/jessie/main/installer-amd64/current/images/netboot/debian-installer/amd64/linux
    --2016-06-13 18:40:02--  http://ftp.fr.debian.org/debian/dists/jessie/main/installer-amd64/current  /images/netboot/debian-installer/amd64/linux
    Resolving ftp.fr.debian.org (ftp.fr.debian.org)... 212.27.32.66, 2a01:e0c:1:1598::2
    Connecting to ftp.fr.debian.org (ftp.fr.debian.org)|212.27.32.66|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 3120416 (3.0M) [text/plain]
    Saving to: ‘/media/sdb1/EFI/debian/linux’
    
    /media/sdb1/EFI/debian/linux      100%[========================================================>]   2.98M      464KB/s   in 6.6s
    
    2016-06-13 18:40:09 (459 KB/s) - ‘/media/sdb1/EFI/debian/linux’ saved [3120416/3120416]
    
    david@laptop:~$ wget -O /media/sdb1/EFI/debian/initrd.gz http://ftp.fr.debian.org/debian/dists/jessie/main/installer-amd64/current/images/netboot/debian-installer/amd64/initrd.gz
    --2016-06-13 18:41:30--  http://ftp.fr.debian.org/debian/dists/jessie/main/installer-amd64/current/images/netboot/debian-installer/amd64/initrd.gz
    Resolving ftp.fr.debian.org (ftp.fr.debian.org)... 212.27.32.66, 2a01:e0c:1:1598::2
    Connecting to ftp.fr.debian.org (ftp.fr.debian.org)|212.27.32.66|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 15119287 (14M) [application/x-gzip]
    Saving to: ‘/media/sdb1/EFI/debian/initrd.gz’
    
    /media/sdb1/EFI/debian/initrd.g    100%[========================================================>]  14.42M    484KB/s   in 31s
    
    2016-06-13 18:42:02 (471 KB/s) - ‘/media/sdb1/EFI/debian/initrd.gz’ saved [15119287/15119287]
    

    Configure Grub

    Then, we must write a decent grub.cfg file to load these:

    david@laptop:~$ echo >/media/sdb1/boot/grub/grub.cfg <<EOF
    menuentry "Jessie Installer" {
      insmod part_msdos
      insmod ext2
      insmod part_gpt
      insmod fat
      insmod gzio
      echo  'Loading Linux kernel'
      linux /EFI/debian/linux --- console=ttyS0,115200
      echo 'Loading InitRD'
      initrd /EFI/debian/initrd.gz
    }
    EOF
    

    Et voilà, piece of cake!


  • Our work for the OpenDreamKit project during the 77th Sage days

    2016/04/18 by Florent Cayré

    Logilab is part of OpenDreamKit, a Horizon 2020 European Research Infrastructure project that will run until 2019 and provides substantial funding to the open source computational mathematics ecosystem.

    https://www.logilab.org/file/5545539/raw

    One of the goals of this project is improve the packaging and documentation of SageMath, the open source alternative to Maple and Mathematica.

    The core developers of SageMath organised the 77th Sage days last week and Logilab has taken part, with David Douard, Julien Cristau and I, Florent Cayre.

    David and Julien have been working on packaging SageMath for Debian. This is a huge task (several man-months of work), split into two sub-tasks for now:

    • building SageMath with Debian-packaged versions of its dependencies, if available;
    • packaging some of the missing dependencies, starting with the most expected ones, like the latest releases of Jupyter and IPython.
    http://ipython.org/_static/IPy_header.png http://jupyter.org/assets/nav_logo.svg https://www.debian.org/Pics/hotlink/swirl-debian.png

    As a first result, the following packages have been pushed into Debian experimental:

    There is still a lot of work to be done, and packaging the notebook is the next task on the list.

    One hiccup along the way was a python crash involving multiple inheritance from Cython extensions classes. Having people nearby who knew the SageMath codebase well (or even wrote the relevant parts) was invaluable for debugging, and allowed us to blame a recent CPython change.

    Julien also gave a hand to Florent Hivert and Robert Lehmann who were trying to understand why building SageMath's documentation needed this much memory.

    As far as I am concerned, I made a prototype of a structured HTML documentation produced with Sphinx and containing Python executable code ran on https://tmpnb.org/ thanks to the Thebe javascript library that interfaces statically delivered HTML pages with a Jupyter notebook server.

    The Sage days have been an excellent opportunity to efficiently work on the technical tasks with skillfull and enthusiastic people. We would like to thank the OpenDreamKit core team for the organization and their hard work. We look forward to the next workshop.


  • 3D Visualization of simulation data with x3dom

    2016/02/16 by Yuanxiang Wang

    X3DOM Plugins

    As part of the Open Dream Kit project, we are working at Logilab on the creation of tools for mesh data visualization and analysis in a web application. Our goal was to create widgets to use in Jupyter notebook (formerly IPython) for 3D visualization and analysis.

    We found two interesting technologies for 3D rendering: ThreeJS and X3DOM. ThreeJS is a large JavaScript 3D library and X3DOM a HTML5 framework for 3D. After working with both, we chose to use X3DOM because of its high level architecture. With X3DOM the 3D model is defined in the DOM in HTML5, so the parameters of the nodes can be changed easily with the setAttribute DOM function. This makes the creation of user interfaces and widgets much easier.

    We worked to create new DOM nodes that integrate nicely in a standard X3DOM tree, namely IsoColor, Threshold and ClipPlane.

    We had two goals in mind:

    1. create an X3DOM plugin API that allows one to create new DOM nodes which extend X3DOM functionality;
    2. keep a simple X3DOM-like interface for the final users.

    Example of the plugins Threshold and IsoColor:

    image0

    The Threshold and IsoColor nodes work like any X3DOM node and react to attribute changes performed with the setAttribute method. This makes it easy to use HTML widgets like sliders / buttons to drive the plugin's parameters.

    X3Dom API

    The goal is to create custom nodes that affect the rendering based on data (positions, pressure, temperature...). The idea is to manipulate the shaders, since it gives low-level manipulation on the 3D rendering. Shaders give more freedom and efficiency compared to reusing other X3DOM nodes. (Reminder : Shaders are parts of GLSL, used to work with the GPU).

    X3DOM has a native node to all users to write shaders : the ComposedShader node. The problem of this node is it overwrites the shaders generated by X3DOM. For example, nodes like ClipPlane are disabled with a ComposedShader node in the DOM. Another example is image texturing, the computation of the color from texture coordinate should be written within the ComposedShader.

    In order to add parts of shader to the generated shader without overwriting it I created a new node: CustomAttributeNode. This node is a generic node to add uniforms, varying and shader parts into X3DOW. The data of the geometry (attributes) are set using the X3DOM node named FloatVertexAttribute.

    Example of CustomAttributeNode to create a threshold node:

    image2

    The CustomAttributeNode is the entry point in x3dom for the javascript API.

    JavaScript API

    The idea of the the API is to create a new node inherited from CustomAttributeNode. We wrote some functions to make the implementation of the node easier.

    Ideas for future improvement

    There are still some points that need improvement

    • Create a tree widget using the grouping nodes in X3DOM
    • Add high level functions to X3DGeometricPropertyNode to set the values. For instance the IsoColor node is only a node that set the values of the TextureCoordinate node from the FloatVertexAttribute node.
    • Add high level function to return the variable name needed to overwrite the basic attributes like positions in a Geometry. With my API, the IsoColor use a varying defined in X3DOM to overwrite the values of the texture coordinate. Because there are no documentation, it is hard for the users to find the varying names. On the other hand there are no specification on the varying names so it might need to be maintained.
    • Maybe the CustomAttributeNode should be a X3DChildNode instead of a X3DGeometricPropertyNode.

    image4

    This structure might allow the "use" attribute in X3DOM. Like that, X3DOM avoid data duplication and writing too much HTML. The following code illustrate what I expect.

    image5


  • We went to cfgmgmtcamp 2016 (after FOSDEM)

    2016/02/09 by Arthur Lutz

    Following a day at FOSDEM (another post about it), we spend two days at cfgmgmtcamp in Gent. At cfgmgmtcamp, we obviously spent some time in the Salt track since it's our tool of choice as you might have noticed. But checking out how some of the other tools and communities are finding solutions to similar problems is also great.

    cfgmgmtcamp logo

    I presented Roll out active Supervision with Salt, Graphite and Grafana (mirrored on slideshare), you can find the code on bitbucket.

    http://image.slidesharecdn.com/cfgmgmtcamp2016activesupervisionwithsalt-160203131954/95/cfgmgmtcamp-2016-roll-out-active-supervision-with-salt-graphite-and-grafana-1-638.jpg?cb=1454505737

    We saw :

    Day 1

    • Mark Shuttleworth from Canonical presenting Juju and its ecosystem, software modelling. MASS (Metal As A Service) was demoed on the nice "OrangeBox". It promises to spin up an OpenStack infrastructure in 15 minutes. One of the interesting things with charms and bundles of charms is the interfaces that need to be established between different service bricks. In the salt community we have salt-formulas but they lack maturity in the sense that there's no possibility to plug in multiple formulas that interact with each other... yet.
    juju deploy of openstack
    • Mitch Michell from Hashicorp presented vault. Vault stores your secrets (certificates, passwords, etc.) and we will probably be trying it out in the near future. A lot of concepts in vault are really well thought out and resonate with some of the things we want to do and automate in our infrastructure. The use of Shamir Secret Sharing technique (also used in the debian infrastructure team) for the N-man challenge to unvault the secrets is quite nice. David is already looking into automating it with Salt and having GSSAPI (kerberos) authentication.
    https://www.vaultproject.io/assets/images/hero-95b4a434.png bikes!

    Day 2

    • Gareth Rushgrove from PuppetLabs talked about the importance of metadata in docker images and docker containers by explaining how these greatly benefit tools like dpkg and rpm and that the container community should be inspired by the amazing skills and experience that has been built by these package management communities (think of all the language-specific package managers that each reinvent the wheel one after the other).
    • Testing Immutable Infrastructure: we found some inspiration from test-kitchen and running the tests inside a docker container instead of vagrant virtual machine. We'll have to take a look at the SaltStack provisioner for test-kitchen. We already do some of that stuff in docker and OpenStack using salt-cloud. But maybe we can take it further with such tools (or testinfra whose author will be joining Logilab next month).
    coreos, rkt, kubernetes
    • How CoreOS is built, modified, and updated: From repo sync to Omaha by Brian "RedBeard" Harrington. Interesting presentation of the CoreOS system. Brian also revealed that CoreOS is now capable of using the TPM to enforce a signed OS, but also signed containers. Official CoreOS images shipped through Omaha are now signed with a root key that can be installed in the TPM of the host (ie. they didn't use a pre-installed Microsoft key), along with a modified TPM-aware version of GRUB. For now, the Omaha platform is not open source, so it may not be that easy to build one's own CoreOS images signed with a personal root key, but it is theoretically possible. Brian also said that he expect their Omaha server implementation to become open source some day.
    • The use of Salt in Foreman was presented and demoed by Stephen Benjamin. We'll have to retry using that tool with the newest features of the smart proxy.
    • Jonathan Boulle from CoreOS presented "rkt and Kubernetes: What’s new with Container Runtimes and Orchestration" In this last talk, Johnathan gave a tour of the rkt project and how it is used to build, coupled with kubernetes, a comprehensive, secure container running infrastructure (which uses saltstack!). He named the result "rktnetes". The idea is to use rkt as the kubelet's (primany node agent) container runtime of a kubernetes cluster powered by CoreOS. Along with the new CoreOS support for TPM-based trust chain, it allows to ensure completely secured executions, from the bootloader to the container! The possibility to run fully secured containers is one of the reasons why CoreOS developed the rkt project.
    coffee!

    We would like to thank the cfgmgmntcamp organisation team, it was a great conference, we highly recommend it. Thanks for the speaker event the night before the conference, and the social event on Monday evening. (and thanks for the chocolate!).


  • We went to FOSDEM 2016 (and cfgmgmtcamp)

    2016/02/09 by Arthur Lutz

    David & I went to FOSDEM and cfgmgmtcamp this year to see some conferences, do two presentations, and discuss with the members of the open source communities we contribute to.

    https://www.logilab.org/file/4253021/raw/16312670359_565eec1e3d_k.jpg

    At FOSDEM, we started early by doing a presentation at 9.00 am in the "Configuration Management devroom", which to our surprise was a large room which was almost full. The presentation was streamed over the Internet and should be available to view shortly.

    I presented "Once you've configured your infrastructure using salt, monitor it by re-using that definition". (mirrored on slideshare. The main part was a demo, the code being published on bitbucket.

    http://image.slidesharecdn.com/fosdem2016describeitmonitorit-160203131836/95/fosdem-2016-after-describing-your-infrastructure-as-code-reuse-that-to-monitor-it-1-638.jpg?cb=1454505792

    The presentation was streamed live (I came across someone that watched it on the Internet to "sleep in"), and should be available to watch when it gets encoded on http://video.fosdem.org/.

    FOSDEM video box

    We then saw the following talks :

    • Unified Framework for Big Data Foreign Data Wrappers (FDW) by Shivram Mani in the Postgresql Track
    • Mainflux Open Source IoT Cloud
    • EzBench, a tool to help you benchmark and bisect the Graphics Stack's performance
    • The RTC components in the debian infrastructure
    • CoreOS: A Linux distribution designed for application containers that scale
    • Using PostgreSQL for Bibliographic Data (since we've worked on http://data.bnf.fr/ with http://cubicweb.org/ and PostgreSQL)
    • The FOSDEM infrastructure review

    Congratulations to all the FOSDEM organisers, volunteers and speakers. We will hopefully be back for more.

    We then took the train to Gent where we spent two days learning and sharing about Configuration Management Systems and all the ecosystem around it (orchestration, containers, clouds, testing, etc.).

    More on our cfmgmtcamp experience in another blog post.

    Photos under creative commons CC-BY, by Ludovic Hirlimann and Deborah Bryant here and here


  • DebConf15 wrap-up

    2015/08/25 by Julien Cristau
    //www.logilab.org/file/856155/raw/heidelberg-panorama-2.jpg

    I just came back from two weeks in Heidelberg for DebCamp15 and DebConf15.

    In the first week, besides helping out DebConf's infrastructure team with network setup, I tried to make some progress on the library transitions triggered by libstdc++6's C++11 changes. At first, I spent many hours going through header files for a bunch of libraries trying to figure out if the public API involved std::string or std::list. It turns out that is time-consuming, error-prone, and pretty efficient at making me lose the will to live. So I ended up stealing a script from Steve Langasek to automatically rename library packages for this transition. This ended in 29 non-maintainer uploads to the NEW queue, quickly processed by the FTP team. Sadly the transition is not quite there yet, as making progress with the initial set of packages reveals more libraries that need renaming.

    Building on some earlier work from Laurent Bigonville, I've also moved the setuid root Xorg wrapper from the xserver-xorg package to xserver-xorg-legacy, which is now in experimental. Hopefully that will make its way to sid and stretch soon (need to figure out what to do with non-KMS drivers first).

    Finally, with the help of the security team, the security tracker was moved to a new VM that will hopefully not eat its root filesystem every week as the old one was doing the last few months. Of course, the evening we chose to do this was the night DebConf15's network was being overhauled, which made things more interesting.

    DebConf itself was the opportunity to meet a lot of people. I was particularly happy to meet Andreas Boll, who has been a member of pkg-xorg for two years now, working on our mesa package, among other things. I didn't get to see a lot of talks (too many other things going on), but did enjoy Enrico's stand up comedy, the CitizenFour screening, and Jake Applebaum's keynote. Thankfully, for the rest the video team has done a great job as usual.

    Note

    Above picture is by Aigars Mahinovs, licensed under CC-BY 2.0


  • Going to DebConf15

    2015/08/11 by Julien Cristau

    On Sunday I travelled to Heidelberg, Germany, to attend the 16th annual Debian developer's conference, DebConf15.

    The conference itself is not until next week, but this week is DebCamp, a hacking session. I've already met a few of my DSA colleagues, who've been working on setting up the network infrastructure. My other plans for this week involve helping the Big Transition of 2015 along, and trying to remove the setuid bit from /usr/bin/X in the default Debian install (bug #748203 in particular).

    As for next week, there's a rich schedule in which I'll need to pick a few things to go see.

    //www.logilab.org/file/524206/raw/Dc15going1.png

  • Experiments on building a Jenkins CI service with Salt

    2015/06/17 by Denis Laxalde

    In this blog post, I'll talk about my recent experiments on building a continuous integration service with Jenkins that is, as much as possible, managed through Salt. We've been relying on a Jenkins platform for quite some time at Logilab (Tolosa team). The service was mostly managed by me with sporadic help from other team-mates but I've never been entirely satisfied about the way it was managed because it involved a lot of boilerplate configuration through Jenkins user interface and this does not scale very well nor does it make long term maintenance easy.

    So recently, I've taken a stance and decided to go through a Salt-based configuration and management of our Jenkins CI platform. There are actually two aspects here. The first concerns the setup of Jenkins itself (this includes installation, security configuration, plugins management amongst other things). The second concerns the management of client projects (or jobs in Jenkins jargon). For this second aspect, one of the design goals was to enable easy configuration of jobs by users not necessarily familiar with Jenkins setup and to make collaborative maintenance easy. To tackle these two aspects I've essentially been using (or developing) two distinct Salt formulas which I'll detail hereafter.

    Jenkins jobs salt

    Core setup: the jenkins formula

    The core setup of Jenkins is based on an existing Salt formula, the jenkins-formula which I extended a bit to support map.jinja and which was further improved to support installation of plugins by Yann and Laura (see 3b524d4).

    With that, deploying a Jenkins server is as simple as adding the following to your states and pillars top.sls files:

    base:
      "jenkins":
        - jenkins
        - jenkins.plugins
    

    Base pillar configuration is used to declare anything that differs from the default Jenkins settings in a jenkins section, e.g.:

    jenkins:
      lookup:
        - home: /opt/jenkins
    

    Plugins configuration is declared in plugins subsection as follows:

    jenkins:
      lookup:
        plugins:
          scm-api:
            url: 'http://updates.jenkins-ci.org/download/plugins/scm-api/0.2/scm-api.hpi'
            hash: 'md5=9574c07bf6bfd02a57b451145c870f0e'
          mercurial:
            url: 'http://updates.jenkins-ci.org/download/plugins/mercurial/1.54/mercurial.hpi'
            hash: 'md5=1b46e2732be31b078001bcc548149fe5'
    

    (Note that plugins dependency is not handled by Jenkins when installing from the command line, neither by this formula. So in the preceding example, just having an entry for the Mercurial plugin would have not been enough because this plugin depends on scm-api.)

    Other aspects (such as security setup) are not handled yet (neither by the original formula, nor by our extension), but I tend to believe that this is acceptable to manage this "by hand" for now.

    Jobs management : the jenkins_jobs formula

    For this task, I leveraged the excellent jenkins-job-builder tool which makes it possible to configure jobs using a declarative YAML syntax. The tool takes care of installing the job and also handles any housekeeping tasks such as checking configuration validity or deleting old configurations. With this tool, my goal was to let end-users of the Jenkins service add their own project by providing a minima a YAML job description file. So for instance, a simple Job description for a CubicWeb job could be:

    - scm:
        name: cubicweb
        scm:
          - hg:
             url: http://hg.logilab.org/review/cubicweb
             clean: true
    
    - job:
        name: cubicweb
        display-name: CubicWeb
        scm:
          - cubicweb
        builders:
          - shell: "find . -name 'tmpdb*' -delete"
          - shell: "tox --hashseed noset"
        publishers:
          - email:
              recipients: cubicweb@lists.cubicweb.org
    

    It consists of two parts:

    • the scm section declares, well, SCM information, here the location of the review Mercurial repository, and,

    • a job section which consists of some metadata (project name), a reference of the SCM section declared above, some builders (here simple shell builders) and a publisher part to send results by email.

    Pretty simple. (Note that most test running configuration is here declared within the source repository, via tox (another story), so that the CI bot holds minimum knowledge and fetches information from the sources repository directly.)

    To automate the deployment of this kind of configurations, I made a jenkins_jobs-formula which takes care of:

    1. installing jenkins-job-builder,
    2. deploying YAML configurations,
    3. running jenkins-jobs update to push jobs into the Jenkins instance.

    In addition to installing the YAML file and triggering a jenkins-jobs update run upon changes of job files, the formula allows for job to list distribution packages that it would require for building.

    Wrapping things up, a pillar declaration of a Jenkins job looks like:

    jenkins_jobs:
      lookup:
        jobs:
          cubicweb:
            file: <path to local cubicweb.yaml>
            pkgs:
              - mercurial
              - python-dev
              - libgecode-dev
    

    where the file section indicates the source of the YAML file to install and pkgs lists build dependencies that are not managed by the job itself (typically non Python package in our case).

    So, as an end user, all is needed to provide is the YAML file and a pillar snippet similar to the above.

    Outlook

    This initial setup appears to be enough to greatly reduce the burden of managing a Jenkins server and to allow individual users to contribute jobs for their project based on simple contribution to a Salt configuration.

    Later on, there is a few things I'd like to extend on jenkins_jobs-formula side. Most notably the handling of distant sources for YAML configuration file (as well as maybe the packages list file). I'd also like to experiment on configuring slaves for the Jenkins server, possibly relying on Docker (taking advantage of another of my experiment...).


show 194 results