Blog entries

Rencontre LLVM à l'IRILL

2012/09/27 by Damien Garaud
IRILL-logoLLVM-logo

L'IRILL , Initiative de Recherche et Innovation sur le Logiciel Libre, organisait une rencontre LLVM mardi 25 septembre à Paris dans les locaux de l'INRIA Place d'Italie dans le 13e arrondissement. Les organisateurs, Arnaud de Grandmaison, Duncan Sands, Sylvestre Ledru et Tobias Grosser ont chaleureusement accueilli environ 8 personnes dont 3 de Logilab.

L'annonce de l'IRILL indiquait une rencontre informelle avec une potentielle Bug Squashing Party sur Clang. Nous n'avons pas écrasé d'insectes mais avons plutôt discuté: discussions techniques, petits trolls, échanges de connaissances, d'outils et d'expériences accompagnés de bières, sodas, gâteaux et pizzas (et une salade solitaire et orpheline qui n'a finie dans le ventre de personne contrairement aux pizzas mexicaines ou quatre fromages).

LLVM (Low Level Virtual Machine) est un projet développé depuis une dizaine d'années qui propose une collection d'outils et de modules facilement réutilisables pour construire des logiciels orientés langages : interpréteurs, compilateurs, optimiseurs, convertisseurs source-to-source...

Depuis l'étage d'entrée du code dans le compilateur (ou frontend), en passant par l'optimiseur indépendant de la plate-forme, jusqu'à la génération de code machine (backend) et ce, pour plusieurs architectures (X86, PowerPC, ARM, etc.), le choix de son design en fait un outil tout à fait intéressant. Parmi ces frontends, il y a le fameux Clang (C/C++, Objective-C), GHC (Haskell). Et le projet dragonegg permet de l'utiliser comme plug-in à GCC et donc de bénéficier des différents frontends GCC (Ada, Fortran, etc.).

De nombreux outils se sont construits autour du framework LLVM : LLDB un débugger, vmkit une implémentation de la machine virtuelle Java et .NET ou encore polly, un projet de recherche dont Tobias est l'un des deux co-fondateurs, qui a pour objectif d'optimiser un programme indépendamment du langage via des polyhedral optimizations.

Les principaux contributeurs à ce jour sont Apple, Google, des contributeurs "individuels" dont Duncan Sands, suivis par des laboratoires de recherche et autres. Voir l'article de Sylvestre sur "Who is in control of LLVM/Clang ?".

La clef de voûte de LLVM est sa représentation intermédiaire (IR pour Intermediate Representation). C'est une structure de données qui représente sous forme SSA (Single Static Assignment) les flux de données et de contrôle. Cette forme est plus "haut-niveau" et plus lisible que du code assembleur, elle comporte certaines informations de type par exemple. Elle constitue le pivot de l'infrastructure LLVM : c'est ce que produisent les frontends comme Clang, qui est ensuite passée aux optimiseurs de LLVM et est ensuite consommée par les backends dont le rôle est de les transformer en code machine natif.

En voici un exemple en C:

int add(int a)
{
    return a + 42;
}

La représentation intermédiaire est donnée via Clang :

$> clang -S -emit-llvm add_function.c -o add_function.ll

L'extension *.ll désigne des "fichiers IR" et le résultat donne:

define i32 @add(i32 %a) nounwind uwtable {
  %1 = alloca i32, align 4
  store i32 %a, i32* %1, align 4
  %2 = load i32* %1, align 4
  %3 = add nsw i32 %2, 42
  ret i32 %3
}

Cette représentation peut ensuite être bitcodée, assemblée ou compilée. Voir les différentes commandes LLVM pour assembler, désassembler, optimiser, linker, exécuter, etc.

Une autre utilisation intéressante de cette infrastructure est l'utilisation de la bibliothèque Clang pour parser du code C/C++ afin de parcourir l'Abstract Syntax Tree (AST). Ceci offre notamment de belles possibilités d'introspection. Google a d'ailleurs rédigé un tutoriel sur les plugins Clang et ses possibilités via l'API de Clang, notamment la classe ASTConsumer. Il est possible de parcourir l'ensemble des constructeurs, de connaître la propriété d'une classe (abstraite ou non), parcourir les membres d'une classe, etc. Avant de partir, nous parlions de la possiblité de propager un changement d'API à travers toutes les bibliothèques qui en dépendent, soit la notion de patch sémantique.

Nous avons aussi profité pour parler un peu du langage Julia ou de Numba.

Pythonistes convaincus, nous connaissions déjà un peu Numba, projet de Travis Oliphant, contributeur NumPy et papa de SciPy. Ce projet utilise l'infrastructure LLVM pour compiler du byte-code Python en code machine (Just In Time compilation), en particulier pour NumPy et SciPy. Les exemples montrent comment passer d'une fonction Python qui traite un tableau NumPy à une fonction compilée Just In Time. Extrait issu d'un des exemples fournis par le projet Numba :

from numba import d
from numba.decorators import jit as jit

def sum2d(arr):
    M, N = arr.shape
    result = 0.0
    for i in range(M):
        for j in range(N):
            result += arr[i,j]
    return result

csum2d = jit(ret_type=d, arg_types=[d[:,:]])(sum2d)

Oui, la fonction sum2d n'est pas optimale et très "naïve". Néanmoins, la fonction compilée va près de 250 fois plus vite ! Numba utilise les bindings LLVM pour Python via llvm-py afin de passer du code Python, à l'Intermediate Representation pour ensuite utiliser les fonctionnalités JIT d'LLVM.

Entre programmeurs de langages à typage statique, nous avons aussi parlé de C et d'Ocaml (dont il existe des bindings pour LLVM) et mentionné de beaux projets tels que PIPS et Coccinelle.

Pour finir, nous savons maintenant prononcer Clang. On dit "klang" et non "cilangue". Nous avons appris que gdb avait son propre parser et n'utilisait pas le parser de GCC. Rappelons-le, l'un des grands avantages de LLVM & Clang face à GCC est sa modularité et la possibilité d'utiliser une des pierres qui forme l'édifice pour construire sa propre maison.

Nous finissons ce billet par une citation. David Beazley disait lors de sa présentation à Eursoscipy 2012 :

The life is too short to write C++.

Certes. Mais qu'on le veuille ou qu'on le doive, autant se servir d'outils biens pensés pour nous faciliter la vie.

Encore merci aux organisateurs pour cette rencontre et à la prochaine.

Quelques références


Emacs turned into a IDE with CEDET

2013/08/29 by Anthony Truchet

Abstract

In this post you will find one way, namely thanks to CEDET, of turning your Emacs into an IDE offering features for semantic browsing and refactoring assistance similar to what you can find in major IDE like Visual Studio or Eclipse.

Introduction

Emacs is a tool of choice for the developer: it is very powerful, highly configurable and has a wealth of so called modes to improve many aspects of daily work, especially when editing code.

The point, as you might have realised in case you have already worked with an IDE like Eclipse or Visual Studio, is that Emacs (code) browsing abilities are quite rudimentary... at least out of the box!

In this post I will walk through one way to configure Emacs + CEDET which works for me. This is by far not the only way to get to it but finding this path required several days of wandering between inconsistent resources, distribution pitfall and the like.

I will try to convey relevant parts of what I have learnt on the way, to warn about some pitfalls and also to indicate some interesting direction I haven't followed (be it by choice or necessity) and encourage you to try. Should you try to push this adventure further, your experience will be very much appreciated... and in any case your feedback on this post is also very welcome.

The first part gives some deemed useful background to understand what's going on. If you want to go straight to the how-to please jump directly to the second part.

Sketch map of the jungle

This all started because I needed a development environment to do work remotely on a big, legacy C++ code base from quite a lightweight machine and a weak network connection.

My former habit of using Eclipse CDT and compiling locally was not an option any longer but I couldn't stick to a bare text editor plus remote compilation either because of the complexity of the code base. So I googled emacs IDE code browser and started this journey to set CEDET + ECB up...

I quickly got lost in a jungle of seemingly inconsistent options and I reckon that some background facts are welcome at this point as to why.

Up to this date - sept. 2013 - most of the world is in-between two major releases of Emacs. Whereas Emacs 23.x is still packaged in many stable Linux distribution, the latest release is Emacs 24.3. In this post we will use Emacs 24.x which brings lots of improvements, two of those are really relevant to us:

  • the introduction of a package manager, which is great and (but) changes initialisation
  • the partial integration of some version of CEDET into Emacs since version 23.2

Emacs 24 initialisation

Very basically, Emacs used to read the user's Emacs config (~/.emacs or ~/.emacs.d/init.el) which was responsible for adapting the load-path and issuing the right (require 'stuff) commands and configuring each library in some appropriate sequence.

Emacs 24 introduces ELPA, a new package system and official packages repository. It can be extended by other packages repositories such as Marmalade or MELPA

By default in Emacs 24, the initialisation order is a bit more complex due to packages loading: the user's config is still read but should NOT require the libraries installed through the package system: those are automatically loaded (the former load-path adjustment and (require 'stuff) steps) after the ~/.emacs or ~/.emacs.d/init.el has finished. This makes configuring the loaded libraries much more error-prone, especially for libraries designed to be configured the old way (as of today most libraries, notably CEDET).

Here is a good analysis of the situation and possible options. And for those interested in the details of the new initialisation process, see following sections of the manual:

I first tried to stick to the new-way, setting up hooks in ~/.emacs.d/init.el to be called after loading the various libraries, each library having its own configuration hook, and praying for the interaction between the package manager load order and my hooks to be ok... in vain. So I ended up forcing the initialisation to the old way (see Emacs 24 below).

What is CEDET ?

CEDET is a Collection of Emacs Development Environment Tools. The major word here is collection, do not expect it to be an integrated environment. The main components of (or coupled with) CEDET are:

Semantic
Extract a common semantic from source code in different languages
(e)ctags / GNU global
Traditional (exhuberant) CTags or GNU global can be used as a source of information for Semantic
SemanticDB
SemanticDB provides for caching the outcome of semantic analysis in some database to reduce analysis overhead across several editing sessions
Emacs Code Browser
This component uses information provided by Semantic to offer a browsing GUI with windows for traversing files, classes, dependencies and the like
EDE
This provides a notion of project analogous to most IDE. Even if the features related to building projects are very Emacs/ Linux/ Autotools-centric (and thus not necessarily very helful depending on your project setup), the main point of EDE is providing scoping of source code for Semantic to analyse and include path customisation at the project level.
AutoComplete
This is not part of CEDET but Semantic can be configured as a source of completions for auto-complete to propose to the user.
and more...
Senator, SRecode, Cogre, Speedbar, EIEIO, EAssist are other components of CEDET I've not looked at yet.

To add some more complexity, CEDET itself is also undergoing heavy changes and is in-between major versions. The last standalone release is 1.1 but it has the old source layout and activation method. The current head of development says it is version 2.0, has new layout and activation method, plus some more features but is not released yet.

Integration of CEDET into Emacs

Since Emacs 23.2, CEDET is built into Emacs. More exactly parts of some version of new CEDET are built into Emacs, but of course this built-in version is older than the current head of new CEDET... As for the notable parts not built into Emacs, ECB is the most prominent! But it is packaged into Marmalade in a recent version following head of development closely which, mitigates the inconvenience.

My first choice was using built-in CEDET with ECB installed from the packages repository: the installation was perfectly smooth but I was not able to configure cleanly enough the whole to get proper operation. Although I tried hard, I could not get Semantic to take into account the include paths I configured using my EDE project for example.

I would strongly encourage you to try this way, as it is supposed to require much less effort to set up and less maintenance. Should you succeed I would greatly appreciate some feedback of you experience!

As for me I got down to install the latest version from the source repositories following as closely as possible Alex Ott's advices and using his own fork of ECB to make it compliant with most recent CEDET:

How to set up CEDET + ECB in Emacs 24

Emacs 24

Install Emacs 24 as you wish, I will not cover the various options here but simply summarise the local install from sources I choose.

  1. Get the source archive from http://ftpmirror.gnu.org/emacs/
  2. Extract it somewhere and run the usual (or see the INSTALL file) - configure --prefix=~/local, - make, - make install

Create your emacs personal directory and configuration file ~/.emacs.d/site-lisp/ and ~/.emacs.d/init.el and put this inside the latter:

;; this is intended for manually installed libraries
(add-to-list 'load-path "~/.emacs.d/site-lisp/")

;; load the package system and add some repositories
(require 'package)
(add-to-list 'package-archives
             '("marmalade" . "http://marmalade-repo.org/packages/"))
(add-to-list 'package-archives
             '("melpa" . "http://melpa.milkbox.net/packages/") t)

;; Install a hook running post-init.el *after* initialization took place
(add-hook 'after-init-hook (lambda () (load "post-init.el")))

;; Do here basic initialization, (require) non-ELPA packages, etc.

;; disable automatic loading of packages after init.el is done
(setq package-enable-at-startup nil)
;; and force it to happen now
(package-initialize)
;; NOW you can (require) your ELPA packages and configure them as normal

Useful Emacs packages

Using the emacs commands M-x package-list-packages interactively or M-x package-install <package name>, you can install many packages easily. For example I installed:

Choose your own! I just recommend against installing ECB or other CEDET since we are going to install those from source.

You can also insert or load your usual Emacs configuration here, simply beware of configuring ELPA, Marmalade et al. packages after (package-initialize).

CEDET

  • Get the source and put it under ~/.emacs.d/site-lisp/cedet-bzr. You can either download a snapshot from http://www.randomsample.de/cedet-snapshots/ or check it out of the bazaar repository with:

    ~/.emacs.d/site-lisp$ bzr checkout --lightweight \
    bzr://cedet.bzr.sourceforge.net/bzrroot/cedet/code/trunk cedet-bzr
    
  • Run make (and optionnaly make install-info) in cedet-bzr or see the INSTALL file for more details.

  • Get Alex Ott's minimal CEDET configuration file to ~/.emacs.d/config/cedet.el for example

  • Adapt it to your system by editing the first lines as follows

    (setq cedet-root-path
        (file-name-as-directory (expand-file-name
            "~/.emacs.d/site-lisp/cedet-bzr/")))
    (add-to-list 'Info-directory-list
            "~/projects/cedet-bzr/doc/info")
    
  • Don't forget to load it from your ~/.emacs.d/init.el:

    ;; this is intended for configuration snippets
    (add-to-list 'load-path "~/.emacs.d/")
    ...
    (load "config/cedet.el")
    
  • restart your emacs to check everything is OK; the --debug-init option is of great help for that purpose.

ECB

  • Get Alex Ott ECB fork into ~/.emacs.d/site-lisp/ecb-alexott:

    ~/.emacs.d/site-lisp$ git clone --depth 1  https://github.com/alexott/ecb/
    
  • Run make in ecb-alexott and see the README file for more details.

  • Don't forget to load it from your ~/.emacs.d/init.el:

    (add-to-list 'load-path (expand-file-name
          "~/.emacs.d/site-lisp/ecb-alexott/"))
    (require 'ecb)
    ;(require 'ecb-autoloads)
    

    Note

    You can theoretically use (require 'ecb-autoloads) instead of (require 'ecb) in order to load ECB by need. I encountered various misbehaviours trying this option and finally dropped it, but I encourage you to try it and comment on your experience.

  • restart your emacs to check everything is OK (you probably want to use the --debug-init option).

  • Create a hello.cpp with you CEDET enable Emacs and say M-x ecb-activate to check that ECB is actually installed.

Tune your configuration

Now, it is time to tune your configuration. There is no good recipe from here onward... But I'll try to propose some snippets below. Some of them are adapted from Alex Ott personal configuration

More Semantic options

You can use the following lines just before (semantic-mode 1) to add to the activated features list:

(add-to-list 'semantic-default-submodes 'global-semantic-decoration-mode)
(add-to-list 'semantic-default-submodes 'global-semantic-idle-local-symbol-highlight-mode)
(add-to-list 'semantic-default-submodes 'global-semantic-idle-scheduler-mode)
(add-to-list 'semantic-default-submodes 'global-semantic-idle-completions-mode)

You can also load additional capabilities with those lines after (semantic-mode 1):

(require 'semantic/ia)
(require 'semantic/bovine/gcc) ; or depending on you compiler
; (require 'semantic/bovine/clang)
Auto-completion

If you want to use auto-complete you can tell it to interface with Semantic by configuring it as follows (where AAAAMMDD.rrrr is the date.revision suffix of the version od auti-complete installed by you package manager):

;; Autocomplete
(require 'auto-complete-config)
(add-to-list 'ac-dictionary-directories (expand-file-name
             "~/.emacs.d/elpa/auto-complete-AAAAMMDD.rrrr/dict"))
(setq ac-comphist-file (expand-file-name
             "~/.emacs.d/ac-comphist.dat"))
(ac-config-default)

and activating it in your cedet hook, for example:

...
;; customisation of modes
(defun alexott/cedet-hook ()
...
    (add-to-list 'ac-sources 'ac-source-semantic)
) ; defun alexott/cedet-hook ()
Support for GNU global a/o (e)ctags
;; if you want to enable support for gnu global
(when (cedet-gnu-global-version-check t)
  (semanticdb-enable-gnu-global-databases 'c-mode)
  (semanticdb-enable-gnu-global-databases 'c++-mode))

;; enable ctags for some languages:
;;  Unix Shell, Perl, Pascal, Tcl, Fortran, Asm
(when (cedet-ectag-version-check)
  (semantic-load-enable-primary-exuberent-ctags-support))

Using CEDET for development

Once CEDET + ECB + EDE is up you can start using it for actual development. How to actually use it is beyond the scope of this already too long post. I can only invite you to have a look at:

Conclusion

CEDET provides an impressive set of features both to allow your emacs environment to "understand" your code and to provide powerful interfaces to this "understanding". It is probably one of the very few solution to work with complex C++ code base in case you can't or don't want to use a heavy-weight IDE like Eclipse CDT.

But its being highly configurable also means, at least for now, some lack of integration, or at least a pretty complex configuration. I hope this post will help you to do your first steps with CEDET and find your way to setup and configure it to you own taste.