show 315 results

Blog entries

  • Going to EuroScipy2013

    2013/09/04 by Alain Leufroy

    The EuroScipy2013 conference was held in Bruxelles at the Université libre de Bruxelles.

    http://www.logilab.org/file/175984/raw/logo-807286783.png

    As usual the first two days were dedicated to tutorials while the last two ones were dedicated to scientific presentations and general python related talks. The meeting was extended by one more day for sprint sessions during which enthusiasts were able to help free software projects, namely sage, vispy and scipy.

    Jérôme and I had the great opportunity to represent Logilab during the scientific tracks and the sprint day. We enjoyed many talks about scientific applications using python. We're not going to describe the whole conference. Visit the conference website if you want the complete list of talks. In this article we will try to focus on the ones we found the most interesting.

    First of all the keynote by Cameron Neylon about Network ready research was very interesting. He presented some graphs about the impact of a group job on resolving complex problems. They revealed that there is a critical network size for which the effectiveness for solving a problem drastically increase. He pointed that the source code accessibility "friction" limits the "getting help" variable. Open sourcing software could be the best way to reduce this "friction" while unit testing and ongoing integration are facilitators. And, in general, process reproducibility is very important, not only in computing research. Retrieving experimental settings, metadata, and process environment is vital. We agree with this as we are experimenting it everyday in our work. That is why we encourage open source licenses and develop a collaborative platform that provides the distributed simulation traceability and reproducibility platform Simulagora (in french).

    Ian Ozsvald's talk dealt with key points and tips from his own experience to grow a business based on open source and python, as well as mistakes to avoid (e.g. not checking beforehand there are paying customers interested by what you want to develop). His talk was comprehensive and mentioned a wide panel of situations.

    http://vispy.org/_static/img/logo.png

    We got a very nice presentation of a young but interesting visualization tools: Vispy. It is 6 months old and the first public release was early August. It is the result of the merge of 4 separated libraries, oriented toward interactive visualisation (vs. static figure generation for Matplotlib) and using OpenGL on GPUs to avoid CPU overload. A demonstration with large datasets showed vispy displaying millions of points in real time at 40 frames per second. During the talk we got interesting information about OpenGL features like anti-grain compared to Matplotlib Agg using CPU.

    We also got to learn about cartopy which is an open source Python library originally written for weather and climate science. It provides useful and simple API to manipulate cartographic mapping.

    Distributed computing systems was a hot topic and many talks were related to this theme.

    https://www.openstack.org/themes/openstack/images/openstack-logo-preview-full-color.png

    Gael Varoquaux reminded us what are the keys problems with "biggish data" and the key points to successfully process them. I think that some of his recommendations are generally useful like "choose simple solutions", "fail gracefully", "make it easy to debug". For big data processing when I/O limit is the constraint, first try to split the problem into random fractions of the data, then run algorithms and aggregate the results to circumvent this limit. He also presented mini-batch that takes a bunch of observations (trade-off memory usage/vectorization) and joblib.parallel that makes I/O faster using compression (CPUs are faster than disk access).

    Benoit Da Mota talked about shared memory in parallel computing and Antonio Messina gave us a quick overview on how to build a computing cluster with Elasticluster, using OpenStack/Slurm/ansible. He demonstrated starting and stopping a cluster on OpenStack: once all VMs are started, ansible configures them as hosts to the cluster and new VMs can be created and added to the cluster on the fly thanks to a command line interface.

    We also got a keynote by Peter Wang (from Continuum Analytics) about the future of data analysis with Python. As a PhD in physics I loved his metaphor of giving mass to data. He tried to explain the pain that scientists have when using databases.

    https://scikits.appspot.com/static/images/scipyshiny_small.png

    After the conference we participated to the numpy/scipy sprint. It was organized by Ralph Gommers and Pauli Virtanen. There were 18 people trying to close issues from different difficulty levels and had a quick tutorial on how easy it is to contribute: the easiest is to fork from the github project page on your own github account (you can create one for free), so that later your patch submission will be a simple "Pull Request" (PR). Clone locally your scipy fork repository, and make a new branch (git checkout -b <newbranch>) to tackle one specific issue. Once your patch is ready, commit it locally, push it on your github repository and from the github interface choose "Push request". You will be able to add something to your commit message before your PR is sent and looked at by the project lead developers. For example using "gh-XXXX" in your commit message will automatically add a link to the issue no. XXXX. Here is the list of open issues for scipy; you can filter them, e.g. displaying only the ones considered easy to fix :D

    For more information: Contributing to SciPy.


  • Emacs turned into a IDE with CEDET

    2013/08/29 by Anthony Truchet

    Abstract

    In this post you will find one way, namely thanks to CEDET, of turning your Emacs into an IDE offering features for semantic browsing and refactoring assistance similar to what you can find in major IDE like Visual Studio or Eclipse.

    Introduction

    Emacs is a tool of choice for the developer: it is very powerful, highly configurable and has a wealth of so called modes to improve many aspects of daily work, especially when editing code.

    The point, as you might have realised in case you have already worked with an IDE like Eclipse or Visual Studio, is that Emacs (code) browsing abilities are quite rudimentary... at least out of the box!

    In this post I will walk through one way to configure Emacs + CEDET which works for me. This is by far not the only way to get to it but finding this path required several days of wandering between inconsistent resources, distribution pitfall and the like.

    I will try to convey relevant parts of what I have learnt on the way, to warn about some pitfalls and also to indicate some interesting direction I haven't followed (be it by choice or necessity) and encourage you to try. Should you try to push this adventure further, your experience will be very much appreciated... and in any case your feedback on this post is also very welcome.

    The first part gives some deemed useful background to understand what's going on. If you want to go straight to the how-to please jump directly to the second part.

    Sketch map of the jungle

    This all started because I needed a development environment to do work remotely on a big, legacy C++ code base from quite a lightweight machine and a weak network connection.

    My former habit of using Eclipse CDT and compiling locally was not an option any longer but I couldn't stick to a bare text editor plus remote compilation either because of the complexity of the code base. So I googled emacs IDE code browser and started this journey to set CEDET + ECB up...

    I quickly got lost in a jungle of seemingly inconsistent options and I reckon that some background facts are welcome at this point as to why.

    Up to this date - sept. 2013 - most of the world is in-between two major releases of Emacs. Whereas Emacs 23.x is still packaged in many stable Linux distribution, the latest release is Emacs 24.3. In this post we will use Emacs 24.x which brings lots of improvements, two of those are really relevant to us:

    • the introduction of a package manager, which is great and (but) changes initialisation
    • the partial integration of some version of CEDET into Emacs since version 23.2

    Emacs 24 initialisation

    Very basically, Emacs used to read the user's Emacs config (~/.emacs or ~/.emacs.d/init.el) which was responsible for adapting the load-path and issuing the right (require 'stuff) commands and configuring each library in some appropriate sequence.

    Emacs 24 introduces ELPA, a new package system and official packages repository. It can be extended by other packages repositories such as Marmalade or MELPA

    By default in Emacs 24, the initialisation order is a bit more complex due to packages loading: the user's config is still read but should NOT require the libraries installed through the package system: those are automatically loaded (the former load-path adjustment and (require 'stuff) steps) after the ~/.emacs or ~/.emacs.d/init.el has finished. This makes configuring the loaded libraries much more error-prone, especially for libraries designed to be configured the old way (as of today most libraries, notably CEDET).

    Here is a good analysis of the situation and possible options. And for those interested in the details of the new initialisation process, see following sections of the manual:

    I first tried to stick to the new-way, setting up hooks in ~/.emacs.d/init.el to be called after loading the various libraries, each library having its own configuration hook, and praying for the interaction between the package manager load order and my hooks to be ok... in vain. So I ended up forcing the initialisation to the old way (see Emacs 24 below).

    What is CEDET ?

    CEDET is a Collection of Emacs Development Environment Tools. The major word here is collection, do not expect it to be an integrated environment. The main components of (or coupled with) CEDET are:

    Semantic
    Extract a common semantic from source code in different languages
    (e)ctags / GNU global
    Traditional (exhuberant) CTags or GNU global can be used as a source of information for Semantic
    SemanticDB
    SemanticDB provides for caching the outcome of semantic analysis in some database to reduce analysis overhead across several editing sessions
    Emacs Code Browser
    This component uses information provided by Semantic to offer a browsing GUI with windows for traversing files, classes, dependencies and the like
    EDE
    This provides a notion of project analogous to most IDE. Even if the features related to building projects are very Emacs/ Linux/ Autotools-centric (and thus not necessarily very helful depending on your project setup), the main point of EDE is providing scoping of source code for Semantic to analyse and include path customisation at the project level.
    AutoComplete
    This is not part of CEDET but Semantic can be configured as a source of completions for auto-complete to propose to the user.
    and more...
    Senator, SRecode, Cogre, Speedbar, EIEIO, EAssist are other components of CEDET I've not looked at yet.

    To add some more complexity, CEDET itself is also undergoing heavy changes and is in-between major versions. The last standalone release is 1.1 but it has the old source layout and activation method. The current head of development says it is version 2.0, has new layout and activation method, plus some more features but is not released yet.

    Integration of CEDET into Emacs

    Since Emacs 23.2, CEDET is built into Emacs. More exactly parts of some version of new CEDET are built into Emacs, but of course this built-in version is older than the current head of new CEDET... As for the notable parts not built into Emacs, ECB is the most prominent! But it is packaged into Marmalade in a recent version following head of development closely which, mitigates the inconvenience.

    My first choice was using built-in CEDET with ECB installed from the packages repository: the installation was perfectly smooth but I was not able to configure cleanly enough the whole to get proper operation. Although I tried hard, I could not get Semantic to take into account the include paths I configured using my EDE project for example.

    I would strongly encourage you to try this way, as it is supposed to require much less effort to set up and less maintenance. Should you succeed I would greatly appreciate some feedback of you experience!

    As for me I got down to install the latest version from the source repositories following as closely as possible Alex Ott's advices and using his own fork of ECB to make it compliant with most recent CEDET:

    How to set up CEDET + ECB in Emacs 24

    Emacs 24

    Install Emacs 24 as you wish, I will not cover the various options here but simply summarise the local install from sources I choose.

    1. Get the source archive from http://ftpmirror.gnu.org/emacs/
    2. Extract it somewhere and run the usual (or see the INSTALL file) - configure --prefix=~/local, - make, - make install

    Create your emacs personal directory and configuration file ~/.emacs.d/site-lisp/ and ~/.emacs.d/init.el and put this inside the latter:

    ;; this is intended for manually installed libraries
    (add-to-list 'load-path "~/.emacs.d/site-lisp/")
    
    ;; load the package system and add some repositories
    (require 'package)
    (add-to-list 'package-archives
                 '("marmalade" . "http://marmalade-repo.org/packages/"))
    (add-to-list 'package-archives
                 '("melpa" . "http://melpa.milkbox.net/packages/") t)
    
    ;; Install a hook running post-init.el *after* initialization took place
    (add-hook 'after-init-hook (lambda () (load "post-init.el")))
    
    ;; Do here basic initialization, (require) non-ELPA packages, etc.
    
    ;; disable automatic loading of packages after init.el is done
    (setq package-enable-at-startup nil)
    ;; and force it to happen now
    (package-initialize)
    ;; NOW you can (require) your ELPA packages and configure them as normal
    

    Useful Emacs packages

    Using the emacs commands M-x package-list-packages interactively or M-x package-install <package name>, you can install many packages easily. For example I installed:

    Choose your own! I just recommend against installing ECB or other CEDET since we are going to install those from source.

    You can also insert or load your usual Emacs configuration here, simply beware of configuring ELPA, Marmalade et al. packages after (package-initialize).

    CEDET

    • Get the source and put it under ~/.emacs.d/site-lisp/cedet-bzr. You can either download a snapshot from http://www.randomsample.de/cedet-snapshots/ or check it out of the bazaar repository with:

      ~/.emacs.d/site-lisp$ bzr checkout --lightweight \
      bzr://cedet.bzr.sourceforge.net/bzrroot/cedet/code/trunk cedet-bzr
      
    • Run make (and optionnaly make install-info) in cedet-bzr or see the INSTALL file for more details.

    • Get Alex Ott's minimal CEDET configuration file to ~/.emacs.d/config/cedet.el for example

    • Adapt it to your system by editing the first lines as follows

      (setq cedet-root-path
          (file-name-as-directory (expand-file-name
              "~/.emacs.d/site-lisp/cedet-bzr/")))
      (add-to-list 'Info-directory-list
              "~/projects/cedet-bzr/doc/info")
      
    • Don't forget to load it from your ~/.emacs.d/init.el:

      ;; this is intended for configuration snippets
      (add-to-list 'load-path "~/.emacs.d/")
      ...
      (load "config/cedet.el")
      
    • restart your emacs to check everything is OK; the --debug-init option is of great help for that purpose.

    ECB

    • Get Alex Ott ECB fork into ~/.emacs.d/site-lisp/ecb-alexott:

      ~/.emacs.d/site-lisp$ git clone --depth 1  https://github.com/alexott/ecb/
      
    • Run make in ecb-alexott and see the README file for more details.

    • Don't forget to load it from your ~/.emacs.d/init.el:

      (add-to-list 'load-path (expand-file-name
            "~/.emacs.d/site-lisp/ecb-alexott/"))
      (require 'ecb)
      ;(require 'ecb-autoloads)
      

      Note

      You can theoretically use (require 'ecb-autoloads) instead of (require 'ecb) in order to load ECB by need. I encountered various misbehaviours trying this option and finally dropped it, but I encourage you to try it and comment on your experience.

    • restart your emacs to check everything is OK (you probably want to use the --debug-init option).

    • Create a hello.cpp with you CEDET enable Emacs and say M-x ecb-activate to check that ECB is actually installed.

    Tune your configuration

    Now, it is time to tune your configuration. There is no good recipe from here onward... But I'll try to propose some snippets below. Some of them are adapted from Alex Ott personal configuration

    More Semantic options

    You can use the following lines just before (semantic-mode 1) to add to the activated features list:

    (add-to-list 'semantic-default-submodes 'global-semantic-decoration-mode)
    (add-to-list 'semantic-default-submodes 'global-semantic-idle-local-symbol-highlight-mode)
    (add-to-list 'semantic-default-submodes 'global-semantic-idle-scheduler-mode)
    (add-to-list 'semantic-default-submodes 'global-semantic-idle-completions-mode)
    

    You can also load additional capabilities with those lines after (semantic-mode 1):

    (require 'semantic/ia)
    (require 'semantic/bovine/gcc) ; or depending on you compiler
    ; (require 'semantic/bovine/clang)
    
    Auto-completion

    If you want to use auto-complete you can tell it to interface with Semantic by configuring it as follows (where AAAAMMDD.rrrr is the date.revision suffix of the version od auti-complete installed by you package manager):

    ;; Autocomplete
    (require 'auto-complete-config)
    (add-to-list 'ac-dictionary-directories (expand-file-name
                 "~/.emacs.d/elpa/auto-complete-AAAAMMDD.rrrr/dict"))
    (setq ac-comphist-file (expand-file-name
                 "~/.emacs.d/ac-comphist.dat"))
    (ac-config-default)
    

    and activating it in your cedet hook, for example:

    ...
    ;; customisation of modes
    (defun alexott/cedet-hook ()
    ...
        (add-to-list 'ac-sources 'ac-source-semantic)
    ) ; defun alexott/cedet-hook ()
    
    Support for GNU global a/o (e)ctags
    ;; if you want to enable support for gnu global
    (when (cedet-gnu-global-version-check t)
      (semanticdb-enable-gnu-global-databases 'c-mode)
      (semanticdb-enable-gnu-global-databases 'c++-mode))
    
    ;; enable ctags for some languages:
    ;;  Unix Shell, Perl, Pascal, Tcl, Fortran, Asm
    (when (cedet-ectag-version-check)
      (semantic-load-enable-primary-exuberent-ctags-support))
    

    Using CEDET for development

    Once CEDET + ECB + EDE is up you can start using it for actual development. How to actually use it is beyond the scope of this already too long post. I can only invite you to have a look at:

    Conclusion

    CEDET provides an impressive set of features both to allow your emacs environment to "understand" your code and to provide powerful interfaces to this "understanding". It is probably one of the very few solution to work with complex C++ code base in case you can't or don't want to use a heavy-weight IDE like Eclipse CDT.

    But its being highly configurable also means, at least for now, some lack of integration, or at least a pretty complex configuration. I hope this post will help you to do your first steps with CEDET and find your way to setup and configure it to you own taste.


  • Pylint 1.0 released!

    2013/08/06 by Sylvain Thenault

    Hi there,

    I'm very pleased to announce, after 10 years of existence, the 1.0 release of Pylint.

    This release has a hell long ChangeLog, thanks to many contributions and to the 10th anniversary sprint we hosted during june. More details about changes below.

    Chances are high that your Pylint score will go down with this new release that includes a lot of new checks :) Also, there are a lot of improvments on the Python 3 side (notably 3.3 support which was somewhat broken).

    You may download and install it from Pypi or from Logilab's debian repositories. Notice Pylint has been updated to use the new Astroid library (formerly known as logilab-astng) and that the logilab-common 0.60 library includes some fixes necessary for using Pylint with Python3 as well as long-awaited support for namespace packages.

    For those interested, below is a comprehensive list of what changed:

    Command line and output formating

    • A new --msg-template option to control output, deprecating "msvc" and "parseable" output formats as well as killing --include-ids and --symbols options.
    • Fix spelling of max-branchs option, now max-branches.
    • Start promoting usage of symbolic name instead of numerical ids.

    New checks

    • "missing-final-newline" (C0304) for files missing the final newline.
    • "invalid-encoded-data" (W0512) for files that contain data that cannot be decoded with the specified or default encoding.
    • "bad-open-mode" (W1501) for calls to open (or file) that specify invalid open modes (Original implementation by Sasha Issayev).
    • "old-style-class" (C1001) for classes that do not have any base class.
    • "trailing-whitespace" (C0303) that warns about trailing whitespace.
    • "unpacking-in-except" (W0712) about unpacking exceptions in handlers, which is unsupported in Python 3.
    • "old-raise-syntax" (W0121) for the deprecated syntax raise Exception, args.
    • "unbalanced-tuple-unpacking" (W0632) for unbalanced unpacking in assignments (bitbucket #37).

    Enhanced behaviours

    • Do not emit [fixme] for every line if the config value 'notes' is empty
    • Emit warnings about lines exceeding the column limit when those lines are inside multiline docstrings.
    • Name check enhancement:
      • simplified message,
      • don't double-check parameter names with the regex for parameters and inline variables,
      • don't check names of derived instance class members,
      • methods that are decorated as properties are now treated as attributes,
      • names in global statements are now checked against the regular expression for constants,
      • for toplevel name assignment, the class name regex will be used if pylint can detect that value on the right-hand side is a class (like collections.namedtuple()),
      • add new name type 'class_attribute' for attributes defined in class scope. By default, allow both const and variable names.
    • Add a configuration option for missing-docstring to optionally exempt short functions/methods/classes from the check.
    • Add the type of the offending node to missing-docstring and empty-docstring.
    • Do not warn about redefinitions of variables that match the dummy regex.
    • Do not treat all variables starting with "_" as dummy variables, only "_" itself.
    • Make the line-too-long warning configurable by adding a regex for lines for with the length limit should not be enforced.
    • Do not warn about a long line if a pylint disable option brings it above the length limit.
    • Do not flag names in nested with statements as undefined.
    • Remove string module from the default list of deprecated modules (bitbucket #3).
    • Fix incomplete-protocol false positive for read-only containers like tuple (bitbucket #25).

    Other changes

    • Support for pkgutil.extend_path and setuptools pkg_resources (logilab-common #8796).
    • New utility classes for per-checker unittests in testutils.py
    • Added a new base class and interface for checkers that work on the tokens rather than the syntax, and only tokenize the input file once.
    • epylint shouldn't hang anymore when there is a large output on pylint'stderr (bitbucket #15).
    • Put back documentation in source distribution (bitbucket #6).

    Astroid

    • New API to make it smarter by allowing transformation functions on any node, providing a register_transform function on the manager instead of the register_transformer to make it more flexible wrt node selection
    • Use this new transformation API to provide support for namedtuple (actually in pylint-brain, logilab-astng #8766)
    • Better description of hashlib
    • Properly recognize methods annotated with abc.abstract{property,method} as abstract.
    • Added the test_utils module for building ASTs and extracting deeply nested nodes for easier testing.

  • Astroid 1.0 released!

    2013/08/02 by Sylvain Thenault

    Astroid is the new name of former logilab-astng library. It's an AST library, used as the basis of Pylint and including Python 2.5 -> 3.3 compatible tree representation, statical type inference and other features useful for advanced Python code analysis, such as an API to provide extra information when statistical inference can't overcome Python dynamic nature (see the pylint-brain project for instance).

    It has been renamed and hosted to bitbucket to make clear that this is not a Logilab dedicated project but a community project that could benefit to any people manipulating Python code (statistical analysis tools, IDE, browser, etc).

    Documentation is a bit rough but should quickly improve. Also a dedicated web-site is now online, visit www.astroid.org (or https://bitbucket.org/logilab/astroid for development).

    You may download and install it from Pypi or from Logilab's debian repositories.


  • Going to DebConf13

    2013/08/01 by Julien Cristau

    The 14th Debian developers conference (DebConf13) will take place between August 11th and August 18th in Vaumarcus, Switzerland.

    Logilab is a DebConf13 sponsor, and I'll attend the conference. There are quite a lot of cloud-related events on the schedule this year, plus the usual impromptu discussions and hallway track. Looking forward to meeting the usual suspects there!

    https://www.logilab.org/file/158611/raw/dc13-btn0-going-bg.png

  • We hosted the Salt Sprint in Paris

    2013/07/30 by Arthur Lutz

    Last Friday, we hosted the French event for the international Great Salt Sprint. Here is a report on what was done and discussed on this occasion.

    http://www.logilab.org/file/228931/raw/saltstack_logo.jpg

    We started off by discussing various points that were of interest to the participants :

    • automatically write documentation from salt sls files (for Sphinx)
    • salt-mine add security layer with restricted access (bug #5467 and #6437)
    • test compatibility of salt-cloud with openstack
    • module bridge bug correction : traceback on KeyError
    • setting up the network in debian (equivalent of rh_ip)
    • configure existing monitoring solution through salt (add machines, add checks, etc) on various backends with a common syntax

    We then split up into pairs to tackle issues in small groups, with some general discussions from time to time.

    6 people participated, 5 from Logilab, 1 from nbs-system. We were expecting more participants but some couldn't make it at the last minute, or though the sprint was taking place at some other time.

    Unfortunately we had a major electricity black out all afternoon, some of us switched to battery and 3G tethering to carry on, but that couldn't last all afternoon. We ended up talking about design and use cases. ERDF (French electricity distribution company) ended up bringing generator trucks for the neighborhood !

    Arthur & Benoit : monitoring adding machines or checks

    http://www.logilab.org/file/157971/raw/salt-centreon-shinken.png

    Some unfinished draft code for supervision backends was written and pushed on github. We explored how a common "interface" could be done in salt (using a combination of states and __virtual___). The official documentation was often very useful, reading code was also always a good resource (and the code is really readable).

    While we were fixing stuff because of the power black out, Benoit submitted a bug fix.

    David & Alain : generate documentation from salt state & salt master

    The idea is to couple the SLS description and the current state of the salt master to generate documentation about one's infrastructure using Sphinx. This was transmitted to the mailing-list.

    http://www.logilab.org/file/157976/raw/salt-sphinx.png

    Design was done around which information should be extracted and display and how to configure access control to the salt-master, taking a further look to external_auth and salt-api will probably be the way forward.

    General discussions

    We had general discussions around concepts of access control to a salt master, on how to define this access. One of the things we believe to be missing (but haven't checked thoroughly) is the ability to separate the "read-only" operations to the "read-write" operations in states and modules, if this was done (through decorators?) we could easily tell salt-api to only give access to data collection. Complex scenarios of access were discussed. Having a configuration or external_auth based on ssh public keys (similar to mercurial-server would be nice, and would provide a "limited" shell to a mercurial server.

    Conclusion

    The power black out didn't help us get things done, but nevertheless, some sharing was done around our uses cases around SaltStack and features that we'd want to get out of it (or from third party applications). We hope to convert all the discussions into bug reports or further discussion on the mailing-lists and (obviously) into code and pull-requests. Check out the scoreboard for an overview of how the other cities contributed.

    to comment this post you need to login or create an account


  • The Great Salt Sprint Paris Location is Logilab

    2013/07/12 by Nicolas Chauvat
    http://farm1.static.flickr.com/183/419945378_4ead41a76d_m.jpg

    We're happy to be part of the second Great Salt Sprint that will be held at the end of July 2013. We will be hosting the french sprinters on friday 26th in our offices in the center of Paris.

    The focus of our Logilab team will probably be Test-Driven System Administration with Salt, but the more participants and the topics, the merrier the event.

    Please register if you plan on joining us. We will be happy to meet with fellow hackers.

    photo by Sebastian Mary under creative commons licence.


  • Compte rendu PGDay France 2013 (Nantes) - partie 2/2

    2013/07/01 by Arthur Lutz

    Ora2Pg: Migration à Grande Vitesse par Gilles Darold

    L'outil ora2pg permet de jouer une migration d'Oracle à Postgresql. Malgré notre absence de besoin de ce type de migration, cette présentation fut l'occasion de parler d'optimisation diverses qui peuvent être appliqué à des imports massifs tel que nous pouvons en pratiquer à Logilab.

    Par exemple :

    • utilisation prioritaire de COPY plutôt que INSERT,
    • supprimer les indexes/contraintes/triggers/sequences (les créer en fin d'import),
    • maintenir un _work_mem_ élevé (pour la création des index et contraintes),
    • augmenter les checkpoin_segments (>64/1Gb).

    Quelques réglages systèmes peuvent être mis en place le temps du chargement (on les rebranche une fois que le serveur passe en "production") :

    • fsync=off
    • full_page_writes=off
    • synchronous_commit=off
    • WAL (Write Ahead Log) sur des disques SSD
    • kernel : vm.dirty_background_ratio = 1
    • kernel : vm.dirty_ratio = 2

    Coté Postresql, les paramètres suivants sont probablement à modifier (les chiffres sont à titre d'exemple, la configuration matérielle est bien particulière, par exemple 64G de RAM, voir les diapos pour le détail):

    • shared_buffers = 10G
    • maintenacen_work_mem = 2G
    • checkpoint_segments = 61
    • checkpoint_completion_target = 0.9
    • wal_level = minimal
    • archive_mode = off
    • wal_buffer = 32 Mo
    • désactiver les logs
    http://ora2pg.darold.net/ora2pg-logo.png

    Pour aller plus loin voir le support de présentation (5).

    Vers le Peta Byte avec PostgreSQL par Dimitri Fontaine

    Dimitri Fontaine a admirablement bien traité cette question complexe qu'est la montée à l'échelle en terme de volume de données. Son approche fut très didactique, avec, à chaque concept, un rappel de ce dont il s'agisait. Par exemple, parlant du MVCC, il explique qu'il s'agit de plusieurs définitions en parallèle d'une même ligne. Ensuite on décide avec VACUUM quelle version doit être visible par tout le monde. Par défaut AUTOVACCUM se déclenche quand 20% des données ont changé.

    Face aux difficultés et aux inconvénients de stocker la totalité d'une PetaByte sur un seul serveur, Dimitri Fontaine a évoqué les solutions possible en terme d'architecture distribuée pour permettre de stocker ce PetaByte sur plusieurs serveurs répartis. La "Bi-Directional Replication" (qui sera dispo dans le futur) permetterait d'avoir plusieurs bases SQL qui séparément stockent une partie des données (cf EVENT TRIGGERS). Un approche complémentaire serait d'utiliser plproxy qui par des procédures stockées permet de répartir les requêtes sur plusieurs serveurs.

    http://www.logilab.org/file/150419/raw/Screenshot%20from%202013-07-01%2010%3A25%3A46.png

    Finalement, un point qui nous a paru pertinent : il a parlé de haute disponibilité et des flous techniques qui entourent le sujet. Il faut bien faire la différence entre la disponibilité des données et du service. Par exemple, en WARM STANDBY les données sont disponibles mais il faut redémarrer Postgresql pour fournir le service alors que en HOT STANDBY les données sont disponibles et le serveur peut fournir les services.

    Pour aller plus loin voir le support de présentation (6).

    Comprendre EXPLAIN par Guillaume Lelarge

    Utiliser EXPLAIN permet de débugger l’exécution d'une requête SQL ou de l'optimiser. On touche assez rapidement au fonctionnement interne de Postgresql qui est relativement bien documentés. Guillaume Lelarge a donc, à l'aide de nombreux exemples, présenté des mécanismes plutôt bas niveau de Postgresql.

    Notons entre autres les différents types de scans dont les noms sont relativement explicites :

    Dans la même veine, les types de tris :

    • external sort (sur disque),
    • quicksort (en mémoire).

    Mais aussi, prenez le temps de lire les exemples sur son support de présentation (7)

    Conclusion

    https://www.pgday.fr/_media/pgfr2.png

    PGDay.fr est une conférence que nous pouvons vivement vous recommander, proposant un savant mélange des différentes questions auxquelles nous sommes confrontées lorsque nous travaillons avec des bases de données. Aussi bien en tant qu'administrateur système, développeur, ou utilisateur de SQL. Il va sans dire que le niveau technique était très pointu. Les présentations restaient pourtant accessibles. Les orateurs et organisateurs étaient disponibles pour des interactions, permettant de prolonger la réflexion et les discussions au delà des présentations.

    Nous envisageons d'ores et déjà d'aller à l'édition 2014! À l'année prochaine...

    http://www.logilab.org/file/150100/raw/2e1ax_default_entry_postgresql.jpg

  • Compte rendu PGDay France 2013 (Nantes) - partie 1/2

    2013/07/01 by Arthur Lutz

    Quelques personnes de Logilab ont assisté aux PGDay 2013 à Nantes. Voici quelques points qui nous ont marqués.

    http://www.cubicweb.org/file/2932005/raw/hdr_left.png

    Gestion de la capacité des ressources mémoire d'un serveur PostgreSQL par Cédric Villemain

    Cédric Villemain nous a exposé plusieurs pistes d'investigation de la gestion mémoire de Postgresql.

    On peut employer des outils Linux tels que vmstat, pmap, meminfo, numactl, mais aussi des outils spécifiques à Postgresql, tels que pg_stat (hit ratio), pg_buffercache (audit de la mémoire cache), pgfincore (audit du cache de l'OS).

    Il faut mettre des sondes sur les tables et indexes critiques de manière à avoir une idée du fonctionnement "normal" pour ensuite détecter le fonctionnement "anormal". À Logilab, nous utilisons beaucoup munin, pour lequel plusieurs greffons Postgresql sont disponibles : munin pg plugins et pymunin.

    Pour aller plus loin voir le support de présentation (1).

    Les nouveautés de PostgreSQL 9.3 par Damien Clochard

    Damien Clochard a fait une très synthétique présentation des fonctionnalités de la nouvelle version de PostgreSQL 9.3. Le cycle de release de Postgresql dure 1 an, donc la periode de beta est courte, il faut que la communauté soit impliquée pour tester rapidement. Damien en profite pour chanter les louanges de PostgreSQL, qui est, selon lui, le SGBD le plus dynamique au monde: 1 version majeure par an, 4-5 versions mineures par an, et un support de 5 ans des versions majeures.

    Actuellement, cela signifie que 5 versions majeures sont maintenues (notamment en terme de sécurité) en parallèle : 8.4, 9.0, 9.1, 9.2, 9.3beta1. Notons donc que la version 9.3 qui sort bientôt sera donc supportée jusqu'à 2018.

    http://www.logilab.org/file/150442/raw/elephant.png

    Pour les nouveautés, difficiles à résumer, notons néanmoins :

    • des gains de performance,
    • des verrous possibles sur les clés étrangères,
    • une gestion mémoire plus fine,
    • la possibilité de faire des pg_dump en parallèle (--jobs),
    • scénarios supplémentaires de réplication,
    • possibilité de "bascule rapide" en architecture répliquée,
    • facilitation de mise en place d'un serveur clone (génération automatique du fichier recovery.conf),
    • vue matérialisées,
    • opérateurs supplémentaires pour JSON (citation "MongoDB avec la tranquilité (ACID)"),
    • les requètes LATERAL
    • extensibilité avec des processus supplémentaires permettant des opérations de maintenance, de supervision ou d'optimisation,
    • des backends supplémentaires pour les "Foreign Data Wrappers" (introduits en 9.1),
    • possibilité de séparer le fichier de configuration en plusieurs sous-fichiers (utile pour une pilotage par SaltStack par exemple).

    Damien en a profité pour parler un peu des points forts prévus pour la version 9.4 :

    • simplification de la montée en charge,
    • réplication logique (répliquer une seule table par exemple),
    • parallélisation des requêtes (multi-coeurs),
    • stockages internes

    En bref, concis et alléchant. Vivement qu'on ait cette version en production.

    En attendant on a profité pour l'installer à partir des entrepôts Debian gérés par la communauté Postgresql.

    Pour aller plus loin voir le support de présentation (2).

    "Ma base de données tiendrait-elle la charge ?" par Philippe Beaudouin

    Philippe Beaudoin a utilisé pour cette analyse une combinaison de pgbench (injection), et la table pg_stat_statements qui collecte les statistiques sur l'utilisation mémoire des requêtes : produit une table avec les query, nombre d'appels, temps passé, nombre de lignes, etc.

    L'idée générale est d'établir un profil de charge transactionnel sur la production pour pouvoir comparer avec la pré-production ou la future plateforme.

    Pour éviter de devoir copier les données de production (lent, problème de confidentialité, etc), il est conseillé d'utiliser "generate_series" pour remplir la base de données de données de test.

    pgbench utilise des scenario TPC-B (Transaction Processing Performance Council Benchmarks) Pour chaque scénario (4 scénarios dans son exemple) on a une cible TPS (transaction par secondes). Il recommande de faire attention à ne pas modifier considérablement la taille de la base avec les scénarios (ex. trop de DELETE, ou trop d'INSERTs).

    Un astuce pour charger le cache linux

    find <files> -exec dd if='{}' of=/dev/null\;
    

    Si on ne sait pas quels fichiers charger, on peut utiliser pg_relation_filepath(oid) FROM pg_class where relname like 'tbl%' pour trouver en SQL quels fichiers contiennent les données.

    Nous avons demandé si un outil de type GOR (flux UDP de la production vers la pre-production ou le serveur de développement pour les requêtes HTTP) existait pour Postgresql.

    http://www.logilab.org/file/150448/raw/gor.png

    Réponse : Tsung contient un mode proxy qui permet d'enregistrer la production, ensuite de la rejouer en pre-prod, mais pas en mode live. À priori il serait possible de combiner plusieurs outils existant pour faire cela (pgShark ?). La problématique n'est pas simple notamment lorsque les bases de données divergent (index, series, etc).

    Pour aller plus loin voir le support de présentation (3).

    PostGIS 2.x et au delà par Hugo Mercier

    Nous avons trouvé la présentation réussie. Elle introduisait les concepts et les nouveautés de PostGIS 2.x. Ayant intégré des fonctionnalités de PostGIS à CubicWeb et travaillant un peu sur la visualisation en WebGL de données stockées dans CubicWeb+Postgresql, nous avons pu réfléchir à des possibilités de délégation de traitement à la base de donnée.

    http://www.logilab.org/file/150441/raw/Screenshot%20from%202013-07-01%2010%3A30%3A00.png

    Nous nous sommes aussi interrogés sur le passage à l'échelle d'applications web qui font de l'affichage de données géographiques, pour éviter d'envoyer au navigateurs un volume de données trop important (geoJSON ou autre). Il y aurait des architectures possible avec une délégation à Postgresql du travail de niveaux de zoom sur des données géographiques.

    Pour aller plus loin voir le support de présentation.


  • Notes suite au 4ème rendez-vous OpenStack à Paris

    2013/06/27 by Paul Tonelli

    Ce 4ème rendez-vous s'est déroulé à Paris le 24 juin 2013.

    http://www.logilab.org/file/149646/raw/openstack-cloud-software-vertical-small.png

    Retours sur l'utilisation d'OpenStack en production par eNovance

    Monitoring

    Parmi les outils utiles pour surveiller des installation OpenStack on peut citer OpenNMS (traps SNMP...). Cependant, beaucoup d'outils posent des problèmes, notamment dans le cadre de l’arrêt ou de la destruction de machines virtuelles (messages d'erreur non souhaités). Il ne faut pas hésiter à aller directement se renseigner dans la base SQL, c'est souvent le seul moyen d'accéder à certaines infos. C'est aussi un bon moyen pour exécuter correctement certaines opérations qui ne se sont pas déroulées convenablement dans l'interface. L'utilisation possible de SuperNova est recommandée si on utilise plusieurs instances de Nova (multi-environnement).

    Problèmes durant les migrations

    Citation : "plus on avance en version, mieux la procédure se déroule".

    La pire migration a été celle de Diablo vers Essex (problèmes d'IPs fixes qui passent en dynamique, problèmes de droits). Même si tout semble bien se passer sur le moment, faire un redémarrage des nœuds hardware / nœud(s) de contrôle est recommandé pour vérifier que tout fonctionne correctement après le redémarrage.

    Si on veut supprimer un nœud qui fait tourner des machines, une commande existe pour ne pas recevoir de nouvelles machines (en cas de panne matérielle par exemple).

    Afin d'avoir de la redondance, ne pas utiliser les nœuds hébergeant les machines à plus de 60% permet d'avoir de la place pour héberger des machines déplacées en cas de panne / disparition d'un des nœuds nova-compute.

    Au niveau migration, ce qui est bien documenté pour KVM est la migration avec du stockage partagé ou 'live storage' (le système de fichier contenant le disque de la machine est disponible sur la source de la machine et sur la destination). Il existe un second mode, avec migration par 'block' qui permet de migrer des disques s'ils ne sont pas sur un système de fichier partagé. Ce second mécanisme a été présent, puis rendu obsolète, mais une nouvelle version semble aujourd'hui fonctionnelle.

    À propos de la fonction terminate

    La fonction terminate détruit définitivement une machine virtuelle, sans possibilité de retour en arrière. Il n'est pas évident pour une personne qui commence sur OpenStack de le comprendre : elle voit souvent ce bouton comme "juste" éteindre la machine. Pour éviter les gros problèmes (tuer une machine client par erreur et sans retour en arrière possible), il existe 2 solutions :

    Pour les admins il existe un paramètre dans la configuration d'une instance :

    UPDATE SED disable_terminate '1'
    

    Pour les autres (non-admins), on peut utiliser une fonction de verrouillage (nova lock) d'une machine afin d'éviter le problème. (Les deux utilisent le même mécanisme).

    Token

    Les tokens servent à authentifier / autoriser une transaction sur les différentes versions d'OpenStack. Il est important que la base qui les stocke soit indexée pour améliorer les performances. Cette base grossit très vite, notamment avec grizzly où les tokens reposent sur un système PKI. Il est utile de faire une sauvegarde régulière de cette base avant de la flusher (pour permettre à un client de récupérer un ancien token notamment).

    Espace de stockage

    Ce n'est pas conforme aux bonnes pratiques, mais avoir juste un /var énorme pour le stockage fonctionne bien, notamment quand on a besoin de stocker des snapshots de machines Windows ou de grosses machines.

    Ceph

    https://www.logilab.org/file/149647/raw/logo.png

    Spécial citation: "c'est du libre, du vrai".

    Ceph est un projet permettant d'avoir un stockage redondant sur plusieurs machines, et offrant trois opérations principales : get, put, delete. Le système repose sur l'utilisation par le client d'une carte, fournie par les serveurs ceph, qui lui permet de trouver et de récupérer facilement un objet stocké.

    Le pilote de stockage virtualisé est parfaitement fonctionnel et tourne déjà en production. Ce pilote permet d'utiliser Ceph comme périphérique de stockage pour des machines virtuelles. Il existe un remplacement (Rados Gateway Daemon) qui implémente l'interface swift pour OpenStack et permet d'utiliser Ceph en backend. Ce pilote est qualifié de stable.

    Le pilote noyau (librbd) permettant de voir du stockage Ceph comme des disques est encore en développement, mais devrait fonctionner à peu près correctement si le noyau est >= 3.8 (3.10 conseillé).

    Performances

    En écriture, une division par deux des performances est à prévoir par rapport à une utilisation directe des disques (il faut que les N copies ait été stockées avant de valider la transaction). En lecture, Ceph est plus rapide vu l'agrégation possible des ressources de stockage.

    Traduction de Openstack

    La communauté cherche des gens motivés pour traduire le texte des différents outils Openstack.

    Neutron (anciennement Quantum)

    Pas mal de travail en cours, la version Havana devrait avoir la capacité de passer mieux à l'échelle et permettre la haute disponibilité (ça fonctionne toujours même quand une des machines qui hébergent le service plante) pour tout ce qui est DHCP / L3.

    Sur l'organisation d'une communauté OpenStack en France

    Différents acteurs cherchent actuellement à se regrouper, mais rien n'est encore décidé.


show 315 results