Blog entries september 2013 [3]

DebConf13 report

2013/09/25 by Julien Cristau

As announced before, I spent a week last month in Vaumarcus, Switzerland, attending the 14th Debian conference (DebConf13).

It was great to be at DebConf again, with lots of people I hadn't seen since New York City three years ago, and lots of new faces. Kudos to the organizers for pulling this off. These events are always a great boost for motivation, even if the amount of free time after coming back home is not quite as copious as I might like.

One thing that struck me this year was the number of upstream people, not directly involved in Debian, who showed up. From systemd's Lennart and Kay, to MariaDB's Monty, and people from upstart, dracut, phpmyadmin or munin. That was a rather pleasant surprise for me.

Here's a report on the talks and BoF sessions I attended. It's a bit long, but hey, the conference lasted a week. In addition to those I had quite a few chats with various people, including fellow members of the Debian release team.

http://debconf13.debconf.org/images/logo.png

Day 1 (Aug 11)

Linux kernel : Ben Hutchings made a summary of the features added between 3.2 in wheezy and the current 3.10, and their status in Debian (some still need userspace work).

SPI status : Bdale Garbee and Jimmy Kaplowitz explained what steps SPI is making to deal with its growth, including getting help from a bookkeeper recently to relieve the pressure on the (volunteer) treasurer.

Hardware support in Debian stable : If you buy new hardware today, it's almost certainly not supported by the Debian stable release. Ideas to improve this :

  • backport whole subsystems: probably not feasible, risk of regressions would be too high
  • ship compat-drivers, and have the installer automatically install newer drivers based on PCI ids, seems possible.
  • mesa: have the GL loader pick a different driver based on the hardware, and ship newer DRI drivers for the new hardware, without touching the old ones. Issue: need to update libGL and libglapi too when adding new drivers.
  • X drivers, drm: ? (it's complicated)

Meeting between release team and DPL to figure out next steps for jessie. Decided to schedule a BoF later in the week.

Day 2 (Aug 12)

Munin project lead on new features in 2.0 (shipped in wheezy) and roadmap for 2.2. Improvements on the scalability front (both in terms of number of nodes and number of plugins on a node). Future work includes improving the UI to make it less 1990 and moving some metadata to sql.

jeb on AWS and Debian : Amazon Web Services (AWS) includes compute (ec2), storage (s3), network (virtual private cloud, load balancing, ..) and other services. Used by Debian for package rebuilds. http://cloudfront.debian.net is a CDN frontend for archive mirrors. Official Debian images are on ec2, including on the AWS marketplace front page. build-debian-cloud tool from Anders Ingeman et al. was presented.

openstack in Debian : Packaging work is focused on making things easy for newcomers, basic config with debconf. Advanced users are going to use puppet or similar anyway. Essex is in wheezy, but end-of-life upstream. Grizzly available in sid and in a separate archive for wheezy. This work is sponsored by enovance.

Patents : http://patents.stackexchange.com, looks like the USPTO has used comments made there when rejecting patent applications based on prior art. Patent applications are public, and it's a lot easier to get a patent application rejected than invalidate a patent later on. Should we use that site? Help build momentum around it? Would other patent offices use that kind of research? Issues: looking at patent applications (and publicly commenting) might mean you're liable for treble damages if the patent is eventually granted? Can you comment anonymously?

Why systemd? : Lennart and Kay. Pop corn, upstart trolling, nothing really new.

Day 3 (Aug 13)

dracut : dracut presented by Harald Hoyer, its main developer. Seems worth investigating replacing initramfs-tools and sharing the maintenance load. Different hooks though, so we'll need to coordinate this with various packages.

upstart : More Debian-focused than the systemd talk. Not helped by Canonical's CLA...

dh_busfactor : debhelper is essentially a one-man show from the beginning. Though various packages/people maintain different dh_* tools either in the debhelper package itself or elsewhere. Joey is thinking about creating a debhelper team including those people. Concerns over increased breakage while people get up to speed (joeyh has 10 years of experience and still occasionally breaks stuff).

dri3000 : Keith is trying to fix dri2 issues. While dri2 fixed a number of things that were wrong with dri1, it still has some problems. One of the goals is to improve presentation: we need a way to sync between app and compositor (to avoid displaying incompletely drawn frames), avoid tearing, and let the app choose immediate page flip instead of waiting for next vblank if it missed its target (stutter in games is painful). He described this work on his blog.

security team BoF : explain the workflow, try to improve documentation of the process and what people can do to help. http://security.debian.org/

Day 4 (Aug 14)

day trip, and conference dinner on a boat from Neuchatel to Vaumarcus

Day 5 (Aug 15)

git-dpm : Spent half an hour explaining git, then was rushed to show git-dpm itself. Still, needs looking at. Lets you work with git and export changes as quilt series to build a source package.

Ubuntu daily QA : The goal was to make it possible for canonical devs (not necessarily people working on the distro) to use ubuntu+1 (dev release). They tried syncing from testing for a while, but noticed bug fixes being delayed: not good. In the previous workflow the dev release was unusable/uninstallable for the first few months. Multiarch made things even more problematic because it requires amd64/i386 being in sync.

  • 12.04: a bunch of manpower thrown at ubuntu+1 to keep backlog of technical debt under control.
  • 12.10: prepare infrastructure (mostly launchpad), add APIs, to make non-canonical people able to do stuff that previously required shell access on central machines.
  • 13.04: proposed migration. britney is used to migrate packages from devel-proposed to devel. A few teething problems at first, but good reaction.
  • 13.10 and beyond: autopkgtest runs triggered after upload/build, also for rdeps. Phased updates for stable releases (rolled out to a subset of users and then gradually generalized). Hook into errors.ubuntu.com to match new crashes with package uploads. Generally more continuous integration. Better dashboard. (Some of that is still to be done.)

Lessons learned from debian:

  • unstable's backlog can get bad → proposed is only used for builds and automated tests, no delay
  • transitions can take weeks at best
  • to avoid dividing human attention, devs are focused on devel, not devel-proposed

Lessons debian could learn:

  • keeping testing current is a collective duty/win
  • splitting users between testing and unstable has important costs
  • hooking automated testing into britney really powerful; there's a small but growing number of automated tests

Ideas:

  • cut migration delay in half
  • encourage writing autopkgtests
  • end goal: make sid to testing migration entirely based on automated tests

Debian tests using Jenkins http://jenkins.debian.net

  • https://github.com/h01ger/jenkins-job-builder
  • Only running amd64 right now.
  • Uses jenkins plugins: git, svn, log parser, html publisher, ...
  • Has existing jobs for installer, chroot installs, others
  • Tries to make it easy to reproduce jobs, to allow debugging
  • {c,sh}ould add autopkgtests

Day 6 (Aug 16)

X Strike Force BoF : Too many bugs we can't do anything about: {mass,auto}-close them, asking people to report upstream. Reduce distraction by moving the non-X stuff to separate teams (compiz removed instead, wayland to discuss...). We should keep drivers as close to upstream as possible. A couple of people in the room volunteered to handle the intel, ati and input drivers.

reclass BoF

I had missed the talk about reclass, and Martin kindly offered to give a followup BoF to show what reclass can do.

Reclass provides adaptors for puppet(?), salt, ansible. A yaml file describes each host:

  • can declare applications and parameters
  • host is leaf in a dag/tree of classes

Lets you put the data in reclass instead of the config management tool, keeping generic templates in ansible/salt.

I'm definitely going to try this and see if it makes it easier to organize data we're currently putting directly in salt states.

release BoF : Notes are on http://gobby.debian.org. Basic summary: "Releasing in general is hard. Releasing something as big/diverse/distributed as Debian is even harder." Who knew?

freedombox : status update from Bdale

Keith Packard showed off the free software he uses in his and Bdale's rockets adventures.

This was followed by a birthday party in the evening, as Debian turned 20 years old.

Day 7 (Aug 17)

x2go : Notes are on http://gobby.debian.org. To be solved: issues with nx libs (gpl fork of old x). Seems like a good thing to try as alternative to LTSP which we use at Logilab.

lightning talks

  • coquelicot (lunar) - one-click secure(ish) file upload web app
  • notmuch (bremner) - need to try that again now that I have slightly more disk space
  • fedmsg (laarmen) - GSoC, message passing inside the debian infrastructure

Debconf15 bids :

  • Mechelen/Belgium - Wouter
  • Germany (no city yet) - Marga

Debconf14 presentation : Will be in Portland (Portland State University) next August. Presentation by vorlon, harmoney, keithp. Looking forward to it!

  • Closing ceremony

The videos of most of the talks can be downloaded, thanks to the awesome work of the video team. And if you want to check what I didn't see or talk about, check the complete schedule.


JDEV2013 - Software development conference of CNRS

2013/09/14 by Nicolas Chauvat

I had the pleasure to be invited to lead a tutorial at JDEV2013 titled Learning TDD and Python in Dojo mode.

http://www.logilab.org/file/177427/raw/logo_JDEV2013.png

I quickly introduced the keywords with a single slide to keep it simple:

http://Python.org
+ Test Driven Development (Test, Code, Refactor)
+ Dojo (house of training: Kata / Randori)
= Calculators
  - Reverse Polish Notation
  - Formulas with Roman Numbers
  - Formulas with Numbers in letters

As you can see, I had three types of calculators, hence at least three Kata to practice, but as usual with beginners, it took us the whole tutorial to get done with the first one.

The room was a class room that we set up as our coding dojo with the coder and his copilot working on a laptop, facing the rest of the participants, with the large screen at their back. The pair-programmers could freely discuss with the people facing them, who were following the typing on the large screen.

We switched every ten minutes: the copilot became coder, the coder went back to his seat in the class and someone else stood up to became the copilot.

The session was allocated 3 hours split over two slots of 1h30. It took me less than 10 minutes to open the session with the above slide, 10 minutes as first coder and 10 minutes to close it. Over a time span of 3 hours, that left 150 minutes for coding, hence 15 people. Luckily, the whole group was about that size and almost everyone got a chance to type.

I completely skipped explaining Python, its syntax and the unittest framework and we jumped right into writing our first tests with if and print statements. Since they knew about other programming languages, they picked up the Python langage on the way.

After more than an hour of slowly discovering Python and TDD, someone in the room realized they had been focusing more on handling exception cases and failures than implementing the parsing and computation of the formulas because the specifications where not clearly understood. He then asked me the right question by trying to define Reverse Polish Notation in one sentence and checking that he got it right.

Different algorithms to parse and compute RPN formulas where devised at the blackboard over the pause while part of the group went for a coffee break.

The implementation took about another hour to get right, with me making sure they would not wander too far from the actual goal. Once the stack-based solution was found and implemented, I asked them to delete the files, switch coder and start again. They had forgotten about the Kata definition and were surprised, but quickly enjoyed it when they realized that progress was much faster on the second attempt.

Since it is always better to show that you can walk the talk, I closed the session by praticing the RPN calculator kata myself in a bit less than 10 minutes. The order in which to write the tests is the tricky part, because it can easily appear far-fetched for such a small problem when you already know an algorithm that solves it.

Here it is:

import operator

OPERATORS = {'+': operator.add,
             '*': operator.mul,
             '/': operator.div,
             '-': operator.sub,
             }

def compute(args):
    items = args.split()
    stack = []
    for item in items:
        if item in OPERATORS:
            b,a = stack.pop(), stack.pop()
            stack.append(OPERATORS[item](a,b))
        else:
            stack.append(int(item))
    return stack[0]

with the accompanying tests:

import unittest
from npi import compute

class TestTC(unittest.TestCase):

    def test_unit(self):
        self.assertEqual(compute('1'), 1)

    def test_dual(self):
        self.assertEqual(compute('1 2 +'), 3)

    def test_tri(self):
        self.assertEqual(compute('1 2 3 + +'), 6)
        self.assertEqual(compute('1 2 + 3 +'), 6)

    def test_precedence(self):
        self.assertEqual(compute('1 2 + 3 *'), 9)
        self.assertEqual(compute('1 2 * 3 +'), 5)

    def test_zerodiv(self):
        self.assertRaises(ZeroDivisionError, compute, '10 0 /')

unittest.main()

Apparently, it did not go too bad, for I had positive comments at the end from people that enjoyed discovering in a single session Python, Test Driven Development and the Dojo mode of learning.

I had fun doing this tutorial and thank the organizators for this conference!


Going to EuroScipy2013

2013/09/04 by Alain Leufroy

The EuroScipy2013 conference was held in Bruxelles at the Université libre de Bruxelles.

http://www.logilab.org/file/175984/raw/logo-807286783.png

As usual the first two days were dedicated to tutorials while the last two ones were dedicated to scientific presentations and general python related talks. The meeting was extended by one more day for sprint sessions during which enthusiasts were able to help free software projects, namely sage, vispy and scipy.

Jérôme and I had the great opportunity to represent Logilab during the scientific tracks and the sprint day. We enjoyed many talks about scientific applications using python. We're not going to describe the whole conference. Visit the conference website if you want the complete list of talks. In this article we will try to focus on the ones we found the most interesting.

First of all the keynote by Cameron Neylon about Network ready research was very interesting. He presented some graphs about the impact of a group job on resolving complex problems. They revealed that there is a critical network size for which the effectiveness for solving a problem drastically increase. He pointed that the source code accessibility "friction" limits the "getting help" variable. Open sourcing software could be the best way to reduce this "friction" while unit testing and ongoing integration are facilitators. And, in general, process reproducibility is very important, not only in computing research. Retrieving experimental settings, metadata, and process environment is vital. We agree with this as we are experimenting it everyday in our work. That is why we encourage open source licenses and develop a collaborative platform that provides the distributed simulation traceability and reproducibility platform Simulagora (in french).

Ian Ozsvald's talk dealt with key points and tips from his own experience to grow a business based on open source and python, as well as mistakes to avoid (e.g. not checking beforehand there are paying customers interested by what you want to develop). His talk was comprehensive and mentioned a wide panel of situations.

http://vispy.org/_static/img/logo.png

We got a very nice presentation of a young but interesting visualization tools: Vispy. It is 6 months old and the first public release was early August. It is the result of the merge of 4 separated libraries, oriented toward interactive visualisation (vs. static figure generation for Matplotlib) and using OpenGL on GPUs to avoid CPU overload. A demonstration with large datasets showed vispy displaying millions of points in real time at 40 frames per second. During the talk we got interesting information about OpenGL features like anti-grain compared to Matplotlib Agg using CPU.

We also got to learn about cartopy which is an open source Python library originally written for weather and climate science. It provides useful and simple API to manipulate cartographic mapping.

Distributed computing systems was a hot topic and many talks were related to this theme.

https://www.openstack.org/themes/openstack/images/openstack-logo-preview-full-color.png

Gael Varoquaux reminded us what are the keys problems with "biggish data" and the key points to successfully process them. I think that some of his recommendations are generally useful like "choose simple solutions", "fail gracefully", "make it easy to debug". For big data processing when I/O limit is the constraint, first try to split the problem into random fractions of the data, then run algorithms and aggregate the results to circumvent this limit. He also presented mini-batch that takes a bunch of observations (trade-off memory usage/vectorization) and joblib.parallel that makes I/O faster using compression (CPUs are faster than disk access).

Benoit Da Mota talked about shared memory in parallel computing and Antonio Messina gave us a quick overview on how to build a computing cluster with Elasticluster, using OpenStack/Slurm/ansible. He demonstrated starting and stopping a cluster on OpenStack: once all VMs are started, ansible configures them as hosts to the cluster and new VMs can be created and added to the cluster on the fly thanks to a command line interface.

We also got a keynote by Peter Wang (from Continuum Analytics) about the future of data analysis with Python. As a PhD in physics I loved his metaphor of giving mass to data. He tried to explain the pain that scientists have when using databases.

https://scikits.appspot.com/static/images/scipyshiny_small.png

After the conference we participated to the numpy/scipy sprint. It was organized by Ralph Gommers and Pauli Virtanen. There were 18 people trying to close issues from different difficulty levels and had a quick tutorial on how easy it is to contribute: the easiest is to fork from the github project page on your own github account (you can create one for free), so that later your patch submission will be a simple "Pull Request" (PR). Clone locally your scipy fork repository, and make a new branch (git checkout -b <newbranch>) to tackle one specific issue. Once your patch is ready, commit it locally, push it on your github repository and from the github interface choose "Push request". You will be able to add something to your commit message before your PR is sent and looked at by the project lead developers. For example using "gh-XXXX" in your commit message will automatically add a link to the issue no. XXXX. Here is the list of open issues for scipy; you can filter them, e.g. displaying only the ones considered easy to fix :D

For more information: Contributing to SciPy.