As usual the first two days were dedicated to tutorials while the last two ones were dedicated to scientific presentations and general python related talks. The meeting was extended by one more day for sprint sessions during which enthusiasts were able to help free software projects, namely sage, vispy and scipy.
Jérôme and I had the great opportunity to represent Logilab during the scientific tracks and the sprint day. We enjoyed many talks about scientific applications using python. We're not going to describe the whole conference. Visit the conference website if you want the complete list of talks. In this article we will try to focus on the ones we found the most interesting.
First of all the keynote by Cameron Neylon about Network ready research was very interesting. He presented some graphs about the impact of a group job on resolving complex problems. They revealed that there is a critical network size for which the effectiveness for solving a problem drastically increase. He pointed that the source code accessibility "friction" limits the "getting help" variable. Open sourcing software could be the best way to reduce this "friction" while unit testing and ongoing integration are facilitators. And, in general, process reproducibility is very important, not only in computing research. Retrieving experimental settings, metadata, and process environment is vital. We agree with this as we are experimenting it everyday in our work. That is why we encourage open source licenses and develop a collaborative platform that provides the distributed simulation traceability and reproducibility platform Simulagora (in french).
Ian Ozsvald's talk dealt with key points and tips from his own experience to grow a business based on open source and python, as well as mistakes to avoid (e.g. not checking beforehand there are paying customers interested by what you want to develop). His talk was comprehensive and mentioned a wide panel of situations.
We got a very nice presentation of a young but interesting visualization tools: Vispy. It is 6 months old and the first public release was early August. It is the result of the merge of 4 separated libraries, oriented toward interactive visualisation (vs. static figure generation for Matplotlib) and using OpenGL on GPUs to avoid CPU overload. A demonstration with large datasets showed vispy displaying millions of points in real time at 40 frames per second. During the talk we got interesting information about OpenGL features like anti-grain compared to Matplotlib Agg using CPU.
We also got to learn about cartopy which is an open source Python library originally written for weather and climate science. It provides useful and simple API to manipulate cartographic mapping.
Distributed computing systems was a hot topic and many talks were related to this theme.
Gael Varoquaux reminded us what are the keys problems with "biggish data" and the key points to successfully process them. I think that some of his recommendations are generally useful like "choose simple solutions", "fail gracefully", "make it easy to debug". For big data processing when I/O limit is the constraint, first try to split the problem into random fractions of the data, then run algorithms and aggregate the results to circumvent this limit. He also presented mini-batch that takes a bunch of observations (trade-off memory usage/vectorization) and joblib.parallel that makes I/O faster using compression (CPUs are faster than disk access).
Benoit Da Mota talked about shared memory in parallel computing and Antonio Messina gave us a quick overview on how to build a computing cluster with Elasticluster, using OpenStack/Slurm/ansible. He demonstrated starting and stopping a cluster on OpenStack: once all VMs are started, ansible configures them as hosts to the cluster and new VMs can be created and added to the cluster on the fly thanks to a command line interface.
We also got a keynote by Peter Wang (from Continuum Analytics) about the future of data analysis with Python. As a PhD in physics I loved his metaphor of giving mass to data. He tried to explain the pain that scientists have when using databases.
After the conference we participated to the numpy/scipy sprint. It was organized by Ralph Gommers and Pauli Virtanen. There were 18 people trying to close issues from different difficulty levels and had a quick tutorial on how easy it is to contribute: the easiest is to fork from the github project page on your own github account (you can create one for free), so that later your patch submission will be a simple "Pull Request" (PR). Clone locally your scipy fork repository, and make a new branch (git checkout -b <newbranch>) to tackle one specific issue. Once your patch is ready, commit it locally, push it on your github repository and from the github interface choose "Push request". You will be able to add something to your commit message before your PR is sent and looked at by the project lead developers. For example using "gh-XXXX" in your commit message will automatically add a link to the issue no. XXXX. Here is the list of open issues for scipy; you can filter them, e.g. displaying only the ones considered easy to fix :D
For more information: Contributing to SciPy.