The EuroSciPy2009 conference was held in Leipzig at the end of July and was sponsored by Logilab and other companies. It started with three talks about speed.
In his keynote, Fransesc Alted talked about starving CPUs. Thirty years back, memory and CPU frequencies where about the same. Memory speed kept up for about ten years with the evolution of CPU speed before falling behind. Nowadays, memory is about a hundred times slower than the cache which is itself about twenty times slower than the CPU. The direct consequence is that CPUs are starving and spend many clock cycles waiting for data to process.
In order to improve the performance of programs, it is now required to know about the multiple layers of computer memory, from disk storage to CPU. The common architecture will soon count six levels: mechanical disk, solid state disk, ram, cache level 3, cache level 2, cache level 1.
Using optimized array operations, taking striding into account, processing data blocks of the right size and using compression to diminish the amount of data that is transfered from one layer to the next are four techniques that go a long way on the road to high performance. Compression algorithms like Blosc increase throughput for they strike the right balance between being fast and providing good compression ratios. Blosc compression will soon be available in PyTables.
Fransesc also mentions the numexpr extension to numpy, and its combination with PyTables named tables.Expr, that nicely and easily accelerates the computation of some expressions involving numpy arrays. In his list of references, Fransesc cites Ulrich Drepper article What every programmer should know about memory.
Maciej Fijalkowski started his talk with a general presentation of the PyPy framework. One uses PyPy to describe an interpreter in RPython, then generate the actual interpreter code and its JIT.
Since PyPy is has become more of a framework to write interpreters than a reimplementation of Python in Python, I suggested to change its misleading name to something like gcgc the Generic Compiler for Generating Compilers. Maciej answered that there are discussions on the mailing list to split the project in two and make the implementation of the Python interpreter distinct from the GcGc framework.
Maciej then focused his talk on his recent effort to rewrite in RPython the part of numpy that exposes the underlying C library to Python. He says the benefits of using PyPy's JIT to speedup that wrapping layer are already visible. He has details on the PyPy blog. Gaël Varoquaux added that David Cournapeau has started working on making the C/Python split in numpy cleaner, which would further ease the job of rewriting it in RPython.
The SciPy2008 conference had at least two papers talking about speeding Python: Converting Python Functions to Dynamically Compiled C and unPython: Converting Python Numerical Programs into C.
David Beazley gave a very interesting talk in 2009 at a Chicago Python Users group meeting about the effects of the GIL on multicore machines.
I will continue my report on the conference with the second part titled "Applications And Open Questions".