|
rss python-science
I spent some time this week evaluating Pupynere, the PUre PYthon NEtcdf REader written by Roberto De Almeida. I see several advantages in pupynere.
First it's a pure Python module with no external dependency. It doesn't even depend on the NetCDF lib and it is therefore very easy to deploy.
Second, it offers the same interface as Scientific Python's NetCDF bindings which makes transitioning from one module to another very easy.
Third pupynere is being integrated into Scipy as the scypi.io.netcdf module. Once integrated, this could ensure a wide adoption by the python community.
Finally it's easy to dig in this clear and small code base of about 600 lines. I have just sent several fixes and bug reports to the author.
However pupynere isn't mature yet. First it seems pupynere has been only used for simple cases so far. Many common cases are broken. Moreover there is no support for new NetCDF formats such as long-NetCDF and NetCDF4, and important features such as file update are still missing. In addition, The lack of a test suite is a serious issue. In my opinion, various bugs could already have been detected and fixed with simple unit tests. Contributions would be much more
comfortable with the safety net offered by a test suite. I am not certain
that the fixes and improvements I made this week did not introduce regressions.
To conclude, pupynere seems too young for production use. But I invite people
to try it and provide feedback and fixes to the author. I'm looking forward to using this project in production in the future.
The EuroSciPy2009 conference was held in Leipzig at the end of July and was
sponsored by Logilab and other companies. It started with three talks about speed.
In his keynote, Fransesc Alted talked about starving CPUs. Thirty years back,
memory and CPU frequencies where about the same. Memory speed kept up for about
ten years with the evolution of CPU speed before falling behind. Nowadays,
memory is about a hundred times slower than the cache which is itself about
twenty times slower than the CPU. The direct consequence is that CPUs are
starving and spend many clock cycles waiting for data to process.
In order to improve the performance of programs, it is now required to know
about the multiple layers of computer memory, from disk storage to CPU. The
common architecture will soon count six levels: mechanical disk, solid state
disk, ram, cache level 3, cache level 2, cache level 1.
Using optimized array operations, taking striding into account, processing data
blocks of the right size and using compression to diminish the amount of data
that is transfered from one layer to the next are four techniques that go a long
way on the road to high performance. Compression algorithms like Blosc increase
throughput for they strike the right balance between being fast and providing
good compression ratios. Blosc compression will soon be available in PyTables.
Fransesc also mentions the numexpr extension to numpy, and its combination with
PyTables named tables.Expr, that nicely and easily accelerates the computation
of some expressions involving numpy arrays. In his list of references, Fransesc
cites Ulrich Drepper article What every programmer should know about memory.
Maciej Fijalkowski started his talk with a general presentation of the PyPy
framework. One uses PyPy to describe an interpreter in RPython, then generate
the actual interpreter code and its JIT.
Since PyPy is has become more of a framework to write interpreters than a
reimplementation of Python in Python, I suggested to change its misleading name to
something like gcgc the Generic Compiler for Generating Compilers. Maciej
answered that there are discussions on the mailing list to split the project in
two and make the implementation of the Python interpreter distinct from the GcGc
framework.
Maciej then focused his talk on his recent effort to rewrite in RPython the part
of numpy that exposes the underlying C library to Python. He says the benefits
of using PyPy's JIT to speedup that wrapping layer are already visible. He has
details on the PyPy blog. Gaël Varoquaux added that David Cournapeau has
started working on making the C/Python split in numpy cleaner, which would
further ease the job of rewriting it in RPython.
Damien Diederen talked about his work on CrossTwine Linker and compared it
with the many projects that are actively attacking the problem of speed that
dynamic and interpreted languages have been dragging along for years. Parrot
tries to be the über virtual machine. Psyco offers very nice acceleration, but
currently only on 32bits system. PyPy might be what he calls the Right
Approach, but still needs a lot of work. Jython and IronPython modify the
language a bit but benefit from the qualities of the JVM or the CLR. Unladen
Swallow is probably the one that's most similar to CrossTwine.
CrossTwine considers CPython as a library and uses a set of C++ classes to
generate efficient interpreters that make calls to CPython's
internals. CrossTwine is a tool that helps improving performance by
hand-replacing some code paths with very efficient code that does the same
operations but bypasses the interpreter and its overhead. An interpreter built
with CrossTwine can be viewed as a JIT'ed branch of the official Python
interpreter that should be feature-compatible (and bug-compatible) with CPython.
Damien calls he approach "punching holes in C substrate to get more speed" and
says it could probably be combined with Psyco for even better results.
CrossTwine works on 64bit systems, but it is not (yet?) free software. It
focuses on some use cases to greatly improve speed and is not to be considered a
general purpose interpreter able to make any Python code faster.
We've just released a new project on logilab.org : lutin77. It's a test framework for Fortran77.
The goal of this framework is to make unit tests in fortran 77 by having few dependencies: a POSIX environment with C and fortran 77 compilers. Of course, you can use it for making integration or acceptance tests too. The 0.1 version has just been released here: http://www.logilab.org/project/lutin77
If you are new to the unit tests way of building software, I must admit it lacks examples. For an introduction to the techniques involved, you can have a look at Growing Object-Oriented Software, Guided by Tests even if mocked subroutines will be for later. But remember that if you do not like to write tests, you are probably not writing unit tests.
My latest personal project, pygpibtoolkit, holds a simple HPGL plotter trying to emulate the HP7470A GPIB plotter, using the very nice and cheap Prologix USB-GPIB dongle.
This tool is (for now) called qgpibplotter (since it is using the Qt4 toolkit).
Tonight, I took (at last) the time to make it work nicely. Well, nicely with the only device I own which is capable of plotting on the GPIB bus, my HP3562A DSA.
Now, you just have to press the "Plot" button of your test equipment, and bingo! you can see the plot on your computer.
We have been using many different tools for doing statistical analysis with Python, including R, SciPy, specific C++ code, etc. It looks like the growing audience of SciPy is now in movement to have dedicated modules in SciPy (lets call them SciKits). See this thread in SciPy-user mailing-list.
The presentation of Python as a tool for applied mathematics got highlighted at the 2008 annual meeting of the american Society for Industrial and Applied Mathematics (SIAM). For more information, read this blogpost and the slides.
Bienvenue à SciLab version 5.0 dans le monde du logiciel libre. SciLab 5.0, plateforme open source de calcul scientifique sous licence CeCill, est une alternative crédible et maintenant reconnue comme telle à Matlab. Pour assurer le développement pérenne de Scilab, le consortium Scilab rejoint DIGITEO, parc de recherche d'envergure mondiale dans le domaine des sciences et
technologies de l'information en Île-de-France.
How can I test if a python float is "not a number" without depending on numpy? Simple, a nan value is different to any other value, including itself:
def isnan(x):
return isinstance(x, float) and x!=x
|