Blog entries by Alain Leufroy [4]

Profiling tools

2012/09/07 by Alain Leufroy

Python

Run time profiling with cProfile

Python is distributed with profiling modules. They describe the run time operation of a pure python program, providing a variety of statistics.

The cProfile module is the recommended module. To execute your program under the control of the cProfile module, a simple form is

$ python -m cProfile -s cumulative mypythonscript.py

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      16    0.055    0.003   15.801    0.988 __init__.py:1(<module>)
       1    0.000    0.000   11.113   11.113 __init__.py:35(extract)
     135    7.351    0.054   11.078    0.082 __init__.py:25(iter_extract)
10350736    3.628    0.000    3.628    0.000 {method 'startswith' of 'str' objects}
       1    0.000    0.000    2.422    2.422 pyplot.py:123(show)
       1    0.000    0.000    2.422    2.422 backend_bases.py:69(__call__)
       ...

Each column provides information about time execution of every function calls. -s cumulative orders the result by descending cumulative time.

Note:

You can profile a particular python function such as main()

>>> import profile
>>> profile.run('main()')

Graphical tools to show profiling results

Even if report tools are included in cProfile profiler, it can be interesting to use graphical tools. Most of them work with a stat file that can be generated by cProfile using the -o filepath option.

Below are some of available graphical tools that we tested.

Gpro2Dot

is a python based tool that allows to transform profiling results output into a picture containing the call tree graph (using graphviz). A typical profiling session with python looks like this:

$ python -m cProfile -o output.pstats mypythonscript.py
$ gprof2dot.py -f pstats output.pstats | dot -Tpng -o profiling_results.png
http://wiki.jrfonseca.googlecode.com/git/gprof2dot.png

Each node of the output graph represents a function and has the following layout:

+----------------------------------+
|   function name : module name    |
| total time including sub-calls % |  total time including sub-calls %
|    (self execution time %)       |------------------------------------>
|  total number of self calls      |
+----------------------------------+

Nodes and edges are colored according to the "total time" spent in the functions.

Note:The following small patch let the node color correspond to the execution time and the edge color to the "total time":
diff -r da2b31597c5f gprof2dot.py
--- a/gprof2dot.py      Fri Aug 31 16:38:37 2012 +0200
+++ b/gprof2dot.py      Fri Aug 31 16:40:56 2012 +0200
@@ -2628,6 +2628,7 @@
                 weight = function.weight
             else:
                 weight = 0.0
+            weight = function[TIME_RATIO]

             label = '\n'.join(labels)
             self.node(function.id,
PyProf2CallTree

is a script to help visualizing profiling data with the KCacheGrind graphical calltree analyzer. This is a more interactive solution than Gpro2Dot but it requires to install KCacheGrind. Typical usage:

$ python -m cProfile -o stat.prof mypythonscript.py
$ python pyprof2calltree.py -i stat.prof -k

Profiling data file is opened in KCacheGrind with pyprof2calltree module, whose -k switch automatically opens KCacheGrind.

http://kcachegrind.sourceforge.net/html/pics/KcgShot3Large.gif

There are other tools that are worth testing:

  • RunSnakeRun is an interactive GUI tool which visualizes profile file using square maps:

    $ python -m cProfile -o stat.prof mypythonscript.py
    $ runsnake stat.prof
    
  • pycallgraph generates PNG images of a call tree with the total number of calls:

    $ pycallgraph mypythonscript.py
    
  • lsprofcalltree also use KCacheGrind to display profiling data:

    $ python lsprofcalltree.py -o output.log yourprogram.py
    $ kcachegrind output.log
    

C/C++ extension profiling

For optimization purpose one may have python extensions written in C/C++. For such modules, cProfile will not dig into the corresponding call tree. Dedicated tools must be used (they are most part of Python) to profile a C++ extension from python.

Yep

is a python module dedicated to the profiling of compiled python extension. It uses the google CPU profiler:

$ python -m yep --callgrind mypythonscript.py

Memory Profiler

You may want to control the amount of memory used by a python program. There is an interesting module that fits this need: memory_profiler

You can fetch memory consumption of a program over time using

>>> from memory_profiler import memory_usage
>>> memory_usage(main, (), {})

memory_profiler can also spot lines that consume the most using pdb or IPython.

General purpose Profiling

The Linux perf tool gives access to a wide variety of performance counter subsystems. Using perf, any execution configuration (pure python programs, compiled extensions, subprocess, etc.) may be profiled.

Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots.

You can have information about execution times with:

$ perf stat -e cpu-cycles,cpu-clock,task-clock python mypythonscript.py

You can have RAM access information using:

$ perf stat -e cache-misses python mypythonscript.py

Be careful about the fact that perf gives the raw value of the hardware counters. So, you need to know exactly what you are looking for and how to interpret these values in the context of your program.

Note that you can use Gpro2Dot to get a more user-friendly output:

$ perf record -g python mypythonscript.py
$ perf script | gprof2dot.py -f perf | dot -Tpng -o output.png

Text mode makes it into hgview 1.4.0

2011/10/06 by Alain Leufroy

Here is at last the release of the version 1.4.0 of hgview.

http://www.logilab.org/image/77974?vid=download

Small description

Besides the classic bugfixes this release introduces a new text based user interface thanks to the urwid library.

Running hgview in a shell, in a terminal, over a ssh session is now possible! If you are trying not to use X (or use it less), have a geek mouse-killer window manager such as wmii/dwm/ion/awesome/... this is for you!

This TUI (Text User Interface!) adopts the principal features of the Qt4 based GUI. Although only the main view has been implemented for now.

In a nutshell, this interface includes the following features :

  • display the revision graph (with working directory as a node, and basic support for the mq extension),
  • display the files affected by a selected changeset (with basic support for the bfiles extension)
  • display diffs (with syntax highlighting thanks to pygments),
  • automatically refresh the displayed revision graph when the repository is being modified (requires pyinotify),
  • easy key-based navigation in revisions' history of a repo (same as the GUI),
  • a command system for special actions (see help)

Installation

There are packages for debian and ubuntu in the logilab's debian repository.

Note:you have to install the hgview-curses package to get the text based interface.

Or you can simply clone our Mercurial repository:

hg clone http://hg.logilab.org/hgview

(more on the hgview home page)

Running the text based interface

A new --interface option is now available to choose the interface:

hgview --interface curses

Or you can fix it in the [hgview] section of your ~/.hgrc:

[hgview]
interface = curses # or qt or raw

Then run:

hgview

What's next

We'll be working on including other features from the Qt4 interface and making it fully configurable.

We'll also work on bugfixes and new features, so stay tuned! And feel free to file bugs and feature requests.


EuroSciPy'11 - Annual European Conference for Scientists using Python.

2011/08/24 by Alain Leufroy
http://www.logilab.org/image/9852?vid=download

The EuroScipy2011 conference will be held in Paris at the Ecole Normale Supérieure from August 25th to 28th and is co-organized and sponsored by INRIA, Logilab and other companies.

The conference is dedicated to cross-disciplinary gathering focused on the use and development of the Python language in scientific research.

August 25th and 26th are dedicated to tutorial tracks -- basic and advanced tutorials. August 27th and 28th are dedicated to talks, posters and demos sessions.

Damien Garaud, Vincent Michel and Alain Leufroy (and others) from Logilab will be there. We will talk about a RSS feeds aggregator based on Scikits.learn and CubicWeb and we have a poster about LibAster (a python library for thermomechanical simulation based on Code_Aster).


Distutils2 Sprint at Logilab (first day)

2011/01/28 by Alain Leufroy

We're very happy to host the Distutils2 sprint this week in Paris.

The sprint has started yesterday with some of Logilab's developers and others contributors. We'll sprint during 4 days, trying to pull up the new python package manager.

Let's sumarize this first day:

  • Boris Feld and Pierre-Yves David worked on the new system for detecting and dispatching data-files.
  • Julien Miotte worked on
    • moving qGitFilterBranch from setuptools to distutils2
    • testing distutils2 installation and register (see the tutorial)
    • the backward compatibility to distutils in setup.py, using setup.cfg to fill the setup arguments of setup for helping users to switch to distutils2.
  • André Espaze and Alain Leufroy worked on the python script that help developers build a setup.cfg by recycling their existing setup.py (track).

Join us on IRC at #distutils on irc.freenode.net !


Virtualenv - Play safely with a Python

2010/03/26 by Alain Leufroy
http://farm5.static.flickr.com/4031/4255910934_80090f65d7.jpg

virtualenv, pip and Distribute are tree tools that help developers and packagers. In this short presentation we will see some virtualenv capabilities.

Please, keep in mind that all above stuff has been made using : Debian Lenny, python 2.5 and virtualenv 1.4.5.

Abstract

virtualenv builds python sandboxes where it is possible to do whatever you want as a simple user without putting in jeopardy your global environment.

virtualenv allows you to safety:

  • install any python packages
  • add debug lines everywhere (not only in your scripts)
  • switch between python versions
  • try your code as you are a final user
  • and so on ...

Install and usage

Install

Prefered way

Just download the virtualenv python script at http://bitbucket.org/ianb/virtualenv/raw/tip/virtualenv.py and call it using python (e.g. python virtualenv.py).

For conveinience, we will refers to this script using virtualenv.

Other ways

For Debian (ubuntu as well) addicts, just do :

$ sudo aptitude install python-virtualenv

Fedora users would do:

$ sudo yum install python-virtualenv

And others can install from PyPI (as superuser):

$ pip install virtualenv

or

$ easy_install pip && pip install virtualenv

You could also get the source here.

Quick Guide

To work in a python sandbox, do as follow:

$ virtualenv my_py_env
$ source my_py_env/bin/activate
(my_py_env)$ python

"That's all Folks !"

Once you have finished just do:

(my_py_env)$ deactivate

or quit the tty.

What does virtualenv actually do ?

At creation time

Let's start again ... more slowly. Consider the following environment:

$ pwd
/home/you/some/where
$ ls

Now create a sandbox called my-sandbox:

$ virtualenv my-sandbox
New python executable in "my-sandbox/bin/python"
Installing setuptools............done.

The output said that you have a new python executable and specific install tools. Your current directory now looks like:

$ ls -Cl
my-sandbox/ README
$ tree -L 3 my-sandbox
my-sandbox/
|-- bin
|   |-- activate
|   |-- activate_this.py
|   |-- easy_install
|   |-- easy_install-2.5
|   |-- pip
|   `-- python
|-- include
|   `-- python2.5 -> /usr/include/python2.5
`-- lib
    `-- python2.5
        |-- ...
        |-- orig-prefix.txt
        |-- os.py -> /usr/lib/python2.5/os.py
        |-- re.py -> /usr/lib/python2.5/re.py
        |-- ...
        |-- site-packages
        |   |-- easy-install.pth
        |   |-- pip-0.6.3-py2.5.egg
        |   |-- setuptools-0.6c11-py2.5.egg
        |   `-- setuptools.pth
        |-- ...

In addition to the new python executable and the install tools you have an whole new python environment containing libraries, a site-packages/ (where your packages will be installed), a bin directory, ...

Note:
virtualenv does not create every file needed to get a whole new python environment. It uses links to global environment files instead in order to save disk space end speed up the sandbox creation. Therefore, there must already have an active python environment installed on your system.

At activation time

At this point you have to activate the sandbox in order to use your custom python. Once activated, python still has access to the global environment but will look at your sandbox first for python's modules:

$ source my-sandbox/bin/activate
(my-sandbox)$ which python
/home/you/some/where/my-sandbox/bin/python
$ echo $PATH
/home/you/some/where/my-sandbox/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
(pyver)$ python -c 'import sys;print sys.prefix;'
/home/you/some/where/my-sandbox
(pyver)$ python -c 'import sys;print "\n".join(sys.path)'
/home/you/some/where/my-sandbox/lib/python2.5/site-packages/setuptools-0.6c8-py2.5.egg
[...]
/home/you/some/where/my-sandbox
/home/you/personal/PYTHONPATH
/home/you/some/where/my-sandbox/lib/python2.5/
[...]
/usr/lib/python2.5
[...]
/home/you/some/where/my-sandbox/lib/python2.5/site-packages
[...]
/usr/local/lib/python2.5/site-packages
/usr/lib/python2.5/site-packages
[...]

First of all, a (my-sandbox) message is automatically added to your prompt in order to make it clear that you're using a python sandbox environment.

Secondly, my-sandbox/bin/ is added to your PATH. So, running python calls the specific python executable placed in my-sandbox/bin.

Note
It is possible to improve the sandbox isolation by ignoring the global paths and your PYTHONPATH (see Improve isolation section).

Installing package

It is possible to install any packages in the sandbox without any superuser privilege. For instance, we will install the pylint development revision in the sandbox.

Suppose that you have the pylint stable version already installed in your global environment:

(my-sandbox)$ deactivate
$ python -c 'from pylint.__pkginfo__ import version;print version'
0.18.0

Once your sandbox activated, install the development revision of pylint as an update:

$ source /home/you/some/where/my-sandbox/bin/activate
(my-sandbox)$ pip install -U hg+http://www.logilab.org/hg/pylint#egg=pylint-0.19

The new package and its dependencies are only installed in the sandbox:

(my-sandbox)$ python -c 'import pylint.__pkginfo__ as p;print p.version, p.__file__'
0.19.0 /home/you/some/where/my-sandbox/lib/python2.6/site-packages/pylint/__pkginfo__.pyc
(my-sandbox)$ deactivate
$ python -c 'import pylint.__pkginfo__ as p;print p.version, p.__file__'
0.18.0 /usr/lib/pymodules/python2.6/pylint/__pkginfo__.pyc

You can safely do any change in the new pylint code or in others sandboxed packages because your global environment is still unchanged.

Useful options

Improve isolation

As said before, your sandboxed python sys.path still references the global system path. You can however hide them by:

  • either use the --no-site-packages that do not give access to the global site-packages directory to the sandbox
  • or change your PYTHONPATH in my-sandbox/bin/activate in the same way as for PATH (see tips)
$ virtualenv --no-site-packages closedPy
$ sed -i '9i PYTHONPATH="$_OLD_PYTHON_PATH"
      9i export PYTHONPATH
      9i unset _OLD_PYTHON_PATH
      40i _OLD_PYTHON_PATH="$PYTHONPATH"
      40i PYTHONPATH="."
      40i export PYTHONPATH' closedPy/bin/activate
$ source closedPy/bin/activate
(closedPy)$ python -c 'import sys; print "\n".join(sys.path)'
/home/you/some/where/closedPy/lib/python2.5/site-packages/setuptools-0.6c8-py2.5.egg
/home/you/some/where/closedPy
/home/you/some/where/closedPy/lib/python2.5
/home/you/some/where/closedPy/lib/python2.5/plat-linux2
/home/you/some/where/closedPy/lib/python2.5/lib-tk
/home/you/some/where/closedPy/lib/python2.5/lib-dynload
/usr/lib/python2.5
/usr/lib64/python2.5
/usr/lib/python2.5/lib-tk
/home/you/some/where/closedPy/lib/python2.5/site-packages
$ deactivate

This way, you'll get an even more isolated sandbox, just as with a brand new python environment.

Work with different versions of Python

It is possible to dedicate a sandbox to a particular version of python by using the --python=PYTHON_EXE which specifies the interpreter that virtualenv was installed with (default is /usr/bin/python):

$ virtualenv --python=python2.4 pyver24
$ source pyver24/bin/activate
(pyver24)$ python -V
Python 2.4.6
$ deactivate
$ virtualenv --python=python2.5 pyver25
$ source pyver25/bin/activate
(pyver25)$ python -V
Python 2.5.2
$ deactivate

Distribute a sandbox

To distribute your sandbox, you must use the --relocatable option that makes an existing sandbox relocatable. This fixes up scripts and makes all .pth files relative This option should be called just before you distribute the sandbox (each time you have changed something in your sandbox).

An important point is that the host system should be similar to your own.

Tips

Speed up sandbox manipulation

Add these scripts to your .bashrc in order to help you using virtualenv and automate the creation and activation processes.

rel2abs() {
#from http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2005-01/0206.html
  [ "$#" -eq 1 ] || return 1
  ls -Ld -- "$1" > /dev/null || return
  dir=$(dirname -- "$1" && echo .) || return
  dir=$(cd -P -- "${dir%??}" && pwd -P && echo .) || return
  dir=${dir%??}
  file=$(basename -- "$1" && echo .) || return
  file=${file%??}
  case $dir in
    /) printf '%s\n' "/$file";;
    /*) printf '%s\n' "$dir/$file";;
    *) return 1;;
  esac
  return 0
}
function activate(){
    if [[ "$1" == "--help" ]]; then
        echo -e "usage: activate PATH\n"
        echo -e "Activate the sandbox where PATH points inside of.\n"
        return
    fi
    if [[ "$1" == '' ]]; then
        local target=$(pwd)
    else
        local target=$(rel2abs "$1")
    fi
    until  [[ "$target" == '/' ]]; do
        if test -e "$target/bin/activate"; then
            source "$target/bin/activate"
            echo "$target sandbox activated"
            return
        fi
        target=$(dirname "$target")
    done
    echo 'no sandbox found'
}
function mksandbox(){
    if [[ "$1" == "--help" ]]; then
        echo -e "usage: mksandbox NAME\n"
        echo -e "Create and activate a highly isaolated sandbox named NAME.\n"
        return
    fi
    local name='sandbox'
    if [[ "$1" != "" ]]; then
        name="$1"
    fi
    if [[ -e "$1/bin/activate" ]]; then
        echo "$1 is already a sandbox"
        return
    fi
    virtualenv --no-site-packages --clear --distribute "$name"
    sed -i '9i PYTHONPATH="$_OLD_PYTHON_PATH"
            9i export PYTHONPATH
            9i unset _OLD_PYTHON_PATH
           40i _OLD_PYTHON_PATH="$PYTHONPATH"
           40i PYTHONPATH="."
           40i export PYTHONPATH' "$name/bin/activate"
    activate "$name"
}
Note:
The virtualenv-commands and virtualenvwrapper projects add some very interesting features to virtualenv. So, put on eye on them for more advanced features than the above ones.

Conclusion

I found it to be irreplaceable for testing new configurations or working on projects with different dependencies. Moreover, I use it to learn about other python projects, how my project exactly interacts with its dependencies (during debugging) or to test the final user experience.

All of this stuff can be done without virtualenv but not in such an easy and secure way.

I will continue the series by introducing other useful projects to enhance your productivity : pip and Distribute. See you soon.