Recent discussions on the #disutils irc channel and with my logilab co-workers led me to the following conclusions:

  • The Python Package Index is not a software distribution
  • There is more than one way to distribute python software
  • Distribution packagers are power users and need super cow-powers
  • Users want it to "just works"
  • The Python Package Index is used by many as a software distribution
  • Pypi has a lot of contributions because requirements are low.

The Python Package Index is not a software distribution

I would define a software distribution as :

  • Organised group of people
  • Who apply a Unified Quality process
  • To a finite set of software
  • Which includes all its dependencies
  • With a consistent set of versions that work together
  • For a finite set of platforms
  • Managed and installed by dedicated tools.

Pypi is a public index where:

  • Any python developer
  • Can upload any tarball containing something related
  • To any python package
  • Which might have external dependencies (outside Pypi)
  • The latest version of something is always available disregarding its compatibility with other packages.
  • Binary packages can be provided for any platform but are usually not.
  • There are several tools to install and manage python packages from pypi.

Pypi is not a software distribution, it is a software index.

Card File by Mr. Ducke / Matt

There is more than one way to distribute python software

There is a long way from the pure source used by the developer to the software installed on the system of the end user.

First, the source must be extracted from a (D)VCS to make a version tarball, while executing several release specific actions (eg: changelog generation from a tracker) Second, the version tarball is used to generate a platform independent build, while executing several build steps (eg, Cython compilation into C files or documentation generation). Third, the platform independent build is used to generate a platform dependant build, while executing several platforms dependant build (eg, compilation of C extension). Finally, the platform dependant build is installed and each file gets dispatched to its proper location during the installation process.

Pieces of software can be distributed as development snapshots taken from the (D)VCS, version tarballs, source packages, platform independent package or platform dependent package.

package! by Beck Gusler

Distribution packagers are power users and need super cow-powers

Distribution packagers usually have the necessary infrastructure and skills to build packages from version tarballs. Moreover they might have specific needs that require as much control as possible over the various build steps. For example:

  • Specific help system requiring a custom version of sphinx.
  • Specific security or platform constraint that require a specific version of Cython
Cheese Factory by James Yu

Users want it to "just work"

Standard users want it to "just work". They prefer simple and quick ways to install stuff. Build steps done on their machine increase the duration of the installation, add potential new dependencies and may trigger an error. Standard users are very disappointed when an installed failed because an error occurred while building the documentation. User give up when they have to download extra dependency and setup complicated compilation environment.

Users want as many build steps as possible to be done by someone else. That's why many users usually choose a distribution that do the job for them (eg, ubuntu, red-hat, python xy)

The Python Package Index is used by many as a software distribution

But there are several situations where the user can't rely on his distribution to install python software:

  • There is no distribution available for the platform (Windows, Mac OS X)
  • They want to install a python package outside of their distribution system (to test or because they do not have the credentials to install it system-wide)
  • The software or version they need is not included in the finite set of software included in their distribution.

When this happens, the user will use Pypi to fetch python packages. To help them, Pypi accepts binary packages of python modules and people have developed dedicated tools that ease installation of packages and their dependencies: pip, easy_install.

Pip + Pypi provides the tools of a distribution without its consistency. This is better than nothing.

Pypi has a lot of contributions because requirements are low

Pypi should contain version tarballs of all known python modules. It is the first purpose of an index. Version tarball should let distribution and power user perform as many build steps as possible. Pypi will continue to be used as a distribution by people without a better option. Packages provided to these users should require as little as possible to be installed, meaning they either have no build step to perform or have only platforms dependent build step (that could not be executed by the developer).

Thomas Fisher Rare Book Library by bookchen

If the incoming distutils2 provides a way to differentiate platform dependent build steps from platform independent ones, python developers will be able to upload three different kind of package on Pypi.

sdist:Pure source version released by upstream targeted at packagers and power users.
idist:Platform-independent package with platform independent build steps done (Cython, docs). If there is no such build step, the package is the same as sdist.
bdist:Platform-dependent package with all build steps performed. For package with no platform dependent build step this package is the same that idist.

(Image under creative commons Card File by-nc-nd by Mr. Ducke / Matt, Thomas Fisher Rare Book Library by bookchen, package! by Beck Gusler, Cheese Factory by James Yu)

blog entry of