Anatomy of a Python Package

2014-01-27 23:00

Over the course of several past months and years I was coding in Python, I’ve created quite a few Python packages: both open source and for private projects. Even though their most important part was always the code, there are numerous additional files that are necessary for the package to correctly serve its purpose. Rather than part of the Python language, they are more closely related to the Python platform.

But if you look for any definite, systemic info about them, you will at best find some scattered pieces of knowledge in various unrelated places. At worst, the only guidance would come in the form of a multitude of existing Python package sources, available on GitHub and similar sites. Parroting them is certainly an option, although I believe it’s much more advantageous to acquire firm understanding of how those different cogs fit together. Without it, following the modern Python’s best development practices – which are all hugely beneficial – is largely impossible.

So, I want to fill this void by outlining the structure of a Python package, as completely as possible. You can follow it as a step-by-step guide when creating your next project. Or just skim through it to see what you’re missing, and whether it’d be worthwhile to address such gaps. Any additional element or file will usually provide some tangible benefit, but of course not every project requires all bells and whistles.

Without further ado, let’s see what’s necessary for a complete Python software bundle.

0. LICENSE file

Very early, possibly even before you write a single line of code, configuration or documentation, I’d like you first to take a moment and decide how you’re going to distribute your work. It’s fine to belay that decision if it’s not intended to be public, but for any kind of open source projects, establishing a license is of paramount importance. In fact, I wrote about this very topic some time ago.

As for practical matters, generating the actual license text turns out to be extremely easy if you use the incredibly useful, small program called lice. I recommend installing it directly into your global Python interpreter space:

  1. $ sudo pip install lice

Now you can just navigate to your project’s directory and whip out a license file with one simple command:

  1. $ lice bsd2 >LICENSE

Variable parts of license text, such as project’s and author’s name, are filled in automatically. Refer to $ lice --help for more details, and a list of available licenses.

1. Main Python file

Now, with clear conscience, you can commence coding. But wait! There is a small piece of boilerplate that you may find tremendously useful to put right on top of your main Python file. “Main” means, of course, either the sole .py module, or __init__.py of the top level package.

Also, it’s not really a boilerplate. It’s more like… an introduction:

  1. """
  2. dogelchemist :: Turns spam emails into Dogecoins
  3. """
  4. __version__ = "0.0.1"
  5. __author__ = "John Doe"
  6. __license__ = "WTFPL"

It happens that version number, author’s name and program’s license are often needed in at least few parts of the code, and beyond. Examples include:

  • setup.py file (see below)
  • usage/--help documentation, for command line programs
  • content of About dialog, for GUI programs
  • text in footer, for web applications
  • User Agent (e.g. for urllib.urlopen) and other kind of client identification strings

Exposing __version__ in code also means that dependent packages may possibly adjust to different releases of your bundle, at least in somewhat systematic way. Should you deprive them of this possibility, they’ll inevitably resort to much nastier hacks.

2. requirements-test.txt

Most projects rely on existing packages to handle some of their necessary functionality; those are referred to as dependencies or requirements. By no means this is a necessity, though. Since Python’s standard library is rich and powerful, it’s sometimes possible to build valuable packages based entirely upon standard modules.

The tests, however, are pretty much given to require some assistance. There is little reason not to use a third-party test runner, for example. Indeed, something like nose or py.test makes running tests a breeze and simplifies writing them, too.
Likewise for the various mocking libraries, not to mention more specialized tools for e.g. benchmarking or load testing. Testing can be complicated affair sometimes, especially in a language such as Python where it’s absolutely crucial.

Therefore the recommendation is to put all dependencies required strictly by tests – and tests only! – into a separate requirements-test.txt file. Here’s an example:

  1. -e .
  2.  
  3. py.test==2.5.0
  4. mocktest==0.7

Like traditional requirements.txt described later, this file is in the standard format, understood by Pip. With a properly configured Python package, it is also complete enough to allow a single command:

  1. $ pip install -r requirements-test.txt

to install everything what’s necessary for tests to run – and hopefully pass! Being capable of such a level of automation can unlock quite powerful rewards later on, as it’s a prerequisite for any kind of continuous integration.

3. setup.py

Python has no “package description” file per se, akin to package.json from Node.js or *.cabal files for Haskell. Instead, it takes an approach not entirely dissimilar to Ruby’s, where the bundle’s “specification” is also executable code. But Python’s setup.py is not just that: it’s an actual installation script.

Although not strictly necessary for executable programs, having a setup.py is still recommended for all Python packages. For libraries, it is absolutely mandatory. In either case, setup.py is the file which enables a package to be installed into the interpreter, and hence imported by any other Python code it runs.

Most of the content of setup.py is typically an invocation of setuptools.setup function. The majority of its parameters are in fact fields in the package’s resume: name, description, author, etc. Rest is more substantial: they describe what Python files the package consists of, and what are its installation requirements.

Let’s have a look at a “real world” example:

  1. #!/usr/bin/env python
  2. """
  3. dogelchemist
  4. ============
  5.  
  6. Turns spam emails into Dogecoins
  7. """
  8.  
  9. from setuptools import setup, find_packages
  10.  
  11. import dogelchemist
  12.  
  13.  
  14. setup(
  15.     name="dogelchemist",
  16.     version=dogelchemist.__version__,
  17.     description="Turns spam emails into Dogecoins",
  18.     long_description=__doc__,
  19.     author=dogelchemist.__author__,
  20.     url="http://example.com/dogelchemist",
  21.     license=dogelchemist.__license__,
  22.  
  23.     classifiers=[
  24.         "Intended Audience :: End Users/Desktop",
  25.         "License :: Freely Distributable",
  26.         "Operating System :: OS Independent",
  27.         "Programming Language :: Python",
  28.         "Programming Language :: Python :: 2.6",
  29.         "Programming Language :: Python :: 2.7",
  30.         "Topic :: Office/Business :: Financial",
  31.         "Topic :: Security :: Cryptography",
  32.     ],
  33.  
  34.     platforms='any',
  35.     install_requires=[
  36.         "requests>=1.0",
  37.         "quantum-gravity>=1.0",
  38.         "unobtanium>=0.4",
  39.     ],
  40.  
  41.     packages=find_packages(exclude=['tests']),
  42.     entry_points={
  43.         'console_scripts': ['dogelchemist=dogelchemist.main:main'],
  44.     },
  45. )

While this by far doesn’t exhaust the breadth of setup parameters, it demonstrates some of the more common arguments. This includes classifiers=, a complete list of which can be found on PyPI website; and entry_points= with executable commands.

But more importantly, we have packages= argument which is almost always used in conjunction with find_packages function. (Should you only have loose modules, you’d use py_modules= instead). We typically exclude tests from packages to be installed, as they are not relevant for end users who just want to use our code.
Finally, install_requires= lists all the external packages we depend on. For short dependency lists, it’s fine to enumerate them like that, though the usual practice nowadays is to use a dedicated requirements.txt file.

4. MANIFEST.in

Among the purposes of setup.py is to tell where are .py files. It’s not as good for pointing to all the other necessary files – basically those we’re talking about here.

To maintain tighter control over what goes into final distribution package, we should use a manifest file called MANIFEST.in. It acts as a fine grained filter, allowing to include or exclude directories, wildcard groups, or individual files:

  1. include LICENSE
  2. include README.rst
  3. include requirements-test.txt
  4. recursive-exclude * *.pyc

The manifest file is critically important, if only for the first line of the above example. It ensures that even when installing from PyPI, the user still receives a copy of the license.

5. requirements.txt (optional)

As mentioned previously, it’s a common practice to extract the dependency list into separate file named requirements.txt. Sometimes it’s even an obligatory, pardon the pun, requirement; that’s how Heroku cloud platform recognizes Python apps, for example.

The file itself is rather straightforward:

  1. -e .
  2.  
  3. requests>=1.0
  4. quantum-gravity>=1.0
  5. unobtanium>=0.4

What’s less obvious is finding a way to tie it back to setup.py, replacing a literal content of install_requires= argument value. For that, I find the following function quite useful:

as it makes everything very straightforward

  1. setup(
  2.     ...
  3.     install_requires=read_requirements(),
  4.     ...
  5.     tests_require=read_requirements('test'),
  6. )

Of course, you need to embed it verbatim in your setup.py first, which may diminish the appeal of it somewhat. On the plus side, your requirements’ files may now contain -X flags for Pip (like -e or -r) and comments, which is useful for bigger projects but impossible with simple:

  1. install_requires=open('requirements.txt').readlines(),

6. tox.ini (optional)

How many Python versions does your package support? The answer is obvious for standalone applications: the only one it’s currently running on.
But writing a library, we need to treat that issue with a little more gravity. As of now, there are still multiple versions of the language powering thousands of working, production apps. Not just 2.7, but perhaps even 2.6 is not going anywhere anytime soon. Meanwhile, 3.x is increasingly viable option. What do we target?

Whatever we do, it’s important to be explicit and honest about. Explicit means stating it upfront in the README or elsewhere, so that potential users are not left wondering or set up for disappointment. Honest, on the other hand, means fulfilling the promises and actually testing against all these different versions (and implementations) of Python.

This is where wonders of test automation technology come into play. The de facto standard solution for cross-interpreter testing in Python is a tool called tox. Projects that employ it include a tox.ini configuration file, where they list what Python environments they are expected to work in. “Work”, of course, is defined as having the test suite run without any failure.

Simplest tox.ini might look like this:

  1. [tox]
  2. minversion=1.4
  3. envlist=py26,py27,pypy,py33
  4.  
  5. [testenv]
  6. deps=-rrequirements-test.txt
  7. commands=py.test

and, I believe, is pretty self-explanatory. This is also where it clearly shows how extracting requirements-test.txt is a worthwhile endeavor. tox will use it to install test dependencies in virtualenvs for each Python version we’ve specified. Then, it will run the test command (here, py.test) and accumulate results for all environments.

7. .travis.yml (optional)

Running tox is practical equivalent of “building” a Python project from the continuous integration point of view. I’m pretty sure, though, that no one would fancy keeping up a CI server just for their little open source pet project. Fortunately, this doesn’t mean you need to forgo the benefits of CI anymore.

If you haven’t heard of Travis, it’s a free, hosted continuous integration platform for open source projects. As long as your package is publicly available on one of the code hosting providers (like GitHub or Bitbucket), you can configure a hook that’ll make Travis build it after every code push, and notify immediately should a failure occur.

There’s some configuration involved, of course, but it’s limited to providing a fairly basic .travis.yml file:

  1. language: python
  2. python:
  3.     - "2.6"
  4.     - "2.7"
  5.     - "pypy"
  6.     - "3.3"
  7.  
  8. install:
  9.     - pip install -r requirements-test.txt --use-mirrors
  10. script:
  11.     - py.test

Its content is eerily similar to that of tox.ini, which is no coincidence. In practice, you can treat running `tox` as local substitute of a Travis CI build. If the former goes all right, it’s almost certain you can safely push your code upstream, and the latter will build just fine, too.

8. requirements-dev.txt (optional)

Almost all of the other files described there, both very mandatory and those quite optional, are provided for the benefit of some (more or less) automated process. I say that, for a change, we should finish off with something that’ll be helpful for humans instead.

The file – which I propose to call requirements-dev.txt – is yet another listing of packages in Pip-compatible format. But they are not any sort of actual dependencies; neither should the project need them to work correctly, nor its test suite require them to pass.
What I suggest to put into requirements-dev.txt are packages necessary for the development process itself. The goal is to streamline the initial setup for the new contributors to our project. Ideally, all they have to do before starting to code would consist of:

  1. cloning the project’s repository
  2. creating a virtualenv
  3. running $ pip install -r requirements-dev.txt

Not every project would be complex enough to require some auxiliary tools. For those that don’t, requirements-dev.txt would be reduced to simple delegation:

  1. -e .
  2. -r requirements-test.txt

which hardly justifies its existence. But in reality, software tends to quickly spread its tendrils far and wide, whilst developers are eager to automate any task that appears cumbersome or mundane. Soon enough, those handcrafted helpers start to surround the core project like growth rings.

In essence, requirements-dev.txt is mostly there to support them. The exact packages that are worthy of putting there will vary from project to project, but common examples would include:

  • configuration and deployment tools, such as like Fabric
  • database migration utilities , e.g. alembic
  • debuggers and other development aids, like IPython and ipdb
  • test environment managers, including tox that was presented before
  • tools for measuring test coverage
  • log analyzers
  • …and so on

Automation?…

OK, I think I know what you’re thinking now. You didn’t sign up for this! You just wanted to write some Python code. Why do you need to bother with all these asides? And since we spoke of automation, why they cannot be taken care of… automatically?

Alas, there exist only some incomplete attempts to tackle this issue. Picnic.py, for example, is currently making rounds, but it focuses more on documentation and version control rather than Python-specific artifacts. When it comes to procuring the latter, we are largely on our own. I recommend therefore to closely balance the benefits they may provide with efforts required to create and maintain them.

Tags: , , , ,
Author: Xion, posted under Programming »


4 comments for post “Anatomy of a Python Package”.
  1. Ionel Cristiam Mărieș:
    January 28th, 2014 o 0:54

    While this is a good introduction covering all the bases, it points users to some grievously bad practices. Or some practices that annoy the hell out of me :-)

    You should never import the package you are packaging in the setup script that does the packaging. See how ridiculous that sounds ? Just read the file and parse out the version with a regex. Doing it like that sounds bad but it is in fact the most reliable way to do it. Things get out of hand especially when your package has dependencies. Some tools, including pip, don’t guarantee that dependencies will be installed.

    Using pkg_resources to pull out the version (in the __init__.py of your packages) from the metadata is very unreliable – much more worse. Setuptools and pip have historically been riddled with bugs caused or around pkg_resources. It’s so bad that the latest release of pip vendors (bundles) a specific version of pkg_resources.

    This is how to properly do it: https://github.com/celery/kombu/blob/master/setup.py

    Never use relative paths in your setup script. You can never rely on the current working directory. While it shouldn’t be a issue with latest version of pip, you never know what version people are running. It’s always best practice to use absolute paths – just compute it from the __file__.

    For tox configuration and requirement files if found out it is generally a bad idea to use develop setup – any form of ‘pip install -e .’, ‘-e .’ in requirement file, running `setup.py develop` is a bad idea as it will short-circuit proper build and install of your package. You ran the tests – everything was fine – and you do a quick `setup.py sdist upload`thinking users will get the same code. But no, you forgot something in MANIFEST.in and your distribution is useless. Worst of it – you do not know this – cause you don’t test properly.

    This is such a common case tox handles this by default. You don’t need to put any ‘-e .’ in your requirement file.

    About the travis configuration, generally, using environment variables is a good idea. Have a env var for deciding if runing coverage or not. Have an env var for the python version (so you can run your tests via tox instead of just some shell script) – you get both the travis/dev parity and build parallelism in Travis. Have an env var for dependencies – eg, if you test against many versions of Django or some other dependency. You could do it like this: https://github.com/celery/kombu/blob/master/.travis.yml or

    Also, (kinda nitpicking now) –use-mirrors is deprecated. It will be removed in a future version of pip. In fact, PYPI uses a CDN now – no need to mess with mirrors anymore :-)

  2. Kos:
    February 2nd, 2014 o 16:26

    Instead of hand crafting read_requirements you could hook into parse_requirements from pip.req.

    Sauce: http://stackoverflow.com/a/16624700/399317

Add a comment

Newline tags are added automatically.
For code, use [code][/code]. You can also insert LaTeX formulae inside [tex][/tex].
HTML tags allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

 


© 2014 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.