pyron: Making Python package development DRY to the point of no return

Date: 22 April 2009 Tags: computing, python

I finally snapped last week.

After years of writing verbose and repetitive setup.py files for my Python packages, I am unable to write another. Instead, I have started writing Pyron, a tool that gathers the same information by inspecting a Python package itself. Not only does this mean that I get to stop repeating myself, but that my projects will become much more uniform because package metadata will be represented through common conventions instead of explicit (and repetitive) configuration. Though Pyron is still very primitive, it has already allowed me to reduce simple packages to only a README.txt plus their actual Python source code.

The start of the trouble

What happened is that I wanted to create a simple Python package full of tools for professional authors working with rst documents, so that they could monitor their word count while writing, and convert their rst files into the proprietary formats used by various publications. But just to start a new Python project required me to create four entire files, and almost as many directories:

. / cursive . tools / setup . py . / cursive . tools / cursive / __init__ . py . / cursive . tools / cursive / tools / README . txt . / cursive . tools / cursive / tools / __init__ . py

The setup.py file itself repeats the project name over, and over, and over again, reminding me of the old Adventure game's “maze of twisty passages, all alike”:

from setuptools import setup setup ( name = 'cursive.tools' , version = '0.1' , description = 'Tools for restructured text files' , author = 'Brandon Craig Rhodes' , author_email = 'brandon@rhodesmill.org' , packages = [ 'cursive.tools' , 'cursive' ], namespace_packages = [ 'cursive' ], )

The first __init__.py file shown above of course looks like:

import pkg_resources pkg_resources . declare_namespace ( __name__ )

Meanwhile, my stub README.txt and __init__.py files down in the bottom directory contained just enough information to get me started, whether I wanted to start by writing documentation and tests or get started by writing actual code:

``cursive.tools`` -- Tools for restructured text files ------------------------------------------------------ The routines in this ``cursive.tools`` package are designed for authors. They provide command-line tools that can examine Restructured Text files.

"""Command-line routines for Restructured Text authors.""" __version__ = ' 0.1 '

And, having created these files, I stopped, and stared in horror.

For an entire hour I tried to move on. I tried to start writing actual code and actual documentation. I tried to just ignore the stupidity of what I had just written. Or, in the case of setup.py , what I had just written by cutting and pasting from another project on my hard drive — yes, it's actually become that bad, that we cut-and-paste file contents between Python projects because our boilerplate requires so much repetition while carrying so little information.

But, try though I might, I could not move on to writing code; I was finally defeated. The Python language has done such a wonderful job over the past decade of honing my asthetics and sharpening my senses that I am now unable to use its own standard packaging techquies! This new package would have to wait until I had resolved the problems that sat staring me in the face. Let us review them, one by one.

After stating so carefully that this package was named cursive.tools , I then had to inform setup() that the project name would also be — who would have guessed? — cursive.tools as well! This is idiotic. Of course I am giving this project the same name as the package it contains; that is a best-practice from which modern Python projects have no excuse to dissent. Who wants to have to remember that you need the ZODB3 package when all you want to do is import persistent ? Who wants to remember to depend on pyephem when all you want is to import ephem (a problem that I, myself, created in my own misguided Python youth)? Not me. And not, if they have any sense, my users. This package is named cursive.tools . Of course I want cursive to be a namespace package! That is so painfully obvious that it should not even require mention; it should be inferred. Similarly, the mention that cursive is a package in the packages declaration is redundant. Of course if a.b is a package then a is going to be a package as well! There's not even a way to avoid that in the Python language, so far as I know. Why even make me type it? The entire top-level __init__.py file — the one inside of the cursive directory — is utterly and entirely a boilerplate cut-and-paste. Given that cursive is already stated to be a namespace package, it should not even be necessary to provide the contents of its __init__.py ; it's standard and can be copied straight from PEP-382. The package, you will note, has started out lacking a long_description despite the fact that it has a perfectly serviceable README.txt file. Many packages jump through the hoops of path manipulation just to find their own README.txt so that they can include it as their long description; but why, in the absence of an override, shouldn't its inclusion as the long description be the default? This raises the larger question of where, exactly, should a project README.txt even go — where on the filesystem, that is, should it be placed? There seems to be no consistency on this between different Python packages. Some people place it directly at the project top-level, next to the setup.py file, which is friendliest to developers checking out the source code from a public repository — but which makes the README.txt invisible to users! Others place it down inside of the package directory itself so that it will be included in their distribution, which is better; and still other Python projects have two separate README.txt files so that they have both bases covered! The package version is kept in two different places here: in the setup.py and also in the __version__ symbol of the module itself. When the version advances, both places will have to be updated — if the developer remembers! The alternative is for the setup.py to grow more complex by including its own bootstrap code that uses path manipulations to find and introspect the __version__ symbol inside of the module. The name of the package occurs both at the top of README.txt and inside of setup.py . The short description is repeated twice: once in the title of the README.txt and once in the setup() stanza of the setup.py . Finally, the directory structure of this project is ridiculous. If, as the setup.py clearly states, I am writing the cursive.tools module, why should I even include both a cursive and a tools directory? Since the only legitimate activity that I can undertake in constructing this module is to place files inside of cursive.tools , why do directories exist where files could collect outside of this one depository?

Obviously, the above arguments hold only for pure-Python packages; when C extensions and other special effects come into play, then excellent reasons arise for a complicated directory structure, sophisticated metadata, and possibly documentation above and beyond that distributed with binary versions of the package. But for normal packages, I am finished with writing and distributing a setup.py by hand.

Toward perfecting Pyron

My new tool for Python package building, Pyron — which, for those keeping score, is my very first bitbucket-hosted project (and I am very much enjoying these first few weeks of using Mercurial, since Guido made the big decision at the end of PyCon last month) — is not yet mature enough to warrant a first release on PyPI. Please check out the development version if you want to take a first look at Pyron. And, yes, Pyron currently has to include a setup.py of its own, which will not disappear until I release the first version and it can become self-hosting!

Please note that Pyron is only for developers! The sdist archives and the eggs produced for a Pyron-powered project are completely standard; the end users and developers installing a module will not be affected by your choice to use Pyron. It simply keeps your project repository cleaner by inferring package metadata on the fly rather than making you maintain a setup.py in version control along with your Python package.

A package developed with pyron only needs two files: README.txt and __init__.py . The two files quoted above will work just fine. These simply need to sit in the same directory, like this:

. / cursive . tools / README . txt . / cursive . tools / __init__ . py

See? All of the actual meat of the cursive.tools module remains when the files are stored like this, while the while repetition and boilerplate disappears! Check out the Pyron README.txt (or, of course, the same information as formatted in its project page on PyPI) for more details about how it works; here, I will just make three last observations:

Sometimes I had to choose between best practices when deciding how Pyron would operate. Where, for example, should it find the package name? Instead of looking at the title of the README.txt , as it currently does, one could imagine my having written it to look somewhere in __init__.py (but there seems to be no agreed-upon place for a package to name itself), or even at the name of the directory in which the package is sitting (but often the directory will not be named cursive.tools , but something like branches/0.1 or even just trunk ). In each case, I have tried to choose the most obvious and easy-to-maintain convention, and the real point is that there be some common idiom for everyone to fall into line with as more and more packages in the future abandon their setup.py files and start using Pyron.

, as it currently does, one could imagine my having written it to look somewhere in (but there seems to be no agreed-upon place for a package to name itself), or even at the name of the directory in which the package is sitting (but often the directory will not be named , but something like or even just ). In each case, I have tried to choose the most obvious and easy-to-maintain convention, and the real point is that there be common idiom for everyone to fall into line with as more and more packages in the future abandon their files and start using Pyron. Sometimes no best practice existed, and I had to, frankly, make things up. Where should the author of a package go, without a setup.py file? In a special metadata file that I would have to invent? In some formatted region of the README.txt file? By choosing instead that it go inside an __author__ symbol in setup.py , I hope that I have at least preserved symmetry with an existing best-practice while, again, making future Python projects as readable as possible should Pyron use become widespread.

file? In a special metadata file that I would have to invent? In some formatted region of the file? By choosing instead that it go inside an symbol in , I hope that I have at least preserved symmetry with an existing best-practice while, again, making future Python projects as readable as possible should Pyron use become widespread. Pyron should become more sophisticated in the future, and eliminate even more repitition. It currently needs project dependencies, for example, to be defined as a __requires__ constant in a package's __init__.py file. In the future, Pyron will hopefully gain the ability to inspect a project's import statements and make intelligent guesses about its dependencies that could often eliminate any need for explicit dependency declarations.

Thanks to Pyron, I am now happily working away on my cursive packages, and they should soon see their first releases. I can now sleep at night, knowing that boilerplate and repetition have finally vanished from my development code.

Please enable JavaScript to view the comments powered by Disqus.

Disqus