Publishing your own Python package

A practical guide to packaging Python code

Say you have a nice piece of Python code; a couple of small related functions, or perhaps even a medium-sized module with a few hundred lines of code. And say that you end up using that piece of code time and again; maybe you keep copying it into different projects or repositories, or you keep importing it from some dedicated utility-code folder you’ve set up in a specific path.

It’s natural — we all keep accumulating these small personal tools as we code, and Python probably enables and encourages this more than the average programming language — and it feels nice to have those pieces of code.

But wouldn’t it be nice if you could easily import that one small tool you developed without copying it around, and to maintain one updated version of it? To have it available in different environments, machines and contexts without depending on a specific file or path? To perhaps even be able to version it and have code using it reflect that dependency clearly? And wouldn’t it be even nicer if other people could use it too?

Yes. Yes it would. 🐃

The concept is not new, of course. It’s why we have modules, packages and libraries in programming languages generally, and why we have those in Python specifically. It is part of what drives the phenomenal power, usability and popularity of Python; we can all get the amazing power of beautifulSoup’s html parsing or pandas’ dataframe processing with a simple pip install and import .

The cool part is that Python makes it easy for everybody to write and publish their own pieces of code as packages on PyPI — the official Python package index — and make them as easily available as sklearn , requests or delorean (all extremely useful and popular Python packages). And you definitely should, since:

It’s fun to have other people use your code, even if it’s only a handful; it’s something cool to share and showcase at work, community events or on a job interview.

It makes your code better by forcing you to organize and document it, and by exposing it to peer review.

Finally, it benefits the community by contributing some missing functionality. You’ll be surprised to find how many people might find your efficient serialization of HTTP headers to JSON or that decorator you created for easy validation of input MongoDB query documents useful.

Now that I‘ve hopefully convinced you to blow the dust off that cool old Python module you have lying around and make it into a small Python package, let’s get started.

Figure 1: Python dust 🐍

In the following post I’ll try to walk you through the minimal process I have settled on for packaging Python code, building on my experience from publishing a few of small packages (e.g. pdpipe, skift and cachier) to the global PyPI server and a couple dozen more on private PyPI servers of the couple of start-ups I’ve data-scienced in. I’m sure it’s not perfect, but it’s what I know, so it’ll have to do.

Step 1: It’s all in the name

Let’s start by choosing a name for your package. In my opinion, a good name is short enough that it’s easy to type casually after a pip install or an import declaration (although we do have autocompletion these days), but informative enough that people can understand, or at least remember after installing the package, what it’s about.

requests , dealing with HTTP requests, delorean , dealing with date and time, and sklearn , offering a machine learning framework, are all good examples. I tried to follow that example when choosing a name for my pandas pipelines package (pdpipe, which builds on the fact that pandas is often imported under the shorter alias of pd ) and my caching package (cachier).

To be honest, though, that is the exception rather than the rule. Popular Python packages have names like pandas , keras , django , boto , jinja , flask and pytorch , and yet we all remember their names, so you can also work with any short and pronounceable name (for example, even though skift is a nice acronym for “SciKIt-learn wrappers for FastText”, I mainly cared about pronounceability). It is also a good idea to make sure it works well all lowercased.

For the sake of this post, let’s say you went with chocobo . 🐤

Step 2: The basic package structure

Now let’s go through several short steps that will leave you with a common package structure:

Create a Github repository with the exact name of your package. Don’t stylize it or camel-case it. Then, clone it locally. Create a folder, in that repository, with the exact name of your package; that’s the folder that will hold the package’s code. This is the norm, and you just have to remember that the outer chocobo folder (in our case) is the repository’s folder, while the inner chocobo folder is the package’s folder. Put your module, and any other modules it uses, in that inner chocobo folder. Add an __init__.py file if one is missing. Import the important objects, usually functions, that users of the package will call directly, from their respective modules into the __init__.py file. These are the functions, classes and variables that will be available under the package namespace. The API of the package, if you will. Not package-specific, but you really should include a .gitignore file in the repository’s root directory. This is an excellent one.

So we now have a structure into which we can add the different files that make up a package; the inner folder will hold the package’s code, and the outer one will hold auxiliary packaging files and other repository-related files.

So, let’s say that your original module, chocobo.py , look something like this to begin with:

Your new repository folder, then, should look something like this:

chocobo/

chocobo/

__init__.py

chocobo.py

.gitignore

Where __init__.py looks something like this:

That means that after we’re done users of our state-of-the-art chocobo package will be able to use it this way:

You get the gist.

Ha ha! Cause I’ve just attached a bunch of GitHub gists…. Well, moving on!

Step 3: The license

Releasing your code with some common license is a good idea; While we are talking here about releasing your code as open source, you might want to allow reuse of your code while retaining copyright, or you might wish to have people extending your code to have to keep that derivative code open, and a license can help you do just that very easily.

For a pet project, where you don’t really care what other might do with, the MIT license is probably a great idea, but you can head on to choosealicense.com for some excellent advice on the subject by GitHub and the open source community.

Whatever license you choose, it is better than not choosing a license at all. Releasing code without a license is tantamount to not releasing it at all in many cases; if you don’t explicitly wave your rights to the code, most companies will not risk the possible legal implications of using it, and so many of its potential users will look at other options.

After choosing a license, create a LICENSE file in your repository (no file extension is needed), and fill it with the exact text of the chosen license.

Step 4: The setup file

We will now create the basic file Python packaging tools — in our case setuptools — rely on; setup.py . setup.py contains the actual instructions used when building and distributing the package.

Here’s a template you can start with (don’t worry, we’ll go over everything):

So, we start by importing setuptools . It is a very useful package meant to facilitate the easy distribution of Python packages, and even though it is not included in the standard library (unlike its less powerful alternative, distutils ) it is the de facto standard for Python package distribution today and what you should be using. We are only going to use two functions from the setuptools package: setup and find_pacakges .

After importing setuptools and before calling the setup() function, we’ll just read the content of our README.md file into a convenient global variable, README .

Now, all that’s left is to call the setuptools.setup() function with the following arguments:

author : Just provide your name.

: Just provide your name. author_email : Your email.

: Your email. name : The package name. “chocobo”, in this case.

: The package name. “chocobo”, in this case. license : The string “MIT”, in this case, or the name of any other license you chose.

: The string “MIT”, in this case, or the name of any other license you chose. description : A short, one line description of your package. For example, here: “chocobo is a python package for delicious chocobo recipes”.

: A short, one line description of your package. For example, here: “chocobo is a python package for delicious chocobo recipes”. version : A string denoting the current version of the package. I hope to discuss a slightly more elegant way to handle package versioning in a future post, but for now you can just start with a number you manually increase whenever you want to release a new version. It is common practice to prefix the version number with the letter v, so v1 is an appropriate version string for your first version, but I suggest you think of v0.0.1 as an equivalent version string and use this format. Again, more on what this means in a later installment.

: A string denoting the current version of the package. I hope to discuss a slightly more elegant way to handle package versioning in a future post, but for now you can just start with a number you manually increase whenever you want to release a new version. It is common practice to prefix the version number with the letter v, so is an appropriate version string for your first version, but I suggest you think of as an equivalent version string and use this format. Again, more on what this means in a later installment. long_description : The README content goes here. Indeed, it is quite literally a long description of the package. This is the content of the package’s page on PyPI (example).

: The README content goes here. Indeed, it is quite literally a long description of the package. This is the content of the package’s page on PyPI (example). url : A URL leading to the package’s homepage. Assuming you don’t have a dedicated site, your repository’s URL is a good target.

: A URL leading to the package’s homepage. Assuming you don’t have a dedicated site, your repository’s URL is a good target. packages : Aha, here we use setuptools for the second time! This parameter takes an array of the names of all packages to build and distribute / install (depending on the command called). Technically you can just put ["chocobo"] here, but it is better to generalize and use this setuptools function that can handle more complex package and repository structure. It takes two optional parameters, where and exclude , both of which we ignore here. The result is that where is set to the directory the setup file is located in, and that no sub-directory is excluded, which is good enough for most cases.

: Aha, here we use for the second time! This parameter takes an array of the names of all packages to build and distribute / install (depending on the command called). Technically you can just put here, but it is better to generalize and use this function that can handle more complex package and repository structure. It takes two optional parameters, and , both of which we ignore here. The result is that is set to the directory the setup file is located in, and that no sub-directory is excluded, which is good enough for most cases. python_requires : If your package supports all Python versions, you can discard this parameter. Since this is very much likely not the case, you should choose an appropriate value. Technically, I would not advertise support of a version I do not test against, but for now we can just safely assume that:

(1) If you’re working with Python 2 you’re working with Python 2.7 specifically, and (a) you are a rare and wonderful bird (b) you don’t need to support anything but python 2.7, so you can provide this parameter with the ">=2.7" string. Also, please move on to Python 3; the clock is literally ticking.

(2) If you’re working with Python 3 you’re supporting any Python version larger or equal than the version you’re using to develop the package. So ">=3.5" should be fine if you’re working with Python 3.5.

: If your package supports all Python versions, you can discard this parameter. Since this is very much likely not the case, you should choose an appropriate value. Technically, I would not advertise support of a version I do not test against, but for now we can just safely assume that: (1) If you’re working with Python 2 you’re working with Python 2.7 specifically, and (a) you are a rare and wonderful bird (b) you don’t need to support anything but python 2.7, so you can provide this parameter with the string. Also, please move on to Python 3; the clock is literally ticking. (2) If you’re working with Python 3 you’re supporting any Python version larger or equal than the version you’re using to develop the package. So should be fine if you’re working with Python 3.5. install_requires : All non-standard-library package dependencies go here. For example, if our chocobo package depends on both the requests package and the pytz package, we’ll provide this parameter with the following string array: ["requests", "pytz"] .

: All non-standard-library package dependencies go here. For example, if our package depends on both the package and the package, we’ll provide this parameter with the following string array: . classifiers : Your package will soon go up on PyPI, along with tens of thousands of other packages. To help categorize and distinguish then, package authors can provide PyPI with a list of trove classifiers to categorize each release, describing who it’s for, what systems it can run on, and how mature it is. These standardized classifiers can then be used by community members to find projects based on their desired criteria (although I’m not sure who actually does that 🦆). The list of all possible classifiers can be found here; I suggest you start with these:

- “Development Status :: 3 — Alpha”

- “License :: OSI Approved :: MIT License”

- “ Programming Language :: Python”

- “ Programming Language :: Python :: 3.5”

- “ Programming Language :: Python :: 3.6”

- “ Programming Language :: Python :: 3.7”

- “Topic :: Software Development :: Libraries”

- “Topic :: Software Development :: Libraries :: Python Modules”

- “Intended Audience :: Developers”

Alrighty, then! We’re done with that! ☃️

Figure 2: Ace Ventura after choosing his trove classifiers

Step 5: Building the distribution files

Python packages are built into build distribution files, which are then uploaded to a server — usually the global PyPI server — from which they can be downloaded by everybody.

I won’t go into detail about the different possible distribution formats (you can read more about them here and here), but we are going to use the standard approach and build two files: a source distribution file— which is basically an archive of the package’s code — and a wheel build distribution file.

First, make sure you install the latest versions of setuptools and wheel :

python3 -m pip install --user --upgrade setuptools wheel

Now, to build the distribution files simply run the following command in the root folder of the repository, where your setup.py is located:

python setup.py sdist bdist_wheel

What’s happening here is that we tell the Python interpreter to run the setup.py Python script, and send it two arguments, which order it to build both a source distribution file (the sdist argument) and a wheel build distribution file (the bdist_wheel argument).

Three folders will be created in the calling directory when this command is run: build , dist and chocobo.egg-info . All three should be ignored by your .gitignore file. If these directories already exist — e.g. from previous runs of this command — it might be a good idea to delete them by running rm -rf build dist , as any valid package file under dist will be uploaded.

The two files we’re going to upload are located in the dist folder: chocobo-0.0.3-py-none.any.whl (the build distribution; a wheel file) and chocobo-0.0.3.tar.gz (the source distribution; a gzipped tar archive). Make sure they were created successfully, and we’ll move on to uploading your package!

Step 6: Uploading your package

All that remains now is to upload you packages to the global PyPI server! To do that, however, you’ll have to register to the PyPI website. Follow the steps for registration and write down the username and password you have chosen.

If you’ll like to test package upload before uploading them to the global PyPI server, you can additionally register a user at the test PyPI website.

Now, the Python package we’ll use to upload our package will look for the PyPI username and password — to authenticate with the PyPI server — in a text file named .pypirc , commonly placed in your home folder. Create it, and populate it as demonstrated below (the testpypi section is optional):

[distutils]

index-servers =

pypi

testpypi [pypi]

username: teapot48

password: myPYPIpassword

repository:

username: teapot48

password: MYtestPYPIpassword [testpypi]repository: https://test.pypi.org/legacy/ username: teapot48password: MYtestPYPIpassword

We are going to follow the up-to-date and recommended¹ way to upload files to the PyPI server, and use twine , a utility for uploading Python packages, and not the old python setup.py upload method. Simply run:

twine upload dist/*

If you want to start by testing the upload process with the test PyPI server, just run twine upload — repository testpypi dist/* .

In any case, you should see one progress bar as your .whl file is uploaded, and a second when the .tar.gz archive is uploaded, after which the upload will be complete.

Congratulations! You can now go on and have a look at your brand new Python package page on the official PyPI website, for all to see! Here’s an example:

Figure 3: An example for a package page on the PyPI website

That’s it! Your’e done! People everywhere can now run pip install chocobo and celebrate the amazing creation that is your Python package! :) ☃️

Final words

I hope you have found this post useful in understanding how to build and distribute your first Python package. If you use it to package some code, please let me know by commenting here or getting in touch with me!

There’s also an official — and concise — tutorial for packaging Python projects which can be found here. I attempted to improve upon it by adding more details and some personal best practices I picked up while building a few small open source Python projects.

Finally, I hope to touch upon a few other related subjects in some future posts: writing good README files, basic testing for Python packages (I wrote a bit about it here) and package versioning. Cheers!

Footnotes