Today I want to talk a little bit about how to setup a Python package. Hopefully you are familiar with this already, but if not, let me give a quick overview. When building anything in Python that requires more than one file, you should always put it into a package with a directory structure following the pattern:

├── setup.py ├── requirements.txt ├── modname │ ├── __init__.py │ ├── foo.py │ └── bar.py

This is important because it puts all of your code in the “modname” namespace, which ensures that all the code you write in “modname” will not conflict with other python packages on your system. You can think of “modname” directory just like you think about any other Python file; the only difference is that its code lives in the __init__.py file, and it can contain submodules. Typically, you expose your modules public API in the the __init__.py file. This is done by simply important anything from your submodules that you want to be public.

This being said lets move on to the content of this post.

Namespaces and Python’s Import Syntax

One of the reasons why Python is great is because its import system is succinct, yet highly explicit. Unlike C / C++, where including a header blindly dumps all definitions into the file, and it is up to the developer to (a) know where the header is, and (b) know what the header defines, Python imports references to specific modules or the exact variables the developer wants to have access to. The following example illustrates the importance of this. Consider a simple source files that imports two packages and calls some functions:

First in C

#include "package1.h" #include "package2.h" void main() { func1(); func2(); }

And now in Python

import package1 from package2 import func2 def main(): package1.func1() func2()

In C it is impossible to determine what package func1 and func2 belong to using only the information in this source file. However, in Python it is explicit in both cases. We can tell func1 belongs to package1 because it is accessed as the attribute of a namespace, and we can tell that func2 belongs to package2 because it explicitly imported from it. The purpose of this example isn’t to complain about the syntax of C (which has its place). The purpose of this example is to show how Python imports (or more generally, namespaces) are great for writing understandable code.

The Dreaded “Import *”

However, Python’s import system is not perfect. There is a lurking interpreter feature that is the bane of linters, code reviewers, and static analysis tools. It is the dreaded from package import * . This feature of Python allows the developer to cast away the benefits and readability provided by namespacesf and use marginally less keystrokes to access the variables they are thinking about in their current short term memory.

While most Python developers generally consider using the import * ​syntax to be bad practice, there are some cases where it does save a lot of keystrokes. Writing __init__.py files is one of these cases. If you have a lot of functions that you want to expose you might be tempted to use an import * in your __init__.py . I wouldn’t blame you if you did, but I’d like to propose an alternative solution: mkinit.

The ` mkinit` package is a simple command line tool that auto-generates the code roughly equivalent to using import * . It requires roughly the same amount of effort as typing the import * syntax, but the resulting __init__ files will have all of their attributes explicitly listed. This means that anyone reading (or statically analyzing) the file will know exactly what variables are available, and that is a powerful thing.

Autogenerating Explicit Imports

The mkinit module is a pip installable Python package. It is bundled with a command line tool that will autogenerate explicit __init__.py files. It does this by statically parsing all submodules in your package directory, inspecting the names of the public members, and generating the appropriate from package.submodule import func statements.

For instance, say we have a package that looks like this:

├── mypackage │ ├── __init__.py │ ├── foo.py │ └── bar.py

Lets assume that foo.py contains two functions: func1 and func2, bar.py contains two classes: class1 and class2.

In this case if we ran mkinit giving it the path to mypackage as an argument, then it would generate the following text:

from mypackage import foo from mypackage import bar from mypackage.foo import (func1, func2,) from mypackage.bar import (class1, class2,) __all__ = ['class1', 'class2', 'func1', 'func2', 'foo', 'bar']

This is the same as if you had written

from mypackage import foo from mypackage import bar from mypackage.foo import * from mypackage.bar import *

except (a) you don’t have to actually write any of it, (b) it generates the module imports as well as the attribute imports, and (c) it makes the standard __all__ variable for you as well. Note that like ​​ import * , mkinit will never expose “protected and private” variables prefixed with an underscore. Furthermore it will respect any __all__ variable defined in a submodule.

If the __init__ file in the target directory does not exist, then mkinit will simply create it and populate it with this text. However, if an __init__ file already exists, then mkinit will try to insert the autogenerated text in an intelligent manner. Sometimes it is desirable to have other code in an __init__ file other than exposing child variables. For instance, perhaps we want to include a __version__ = '1.0.0' in our __init__ file. How do we ensure mkinit does not clobber our code? The safest way to simply insert XML-like comments: # <AUTOGEN_INIT> and # </AUTOGEN_INIT> . If these tags are detected then it forces mkinit to insert the between those tags (and at their current indentation level). However, if you don’t specify these (rather ugly) tags all is not lost. The program will never clobber any variable definition. Also, if there are any comments in the code, then it will only consider clobbering code that comes after the final comment. This means that is is very easy to write custom __init__ logic, yet still be able to autogenerate the code that exposes your module API.

Customizing mkinit behavior

The behavior is mkinit is fairly customizable. By passing the --noattrs flag only the import module lines are generated. Likewise if you pass the --nomods flag, then only the from mod import (attr,) lines are generated. The --noall flag prevents the __all__ variable from being generated. Other options are --relative which causes imports to use dots instead of the full package name, and --ignore_all will cause mkinit to expose all public attributes in subpackages even if they contain an __all__ variable that does not specify that attribute. While are talking about flags, note there is also a --dry flag, which will just dump the autogenerated text to stdout without overwriting any files.

You can also customize which packages are exposed by defining a __submodules__ attribute in the __init__.py file. You can explicitly set this to a list of submodule names that you want to expose. Any submodule not in this list will be ignored.

Lastly, because mkinit autogenerates text, if you mostly like what it generates, but want to change something, then you can just do it manually. It only generates the text when you run the command, so you can use it to get you most of the way there and then clean it up as you see fit.

Limitations

Hopefully I’ve shown that mkinit is a powerful tool that can serve as a reasonable alternative to one of the only times where using a import * might be considered somewhat acceptable. However, the behaviors are not exactly equivalent.

First, because mkinit runs statically, it can’t pick up any definitions that are defined dynamically at runtime. However, if you are populating the members of a module dynamically like this you may want to reconsider your design as this makes reading the code much more of a challenge for an outside developer.

Second, mkinit will only pick up definitions that are defined on the top-level of the code. Functions, classes, and variables defined in if-else, try-except, or for statements might not be picked up, but mkinit does contain some fancy logic to try and pick some of these up. If tries to detect variables defined in all branches of a conditional statement, and it exposes these. I’ve found this logic to be fairly robust in most cases, but its possible there are cases that it misses.

Lastly, mkinit (currently) has absolutely no logic for handling a del statement. As such, the current version ( 0.0.4 ) will export a deleted variable if it defined anywhere on the top level. Future versions may take steps to mitigate this issue, but because usage of del is somewhat uncommon, its not high on my priority list.

Note that there is a dynamic backend to mkinit that more closely resembles the import * behavior, however (1) its not exposed in the command line tool by default, and (2) using this can lead to invalid __init__ files when a submodule defines a variable name conditionally (e.g. if sys.platform.startswith('win32'): only_on_windows = True ). Thus, it is generally safer to use the static backend because it will only export variables guaranteed to be defined on all systems (assuming no del statements).

Summary

If you are building a package and are either (a) about to spend a whole lot of time manually writing imports for your __init__ file or (b) about to use import * to save yourself some time, then stop and consider mkinit . Just

pip install mkinit cd /path/to/my/repo/pkgname mkinit .

And let mkinit generate readable, explicit, and statically analyzable code for you. If you don’t like what you get, then go back to however you were going to do it. If you do like what you get than great! That means I’ve accomplished the goal I set when making this tool: making the lives of a handful of developers just a little bit easier.

The code for mkinit is open source and on GitHub.Contributions are welcome. The link is: https://github.com/Erotemic/mkinit