In a large Python program such as IMVU, startup time is dominated by Python module imports. Take these warm timings:

$ time python -c 'None' real 0m0.096s user 0m0.077s sys 0m0.061s $ time python -c 'import urllib2' real 0m0.314s user 0m0.155s sys 0m0.186s

That's 300ms for a single basic dependency. Importing the entire IMVU client takes 1.5s warm and 20s cold on a typical user's machine.

The IMVU client's loading progress bar imports modules bottom-up; that is, leaf modules are imported before their parents. The root module is imported last.

Implementing a bottom-up import sequence requires generating a graph of dependencies between modules:

def get_dependencies(module_name): """\ Takes a module name as input (e.g. 'xml.dom') and returns a set of (lhs, rhs) tuples where lhs and rhs are module names and lhs imports rhs. """ # module_from_key is a dict from a module key, an arbitrary # object, to a module object. While importing, we discover # dependencies before we have access to the actual module objects. # import_dependencies is a list of (lhs, rhs) tuples where lhs and # rhs are module keys, and module_from_key[lhs] imported # module_from_key[rhs]. root_key = object() module_from_key = {root_key: __main__} import_dependencies = [] stack = [root_key] def import_in_stack(key, name, globals, locals, fromlist, level): stack.append(key) try: return original_import(name, globals, locals, fromlist, level) finally: stack.pop() import __builtin__ original_import = __builtin__.__import__ def my_import(name, globals=globals(), locals=locals(), fromlist=[], level=-1): # fromlist is a whore. Most of the complexity in this # function stems from fromlist's semantics. See # http://docs.python.org/library/functions.html#__import__ # If a module imports 'xml.dom', then the module depends on # both 'xml' and 'xml.dom' modules. dotted = name.split('.') for i in range(1, len(dotted)): my_import('.'.join(dotted[0:i]), globals, locals, [], level) module_key = object() parent_key = stack[-1] def add_dependency_from_parent(key, m): module_from_key[key] = m import_dependencies.append((parent_key, key)) submodule = import_in_stack(module_key, name, globals, locals, ['__name__'], level) add_dependency_from_parent(module_key, submodule) for f in (fromlist or []): from_key = object() module = import_in_stack(from_key, name, globals, locals, [f], level) if f == '*': continue submodule = getattr(module, f) if isinstance(submodule, types.ModuleType): add_dependency_from_parent(from_key, submodule) return original_import(name, globals, locals, fromlist, level) # Import module_name with import hook enabled. original_modules = sys.modules.copy() __builtin__.__import__ = my_import try: my_import(module_name) finally: __builtin__.__import__ = original_import sys.modules.clear() sys.modules.update(original_modules) assert stack == [root_key], stack return sorted(set( (module_from_key[lhs].__name__, module_from_key[rhs].__name__) for lhs, rhs in import_dependencies ))

(You can view all of the code at SourceForge).

First, we install an __import__ hook that discovers import dependencies between modules, and convert those relationships into a directed graph:

Then, we merge cycles. If module A imports B, B imports C, and C imports A, then it doesn't matter which you import first. Importing A is equivalent to importing B or C. After this step, we have a DAG:

Finally, we can postorder traverse the DAG to determine a good import sequence and cost (approximated as the number of modules in the cycle) per import:

1 xml 3 xml.dom 1 copy_reg 1 types 1 copy 1 xml.dom.NodeFilter 1 xml.dom.xmlbuilder 1 xml.dom.minidom 1 __main__

Now let's look at some less-trivial examples. urllib2:

SimpleXMLRPCServer:

imvu.task:

Final notes: Other people have solved this problem with bytecode scanning, but we wanted to know the actual modules imported for an accurate progress bar. A simpler __import__ hook could have calculate the correct import sequence, but I find a visual representation of module dependencies to have additional benefits.