Recently, I have been playing with some ideas about applying static analysis to Python and building a Python editor in JetBrains MPS.

To do any of this, I would need to first build a model of Python code. Recently, we have seen how to parse Python code, however, we still need to consider all the packages our code uses. Some of those could be built-in or implemented through C extensions. That means we do not have Python code for them. In this post, I look into retrieving a list of all modules and then inspect their contents.

My strategy is to use reflection writing scripts in Python. I will then invoke those scripts from inside JetBrains MPS (and so, from Java code). However, this is the topic of a future post.

Listing Modules

Listing top modules is relatively easy if you know how to do it. This script prints a list of all top-level modules:

import pkgutil for p in pkgutil.iter_modules(): print(p[1])

Now, we need to look inside modules to find sub-modules. For performance reasons, I want to do that only when it is necessary:

import pkgutil import sys def explore_package(module_name): loader = pkgutil.get_loader(module_name) for sub_module in pkgutil.walk_packages([loader.filename]): _, sub_module_name, _ = sub_module qname = module_name + "." + sub_module_name print(qname) explore_package(qname) explore_package(sys.argv[1])

For example, for xml I get:

xml.dom xml.dom.NodeFilter xml.dom.domreg xml.dom.expatbuilder xml.dom.minicompat xml.dom.minidom xml.dom.pulldom xml.dom.xmlbuilder xml.etree xml.etree.ElementInclude xml.etree.ElementPath xml.etree.ElementTree xml.etree.cElementTree xml.parsers xml.parsers.expat xml.sax xml.sax._exceptions xml.sax.expatreader xml.sax.handler xml.sax.saxutils xml.sax.xmlreader

Examining Module Contents and Recognizing Functions

Now, given a module, I need to list all of its contents. I can load the module by name and iterate over it, printing information about the elements found.

I want to distinguish between classes, submodules (which I will ignore for now), functions, and simple values.

Built-in functions need to be treated differently—to access their information, I need to parse their documentation. Not cool... not cool at all.

import sys import inspect def describe_builtin(obj): """ Describe a builtin function """ # Built-in functions cannot be inspected by # inspect.getargspec. We have to try and parse # the __doc__ attribute of the function. docstr = obj.__doc__ args = '' if docstr: items = docstr.split('

') if items: func_descr = items[0] s = func_descr.replace(obj.__name__,'') idx1 = s.find('(') idx2 = s.find(')',idx1) if idx1 != -1 and idx2 != -1 and (idx2>idx1+1): args = s[idx1+1:idx2] return args package_name = sys.argv[1].strip() mymodule = __import__(package_name, fromlist=['foo']) for element_name in dir(mymodule): element = getattr(mymodule, element_name) if inspect.isclass(element): print("class %s" % element_name) elif inspect.ismodule(element): pass elif hasattr(element, '__call__'): if inspect.isbuiltin(element): sys.stdout.write("builtin_function %s" % element_name) data = describe_builtin(element) data = data.replace("[", " [") data = data.replace(" [", " [") data = data.replace(" [, ", " [") sys.stdout.write(data.replace(", ", " ")) print("") else: try: data = inspect.getargspec(element) sys.stdout.write("function %s" % element_name) for a in data.args: sys.stdout.write(" ") sys.stdout.write(a) if data.varargs: sys.stdout.write(" *") sys.stdout.write(data.varargs) print("") except: pass else: print("value %s" % element_name)

This is what I get for the module os:

value EX_CANTCREAT value EX_CONFIG value EX_DATAERR value EX_IOERR value EX_NOHOST value EX_NOINPUT value EX_NOPERM value EX_NOUSER value EX_OK value EX_OSERR value EX_OSFILE value EX_PROTOCOL value EX_SOFTWARE value EX_TEMPFAIL value EX_UNAVAILABLE value EX_USAGE value F_OK value NGROUPS_MAX value O_APPEND value O_ASYNC value O_CREAT value O_DIRECT value O_DIRECTORY value O_DSYNC value O_EXCL value O_LARGEFILE value O_NDELAY value O_NOATIME value O_NOCTTY value O_NOFOLLOW value O_NONBLOCK value O_RDONLY value O_RDWR value O_RSYNC value O_SYNC value O_TRUNC value O_WRONLY value P_NOWAIT value P_NOWAITO value P_WAIT value R_OK value SEEK_CUR value SEEK_END value SEEK_SET value ST_APPEND value ST_MANDLOCK value ST_NOATIME value ST_NODEV value ST_NODIRATIME value ST_NOEXEC value ST_NOSUID value ST_RDONLY value ST_RELATIME value ST_SYNCHRONOUS value ST_WRITE value TMP_MAX value WCONTINUED builtin_function WCOREDUMPstatus builtin_function WEXITSTATUSstatus builtin_function WIFCONTINUEDstatus builtin_function WIFEXITEDstatus builtin_function WIFSIGNALEDstatus builtin_function WIFSTOPPEDstatus value WNOHANG builtin_function WSTOPSIGstatus builtin_function WTERMSIGstatus value WUNTRACED value W_OK value X_OK class _Environ value __all__ value __builtins__ value __doc__ value __file__ value __name__ value __package__ function _execvpe file args env function _exists name builtin_function _exitstatus function _get_exports_list module function _make_stat_result tup dict function _make_statvfs_result tup dict function _pickle_stat_result sr function _pickle_statvfs_result sr function _spawnvef mode file args env func builtin_function abort builtin_function accesspath mode value altsep builtin_function chdirpath builtin_function chmodpath mode builtin_function chownpath uid gid builtin_function chrootpath builtin_function closefd builtin_function closerangefd_low fd_high builtin_function confstrname value confstr_names builtin_function ctermid value curdir value defpath value devnull builtin_function dupfd builtin_function dup2old_fd new_fd value environ class error function execl file *args function execle file *args function execlp file *args function execlpe file *args builtin_function execvpath args builtin_function execvepath args env function execvp file args function execvpe file args env value extsep builtin_function fchdirfildes builtin_function fchmodfd mode builtin_function fchownfd uid gid builtin_function fdatasyncfildes builtin_function fdopenfd [mode='r' [bufsize]] builtin_function fork builtin_function forkpty builtin_function fpathconffd name builtin_function fstatfd builtin_function fstatvfsfd builtin_function fsyncfildes builtin_function ftruncatefd length builtin_function getcwd builtin_function getcwdu builtin_function getegid function getenv key default builtin_function geteuid builtin_function getgid builtin_function getgroups builtin_function getloadavg builtin_function getlogin builtin_function getpgidpid builtin_function getpgrp builtin_function getpid builtin_function getppid builtin_function getresgid builtin_function getresuid builtin_function getsidpid builtin_function getuid builtin_function initgroupsusername gid builtin_function isattyfd builtin_function killpid sig builtin_function killpgpgid sig builtin_function lchownpath uid gid value linesep builtin_function linksrc dst builtin_function listdirpath builtin_function lseekfd pos how builtin_function lstatpath builtin_function majordevice builtin_function makedevmajor minor function makedirs name mode builtin_function minordevice builtin_function mkdirpath [mode=0777] builtin_function mkfifofilename [mode=0666] builtin_function mknodfilename [mode=0600 device] value name builtin_function niceinc builtin_function openfilename flag [mode=0777] builtin_function openpty value pardir builtin_function pathconfpath name value pathconf_names value pathsep builtin_function pipe builtin_function popencommand [mode='r' [bufsize]] function popen2 cmd mode bufsize function popen3 cmd mode bufsize function popen4 cmd mode bufsize builtin_function putenvkey value builtin_function readfd buffersize builtin_function readlinkpath builtin_function removepath function removedirs name builtin_function renameold new function renames old new builtin_function rmdirpath value sep builtin_function setegidgid builtin_function seteuiduid builtin_function setgidgid builtin_function setgroupslist builtin_function setpgidpid pgrp builtin_function setpgrp builtin_function setregidrgid egid builtin_function setresgidrgid egid sgid builtin_function setresuidruid euid suid builtin_function setreuidruid euid builtin_function setsid builtin_function setuiduid function spawnl mode file *args function spawnle mode file *args function spawnlp mode file *args function spawnlpe mode file *args function spawnv mode file args function spawnve mode file args env function spawnvp mode file args function spawnvpe mode file args env builtin_function statpath builtin_function stat_float_times [newval] class stat_result builtin_function statvfspath class statvfs_result builtin_function strerrorcode builtin_function symlinksrc dst builtin_function sysconfname value sysconf_names builtin_function systemcommand builtin_function tcgetpgrpfd builtin_function tcsetpgrpfd pgid builtin_function tempnam [dir [prefix]] builtin_function times builtin_function tmpfile builtin_function tmpnam builtin_function ttynamefd builtin_function umasknew_mask builtin_function uname builtin_function unlinkpath builtin_function unsetenvkey builtin_function urandomn builtin_function utimepath (atime mtime builtin_function wait builtin_function wait3options builtin_function wait4pid options builtin_function waitpidpid options function walk top topdown onerror followlinks builtin_function writefd strin

Of course, for functions I want to build a model of its interface (which parameters it takes, which ones are optional, which ones are variadic and so on). We have the information needed here, it is just a matter of transforming it into a representable form.

Conclusions

I still need to build a model of the imported classes but I'm starting to have a decent model of the elements which I can import into my Python code. This would permit me to easily verify which import statements are valid. Of course, this can be used in combination with virtualenvs and requirements files—given a list of requirements, I would install them in a virtualenv and build the model of the modules available in that virtualenv. I could then statically verify which import would work in that context.