In the process of making the v1.0.0 release of git-pandas, I had one primary goal: to simplify and solidify the interface to git-pandas objects (the ProjectDirectory and the Repository). At the end of the day, the usefulness of a project like git-pandas versus one off analysis or rolling your own interface is consistent and predictable interfaces to commonly used functions.

So with that in mind, I was interested in the various input parameters for the functions. What I wanted to avoid was something like:

df = repo.functionA(file_extension='py') df = repo.functionB(file_ext='py')

So to quickly get an idea of where things stood, I looked to the inspect module in the standard python library. With this, we can load git-pandas into memory, find all of the classes in it, and get a dictionary of the arguments to each function. I've actually left this script in the git-pandas repo here, if you'd like to use it, but let's dig into it here:

The first step is to extract the objects (classes and functions are really what we are interested in here) from the module. In our case, we are just looking at those objects directly importable via "from gitpandas import foo".

def extract_objects(m, classes=True, functions=False): # add in the classes at this level out = {} if classes: m_dict = {k: v for k, v in m.__dict__.items() if inspect.isclass(v)} out.update(m_dict) if functions: m_dict = {k: v for k, v in m.__dict__.items() if inspect.isfunction(v)} out.update(m_dict) return out

Here we use the dict attribute of the module to iterate through the objects stored in it, checking if they are classes or functions, and shoving them into a dictionary for further analysis.

Next we need to find the arguments for each function, or function in a class:

def get_signatures(m, remove_self=True): if remove_self: excludes = ['self'] else: excludes = [] out = {} for key in m.keys(): try: for k, v in m[key].__dict__.items(): try: out[str(key) + '.' + k] = [x for x in list(inspect.getargspec(v).args) if x not in excludes] except: pass except: out[key] = [x for x in list(inspect.getargspec(m[key]).args) if x not in excludes] return out

To denote class methods, we use class.method notation. Optionally, we can exclude the 'self' parameter which is convention to use as the instance variable in class methods.

Finally, we can take this dictionary of functions and arguments, and find the unique set of arguments for the module.

def get_distinct_params(m): out = set() for k in m.keys(): out.update(m[k]) return out

So pulling these three together, we can just do:

sigs = get_signatures(extract_objects(module)) print(get_distinct_params(sigs))

And find out that git-pandas has only a handful of possible arguments:

extensions: a list of file extensions to analyze by: a categorical option for how to aggregate or pivot a dataframe (e.g.: by author or by project) branch: the git branch to analyze limit: a max number of rows to return verbose: whether or not to log out detailed information filename: a specific file to analyze committer: a boolean for whether to perform analysis on the committer (as opposed to author) working_dir: the directory your repository or repositories are in ignore_dir: a list of directories to ignore in the analysis coverage: a boolean for whether or not to include coverage data in a resultset skip: an integer of rows to skip in the return set (so limit 10, skip 2 would return rows 0, 2, 4, 6, ... and 18) num_datapoints: a total number of datapoints (evenly spaced across the whole dataset) to return normalize: boolean for whether to return normalized or absolute values ignore_repos: which repositories to ignore when assembling a ProjectDirectory days: the number of days of data to return (since now) rev: the specific revision to analyze

Going forward the aim will continue to be keeping this list short and logical while growing the functionality.

So check out the new release on PyPI, or at the source:

https://github.com/wdm0006/git-pandas