The solution is parfit!

Introducing: parfit, a new package for hyper-parameter optimization, which (using parallel processing) allows the user to perform an exhaustive grid search on a model. This package has the following advantages:

Validate: Flexibly specify the validation set to score on

Flexibly specify the validation set to score on Score: Flexibly choose the scoring metric

Flexibly choose the scoring metric Visualize: Optionally plot the scores over the grid of hyper-parameters entered

Optionally plot the scores over the grid of hyper-parameters entered Optimize: Automatically return the best model, associated hyper-parameters, and score of that model

Automatically return the best model, associated hyper-parameters, and score of that model Easy to use: Do all of this with only one function call

Sounds great, how do i use it?

To install the package, enter the following command in your terminal:

pip install parfit

Then, in your Jupyter notebook (or python program) run the following line to import the module:

import parfit.parfit as pf

Now, you are ready to rock and roll! In just a couple lines of code, you can find the best set of hyper-parameters for your validation set. Here is an example of how to perform an exhaustive search over a parameter grid for RandomForestClassifier. Notice below that I have only specified the RandomForestClassifier class, not instantiated the object with parentheses ().

import numpy as np

from sklearn.model_selection import ParameterGrid

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import roc_auc_score paramGrid = ParameterGrid({

'min_samples_leaf': [1,3,5,10,15,25,50,100,125,150,175,200],

'max_features': ['sqrt', 'log2', 0.4, 0.5, 0.6, 0.7],

'n_estimators': [60],

'n_jobs': [-1],

'random_state': [42]

}) best_model, best_score, all_models, all_scores = pf.bestFit(RandomForestClassifier, paramGrid,

X_train, y_train, X_val, y_val,

metric=roc_auc_score, bestScore='max', scoreLabel='AUC') print(best_model)

From inspecting this grid, we can see that on these data, the highest scoring models used max_features = 0.4 or 0.6 and min_samples_leaf = 100. Not only that, we’ve automatically returned the model, with the single set of parameters which maximized the score on the validation set. Additionally, we’ve returned the score of that best model, so we know how we’re doing.

This package has support for varying any number of parameters, but only plotting support for a grid varying over 1–3 parameters (due to visualization limitations).

The parfit github page has documentation for each function in the package and notes on how to use the functions. There is also a parfit_ex.ipynb that gives examples of how to use for varying 1, 2, and 3 parameters on a generated data set.

My colleague and fellow Master’s in Analytics candidate at the University of San Francisco, Vinay Patlolla, wrote an excellent post in which he utilized parfit to show how to make SGD Classifier perform as well as Logistic Regression. In that post, there is also brief section on how to use parfit.