Let’s develop a neural network assembly line that allows us to easily experiment with numerous model configurations.

Assembly Line of Neural Networks (Source: all_is_magic on Shutterstock & Author)

Optimizing machine learning (ML) models is not an exact science. The best model architecture, optimization algorithm, and hyperparameter settings depend on the data you’re working with. Thus, being able to quickly test several model configurations is imperative in maximizing productivity & driving progress in your ML project. In this article, we’ll create an easy-to-use interface which allows you to do this that resembles an assembly line for ML models.

Deep learning models are governed by a set of hyperparameters, and we can create functions that generalize to these hyperparameters and build ad hoc models. Here are the primary hyperparameters that govern neural networks:

Number of hidden layers

Number of neurons per layer

Activation function

Optimization algorithm

Learning rate

Regularization technique

Regularization hyperparameters

We can package these into a hash table:

model_info = {}

model_info['Hidden layers'] = [100] * 6

model_info['Input size'] = og_one_hot.shape[1] - 1

model_info['Activations'] = ['relu'] * 6

model_info['Optimization'] = 'adadelta'

model_info["Learning rate"] = .005

model_info["Batch size"] = 32

model_info["Preprocessing"] = 'Standard'

model_info["Lambda"] = 0

model_2['Regularization'] = 'l2'

model_2['Reg param'] = 0.0005

Before we begin experimenting with various model architectures, let’s visualize the data to see what we’re working with. Although standard scaling is the de facto preprocessing method, I visualized the data using a variety of preprocessing tactics. I used PCA and t-SNE to reduce the dimensionality of the data for each preprocessing method. Below are the data visualizations which appear to be the most separable:

Source: Author

We can then define a function that will construct & compile a neural network given a hyperparameter hash table:

We can quickly test a few baseline models now that we have a fast, flexible way of constructing and compiling neural networks. This allows us to draw quick inferences about what hyperparameters seem to be working best:

Using the function above, I discovered that deeper and wider architectures are necessary to obtain high performance on the data after evaluating over a dozen model architectures with 5-fold cross validation. This is most likely due to the non-linear structure of our data. The graphs below are illustrative of diminishing returns. Increasing the number of hidden units in each layer from 15 to 120 results in a notable performance improvement on the training data, but virtually no performance on the test data. This is a sign that the model is overfitting — the performance on the training set is not generalizing to the test data.

Aside: If you’re not familiar with k-fold cross validation, it’s a model evaluation technique that involves divvying up the data into K disjoint partitions. One of those partitions is utilized as the test set and the rest of them as the training set. We then iterate through each fold so that every partition has a turn being the test set. Performing k-fold cross validation allows us to obtain a robust assessment of the model’s performance.