Hyperparameters have inevitably been used by anyone who has practiced machine learning, either in industry, academia, or even self-study. It can be an intimidating term if you’re just starting your dive into machine learning. And once you’ve figured out how to pronounce the term, the key to understanding hyperparameters is knowing how they differ from parameters.

It took me a bit more than a simple Google search to fill in the details. In this blog post, I’m going to share what I’ve learned, and at the end of the post we should be able to:

Demystify the meaning of hyperparameters and explain how they differ from parameters

Understand the importance of using hyperparameters

Optimize and tune hyperparameters in Sci-kit learn using Grid Search and Randomized Search

What are Hyperparameters?

Hyperparameters are configuration variables that are external to the model and whose values cannot be estimated from data. That is to say, they can’t be learned directly from the data in standard model training. They’re almost always specified by the machine learning engineer prior to training.

And we do this specification by trial and error until a best prediction score is obtained. Let’s take the simple Support Vector Machine (SVM) example below and use it to explain hyperparameters even further. SVM picks a hyperplane separating the data, but maximizes the margin. (For more information about SVMs and how they work visit here)

Fig. 1 C=2000

Figure 2. with C=45

In the two instances visualized above, we can clearly see the impact of having different C values on the model. The C value represents the regularization constant. Very high C values will have a large penalty for non-separable points, and this can often cause overfitting, as seen in figure 1 (top).

In figure 2 (bottom), I had to manually input different C values before the data points could be separated into their respective classes ( C=45 ). This can be an extremely difficult task (we’ll later see how to optimize this in sklearn).

Parameters, on the other hand, can be learned from data and don’t need to be manually set by the ML engineer. They are internal to the model. Examples of parameters include the coefficients of a linear and logistic regression.