One-Hot Encoding takes a single integer and produces a vector where a single element is 1 and all other elements are 0, like [ 0 , 1 , 0 , 0 ] [0, 1, 0, 0] [0,1,0,0].

For example, imagine we’re working with categorical data, where only a limited number of colors are possible: red, green, or blue. One way we could represent this numerically is by assigning each color a number:

Color Value Red 0 Green 1 Blue 2

This is known as integer encoding. For Machine Learning, this encoding can be problematic - in this example, we’re essentially saying “green” is the average of “red” and “blue”, which can lead to weird unexpected outcomes.

It’s often more useful to use the one-hot encoding instead:

Color Integer Encoding One-Hot Encoding Red 0 [ 1 , 0 , 0 ] [1, 0, 0] [ 1 , 0 , 0 ] Green 1 [ 0 , 1 , 0 ] [0, 1, 0] [ 0 , 1 , 0 ] Blue 2 [ 0 , 0 , 1 ] [0, 0, 1] [ 0 , 0 , 1 ]

This is much more useful to pass into something like a neural network.

One-Hot Encoding in Python

Below are several different ways to implement one-hot encoding in Python.

Using scikit-learn’s OneHotEncoder:

from sklearn . preprocessing import OneHotEncoder encoder = OneHotEncoder ( sparse = False ) print ( encoder . fit_transform ( [ [ 'red' ] , [ 'green' ] , [ 'blue' ] ] ) ) ''' [[0. 0. 1.] [0. 1. 0.] [1. 0. 0.]] '''

Keras

Using Keras’s to_categorical:

from keras . utils import to_categorical print ( to_categorical ( [ 0 , 1 , 2 ] ) ) ''' [[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]] '''

NumPy

Using NumPy: