Implementing a dense neural network for classification with TensorFlow

In part one, we used the diabetes dataset. For consistency and to create a useful comparison, we’ll use this dataset again to predict if a patient is diabetic or not (bi-class).

First things first. For our model to work correctly, we need our data to be correct and make sure that features are on a similar scale. Please make sure you have created the diabetes pandas dataframe, because we’ll be using it to create the dense neural model. Link

Apart from data normalization, we also need to perform some feature column refactoring to make it work with our dense neural network classifier.

FYI: I ran into a problem where TensorFlow was not able to find the feature_column attribute— to resolve this, make sure you have the most recent version of TensorFlow.

Lets start coding

We start by training our model. To do this, we’ll use sklearn to split our data into training and test sets so we can later check our predictions with the actual data.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(x_data,labels,test_size=0.33, random_state=101)

To train the model, we’ll need the data from train_test_split , and we’ll also need to create the input function from TensorFlow’s pandas input function (Pandas specifically because we’re using the pandas data frame).

inputFunction = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train,batch_size=100,num_epochs=1000,shuffle=True)

Use the Input function to train the model on the data we have just created.

dnnClassifierModel = tf.estimator.DNNClassifier(hidden_units=[512, 256, 128], feature_columns=featureColumns, n_classes=2, activation_fn=tf.nn.tanh,optimizer=lambda: tf.train.AdamOptimizer( learning_rate=tf.train.exponential_decay(learning_rate=0.001, global_step=tf.train.get_global_step(),decay_steps=1000,decay_rate=0.96))) dnnClassifierModel.train(input_fn=input_func,steps=1000)

Our DNN model will be created with three layers of 16,777,216 i.e; (512 x 256 x 128) neurons with our first layer containing 512 neurons, 256 neurons in the second, and 128 in third neurons — and remember, all these layers will be densely connected. The number of classes is 2, because we’re classifying the result into two categories.

Depending on your machine training the model might take a bit of time. Like with my laptop of 64 bit processor and 8 GB RAM, it took me around 8 seconds and which can vary.

GROUP column is a categorical column with categories as [A,B,C,D] , thus with DNNClassifier otherwise it will give ERROR.

ValueError: Items of feature_columns must be a _DenseColumn. You can wrap a categorical column with an embedding_column or indicator_column. Given: _VocabularyListCategoricalColumn(key='Group', vocabulary_list=('A', 'B', 'C', 'D'), dtype=tf.string, default_value=-1, num_oov_buckets=0)

Therefore, to overcome this error we’ll convert the assigned_group categorical column to an embedded_column . This will require some code changes: