When we think about machine learning, the first languages that come to mind are Python or R. This is understandable because they provide us with many possibilities to implement these algorithms.

However, I work in C# daily and my attention has been attracted by the quite fresh library that is ML.NET. In this article, I would like to show how to implement the Naive Bayes Classifier in Python language using Scikit-learn, and also in C# with the use of the mentioned earlier ML.NET.

Naive Bayes Classifier

Naive Bayes classifier is a simple, probabilistic classifier that assumes mutual independence of independent variables. It is based on the Bayes’ theorem, which is expressed mathematically as follows:

Dataset

I used the wine quality dataset from the UCI Machine Learning Repository for the experiment. The analyzed data set has 11 features and 11 classes. The classes determine the quality of the wine in the numerical range 0–10.

ML.NET

The first step is to create a console application project. Then you have to download the ML.NET library from NuGet Packages. Now you can create classes that correspond to the attributes in the dataset. Created classes are shown in the listing:

Then you can go on to load the dataset and divide it into a training set and a testing set. I have adopted a standard structure here, i.e. 80% of the data is a training set, while the rest is a testing set.

var dataPath = "../../../winequality-red.csv"; var ml = new MLContext(); var DataView = ml.Data.LoadFromTextFile<Features>(dataPath, hasHeader: true, separatorChar: ';');

Now it is necessary to adapt the model structure to the standards adopted by the ML.NET library. This means that the property specifying the class must be called Label. The remaining attributes must be condensed under the name Features.

var partitions = ml.Data.TrainTestSplit( DataView,

testFraction: 0.3); var pipeline = ml.Transforms.Conversion.MapValueToKey(

inputColumnName: "Quality", outputColumnName: "Label")

.Append(ml.Transforms.Concatenate("Features", "FixedAcidity", "VolatileAcidity","CitricAcid", "ResidualSugar", "Chlorides", "FreeSulfurDioxide", "TotalSulfurDioxide","Density", "Ph", "Sulphates", "Alcohol")).AppendCacheCheckpoint(ml);

Once you have completed the previous steps, you can move on to creating a training pipeline. Here you choose a classifier in the form of Naive Bayes Classifier, to which you specify in the parameters the column names of the label and features. You indicate the property that means the predicted label too.

var trainingPipeline = pipeline.Append(ml.MulticlassClassification.Trainers.

NaiveBayes("Label","Features"))

.Append(ml.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

Finally, you can move on to training and testing the model. Everything is closed in two lines of code.

var trainedModel = trainingPipeline.Fit(partitions.TrainSet); var testMetrics = ml.MulticlassClassification.

Evaluate(trainedModel.Transform(partitions.TestSet));

Scikit-learn

In the case of Python implementation, we also start with the handling of dataset files. We use numpy and pandas libraries for this. In the listing, you can see functions that are used to retrieve data from the file and create ndarray from it, which will then be used for the algorithm.

from sklearn.naive_bayes import GaussianNB

from common.import_data import ImportData

from sklearn.model_selection import train_test_split if __name__ == "__main__":

data_set = ImportData()

x = data_set.import_all_data()

y = data_set.import_columns(np.array(['quality']))

The next step is to create a training and test set. In this case, we also use a 20% division for the test set and 80% for the training set. I used the train_test_split function, which comes from the library sklearn.

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

Now you can move on to the Naive Bayes Classifier. In this case, training and testing are also closed in a few lines of code.

NB = GaussianNB()

NB.fit(X_train, y_train.ravel())

predictions = NB.predict(X_test)

print('Scores from each Iteration: ', NB.score(X_test, y_test))

Results and summary

The accuracy of the Naive Bayes Classifier for Scikit-learn implementation was 56.5%, while for ML.NET it was 41.5%. The difference may be due to other ways of algorithm implementation, but based on the accuracy alone we cannot say which is better. However, we can say that a promising alternative to the implementation of machine learning algorithms is beginning to emerge, which is the use of C# and ML.NET.