Prerequisites

Create a console application

Open Visual Studio 2017. Select File > New > Project from the menu bar. In the New Project* dialog, select the Visual C# node followed by the .NET Core node. Then select the Console App (.NET Core) project template. In the Name text box, type "BbcNewsClassifier" and then select the OK button. Extract dataset from the zip file into the project folder Install the Microsoft.ML NuGet Package: In Solution Explorer, right-click on your project and select Manage NuGet Packages. Choose "nuget.org" as the Package source, select the Browse tab, search for Microsoft.ML, select that package in the list and select the Install button. Select the OK button on the Preview Changes dialog and then select the I Accept button on the License Acceptance dialog if you agree with the license terms for the packages listed.

Create classes and define paths

Add the following additional using statements to the top of the Program.cs file:

using System; using Microsoft.ML.Models; using Microsoft.ML.Runtime; using Microsoft.ML.Trainers; using Microsoft.ML.Transforms; using System.Collections.Generic; using System.Linq; using Microsoft.ML; using System.IO;

Main

static string _testSet = "news-test.txt"; static string _trainingSet = "news-train.txt"; static List<string> _categories = new List<string> { "business", "entertainment", "politics", "sport", "tech" };

using Microsoft.ML.Runtime.Api; public class NewsData { [Column(ordinal: "0")] public string Text; [Column(ordinal: "1", name: "Label")] public string Label; } public class NewsPrediction { [ColumnName("Score")] public float[] Score; }

NewsData

string

Label

Column

Label

NewsPrediction

Score

ColumnName

Label

Score

Data preparation

Console.WriteLine("Hello World!")

Main

PrepareData();

PrepareData

Read all news articles by categories

Get the first two paragraphs from each article (news headline and a summary)

Cleanup new lines and tabs

Separate articles in each category between training and test sets (20% - test set and 80% - training)

Save train and test datasets in separate files

PrepareData

Main

private static void PrepareData() { File.Delete(_trainingSet); File.Delete(_testSet); var basePath = "bbc/"; var training = new List<NewsData>(); var test = new List<NewsData>(); for(var i = 0; i < _categories.Count(); i++) { var category = _categories[i]; var path = basePath + category + "/"; var files = Directory.GetFiles(path); var texts = new List<string>(); foreach (var file in files) { var text = File.ReadAllText(file); var textParts = text.Split("

").ToList(); textParts.RemoveAll(s => string.IsNullOrEmpty(s)); text = textParts[0] + " " + textParts[1]; text = text.Replace(Environment.NewLine, " "); text = text.Replace("

", " "); text = text.Replace("\r", " "); text = text.Replace(" ", " "); texts.Add(text); } texts = texts.OrderBy(s => _random.Next()).ToList(); var trainingTextsCount = (texts.Count / 100) * 80; var trainingTexts = texts.GetRange(0, trainingTextsCount); training.AddRange(trainingTexts.Select(s => new NewsData { Text = s, Label = category }).ToList()); var testTexts = texts.GetRange(trainingTextsCount, texts.Count - trainingTextsCount); test.AddRange(testTexts.Select(s => new NewsData { Text = s, Label = category }).ToList()); } File.AppendAllLines(_testSet, test.Select(s => $"{s.Text}\t{s.Label}")); File.AppendAllLines(_trainingSet, training.Select(s => $"{s.Text}\t{s.Label}")); }

Training model

Main

PrepareData

var model = Train();

Train

Load training data

Preprocess and featurize the data

Train the model

private static PredictionModel<NewsData, NewsPrediction> Train() { var pipeline = new LearningPipeline(); pipeline.Add(new TextLoader<NewsData>(_trainingSet, useHeader: false, separator: "tab")); pipeline.Add(new TextFeaturizer("Features", "Text") { KeepDiacritics = false, KeepPunctuations = false, TextCase = TextNormalizerTransformCaseNormalizationMode.Lower, OutputTokens = true, Language = TextTransformLanguage.English, StopWordsRemover = new PredefinedStopWordsRemover(), VectorNormalizer = TextTransformTextNormKind.L2, CharFeatureExtractor = new NGramNgramExtractor() { NgramLength = 3, AllLengths = false }, WordFeatureExtractor = new NGramNgramExtractor() { NgramLength = 3, AllLengths = true } }); pipeline.Add(new Dictionarizer("Label")); pipeline.Add(new StochasticDualCoordinateAscentClassifier()); return pipeline.Train<NewsData, NewsPrediction>(); }

StochasticDualCoordinateAscentClassifier

Evaluate the model

Evaluate

Train

Evaluate

Train

public static void Evaluate(PredictionModel<NewsData, NewsPrediction> model) { var testData = new TextLoader<NewsData>(_trainingSet, useHeader: false, separator: "tab"); var evaluator = new ClassificationEvaluator(); var metrics = evaluator.Evaluate(model, testData); Console.WriteLine(); Console.WriteLine("PredictionModel quality metrics evaluation"); Console.WriteLine("------------------------------------------"); Console.WriteLine($"AccuracyMacro: {metrics.AccuracyMacro:P2}"); Console.WriteLine($"AccuracyMicro: {metrics.AccuracyMicro:P2}"); Console.WriteLine($"LogLoss: {metrics.LogLoss:P2}"); }

Main

Train

Evaluate(model);

Results

PredictionModel quality metrics evaluation ------------------------------------------ AccuracyMacro: 99,71% AccuracyMicro: 99,70% LogLoss: 13,33%

P.S.

model.WriteAsync(_modelPath); model = PredictionModel.ReadAsync (_modelPath);

_modelPath

This sample tutorial illustrates using ML.NET to create a multiclass classifier via a .NET Core console application using C# in Visual Studio 2017.The sample is the console app that uses ML.NET to train a model that classifies and predicts the category of the news headlines. It also evaluates the model with a second dataset for quality analysis. The news articles datasets are from the BBC News You need to create two global variables to hold the path to the training and test sets.Add the following code to the line right above themethod to specify required global variables:You need to create some classes for your input data and predictions:is the input dataset class and has a) that has a value for news category, and a string for the news headline (Text). Both fields haveattributes attached to them. This attribute describes the order of each field in the data file, and which is thefield.is the class used for prediction after the model has been trained. It has a float[] array (Score) and aattribute. Theis used to create and train the model, and it's also used with a second dataset to evaluate the model. Theis used during prediction and evaluation. For evaluation, an input with training data, the predicted values, and the model are used.BBC News dataset consists of 5 folders (one for each category: business, entertainment, politics, sport, tech). Each folder has files with news articles. We need to pre-process this data before we can continue.In thefile, replace theline with the following code in themethod:Themethod executes the following tasks:Create themethod, just after themethod, using the following code:In themethod aftermethod call add following lines:Themethod executes the following tasks:Here we usedas a classification method, but you can also try other available methods in ML.NET and see how this will affect the results.Now that you've created and trained the model, you need to evaluate it with a different dataset for quality assurance and validation. In themethod, the model created inis passed in to be evaluated. Create themethod, just after, as in the following code:Add a call to the new method from themethod, right under themethod call, using the following code:Your results should be similar to the following. As the pipeline processes, it displays messages. You may see warnings, or processing messages. These have been removed from the following results for clarity.Congratulations! You've now successfully built a machine learning model for classifying news headlines. You can find the source code for this tutorial at GitHub repository.In some cases, you don't want to train the model every time you start your application. And it makes sense to save the model somewhere and then load it on the application start. To do this in ML.NET you can make next:Here we write the model in the file in, but ML.NET also support read/write from streams.