Introducing ML.NET: Cross-platform, Proven and Open Source Machine Learning Framework

Cesar

May 7th, 2018

Today at //Build 2018, we are excited to announce the preview of ML.NET, a cross-platform, open source machine learning framework. ML.NET will allow .NET developers to develop their own models and infuse custom ML into their applications without prior expertise in developing or tuning machine learning models.

ML.NET was originally developed in Microsoft Research and evolved into a significant framework over the last decade; it is used across many product groups in Microsoft like Windows, Bing, Azure, and more .

With this first preview release, ML.NET enables ML tasks like classification (e.g. text categorization and sentiment analysis) and regression (e.g. forecasting and price prediction). Along with these ML capabilities, this first release of ML.NET also brings the first draft of .NET APIs for training models, using models for predictions, as well as the core components of this framework, such as learning algorithms, transforms, and core ML data structures.

ML.NET is first and foremost a framework, which means that it can be extended to add popular ML Libraries like TensorFlow, Accord.NET, and CNTK. We are committed to bringing the full experience of ML.NET’s internal capabilities to ML.NET in open source.

To sum it all up, ML.NET is our commitment to make ML great in .NET.

Please come and join us over on GitHub and help shape the future of ML in .NET:

https://github.com/dotnet/machinelearning

Over time, ML.NET will enable other ML scenarios like recommendation systems, anomaly detection, and other approaches, like deep learning, by leveraging popular deep learning libraries like TensorFlow, Caffe2, and CNTK, and general machine learning libraries like Accord.NET.

ML.NET also complements the experience that Azure Machine Learning and Cognitive Services provides by allowing for a code-first approach, supports app-local deployment and the ability to build your own models.

The rest of this blog post provides more details about ML.NET; feel free to jump to the one that interests you the most.

This blog is Co-authored by Gal Oshri, Niklas Gustafsson, Markus Weimer & Nagesh Pabbisetty

ML.NET Core Components

ML.NET is being launched as a part of the .NET Foundation and the repo today contains the .NET C# API(s) for both model training and consumption, along with a variety of transforms and learners required for many popular ML tasks like regression and classification.

ML.NET is aimed at providing the E2E workflow for infusing ML into .NET apps across pre-processing, feature engineering, modeling, evaluation, and operationalization.

ML.NET comes with support for the types and runtime needed for all aspects of machine learning, including core data types, extensible pipelines, high performance math, data structures for heterogeneous data, tooling support, and more.

The table below describes the entire list of components that are being released as a part of ML.NET 0.1.

We aim to make ML.NET’s APIs generic, such that other frameworks like CNTK, Accord.NET, TensorFlow and other libraries can become usable through one shared API.

Getting Started Installation

To get started with ML.NET, install the ML.NET NuGet from the CLI using:

dotnet add package Microsoft.ML

From package manager:

Install-Package Microsoft.ML

You can build the framework directly from https://github.com/dotnet/machinelearning.

Sentiment Classification with ML.NET

Train your own model

Here is a simple snippet to train a model for sentiment classification (full snippet of code can be found here).

var pipeline = new LearningPipeline(); pipeline.Add(new TextLoader<SentimentData>(dataPath, separator: ",")); pipeline.Add(new TextFeaturizer("Features", "SentimentText")); pipeline.Add(new FastTreeBinaryClassifier()); pipeline.Add(new PredictedLabelColumnOriginalValueConverter(PredictedLabelColumn = "PredictedLabel")); var model = pipeline.Train<SentimentData, SentimentPrediction>();

Let’s go through this in a bit more detail. We create a LearningPipeline which will encapsulate the data loading, data processing/featurization, and learning algorithm. These are the steps required to train a machine learning model which allows us to take the input data and output a prediction.

The first part of the pipeline is the TextLoader, which loads the data from our training file into our pipeline. We then apply a TextFeaturizer to convert the SentimentText column into a numeric vector called Features which can be used by the machine learning algorithm (as it cannot take text input). This is our preprocessing/featurization step.

FastTreeBinaryClassifier is a decision tree learner we will use in this pipeline. Like the featurization step, trying out different learners available in ML.NET and changing their parameters may enable identifying better results. PredictedLabelColumnOriginalValueConverter converts the model’s predicted labels back to their original value/format.

pipeline.Train<SentimentData, SentimentPrediction>() trains the pipeline (loads the data, trains the featurizer and learner). The experiment is not executed until this happens.

Use the trained model for predictions

SentimentData data = new SentimentData { SentimentText = "Today is a great day!" }; SentimentPrediction prediction = model.Predict(data); Console.WriteLine("prediction: " + prediction.Sentiment);

To get a prediction, we use model.Predict() on new data. Note that the input data is a string and the model includes the featurization, so our pipeline stays in sync during both training and prediction. We didn’t have to write preprocessing/featurization code specifically for predictions.

For more scenarios for getting started please refer to the documentation walkthroughs which go over sentiment analysis and taxi fare prediction in more detail.

The Road Ahead

There are many capabilities we aspire to add to ML.NET, but we would love to understand what will best fit your needs. The current areas we are exploring are:

Additional ML Tasks and Scenarios

Deep Learning with TensorFlow & CNTK

ONNX support

Scale-out on Azure

Better GUI to simplify ML tasks

Integration with VS Tools for AI

Language Innovation for .NET

Help shape ML.NET for your needs

Take it for a spin, build something with it, and tell us what ML.NET should be better at. File a couple of issues and suggestions on GitHub and help shape ML.NET for your needs.

https://github.com/dotnet/machinelearning

If you prefer reaching out to us directly, you can do so by providing your details through this very short survey.

This blog was Co-authored by Gal Oshri, Niklas Gustafsson, Markus Weimer & Nagesh Pabbisetty

Thanks, ML.NET Team