# Rusty-machine ## James Lucas Note: Disclaimer: I'm a mathematician by training so things may get heavy. I'll do my best to explain but please interrupt me if I'm not making sense.

## This talk - What is machine learning? - How does rusty-machine work? - Why is rusty-machine great?

What is rusty-machine? Rusty-machine is a machine learning library written entirely in Rust. It focuses on the following: Works out-of-the-box without relying on external dependencies.

Simple and easy to understand API.

Extendible and easy to configure. Installing machine learning libraries can often be made a pain if we also need to install, BLAS, LAPACK, CUDA, and more. Especially for new users. Try and keep things modular and reuse the API across all models. Some examples of this later.

## Another machine learning library? Note: - Machine learning is already in every other language, multiple times each. Are we just rewriting stuff? - Rusty-machine is more than deep learning. - Rust is a good choice: it seemed like it would be rewarding to explore.

## Machine Learning > "Field of study that gives computers the ability to learn without being explicitly programmed." - Arthur Samuel Note: We'll walk through some basic concepts in machine learning that help us to understand why rusty-machine is built as it is.

How do machines learn? With data.

Some examples Predicting rent increase

Predicting whether an image contains a cat or a dog

Understanding hand written digits



Data set might be: rent prices and other facts about the residence. labelled pictures of cats and dogs. many examples of hand written digits.



Define the problem first - then the data - then how machine learning could solve it. For the second problem - imagine you want to predict what your rent will be when you renew your lease. You have data from craigs list of the rent listings in your neighbourhood. And data from some ordinance service with facts like, windows on each apartment, # chimneys, sq footage, etc. You want to use this data to predict what your rent will be.

Some terminology Model : An object that transforms inputs into outputs based on information in data.

: An object that transforms into based on information in data. Train/Fit : Teaching a model how it should transform inputs using data.

: Teaching a model how it should transform using data. Predict : Feeding inputs into a model to receive outputs.



To predict rent increases we may use a Linear Regression Model. We'd train the model on some rent prices and facts about the residence. Then we'd predict the rent of unlisted places. There is a _lot_ of terminology in ML. This is just a handful of things I'll use going forwards. In the example I've used the terminology to illustrate a little more clearly what each means. We've now got a very basic idea of what machine learning is - so let's start talking about rusty-machine!

Why is machine learning hard? There are many, many models to choose from. There are many, many ways to use each model. Machine learning is inherently difficult - those described here certainly aren't the only challenges. Rusty-machine doesn't so much try to solve these problems. Instead it aims to make it easy to navigate the solutions yourself.

## Back to rusty-machine

The foundation of rusty-machine pub trait Model<T, U> { fn train(&mut self, inputs: &T, targets: &U); fn predict(&self, inputs: &T) -> U; } In Rust a trait defines an interface - a set of functions which the implementor should define. This trait is used to represent a model. It is simplified a little from the actual traits used.

## An example Before we go any further we should see an example. Note: The example will show how we use these functions from the Model trait.

K-Means A model for clustering. Clustering is essentially grouping together similar items. Where similar may mean close together in space, or share similar features, etc.

## Using a K-Means Model ``` // ... Get the data samples // Create a new model with 2 clusters let mut model = KMeansClassifier::new(2); // Train the model model.train(&samples); // Predict which cluster each point belongs to let clusters : Vector<usize> = model.predict(&samples); ``` _You can run the full example in the [rusty-machine repo](https://github.com/AtheMathmo/rusty-machine/tree/master/examples)._ ## Under the hood K-Means works in roughly the following way: 1. Get some initial guesses for the centroids (cluster centers) 2. Assign each point to the centroid it is closest to. 3. Update the centroids by taking the average of all points assigned to it. 4. Repeat 2 and 3 until convergence.

K-Means Classification

Simple but complicated The API for other models aim to be as simple as that one. However... Machine learning is complicated. Rusty-machine aims for ease of use. There are lots of different ways to train models and on top of that many ways to configure and adapt them.

## How does rusty-machine (try to) keep things simple?

## Using traits - A clean, simple model API - Extensibility at the user level - Reusable components within the library Note: As seen before, rusty-machine uses the `Model` trait as its foundation. This is the primary way we keep things clean and simple. We use traits to try and _hide_ as much of the machine learning complexity as possible. This is while keeping it in reach for users who need it.

## Extensibility We use traits to define parts of the models. While rusty-machine provides common defaults - users can write their own implementations and plug them in.

Extensibility Example Support Vector Machine /// A Support Vector Machine pub struct SVM<K: Kernel> { ker: K, /// Some other fields /* ... */ } pub trait Kernel { /// The kernel function. /// /// Takes two equal length slices and returns a scalar. fn kernel(&self, x1: &[f64], x2: &[f64]) -> f64; } An SVM is a model which is generally used for classification. The behaviour of the SVM is governed by a kernel. A kernel is essentially a function which obeys some properties (which I won't go into here, there are good resources online). Here we allow the kernel to be generic while providing some sensible defaults. This is accessible in other languages but Rust helps us enforce this with the compiler. Combining kernels K 1 (x 1 , x 2 ) + K 2 (x 1 , x 2 ) = K(x 1 , x 2 ) pub struct KernelSum<T, U> where T: Kernel, U: Kernel { k1: T, k2: U, } /// Computes the sum of the two associated kernels. impl<T, U> Kernel for KernelSum<T, U> where T: Kernel, U: Kernel { fn kernel(&self, x1: &[f64], x2: &[f64]) -> f64 { self.k1.kernel(x1, x2) + self.k2.kernel(x1, x2) } } One property of kernels is that the sum of two kernels is also a kernel. i.e. K on the right also has all the properties of a kernel itself. We can override the `Add` trait to allow complex combinations of kernels. (x, x) + K(x, x) = K(x, x Combining kernels K 1 (x 1 , x 2 ) + K 2 (x 1 , x 2 ) = K(x 1 , x 2 ) let poly_ker = kernel::Polynomial::new(...); let hypert_ker = kernel::HyperTan::new(...); let sum_kernel = poly_ker + hypert_ker; let mut model = SVM::new(sum_kernel); We can override the `Add` trait to allow complex combinations of kernels. (x, x) + K(x, x) = K(x, x

Reusability We use traits to define common components, e.g. Kernels. These components can be swapped in and out of models. New models can easily make use of these common components. Similar to Extensibility - but by this I mean we can move common components across different models. Of course this is possible with other languages and frameworks but Rust helps us do this while enforcing the requirements with the compiler. For example - in other languages how can we be sure that the kernel function won't consume the input data?

Reusability Example Gradient Descent Solvers We use Gradient Descent to minimize a cost function. Gradient Descent Solvers implement this trait. /// Trait for gradient descent algorithms. (Some things omitted) pub trait OptimAlgorithm<M: Optimizable> { /// Return the optimized parameters using gradient optimization. fn optimize(&self, model: &M, ...) -> Vec<f64>; } Allimplement this trait. The Optimizable trait is implemented by a model which is differentiable. Our models have a cost function - e.g. for predicting rent our cost might be the squared distance between our models estimate and the actual value. When our cost function is differentiable we can use gradient descent. The idea is that by taking steps down the steepest slope we get closer to the minimum cost. The OptimAlgorithm trait specifies how we shall do this downward stepping towards the minimum. The Optimizable trait specifies how the derivative of the cost function will be computed.

Creating a new model With gradient descent optimization Define the model. /// Cost function is: f(x) = (x-c)^2 struct XSqModel { c: f64, } You can think of this model as learning the value c. The bulk of the work will be in step 2 - which is where we compute the gradient of the model.

Creating a new model With gradient descent optimization Implement Optimizable for model. /// Cost function is: f(x) = (x-c)^2 struct XSqModel { c: f64, } impl Optimizable for XSqModel { /// 'params' here is 'x' fn compute_grad(&self, params: &[f64], ...) -> Vec<f64> { vec![2f64 * (params[0] - self.c)] } }

Creating a new model With gradient descent optimization Use an OptimAlgorithm to compute the optimized parameters. /// Cost function is: f(x) = (x-c)^2 struct XSqModel { c: f64, } impl Optimizable for XSqModel { fn compute_grad(&self, params: &[f64], ...) -> Vec<f64> { vec![2f64 * (params[0] - self.c)] } } let x_sq = XSqModel { c : 1.0 }; let x_start = vec![30.0]; let gd = GradientDesc::default(); let optimal = gd.optimize(&x_sq, &x_start, ...); The optimal value should be close to 1.0.

## What can rusty-machine do? - K-Means Clustering - DBSCAN Clustering - Linear Regression - Logistic Regression - Generalized Linear Models - Neural Networks - Gaussian Process Regression - Support Vector Machines - Gaussian Mixture Models - Naive Bayes Classifiers

## Linear Algebra - [Rulinalg](https://github.com/AtheMathmo/rulinalg) Rusty-machine works without any external dependencies. Rulinalg provides linear algebra implemented entirely in Rust.

Why Rulinalg? Ease of use Some history behind why this exists - when I started development it was unclear whether any other options would be a good fit. And of course Rust is a great choice for implementing linear algebra.

## A quick note on error handling Rust's error handling is fantastic. ```rust impl Matrix<T> { pub fn inverse(&self) -> Result<Matrix<T>, Error> { // Fun stuff goes here } } ``` Note: Using Results to communicate that a method may fail provides more freedom whilst being more explicit. I could certainly use the error handling more frequently - especially within rusty-machine (rulinalg is pretty good).

## What does Rulinalg do? - Data structures (`Matrix`, `Vector`) - Basic operators (with in-place allocation where possible) - Decompositions (Inverse, Eigendecomp, SVD, etc.) - And more...

Why is Rust a good choice? Trait system is amazing.

Error handling is amazing.

Performance focused code*. * Rusty-machine needs some work, but the future looks bright! Historically we prototype in high level languages and then rewrite performance critical parts. Traits - Clean, extensible, homogenous API. Performance - A bold claim right now... But the potential is there for us to prototype and achieve high performance code in the same environment. Insights - More from a developers points of view; it is useful to have to think about how the model should be structured. What data does it need to own, which parts can be made modular without adding unneeded complexity, etc.

## Why is Rust a good choice? Most importantly for me - safe control over memory. Note: Specifically with the ownership/lifetimes mechanic. We choose when a model needs ownership. When to allocate new memory for operations. These are things that are much harder to achieve in other languages as pleasant-to-use as Rust.

## When would you use rusty-machine? At the moment - experimentation, non-performance critical applications. In the future - quick, safe and powerful modeling. Note: For now it would be unwise to use this for anything serious. Except maybe if the benefits of Rust outweigh performance and accuracy. In the future, rusty-machine will try to enable rapid prototyping that can be easily extended into a finished product.

## Rust and ML in general Note: Rust is well poised to make an impact in the machine learning space. It's excellent tooling and modern design are valuable for ML - and the benefit of performance with minimal effort (once you're past wrestling with the borrow checker) is huge. Some difficulty doing 'exploratory analysis' in Rust compared to say Python. But I think in the future Rust could definitely hold it's own.

What's next? Optimizing and stabilizing existing models.

Providing optional use of BLAS/LAPACK/CUDA/etc.

Addressing lack of tooling. By lack of tooling I mean for data handling mostly.

## What would I like to see from Rust? - Specialization - Growth of Float/Complex generics - Continued effort from community Note: I really like the direction of the language so far and look forward to what will follow. The community is great as I'm sure most would confirm. That drive and enthusiasm will create great things.

## Summary - Machine learning (done quickly) - Rusty-machine - Rulinalg

## Contributors ||| --- | --- | --- [zackmdavis](https://github.com/zackmdavis) | [DarkDrek](https://github.com/DarkDrek) | [tafia](https://github.com/tafia) [ic](https://github.com/ic) | [rrichardson](https://github.com/rrichardson) | [vishalsodani](https://github.com/vishalsodani) [raulsi](https://github.com/raulsi) | [danlrobertson](https://github.com/danlrobertson) | [brendan-rius](https://github.com/brendan-rius) | [andrewcsmith](https://github.com/andrewcsmith) | |

## Thanks! #### Some Links - [Rusty-machine](https://github.com/AtheMathmo/rusty-machine) - [My Blog](http://athemathmo.github.io/)