Machine learning in the era of autonomy

Machine learning lies at the heart of the AI revolution. It is the technology behind image recognition, virtual assistants, and autonomous vehicles. Yet creating machine learning models is a labor-intensive human-driven activity. Here we look at how unified AI platforms like Sonasoft NuGene are challenging this status quo.

Machine learning is the technology behind almost all artificial intelligence. Even the most advanced deep learning platforms, such as Google DeepMind, have their roots in machine learning. It powers automatic facial recognition, underpins the technologies that go into making virtual assistants work and it is used by banks for credit scoring. Social networks even use it to decide which stories you should be shown in your newsfeed. So, what exactly is machine learning?

Machine learning basics

Machine learning involves teaching a computer to identify patterns in data. Then, when the computer sees new data, it is able to draw inferences from this. Typically, we divide machine learning into three categories.

Supervised learning

Here, you take a set of data that has been processed to extract the important features. These features will become the inputs to the actual machine learning model. The aim is to create a model that identifies labels you are interested in. To do this, you start by training the model with a set of labeled data. The model itself is actually an algorithm that takes the features as its inputs and outputs labels. Once the model has been trained, you take a second, smaller set of labeled data and validate the model. Finally, you use an unlabeled test dataset to evaluate the accuracy of the model.

Unsupervised learning

The primary difference between supervised and unsupervised learning is that you don’t need labeled data for training. This means unsupervised learning works where you have unknown data that you want to process. Typically, you use this to identify patterns of interest in previously unseen data. This is what the well-known k-means clustering algorithm does. The benefit here is you don’t need a large existing set of labeled data. Of course, one drawback is that unsupervised learning may well find patterns that are irrelevant or even spurious.

Reinforcement learning

Reinforcement learning is slightly different. Here, the algorithm changes and evolves in response to reactions from its environment. Effectively, you are teaching the computer by trial and error. Imagine training a dog to “stay”. You tell the dog to stay. The dog’s brain has no idea what this means, so he randomly stays or moves. Each time he correctly stays you reward him with a treat. If he moves he doesn’t get a treat. Over time, he learns that if he hears you say “stay” and doesn’t move, then he will get rewarded.

Creating usable machine learning models

Many practical applications of machine learning actually combine more than one approach. For instance, semi-supervised learning overcomes one of the key issues in supervised learning. Namely, what to do if you only have a small amount of labeled data and cannot create an accurate model. However, you can use the labeled data to improve the performance of an unsupervised learning model. You can also combine supervised learning with reinforcement learning to improve the overall performance of the model.

The human element

Creating machine learning models is very human-intensive. The following is a typical timeline for creating a fully validated supervised learning model.

Understand the problem. Define the exact problem you need to solve and check if you have suitable data to work with. Obtain the data. Get the raw data and convert it into a suitable form for analysis Move the data to the cloud. Although it is possible to run AI models locally on GPU-enabled laptops, this is inefficient. So, you really need to migrate the data to the cloud where you have access to (almost) unlimited compute power. Pre-process the data. This includes cleaning, filtering, and manually labeling the data for supervised learning. This is one of the most labor-intensive stages of the process. Choose an ML model. Your data scientist needs to use her experience to select a suitable ML model from the thousands that are now available. Train and verify the model. You can now go through the process of training and verifying your model as described above. Validate the model. Finally, you have a trained model and can test whether it is suitable for the job needed.

Typically, you won’t choose the perfect model first time around. So, it is common to have to repeat steps 5-7 several times. Overall, this process takes even an experienced data scientist months to complete. Moreover, you can’t easily parallelize most steps.

Labeling data

One aspect that can be parallelized is labeling the data. Indeed, you can even crowd-source your labeling efforts. For years, users have been unwittingly helping Google’s AI efforts through the use of reCAPTCHA. All those pictures of signs you had to type helped teach Google Maps to read road signs in StreetView. All those images you clicked that show traffic lights, or pedestrian crossings help to improve self-driving cars.

Moving beyond the human era

Unified AI platforms take things beyond the human era. They make the creation of machine learning models a much less manual process. The simpler platforms just automate some of the steps for you, such as the choosing, creation and validation of the models. But fortunately, there are now more advanced platforms available that are largely autonomous.

Autonomous | ȯ-ˈtä-nə-məs | adjective

undertaken or carried on without outside control Merriam Webster definition

Sonasoft NuGene is one of the most advanced unified AI platforms available on the market. NuGene takes over the entire process of creating machine learning models from ingesting raw data to creating a fully validated and tested model. There are a few features that make NuGene unique.

Time

NuGene understands the concept of time. Most machine learning models view data as essentially cartesian with a set of features on the X-axis and a set of labels on the Y-axis. This means that the models are very poor at handling time-series data. For instance, imagine a model that predicts loan defaults based on historical data. If the model only sees the data with no time context, it will be unaware that in 2008 there was a massive economic crash that triggered a spike in defaults.

Data

Ideally, NuGene takes in raw data. It can accept data in almost any format, including video and audio files. This allows it to create extremely rich models. These models are also free from the unintended bias that is often added in the data pre-processing stage.

Causality

NuGene looks for interesting patterns and correlations in your data. It then generates hypotheses to try and explain these and, uniquely, automatically tests these for causality.

Modeling

Once NuGene has found a strong correlation, it has access to a library of thousands of possible ML models. It will create many different models until it finds the one that performs best. It is even able to use boosting and other techniques to combine models.

Autonomous ML

The upshot of all this is a completely autonomous AI platform. NuGene can create intelligent agents without any human intervention. These agents can perform a whole variety of tasks for you. You can use them to forecast demand so you can make your supply chain mode efficient. Or you might want to reduce the costs of preventive maintenance by correctly predicting machine failures. NuGene can create chatbots to help streamline your customer support systems.