Machine learning is the most common technology for solving many tasks for the last 5 years. Machine learning algorithms are used in the applications of spam filtering, detection of network intruders, and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task.

Classical ML approach works like shown in the picture: We have some amount of data, data scientist does features engineering (generates new features from the data, selects the most important features) after that he trains a ML-algorithm on train set and evaluates it on test set. This work is reaped while we don’t get pretty accuracy for this task.

AutoML approach is different, in this approach we don’t need a human. Because program makes feature engineering, model trains by automate. AutoML is open domain — we can run the same algorithm on any type of data like credit scoring, sales stock, text classifications, and others.

AutoML allows saving money and human resources, because of this solution scalable to many tasks, you can single algorithm for any task. Off course, a human can achieve a better accuracy, but such solution is less scalable, takes more time and specialist analyst is needed.

AutoML in production

Many peoples believe that AutoML is just an area of scientific research and experiments, that has nothing in common with production. But it is not true. AutoML is an evolving technology area, many companies are creating own AutoML solutions and tools, that you can use in your projects right now.

Let’s watch to some of them.

H2O AutoML

H2O is an open source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform that allows you to build machine learning models on big data and provides easy productionalization of those models in an enterprise environment.

Productionalization — is the main difference H2O from other frameworks, it means you can develop your model and features in this framework, and then easily integrate it into production environment like Kafka, Spark, Storm, e.t.c. Thereby you can deploy model on clusters for processing big data flow.

H2O environment has got API integration with different platforms like Java, Scala, R, Python etc.

REST API deserves special attention because it means you can use H2O as web-service for ml with docker and make some HTTP-requests from other microservices.

H2O has integration with different storage platforms, and you can easily connect SQL database, Hadoop Distributed File System (HDFS) or S3 storage to your ML-pipeline.