Create a predictive model with the h2o package.

H2o is a fantastic open source machine learning platform with many different algorithms. There is Graphical user interface, a Python interface and an R interface. Suppose you want to create a predictive model, and you are lazy then just run automl.

Lets say, we have both train and test data sets, and the first column is the target and the columns 2 until 50 are input features. Then we can use the following code in R

out = h2o.automl( x = 2:50, y = 1, training_frame = TrainData, validation_frame = TestData, max_runtime_secs = 1800 )

According the help documentation: The current version of automl trains and cross-validates a Random Forest, an Extremely-Randomized Forest, a random grid of Gradient Boosting Machines (GBMs), a random grid of Deep Neural Nets, and then trains a Stacked Ensemble using all of the models.

After a time period that you can set, automl will terminate and has literally tried hundreds of models. You can get the top models, ranked by a certain performance metric. If you go for the champion just use:

championModel = out@leaderchampionModel

Now the championModel object can be used to score / predict new data with the simple call:

predict(championModel, newdata)

The champion model now lives on my laptop, it’s hard if not impossible to use this model in production. Someone or some process that wants a model score should not depend on you and your laptop!

Instead, the model should be available on a server where model prediction requests can be handled 24/7. This is where plumber comes in handy. First save the champion model to disk so that we can use it later.

h2o.saveModel(championModel, path = "myChampionmodel")

The plumber package, bring your model in production.

In a very simple and concise way, the plumber package allows users to expose existing R code as an API service available to others on the web or intranet. Now suppose you have a saved h2o model with three input features, how can we create an API from it? You decorate your R scoring code with special comments, as shown in the example code below.

# This is a Plumber API. In RStudio 1.2 or newer you can run the API by # clicking the 'Run API' button above. library(plumber)library(h2o) h2o.init() mymodel = h2o.loadModel("mySavedModel") #* @apiTitle My model API engine #### expose my model ################################# #* Return the prediction given three input features #* @param input1 description for inp1 #* @param input2 description for inp2 #* @param input3 description for inp3 #* @post /mypredictivemodel function( input1, input2, input3){ scoredata = as.h2o( data.frame(input1, input2, input3 ) ) as.data.frame(predict(mymodel, scoredata)) }

To create and start the API service, you have to put the above code in a file and call the following code in R.

rapi <- plumber::plumb("plumber.R") # Where 'plumber.R' is the location of the code file shown above rapi$run(port=8000)

The output you see looks like:

Starting server to listen on port 8000

Running the swagger UI at http://127.0.0.1:8000/__swagger__/

And if you go to the swagger UI you can test the API in a web interface where you can enter values for the three input parameters.

What about Performance?

At first glance I thought there might be quit some overhead, calling the h2o library, loading the h2o predictive model and then using the h2o predict function to calculate a prediction given the input features.

But I think it is workable. I installed R, h2o and plumber on a small n1-standard-2 Linux server on the Google Cloud Platform. One API call via plumber to calculate a prediction with a h2o random forest model with 100 trees took around 0.3 seconds to finish.

There is much more that plumber has to offer, see the full documentation.

Cheers, Longhow.