Predicting the Price of Used Cars using Azure SQL ML.



Date created: April 20, 2019.



In this article, we will use Azure SQL Database Machine Learning Services to predict the price of used motor cars

For the purpose of this article, I used the "Automobile price data (Raw)" sample data, although I added some calculated columns to use them to train the model and later make predictions. 80 percent of the sample data was separated on a table named MotorCar_TrainingTable and will be used on this article to train the model, the rest of the sample data was separated on a table named MotorCar_TestingTable to test the model.



For Azure SQL Database Machine Learning we don't need to enable scripts via system stored procedure sp_configure because once you are signed to enjoy the preview, machine learning is enabled for your Azure SQL database.

Since ML is already enable on the database, we just have to proceed to train the model. As you can see on below image we want to predict the price of used cars based on the number of doors they have, number of cylinders, their city mileage, their highway mileage, the engine size and the horsepower. We are using the training table as input, we store the resultant model on a table named MotorCarModels, and we use the rxLinMod to compute the linear regression model.









Below we are using the rxPredict function to predict the price (predicted_price) of used cars data stored on the MotorCar_TestingTable table. This function takes the rxLinMod object we stored on the MotorCarModels table in the previous step, the second argument is the input data set which in this case is the cars data from the MotorCar_TestingTable, and an output data set as its third argument.

The predicted_price column shows the predicted price for each car (row) contained on the test table.













As you can see on below image, you don't necessarily need to create an R script to make price predictions, as we can also use the Transact-SQL Predict function to predict motor car prices. The Predict function will need as parameters the object we stored on the MotorCarModels and the cars data stored on MotorCar_TestingTable as input data set. The WITH clause is used to specify the schema of the output returned by the PREDICT function.









As explained before, the predicted_price column shows the predicted price for each car (row) contained on the test table.