Market basket analysis using machine learning By Pablo Martin, Artelnics. In this article, we see how to perform a market basket analysis using R and Neural Designer. R is a free programming language for statistical computing and graphics widely used among the data science community for performing data analysis. Neural Designer is a software tool for data analytics based on neural networks, one of the main areas of artificial intelligence research. The next picture shows the structure of our example. Our objective is to analyze a dataset from a grocery store to create a recommendation system. This system will be capable of generating accurate recommendations about products that the user may have an interest in. The type of learning that we use for this example is called "unsupervised learning". It is a type of machine learning technique used to find patterns in data when there are no target values. In our example, the output values will be the percentages that the products have to be in the same shopping basket.

Data The dataset selected for our example consists of 9835 transactions of a grocery store. Each transaction can be a single product or several products. From a first look at the database, we cannot know the number of items in the store. Also, the format of the database is difficult to analyze. That is why we need to make a preliminary treatment of the information we have collected before analyzing it.

Data preparation using R As we have said before, it is necessary to pre-process the information that we have in our database, and for that purpose, we use R. Using R, we convert the database into a binary matrix, which is the format we need to be able to perform the advanced data analysis with Neural Designer. The following script is the one we used to make this change. # Package necessary for transaction analysis install.packages("arules") require(arules) # Load data Shopping_Cart <- read.transactions("groceries.csv", sep=",") # Database checking summary(Shopping_Cart) itemFrequencyPlot(Shopping_Cart, topN=20) # Convert data to a numeric matrix as(Shopping_Cart, "matrix") as(Shopping_Cart, "matrix")*1 # Save results to file write.csv((as(Shopping_Cart, "matrix")*1), file = "Shopping_Cart.csv", row.names=FALSE) The first step of our script is to load the package needed for R to read transactions, the "arules" package. Once we have loaded "arules", we execute it and perform the first operation: loading the data. The next step is to check the data to see if they are correctly loaded, for which we use the "summary" command. We also paint a bar graph with the 20 products that have been bought the most. The following image shows the graph with these top 20 products. As we can see in the chart above, the product that is bought the most is whole milk. Once the model has been tested, we have to export it to CSV so that we can analyze it with Neural Designer. For that purpose, we convert the data into a binary matrix and use the command "write", we export it to CSV. The result is a data set that contains a variable for each product and prints 1 if the product has been purchased or 0 if not. This data set is ready to start its analysis with Neural Designer.

Advanced Analytics using Neural Designer When data has already been pre-processed, it is time to add it to Neural Designer to make our recommendation system. Neural Designer provides an easy way of analyzing and deploying advanced analytics models. The next picture shows the data set tab in Neural Designer. Now that the data is loaded into the software, we can check the basic statistics of each of the variables. The table below shows the minimums, maximums, means, and standard deviations of the top 20 variables in this data set. To develop our recommendation system, we use neural networks, the machine learning technique that Neural Designer implements. The neural network defines the predictive model as a multidimensional function containing adjustable parameters. The first step to create our recommendation system is to choose a neural network architecture that represents the classification function. Because the neuronal network of this problem is very complex, 169 inputs 25 neurons in the hidden layer and 169 outputs, the following image represents what would be a neural network if the study was with the 20 most influence variables. The next step to carry out is to train the neural network mentioned above. For this purpose, we apply the Quasi-Newton method to obtain a good model that will recommend the shopping basket more suitable for the customer. To know the types of algorithms that can be used to train a neural network, you can read the article 5 algorithms to train a neural network.

Deploy the recommendation system Once our model is trained and ready to use, we have to export it from Neural Designer to R. We used the task "Export to R" of Neural Designer to obtain our model as a formula of R. The following image shows the "Export to R" task in the Neural Designer task manager. Now that we have obtained our recommendation system as an R script, we run R studio to check it. The next picture shows our script into R studio. If you want to download the R script to try it yourself, click here. Finally, we analyze an example of a shopping cart to check what recommendations our system would make. For instance, we analyze the shopping basket of a customer that is made up of citrus fruit, frozen meat, newspaper, other vegetables, and whole milk. Applying our R model for that customer, it recommends the following products: Instant food products, yogurt, buttermilk, frozen fish, red/blush wine, pip fruit, and butter.