Artificial intelligence (AI), machine learning (ML) and data mining have been hot topics in today’s industry news with many companies and universities striving to improve both our work and personal lives through the use of these technologies. We thought it would be wise to spend the next 3 weeks exploring the different terms we hear thrown around and dive into their meanings a bit more.

First, let’s start by looking at the difference between data mining and machine learning. Although there is a strong correlation and considerable overlap between the two, they are distinct and have different applications.

Data mining:

The goal of data mining is to discover previously unseen patterns and relationships from large datasets and derive a business value from these. It focuses on uncovering relationships between two or more variables in your dataset and extracting insights. These insights include mapping the data into information which is directly relevant to a particular use case such as predicting outcomes from incoming events and prescribing actions.

Data is reviewed for patterns and then criteria applied to determine the most frequent and important relationships. Multiple data sorting techniques can be used to accomplish this goal such as clustering, classification, and sequence analysis. Data mining typically uses batched information to reveal a new insight at a particular point in time rather than an on-going basis. For example, it can be used to identify a sales trend or buying pattern, improve a production process and predict the adoption of a new product.

Machine Learning:

Machine learning and data mining use the same key algorithms to discover patterns in the data. However their process, and consequently utility, differ. Unlike data mining, in machine learning, the machine must automatically learn the parameters of models from the data. Machine learning uses self-learning algorithms to improve its performance at a task with experience over time. It can be used to reveal insights and provide feedback in near real-time.

Generally speaking, the larger the datasets, the better the accuracy and performance. Learning can be by batch wherein the models are trained once, or continuous wherein the models evolve as more data is ingested with time. In the latter mode, based on the new data and feedback received, the machine constantly improves itself and the results increase in accuracy with time. The machine does this by determining relationships within the data, and computing parameters for analytical models which apply those relationships to the use case at hand.

Machine learning, for example, can be used to continuously monitor the performance of equipment and events and automatically determine what the norm is and when failures are likely to occur. When new datasets are introduced or trends change, machine learning incorporates that information to determine the new norm without people needing to go back in and reprogram baselines or key performance indicators. This ability to learn and adapt makes it the optimal choice for improvements in ongoing processes, marketing campaigns and continuous customer service improvements.

The Importance of Your Data:

One of the most important things in machine learning is the data from which you train your machine. Without statistically representative data to train your machine on, machine learning algorithms are very limited and will not give you the accurate results desired. When analyzing multiple, disparate sources of data, this becomes even more tricky. The data needs to be brought into the machine in its native format and then normalized into a standard format that the machine can use and understand. For example, the data sets need to be have the volume necessary for the algorithms to automatically determine what is normal and what is an anomaly. If the dataset does not have sufficient instances of each event, the relative frequencies of events will not be determinable and you will not get a true picture of what is normal and what is not.

Illustration of Data Mining vs. Machine Learning: