You might think the history of Data Mining started very recently as it is commonly considered with new technology. However data mining is a discipline with a long history. It starts with the early Data Mining methods Bayes’ Theorem (1700`s) and Regression analysis (1800`s) which were mostly identifying patterns in data. In this article, we won`t start with `Once upon a time…`, instead we will focus on the recent history and studies. However you can briefly see the major milestones of data mining history on this chronological table below:

http://visual.ly/

Data mining is the process of analyzing large data sets (Big Data) from different perspectives and uncovering correlations and patterns to summarize them into useful information. Nowadays it is blended with many techniques such as artificial intelligence, statistics, data science, database theory and machine learning.

Recent history

Increasing power of technology and complexity of data sets has lead Data Mining to evolve from static data delivery to more dynamic and proactive information deliveries; from tapes and disks to advanced algorithms and massive databases (see the table below). In the late 80`s Data Mining term began to be known and used within the research community by statisticians, data analysts, and the management information systems (MIS) communities.

Source: http://www.thearling.com/text/dmwhite/dmwhite.htm

By the early 1990`s, data mining was recognized as a sub-process or a step within a larger process called Knowledge Discovery in Databases (KDD) – which gave rise to actually making it ‘the popular guy’. The most commonly used definition of KDD is “The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” (Fayyad, 1996).

The sub-processes that form part of the KDD process are;

Understanding of the application and identifying the goal of the KDD process Creating a target data set Data cleaning and pre-processing Matching the goals of the KDD process (step 1) to a particular data-mining method. Research analysis and hypothesis selection Data mining: Searching for patterns of interest in a particular form , including classification rules, regression, and clustering Interpreting mined patterns Acting on the discovered analysis

The popularity of data mining escalated notably in the 1990`s, with the help of dedicated conferences, in addition to the fast increase in technology, data storage capabilities and computers` processing speeds. It was also possible for organizations to keep data in computer readable form and processing of large volumes of data using desk top machines were not far from reality.

By the end of 1990`s, data mining was already a well-known technique used by the organizations after the introduction of customer loyalty cards. This opened a big door allowing organizations to record customer purchases and data, the resulting data could be mined to identify customer purchasing patterns. The popularity of data mining has continued to grow rapidly over the last decade.

The evaluation of data mining applications

The main focus of data mining was tabular data; however with the evolving technology and different needs new sources were formed to be mined!

Text Mining: Still a popular data mining activity, it categorizes or clusters large document collections such as news articles or web pages. Another application is opinion mining where the techniques are applied to obtain useful information from the questionnaire style data.

Image Mining: In image mining, mining techniques are applied to images (2D and 3D)

Graph Mining: It is formed from frequent pattern mining, which is focused on frequently occurring sub-graphs. A popular extension of graph mining is social network mining.

Data mining has become very popular over the last two decades as a discipline in its own. Data mining applications are used in every field of business, government, and science just to name a few. Starting from text mining, it has evolved a lot and it will be very interesting to watch with the usage of different data (e.g spatial data, different sources of multimedia data) in the future.