Introduction to data mining techniques:

Data mining techniques are set of algorithms intended to find the hidden knowledge from the data. Usage of data mining techniques will purely depend on the problem we were going to solve. Some of the popular data mining techniques are classification algorithms, prediction analysis algorithms, clustering techniques. In this initial introduction post, we were going to address the basic understanding of the term data mining by presenting you a toy kind of example. You can learn more on data mining beginners guide.

Data Mining History:

In 1960s statisticians used the terms “Data Fishing” or “Data Dredging” to refer what they considered the bad practice of analyzing data without a prior hypothesis. The term “Data Mining” appeared around 1990 in the database community.

Data mining in Technical words:

Technically Data mining is the process of extracting specific information from data and presenting relevant and usable information that can be used to solve problems. There are different kinds of services in the process like text mining, web mining, audio and video mining, pictorial data mining and social network data mining.

Why is data mining hot cake topic for this generation?

Data mining is the young and promising field for the present generation because of its spacious applications. In a general way of saying, it has an attracted a great deal of attention in the information industry and in society, due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge.The information and knowledge gained can be used for applications ranging from market analysis, fraud detection, and customer retention to production control and science exploration. This is the reason why data mining is also called as knowledge discovery from data.

Data Mining Techniques :

Classification Technique: To predict the outcome of the target class(Will purchase or Not).

To predict the outcome of the target class(Will purchase or Not). Clustering Technique: Grouping or clustering the dataset. (News articles clustering)

Grouping or clustering the dataset. (News articles clustering) Associations Rule Technique: Finding the frequently occurrent items (Frequently purchased items)

Finding the frequently occurrent items (Frequently purchased items) Data Visualization: Visualizing the data for understanding the hidden insights.

Data Mining Applications:

Weather forecasting.

E-commerce.

Self-driving cars.

Hazards of new medicine.

Space research.

Fraud detection.

Stock trade analysis.

Business forecasting.

Social networks.

Customers likelihood.

Understanding of data mining with buying apple example:

Before going to explain data mining with this fresh apples, let me say some interesting facts about apples.

Nutrition: According to the United States Department of Agriculture, a typical apple serving weighs 242 grams and contains 126 calories with significant dietary fiber and modest vitamin C content, with otherwise a generally low content of essential nutrients.

Toxicity of apple seeds: The seeds of apples contain small amounts of amygdalin, a sugar and cyanide compound known as a cyanogenic glycoside. Ingesting small amounts of apple seeds will cause no ill effects, but in extremely large doses can cause adverse reactions. There is only one known case of fatal cyanide poisoning from apple seeds; in this case, the individual chewed and swallowed one cup of seeds. It may take several hours before the poison takes effect, as cyanogenic glycosides must be hydrolyzed before the cyanide ion is released.

Now Let’s step into example for basic understanding building data mining model:

Suppose your family members want to meet someone who is suffering from pancreatic cancer. We all know that the consumption of apples could help to reduce pancreatic cancer by up to 23 percent. So your father asked you to bring apples from a nearby shop. Also, your father teaches (learn) you how to buy apples by giving some set of rules.

Rules for buying apples:

Big size apples are having less taste than small size apples.

Dark red apples are not fresh ones.

Light red apples are fresh ones.

Green apples are good for health.

On clear observation on the about listed rules, You can pick the apples which you want to buy. Your family members want to give these apples to an unhealthy person. Hence, you obviously pick green apples. So when you go for shopping you will pick small size apples which are in green color. End of the story to select apples which are good for health.

NonData mining Algorithm:

Non data mining algorithm if( selected_apple == small (in size )) { if(selected_apple == green ( in color ) ){ select apple } else { don't select apple }} 1 2 3 4 5 6 7 8 if ( selected_apple == small ( in size ) ) { if ( selected_apple == green ( in color ) ) { select apple } else { don ' t select apple } }

Comparing with data mining:

You will randomly select an apple from the shop ( training data )

Make a table of all the physical characteristics of each apple, like color, size( features )

) Tasty apples, apple which is good for health( output variables )

) If you went to new shop to buy the apples ( test data )

Whatever you have done so for is called as model building in data mining terminology once you were with the model you have build (Here the proper rules for buying apples) You can now buy apples with great confidence, without worrying about the details of how to choose the best apples. And what more, you can make your algorithm and improve it over time (reinforcement learning), The model performance will improve when you have done more training, and modifies itself when it makes a wrong prediction. But the best part is, you can use the same algorithm to train different models, one each for predicting the quality of apples, oranges, bananas, grapes, cherries, and watermelons, and keep all your loved ones happy.

This type of learning is called as supervised learning in data mining. In next post, You can get the clear understanding of the difference between supervised learning and unsupervised learning with real life examples.

Reference Books:

Follow us:

I hope you like this post. If you have any questions then feel free to comment below. If you want me to write on one specific topic then do tell it to me in the comments below.

Related Courses:

Do check out unlimited data science courses

Title of the course Course Link What You Will Learn Pattern Discovery in Data Mining

Pattern Discovery in Data Mining Will learn the basic concepts of data mining and it’s real world applications.

Will also learn data-driven methods and some interesting of pattern discovery.

Practice the scalable pattern discovery methods on massive transaction data. Introduction to machine learning

Machine Learning Introduce the basic machine learning, data mining, and pattern recognization concepts.

In details differences of supervised and unsupervised learning algorithms.

Lot more case studies and machine learning applications. Data Mining with Python

Data Mining with Python: Classification and Regression Understand the key concepts in data mining and will learn how to apply these concepts to solve the real world problems.

Will get hands on experience with python programming language.

Hands on experience with numpy, pandas, matplotlib libraries (Python libraries)