Data Cleaning in Python

';

Data cleaning or Data cleansing is very important from machine learning perspective. The term 'garbage in garbage out' refers to the same fact that without sorting what we have in the data and how to make it more presentable, no matter how good a predictive model is used, the results aren't going to be anything reliable. Beginners with machine learning starts working with the publicly available datasets that are thoroughly analyzed with such issues and are therefore, ready to be used for training models and getting good results. But it is far from how the data is in real world. The datasets that are in raw form and have all such issues cannot be benefited from, without knownig the data cleaning and preprocessing steps.

Such issues may include missing values, noise values or univariate outliers, multivariate outliers, data duplication, improving the quality of data through standardizing and normalizing it, dealing with categorical features. Visualization also happens to be an important tool for manually observing issues in the data.

In this course, we discuss the issues with data coming from different courses and how to resolve them handsomely. Each concept has three components that are theoretical explanation, mathematical evaluation and code. The lectures *.1.* refers to the theory and mathematical evaluation of a concept while the lectures *.2.* refers to the practical code of each concept. All the codes are written in Python using Jupyter Notebook.