Iris Dataset

Description

This is perhaps the best known dataset to be found in the pattern recognition literature. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. Each row of the table represents an iris flower, including its species and dimensions of its botanical parts, sepal and petal, in centimeters.

The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width.

Illustration for petal and sepal in iris

Import and explore Data

We load the iris dataset into DataFrame with pandas and scikit-learn.

Relationship between features:

Split the dataset:

Before the training process, we need to split the dataset into train and test set.

Task

The task on this dataset is to train a decision tree classifier to classify the type of iris based on given properties that are the sepal and petal size.

Decision Tree

Definition

A decision tree classifier organizes a series of decision rules in a tree structure. It is one of the most practical methods for non-parametric supervised learning.

A decision tree is made up of three types of nodes:

Decision Nodes: node have two or more branches

Leaf Nodes: the lowest nodes which represents decision

Root Node: Topmost level decision nodes

Here is a step by step guide to build a decision tree.

Step 1: Determined the Root Node of the Tree

To determine the rootNode, we need to compute the information gain for each decision rule.

For example, we consider the following rule: “whether the sepal_length is larger than 5”

Note: In order to handle continuous attributes, the algorithm (C4.5) creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it.

Calculate Information Gain:

With a decision rule, in this case, if sepal_length > 5. A parent group (here we consider there are two classes in the group) is split into two child groups. The information gain of this split is the impurity of the parent group minus weighted average impurity of the child groups.

Impurity Metrics

Here is the definition of impurity metrics

The group (node) is pure when all of its records belong to the same class. Here we use Gini Impurity as the metrics. When the group is pure. Gini Impurity = 0. When the group is half-half mixed. Gini Impurity = 0.5.

Knowing how to calculate the information gain for each decision rule, now we can iterate through all decision rule candidates and calculate the information gain of each split. Select the decision rule which has the largest information gain to split the Root Node. In this case, we are comparing the rule petal_length > 5 and sepal_length > 5.

Step 2: Recursive Binary Splitting

Split the child nodes (groups) recursively, considering decision rule never selected before in the current branch. Stop splitting if there are no decision rules left or the group impurity = 0. Sometimes pruning and early stop condition are used to prevent a large number of splits and overfitting.

Step 3: Predict

Group (node) that has no further split is called the leaf node. The classifier prediction probability of each class in the node. In this case, for the record with petal_length > 5 and sepal_length > 5. There is a 50% chance the class of the record is circle.

Example Code for Training and Evaluation

From the result, you can see that with a default decision tree configuration from sciki-learn, you can have a multi classification model with macro_avg f1 score = 0.97.

The full notebook can be download here.

Follow us on Instagram!