
The parameter test_size can also be manipulated based on need. The parameter random_state can be randomly set to any value, but the same needs to be maintained in order to produce reproducible splits. In line 19, we implement the train_test_split() function. In line 16, we import the train_test_split function. In line 13, we extract the target, i.e., the labels in variable y. In line 10, we extract all of the attributes in variable X. Since the sklearn library contains the IRIS dataset by default, you do not need to upload it again. In line 7, we store the IRIS dataset in the variable data. In lines 1 to 4, we import the necessary libraries to read and analyze the dataset. Our aim is to predict the class of the IRIS plant based on the given attributes. The dataset contains information for three classes of the IRIS plant, namely IRIS Setosa, IRIS Versicolour, and IRIS Virginica, with the following attributes: sepal length, sepal width, petal length, and petal width. We will be using the IRIS dataset to build a decision tree classifier.

Use the test dataset to make a prediction and check the accuracy score of the model. Import the required Python libraries and build a data frame.Ĭreate the model in Python (we will use decision trees). You can follow the steps below to create a feasible and useful decision tree: Let’s use a real-world dataset to apply decision tree algorithms in Python. Variance: This is normally used in the Regression model, which is a measure of the variation of each data point from the mean. Gini impurity: Measures the impurity in a node.Įntropy: Measures the randomness of the system. The main criteria based on which decision trees split are:

Decision trees follow a tree-like structure, where the nodes of a tree are split using the features based on defined criteria. For example, classifying if the temperature of a day will be high or low, or predicting if a team will win the match or not.ĭecision trees work in a step-wise manner, meaning that they perform a step-by-step process instead of following a continuous process. For example, predicting rainfall in a region or predicting the revenue that a company might generate in the future.Ĭlassification tree: These are used to classify discrete variables. Regression tree: These are used to predict continuous variables. You can use decision trees in Regression and Classification problems.

In Machine Learning, we have two types of models:
