# Decision Tree

#### Introduction

A **decision tree** is a decision support tool that uses

a tree-like graph or model of decisions and their possible

consequences, including chance event outcomes, resource costs, and

utility. It is one way to display an algorithm that only contains

conditional control statements.

Decision trees are commonly used in operations research, specifically

in decision analysis, to help identify a strategy most likely to reach a

goal, but are also a popular tool in machine learning. In this

technique, we split the population or sample into two or more

homogeneous sets (or sub-populations) based on most significant splitter

/ differentiator in input variables.

# Types of Decision Trees

Types of decision tree is based on the type of target variable we have. It can be of two types:

**Regression Trees:**Decision Trees with a continuous target variable are termed as regression trees.

We are all familiar with the idea of linear regression as a way of

making quantitative predictions. In simple linear regression, a

real-valued dependent Variable Y is modeled as a linear function of a

real-valued independent variable X plus noise. Even in multiple

regression, we let there be multiple independent variables X1, X2, . . .

Xp and frame the model.

This all goes along so well as the variables are independent and each

have a strictly additive effect on Y. Even though if the variables are

not independent, it is possible to incorporate some amount of

interactions. However, with more number of variables, it gets tougher

and tougher. Moreover, the relationship may no longer be a linear one.

Thus, arises the need of regression trees.

**Classification Tree:**A classification tree is very

similar to regression tree, except it is used to predict a qualitative

response rather than a quantitative one. In case of classification tree,

we predict that each observation belongs to the most commonly occurring

class of training observations in the region to which it belongs. In

interpreting the results of a classification tree, we are often

interested not only in the class predictions corresponding to a

particular terminal node region, but also in the class proportion among

the training observations that fall in the region.

# Advantages

**1. Easy to Explain:** Decision tree are very easy to

understand for people, even from non-analytical background. It does not

require any statistical knowledge to read and interpret them. In fact,

it is even easier to interpret than linear regression!

**2. Useful in Data exploration:** Decision tree is one

of the fastest way to identify most significant variables and relation

between two or more variables. With the help of decision trees, we can

create new variables / features that has better power to predict target

variable.

**3. Less data cleaning required:** It is not influenced by outliers and missing values to a fair degree.

**4. Non Parametric Method:** Decision tree is

considered to be a non-parametric method. This means that decision trees

have no assumptions about the parent distribution and the

classification system.

# Disadvantages

**1. Over fitting:** Over fitting is one of the most

practical difficulty for decision tree models. This problem gets solved

by setting constraints on model parameters and pruning.

**2. Lack of predictive accuracy:** It is less efficient than regression models and cross-validation models.

**3. Non-Robust:** Decision trees are non-robust, meaning that a small change in the data can cause a large change in the final estimated tree.

# DECISION TREE IN R:

For this, we will use the data-set CarSeats, which has the data on

the sales of child car seats sold in 400 different stores in US.

It consists of a data frame with 400 observations on the following 11 variables namely:

**Sales:**Unit sales (in thousands) at each location**CompPrice:**Price charged by competitor at each location**Income:**Community income level (in thousands of dollars)**Advertising:**Local advertising budget for company at each location (in thousands of dollars)**Population:**Population size in region (in thousands)**Price:**Price company charges for car seats at each site**ShelveLoc:**A factor with levels Bad, Good and Medium indicating the quality of the shelving location for the car seats at each site**Age:**Average age of the local population**Education:**Education level at each location**Urban:**A factor with levels No and Yes to indicate whether the store is in an urban or rural location**US:**A factor with levels No and Yes to indicate whether the store is in the US or not

# R-CODE:

attach(Carseats)

high=ifelse(Carseats$Sales<8,“No”,“Yes”)

Car=cbind(Carseats,high)

Car

```
attach(Carseats)
high=ifelse(Carseats$Sales<8,"No","Yes")
Car=cbind(Carseats,high)
library(rpart)
```

`## Warning: package 'rpart' was built under R version 3.2.5`

```
tree=rpart(high~.-Sales,Carseats,method="class")
tree
```

```
## n= 400
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 400 164 No (0.59000000 0.41000000)
## 2) ShelveLoc=Bad,Medium 315 98 No (0.68888889 0.31111111)
## 4) Price>=92.5 269 66 No (0.75464684 0.24535316)
## 8) Advertising< 13.5 224 41 No (0.81696429 0.18303571)
## 16) CompPrice< 124.5 96 6 No (0.93750000 0.06250000) *
## 17) CompPrice>=124.5 128 35 No (0.72656250 0.27343750)
## 34) Price>=109.5 107 20 No (0.81308411 0.18691589)
## 68) Price>=126.5 65 6 No (0.90769231 0.09230769) *
## 69) Price< 126.5 42 14 No (0.66666667 0.33333333)
## 138) Age>=49.5 22 2 No (0.90909091 0.09090909) *
## 139) Age< 49.5 20 8 Yes (0.40000000 0.60000000) *
## 35) Price< 109.5 21 6 Yes (0.28571429 0.71428571) *
## 9) Advertising>=13.5 45 20 Yes (0.44444444 0.55555556)
## 18) Age>=54.5 20 5 No (0.75000000 0.25000000) *
## 19) Age< 54.5 25 5 Yes (0.20000000 0.80000000) *
## 5) Price< 92.5 46 14 Yes (0.30434783 0.69565217)
## 10) Income< 57 10 3 No (0.70000000 0.30000000) *
## 11) Income>=57 36 7 Yes (0.19444444 0.80555556) *
## 3) ShelveLoc=Good 85 19 Yes (0.22352941 0.77647059)
## 6) Price>=142.5 12 3 No (0.75000000 0.25000000) *
## 7) Price< 142.5 73 10 Yes (0.13698630 0.86301370) *
```

`summary(tree)`

```
## Call:
## rpart(formula = high ~ . - Sales, data = Carseats, method = "class")
## n= 400
##
## CP nsplit rel error xerror xstd
## 1 0.28658537 0 1.0000000 1.0000000 0.05997967
## 2 0.10975610 1 0.7134146 0.7134146 0.05547692
## 3 0.04573171 2 0.6036585 0.6158537 0.05298128
## 4 0.03658537 4 0.5121951 0.6097561 0.05280643
## 5 0.02743902 5 0.4756098 0.5975610 0.05244966
## 6 0.02439024 7 0.4207317 0.5853659 0.05208331
## 7 0.01219512 8 0.3963415 0.5975610 0.05244966
## 8 0.01000000 10 0.3719512 0.5609756 0.05132104
##
## Variable importance
## Price ShelveLoc Age Advertising CompPrice Income
## 34 25 11 11 9 5
## Population Education
## 3 1
##
## Node number 1: 400 observations, complexity param=0.2865854
## predicted class=No expected loss=0.41 P(node) =1
## class counts: 236 164
## probabilities: 0.590 0.410
## left son=2 (315 obs) right son=3 (85 obs)
## Primary splits:
## ShelveLoc splits as LRL, improve=28.991900, (0 missing)
## Price < 92.5 to the right, improve=19.463880, (0 missing)
## Advertising < 6.5 to the left, improve=17.277980, (0 missing)
## Age < 61.5 to the right, improve= 9.264442, (0 missing)
## Income < 60.5 to the left, improve= 7.249032, (0 missing)
##
## Node number 2: 315 observations, complexity param=0.1097561
## predicted class=No expected loss=0.3111111 P(node) =0.7875
## class counts: 217 98
## probabilities: 0.689 0.311
## left son=4 (269 obs) right son=5 (46 obs)
## Primary splits:
## Price < 92.5 to the right, improve=15.930580, (0 missing)
## Advertising < 7.5 to the left, improve=11.432570, (0 missing)
## ShelveLoc splits as L-R, improve= 7.543912, (0 missing)
## Age < 50.5 to the right, improve= 6.369905, (0 missing)
## Income < 60.5 to the left, improve= 5.984509, (0 missing)
## Surrogate splits:
## CompPrice < 95.5 to the right, agree=0.873, adj=0.13, (0 split)
##
## Node number 3: 85 observations, complexity param=0.03658537
## predicted class=Yes expected loss=0.2235294 P(node) =0.2125
## class counts: 19 66
## probabilities: 0.224 0.776
## left son=6 (12 obs) right son=7 (73 obs)
## Primary splits:
## Price < 142.5 to the right, improve=7.745608, (0 missing)
## US splits as LR, improve=5.112440, (0 missing)
## Income < 35 to the left, improve=4.529433, (0 missing)
## Advertising < 6 to the left, improve=3.739996, (0 missing)
## Education < 15.5 to the left, improve=2.565856, (0 missing)
## Surrogate splits:
## CompPrice < 154.5 to the right, agree=0.882, adj=0.167, (0 split)
##
## Node number 4: 269 observations, complexity param=0.04573171
## predicted class=No expected loss=0.2453532 P(node) =0.6725
## class counts: 203 66
## probabilities: 0.755 0.245
## left son=8 (224 obs) right son=9 (45 obs)
## Primary splits:
## Advertising < 13.5 to the left, improve=10.400090, (0 missing)
## Age < 49.5 to the right, improve= 8.083998, (0 missing)
## ShelveLoc splits as L-R, improve= 7.023150, (0 missing)
## CompPrice < 124.5 to the left, improve= 6.749986, (0 missing)
## Price < 126.5 to the right, improve= 5.646063, (0 missing)
##
## Node number 5: 46 observations, complexity param=0.02439024
## predicted class=Yes expected loss=0.3043478 P(node) =0.115
## class counts: 14 32
## probabilities: 0.304 0.696
## left son=10 (10 obs) right son=11 (36 obs)
## Primary splits:
## Income < 57 to the left, improve=4.000483, (0 missing)
## ShelveLoc splits as L-R, improve=3.189762, (0 missing)
## Advertising < 9.5 to the left, improve=1.388592, (0 missing)
## Price < 80.5 to the right, improve=1.388592, (0 missing)
## Age < 64.5 to the right, improve=1.172885, (0 missing)
##
## Node number 6: 12 observations
## predicted class=No expected loss=0.25 P(node) =0.03
## class counts: 9 3
## probabilities: 0.750 0.250
##
## Node number 7: 73 observations
## predicted class=Yes expected loss=0.1369863 P(node) =0.1825
## class counts: 10 63
## probabilities: 0.137 0.863
##
## Node number 8: 224 observations, complexity param=0.02743902
## predicted class=No expected loss=0.1830357 P(node) =0.56
## class counts: 183 41
## probabilities: 0.817 0.183
## left son=16 (96 obs) right son=17 (128 obs)
## Primary splits:
## CompPrice < 124.5 to the left, improve=4.881696, (0 missing)
## Age < 49.5 to the right, improve=3.960418, (0 missing)
## ShelveLoc splits as L-R, improve=3.654633, (0 missing)
## Price < 126.5 to the right, improve=3.234428, (0 missing)
## Advertising < 6.5 to the left, improve=2.371276, (0 missing)
## Surrogate splits:
## Price < 115.5 to the left, agree=0.741, adj=0.396, (0 split)
## Age < 50.5 to the right, agree=0.634, adj=0.146, (0 split)
## Population < 405 to the right, agree=0.629, adj=0.135, (0 split)
## Education < 11.5 to the left, agree=0.585, adj=0.031, (0 split)
## Income < 22.5 to the left, agree=0.580, adj=0.021, (0 split)
##
## Node number 9: 45 observations, complexity param=0.04573171
## predicted class=Yes expected loss=0.4444444 P(node) =0.1125
## class counts: 20 25
## probabilities: 0.444 0.556
## left son=18 (20 obs) right son=19 (25 obs)
## Primary splits:
## Age < 54.5 to the right, improve=6.722222, (0 missing)
## CompPrice < 121.5 to the left, improve=4.629630, (0 missing)
## ShelveLoc splits as L-R, improve=3.250794, (0 missing)
## Income < 99.5 to the left, improve=3.050794, (0 missing)
## Price < 127 to the right, improve=2.933429, (0 missing)
## Surrogate splits:
## Population < 363.5 to the left, agree=0.667, adj=0.25, (0 split)
## Income < 39 to the left, agree=0.644, adj=0.20, (0 split)
## Advertising < 17.5 to the left, agree=0.644, adj=0.20, (0 split)
## CompPrice < 106.5 to the left, agree=0.622, adj=0.15, (0 split)
## Price < 135.5 to the right, agree=0.622, adj=0.15, (0 split)
##
## Node number 10: 10 observations
## predicted class=No expected loss=0.3 P(node) =0.025
## class counts: 7 3
## probabilities: 0.700 0.300
##
## Node number 11: 36 observations
## predicted class=Yes expected loss=0.1944444 P(node) =0.09
## class counts: 7 29
## probabilities: 0.194 0.806
##
## Node number 16: 96 observations
## predicted class=No expected loss=0.0625 P(node) =0.24
## class counts: 90 6
## probabilities: 0.938 0.062
##
## Node number 17: 128 observations, complexity param=0.02743902
## predicted class=No expected loss=0.2734375 P(node) =0.32
## class counts: 93 35
## probabilities: 0.727 0.273
## left son=34 (107 obs) right son=35 (21 obs)
## Primary splits:
## Price < 109.5 to the right, improve=9.764582, (0 missing)
## ShelveLoc splits as L-R, improve=6.320022, (0 missing)
## Age < 49.5 to the right, improve=2.575061, (0 missing)
## Income < 108.5 to the right, improve=1.799546, (0 missing)
## CompPrice < 143.5 to the left, improve=1.741982, (0 missing)
##
## Node number 18: 20 observations
## predicted class=No expected loss=0.25 P(node) =0.05
## class counts: 15 5
## probabilities: 0.750 0.250
##
## Node number 19: 25 observations
## predicted class=Yes expected loss=0.2 P(node) =0.0625
## class counts: 5 20
## probabilities: 0.200 0.800
##
## Node number 34: 107 observations, complexity param=0.01219512
## predicted class=No expected loss=0.1869159 P(node) =0.2675
## class counts: 87 20
## probabilities: 0.813 0.187
## left son=68 (65 obs) right son=69 (42 obs)
## Primary splits:
## Price < 126.5 to the right, improve=2.9643900, (0 missing)
## CompPrice < 147.5 to the left, improve=2.2337090, (0 missing)
## ShelveLoc splits as L-R, improve=2.2125310, (0 missing)
## Age < 49.5 to the right, improve=2.1458210, (0 missing)
## Income < 60.5 to the left, improve=0.8025853, (0 missing)
## Surrogate splits:
## CompPrice < 129.5 to the right, agree=0.664, adj=0.143, (0 split)
## Advertising < 3.5 to the right, agree=0.664, adj=0.143, (0 split)
## Population < 53.5 to the right, agree=0.645, adj=0.095, (0 split)
## Age < 77.5 to the left, agree=0.636, adj=0.071, (0 split)
## US splits as RL, agree=0.626, adj=0.048, (0 split)
##
## Node number 35: 21 observations
## predicted class=Yes expected loss=0.2857143 P(node) =0.0525
## class counts: 6 15
## probabilities: 0.286 0.714
##
## Node number 68: 65 observations
## predicted class=No expected loss=0.09230769 P(node) =0.1625
## class counts: 59 6
## probabilities: 0.908 0.092
##
## Node number 69: 42 observations, complexity param=0.01219512
## predicted class=No expected loss=0.3333333 P(node) =0.105
## class counts: 28 14
## probabilities: 0.667 0.333
## left son=138 (22 obs) right son=139 (20 obs)
## Primary splits:
## Age < 49.5 to the right, improve=5.4303030, (0 missing)
## CompPrice < 137.5 to the left, improve=2.1000000, (0 missing)
## Advertising < 5.5 to the left, improve=1.8666670, (0 missing)
## ShelveLoc splits as L-R, improve=1.4291670, (0 missing)
## Population < 382 to the right, improve=0.8578431, (0 missing)
## Surrogate splits:
## Income < 46.5 to the left, agree=0.595, adj=0.15, (0 split)
## Education < 12.5 to the left, agree=0.595, adj=0.15, (0 split)
## CompPrice < 131.5 to the right, agree=0.571, adj=0.10, (0 split)
## Advertising < 5.5 to the left, agree=0.571, adj=0.10, (0 split)
## Population < 221.5 to the left, agree=0.571, adj=0.10, (0 split)
##
## Node number 138: 22 observations
## predicted class=No expected loss=0.09090909 P(node) =0.055
## class counts: 20 2
## probabilities: 0.909 0.091
##
## Node number 139: 20 observations
## predicted class=Yes expected loss=0.4 P(node) =0.05
## class counts: 8 12
## probabilities: 0.400 0.600
```

```
plot(tree)
text(tree)
```