Churn Modelling for Mobile Telecommunications
churn modelling

Churn Modelling for Mobile Telecommunications

What is Churn Modelling 

  • Churn represents the loss of an existing customer to a competitor
  • A prevalent problem in retail:
    • Mobile phone services
    • Home mortgage refinance
    • Credit card
  • Churn is a problem for any provider of a subscription service or recurring purchasable
    • Costs of customer acquisition and win-back can be high
    • Much cheaper to invest in customer retention
    • Difficult to recoup costs of customer acquisition unless customer is retained for a minimum length of time
  • Churn is especially important to mobile phone service providers
    • easy for a subscriber to switch services
    • Phone number portability will remove last important obstacle

Predicting Churn: Key to a Protective Strategy

  • Predictive modelling can assist churn management
    • By tagging customers most likely to churn
  • High risk customers should first be sorted by profitability
    • Campaign targeted to the most profitable at-risk customers
    • Typical retention campaigns include
      • Incentives such as price breaks
      • Special services available only to select customers
  • To be cost effective retention campaigns must be targeted to the right customers
    • Customers who would probably leave without the incentive
    • Costly to offer incentives to those who would stay regardless

 

Here, We have a sample telecom data on which we will run Churn Modelling using R code.

library(rattle)               # The weather data set and normVarNames().
library(randomForest) # Impute missing values using na.roughfix().
library(rpart)               # decision tree
library(tidyr)                # Tidy the data set.
library(ggplot2)           # Visualize data.
library(dplyr)               # Data preparation and pipes %>%.
library(lubridate)         # Handle dates.
library(corrgram)

Loading data directly from web

nm <- read.csv("http://www.sgi.com/tech/mlc/db/churn.names", skip=4, colClasses=c("character", "NULL"),
header=FALSE, sep=":")[[1]]


dat  <- read.csv("http://www.sgi.com/tech/mlc/db/churn.data", header=FALSE, col.names=c(nm, "Churn"))
nobs<- nrow(dat);

colnames(dat)

dsname <- "dat"
ds          <- get(dsname)

dim(ds)

(vars <- names(ds))
target<- 'Churn';

ds$phone.number<-NULL;
ds$churn<-(as.numeric(ds$Churn) - 1)
ds$Churn<-NULL
ds$state<-NULL




## Split ds into train and test 
## 75% of the sample size
smp_size <- floor(0.75 * nrow(ds))


## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(ds)), size = smp_size)
train <- ds[train_ind, ]
test  <- ds[-train_ind, ]
dim(train)
dim(test)

corrgram(train, lower.panel=panel.ellipse, upper.panel=panel.pie);

Churn Modelling in Telecom industries corr-gram graph

Fitting a Model

lm.fit<-lm(churn~., data=train);

# Multiple R-squared:  0.1784,    Adjusted R-squared:  0.1724

How does the linear model perform

pred.lm.fit<-predict(lm.fit, test);
RMSE.lm.fit<-sqrt(mean((test$churn)^2))
RMSE.lm.fit;  #0.3232695




# building a simpler model, similar R2
lm.fit.step <- lm(churn ~ international.plan + voice.mail.plan + total.day.charge +
                      total.eve.minutes + total.night.charge + total.intl.calls +
                      total.intl.charge + number.customer.service.calls, data=train);


# Multiple R-squared: 0.1767, Adjusted R-squared: 0.174

pred.lm.fit.step   <-predict(lm.fit.step, test);
RMSE.lm.fit.step <-sqrt(mean((pred.lm.fit.step-test$churn)^2))
RMSE.lm.fit.step;  #0.3227848 <- simpler, and better RMSE

 

How does the Logistic Regression perform

# logistic regression using a generalized linear model
glm.step <- glm(churn ~ international.plan + voice.mail.plan + total.day.charge +
                    total.eve.minutes + total.night.charge + total.intl.calls +
                    total.intl.charge + number.customer.service.calls, family = binomial, data = train)
                    pred.glm.step <- predict.glm(glm.step, newdata = test, type = "response")


RMSE.glm.step <- sqrt(mean((pred.glm.step-test$churn)^2))
RMSE.glm.step;  #0.3179586 <- better than the linear model

 

How does the Decision Tree perform

# build a decision tree based on the selected variables
rpart.fit.step  <- rpart(churn ~ international.plan + voice.mail.plan + total.day.charge +
                           total.eve.minutes + total.night.charge + total.intl.calls +
                           total.intl.charge + number.customer.service.calls, data=train, method="class");

pred.rpart.step   <- predict(rpart.fit.step, test); # See correction below
RMSE.rpart.step <- sqrt(mean((pred.rpart.step-test$churn)^2))
RMSE.rpart.step;  #0.6742183 <- much worse than the linear model




# forgot type="class"
pred.rpart.step   <- as.numeric(predict(rpart.fit.step, test, type="class")) - 1;
RMSE.rpart.step <- sqrt(mean((pred.rpart.step-test$churn)^2))
RMSE.rpart.step; #0.2423902 <- better than the linear model




sum(pred.rpart.step==test$churn)/nrow(test)
# 0.941247 94% tests are correctly matched

 

How does the Random Forest perform

# Build Random Forest Ensemble
set.seed(415)
rf.fit.step <- randomForest(as.factor(churn) ~ international.plan + voice.mail.plan + total.day.charge +
                                            total.eve.minutes + total.night.charge + total.intl.calls +
                                            total.intl.charge + number.customer.service.calls,
                                            data=train, importance=TRUE, ntree=2000)
varImpPlot(rf.fit.step);

pred.rf.fit.step   <- as.numeric(predict(rf.fit.step, test))-1;
RMSE.rf.fit.step <- sqrt(mean((pred.rf.fit.step-test$churn)^2))
RMSE.rf.fit.step; #0.2217221  improvement from the linear model, so a non-linear, decision tree approach is better

sum(pred.rf.fit.step==test$churn)/nrow(test)
# 0.9508393 95% tests are correctly matched

 

Conclusion:

Algorithm
RMSE
Comment
Linear Model
0.3232695

Simpler Linear Model
0.1767

Logistic Regression
0.3179586
better than the linear model
Decision Tree
0.6742183
much worse than the liniar model (Overfitting)
Decision Tree (Without type = "class")
0.2423902
better than the liniar model
Random Forest
0.2217221
improvement from the liniar model  so a non-linear decision tree approach is better

Leave a Reply

Close Menu