A DUAL-STEP MULTI-ALGORITHM APPROACH FOR CHURN PREDICTION IN PRÉ-PAID TELECOMMUNICATIONS SERVICE PROVIDERS

: Nowadays customer churn has become the main concern of companies which are active in different industries. Among all industries which suffer from this issue, telecommunications industry can be considered in the top of the list with approximate annual churn rate of 30%. Dealing with this problem, there exist different approaches via developing predictive models for customer churn but due to the nature of pre-paid mobile telephony market which is not contract-based, customer churn is not easily traceable and definable, thus constructing a predictive model would be of high complexity. Handling this issue, in this study, we developed a dual-step model building approach, which consists of clustering phase and classification phase. With this regard firstly, the customer base was divided into four clusters, based on their RFM related features, with the aim of extracting a logical definition of churn, and secondly, based on the churn definitions that were extracted in the first step, different algorithms were utilized with the intention of constructing predictive models for churn in our developed clusters. Evaluating and comparing the performance of the employed algorithms based on “gain measure”, we concluded that employing a multi-algorithm approach in the model constructing step, instead of single-algorithm one, can bring the maximum gain among the tested algorithms.


Introduction
Nowadays customer churn has become the main concern of firms in all industries (Neslin, Gupta, Kamakura, Lu, & Mason, 2006), and companies, regardless of the industry that they are active in, are dealing with this issue.Customer churn can blemish a company by decreasing profit level, losing a great deal of price premium, and losing referrals from continuing service customers (Reichheld & Sasser, 1990).
Considering the churn rate of different industries, one can find that the telecommunications industry is one of the main targets of this hazard such that the churn rate in this industry ranges from 20 to 40 annually (Berson, Smith, & Therling, 1999;Madden, Savage, & Coble-Neal, 1999).Customer churn in mobile telecommunications refers to "the movement of subscribers from one provider to another" (Wei & Chiu, 2002).
Adopting a targeted and proactive approach toward managing customer churn, companies try to identify the close future churners and then target these customers with special programs or incentives to prevent them from churning.Targeted proactive programs have the potential advantages of having lower incentive costs, because the incentive may not have to be as high as when the customer has to be ''bribed'' not to leave at the last minute.However, this system would be wasteful if churn prediction is inaccurate, because then companies are wasting incentive money on customers who would have stayed anyway, this threat elucidates the need for an accurate model for churn prediction (Coussement & Van den Poel, 2008b;Neslin, Gupta, Kamakura, Lu, & Mason, 2006).This model has to be able to recognize the customers which tend to churn in close future.With this regard the first step that has to be taken is to define the "Churn" and the "Churner", but due to the nature of pre-paid mobile telephony market which is not contract-based, customer churn is not easily traceable and also definable, thus building a predictive model would be of high complexity.
The study at your disposal aims at developing a predictive model for customer churn in pre-paid mobile telephony companies.With this regard the first step is to give a sensible definition for churn in such companies and afterwards construct the predictive model based on the extracted definition.

-Literature Review
Model building for churn prediction is strongly dependent on data mining techniques due to the better performance of machine learning techniques than the statistical techniques for non-parametric dataset (Baesens, Viaene, Van den Poel, Vanthienen, & Dedene, 2002;Bhattacharyya & Pendharkar, 1998).
Data mining is "the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules" (Berry & Linoff, 2004) and it involves selecting, exploring and modeling large amounts of data to uncover previously unknown patterns, and finally comprehensible information, from large databases (Shaw, Subramaniam, Tan, & Welge, 2001).
What data mining tools do is to take data and construct a model as a representation of reality.The resulted model describes patterns and relationships, present in the data (Rygielski, Wang, & Yen, 2002).
The application of data mining tools in CRM is an emerging trend in global economy.Since most companies try to analyze and understand their customers' behavior and characteristic, for developing a competitive CRM strategy, data mining tools have become of high popularity (Ngai, Xiu, & Chau, 2009).Among all types of models that can be built via data mining, Classification is the most frequent one, especially in CRM, and it is capable of predicting the effectiveness or profitability of a CRM strategy through prediction of the customers' behavior (Ahmad, 2004;Carrier & Povel, 2003;Ngai, Xiu, & Chau, 2009), and can be defined as the process of finding a model that describes and distinguishes data classes, for the purpose of being able to use the model to predict the class of objects whose class label is unknown (Han & Kamber, 2006).
Considering all existing classification techniques Neural Network and Decision Tree are of high frequency of use, respectively (Ngai, Xiu, & Chau, 2009;Wei & Chiu, 2002).
Neural networks are a class of powerful, general-purpose tools readily applied to prediction, classification, and clustering.Neural networks have the ability to learn by example in much the same way that human experts gain from experience (Berry and Linoff, 2004).
On the other hand Decision Tree is a tree-shaped structure with different algorithms such as CART (Breiman, Friedman, Olshen, & Stone, 1984), CHAID (Kass, 1980), andC5.0 (Rulequest Research, 2008), that represents sets of decisions and is able to generate rules for the classification of a data set (Lee & Siau, 2001).
In (2002) Wei & Chiu developed a model for subscriber churn prediction in telecommunications industry.They noted that subscriber churn is not an instantaneous occurrence that leaves no trace and they believed that before an existing subscriber churn, his/her call pattern might change.
Based on this assumption, they randomly considered a set of Observation, Retention, and Prediction time periods and by dividing the observation period into "S" sub-periods, they defined the following features: 1. Minutes of Use of a subscriber in the first sub-period ( ) 2. Frequency of Use of a subscriber in the first sub-period ( ) 3. Sphere of Influence of a subscriber in the first sub-period ( )

4.
: The change in MOU of a subscriber between the sub-period s-1 and s (for s=2,3,…..,n) and is measured by where and .

5.
: The change in FOU of a subscriber between the sub-period s-1 and s (for s=2,….,n)and is calculated as 6.
: The change in SOI of a subscriber between the sub-period s-1 and s (for s=2,….,n)and calculated as .
In their model building phase they tested the role of different variables such as desired class ratio, number of sub-periods in the observation period, and also length of retention period, on model's accuracy.Their analysis revealed that the desired hit ratio equal to 1:2 (churner : non-churner), the retention length of 7 days, and the number of subperiods of 2 can leverage the model accuracy to its optimum level.

-Methodology and Result
The raw data that were utilized in order to construct the churn predictive model, was the call record data (i.e.Date of call, Time of call, Duration of call, Type of Call, and Cost) of 34504 users of a mobile telecommunications service provider company in Iran in a time period from 1 November 2007 to 30 April 2008.
The first hurdle that we faced with in the initial steps of model building phase was the "Churn Definition" problem.In almost all previous studies, the customers of the service provider were its subscribers who had a contract with the company.Consequently, "Churn" in such conditions could be defined as the terminating the contract from customer's side or not renewing it after its expiry date.But circumstances would be different about pre-paid telecommunications service providers.In such companies there is no contract between the company and the clients to be expired or terminated.In other words churn in such cases happens with no tracking point such as terminating the contract or not renewing it and its recognition becomes complicated.
To shed light on the issue, imagine a database of customers, consisting a number of customers with different calling behavior, some of them use their cell phones every day, but the others use it every 2, 3,…, or 20 days.Now if we define a churner as "a person who has not used his/her cell phone for 7 days" a considerable part of our customers who use their cell phone occasionally (i.e.every 8, 9,….., 20 days) would be considered as a "churner", mistakenly.On the other hand if we take a longer time span and define a churner as "a person who hasn't used his/her cell phone for 25 days" our model may suffer from inability in recognizing the real churners.
The above discussed wrong signals would increase the number of False Negatives (FN) and False Positives (FP) in our predictive model and consequently lower the level of model's accuracy.
Tackling this problem our model building phase was broken into two parts: (1) clustering phase for defining the churn (2) Classification phase for predicting the churn With the aim of extracting a logical definition for churn, the raw data was employed for constructing the relevant features in accordance with the prior studies in this realm (Ansari, Kohavi, Mason, & Zheng, 2000;Hung, Yen, & Wang, 2006), and 5 individual interviews with telecom experts.
The outcome is the following features that were constructed to be used in clustering phase: 1.Call Ratio: proportion of calls which has been made by each customer with more than one day time distance to his/her total number of calls.2. Average Call Distance: the average time distance between one's calls 3. Max Date: the last date in our observed time period in which a call has been made by a specific customer 4. Min Date: the first date in our observed time period in which a call has been made by a specific customer 5. Life: the period of time in our observed time span in which each customer has been active 6.Max-Distance: the maximum time distance between two calls of an specific person in our observed period 7. No-of-days: number of days in which a specific customer has made or received a call 8. Total-no-in: the total number of incoming calls for each client in our observed period 9. Total-no-out: the total number of outgoing calls for each client in our observed period 10.Total-cost: the total money that each customer has been charged for using the services in the specific time period under study 11.Total-duration-in: the total duration of incoming calls for a specific customer in our observed time span 12. Total-duration-out: the total duration of outgoing calls for a specific customer in our observed time span UTILIZING  1.
Incoming MOU of a subscriber in the first sub-period 2.
: Incoming FOU of a subscriber in the first sub-period 3.
Outgoing MOU of a subscriber in the first sub-period 4.
: Outgoing FOU of a subscriber in the first sub-period 5.
: The change in IMOU of a subscriber between the sub-period 1 and 2.

6.
: The change in IFOU of a subscriber between the sub-period 1 and 2

7.
: The change in OMOU of a subscriber between the sub-period 1 and 2. 8.
: The change in OFOU of a subscriber between the sub-period 1 and 2. 9. Churn: binary churn labels for each client according to their churn status in prediction period.Regarding this a churner is a person who doesn't have a call record in the prediction period, otherwise he/she would be considered as a nonchurner Utilizing the abovementioned features and considering the hit ratio of 1:2 (churner : non-churner), we constructed different predictive models for each cluster, by employing the Neural Networks technique and also different algorithms of the Decision Tree technique.With this regard CART, C5.0, and CHAID algorithms among Decisions Tree algorithms were applied and their performance was compared with the Neural Networks based constructed model which was a three layer Neural Network including 8 neurons in input layer, 3 neurons in the hidden layer, and one neuron in its output layer.As the validation technique the single split model validation was adopted which has been proven to be an accurate validation method (Burez & Van den Poel, 2008b;Montgomery, Li, Srinivasan, & Liechty, 2004;Swait & Andrews, 2003).
Tables 2 to 5 represent the performance of our constructed models with different algorithms on our clusters, based on gain measure for top %10 and %20 clients of each cluster.
Cluster     While the gain factor of random sampling is %20 for the top %20 of the customer base in all clusters, table 7 depicts that the developed model is able to bring the gain factor of %80.4,%66.7, %30.6, and %80 for the top %20 of the customer base of our four developed clusters, respectively.This implies that by applying the developed multi algorithm predictive model, choosing a sample size of only %20 of each cluster's customer base is enough for identifying %80.4,%66.7, %30.6, and %80 of the total number of churners in each of our four clusters, respectively.As presented in the above figures, the developed predictive model on each cluster has a considerable better performance than the random sampling (diagonal line).

Conclusion
In this study we aimed at constructing a predictive model for customer churn in pre-paid mobile telephony companies.With this regard firstly, the customer base was divided into four clusters based on their RFM related features, with the intention of extracting a logical definition of churn and secondly, based on the churn definitions that were extracted in the first step, we went through the model building phase and tested the performance of different model building algorithms including Neural Networks, Decision Tree (C5.0), Decision Tree (CART), and Decision Tree (CHAID).
Evaluating and comparing the performance of the employed algorithms based on "gain measure", we concluded that Decision Tree algorithms in all clusters outperform Neural Networks, based on "Gain Measure" for top 10% and 20% of our customer base.
Furthermore it was noticed that a Decision Tree algorithm does not possess the same performance in all developed clusters.Thus it was concluded that adopting a combination of Decision Tree algorithms for model building in our developed clusters can bring the maximum model performance.Comparing the performance of this multialgorithm approach (see table 7), with the performance of other tested single algorithm approaches (see tables 2 to 5), one can find that adopting the multi-algorithm approach can bring a considerable superior performance for our predictive model.

Figures 1
Figures 1 to 4 illustrate the gain chart of the constructed model on each cluster.

Figure 1 :Figure 2 :Figure 3 :Figure 4 :
Figure 1: Gain Chart of Decision Tree C5.0 Algorithm for Cluster 1 THE ABOVEMENTIONED SET OF RFM RELATED 12 FEATURES, THE CUSTOMER BASE WAS DIVIDED INTO 4 INDIVIDUAL CLUSTERS BY THE USE OF TWOSTEP (SPSS INC, 2007) CLUSTER TECHNIQUE, WITH DIFFERENT "MAX-DISTANCE"S, AND BY CONSIDERING THE "PREDICTION PERIOD" LENGTH AS TWICE THE "MAX-DISTANCE" AND ALSO BASED ON WEI & CHIU (2002) FINDINGS, WE OBTAINED FOUR SETS OF OBSERVATION, RETENTION, AND PREDICTION PERIODS FOR OUR FOUR EXTRACTED CLUSTERS (SEE TABLE 1).

Table 5 : Performance of Neural Networks Predictive Models Based on Gain Measure Comparing
the performance of our developed models based on gain measure, one can find that Decision Tree algorithms outperform the Neural Networks algorithm.Furthermore, examining the gain which has been brought by each of the Decision Tree based constructed models, we can conclude that maximum performance will be achieved by utilizing different algorithms of Decision Tree technique in different clusters.Table6depicts the most appropriate algorithm among the tested algorithms, for model building in each cluster.

Table 6 : The Appropriate Algorithm for Model Building in Each Cluster Consequently
, by applying this multi algorithm approach the gain factor for each cluster would be in accordance with table 7.