Will they stay — or will they go: Using churn analysis in a competitive market to keep your customers
Intuitively, companies understand that it’s more expensive to find and acquire new customers than to sell to existing ones. This is especially true in industries such as financial services, insurance and telecommunications, as their markets are highly penetrated and there is little differentiation in their product portfolio. Developing strategies to keep digitally savvy customers from leaving becomes important to maintain and grow market share.
But to really execute on such a strategy, you need some idea of who is likely to leave — and when. That’s the problem. Identifying which customers are poised to churn requires a different analytic approach than other questions. This paper explores methodologies companies can use to quantify customer churn and provides the insights you need to deliver more value to these customers through personalized products, offers and promotions.
Changing landscape
A rise in customer churn is common in industries that serve maturing markets. Technology makes it easier for customers to compare products and services. Banking services and insurance products are becoming more transparent. Regulatory changes and comparison shopping sites are helping consumers to become more aware of their options and make more informed choices. Online self-service portals make it easier to switch, too. Retail banking in the United States, for example, is experiencing an annual customer churn rate of approximately 15 percent.1
Banks aren’t alone. The telecommunications industry is also at a crossroads. The average revenue per user (ARPU) in the telecom industry is falling in virtually every region. Even the growing Asia-Pacific region shows a 1 percent fall in ARPU from 2011 to 2016.2 Instant messaging and VoIP services from over-the-top players are increasingly replacing the need for traditional communication services. All this manifests in falling ARPUs and high subscriber churn rates.
Bharti Airtel, India’s No. 1 telco, sees monthly customer churn in the range of 3.5 to 4 percent.3 AT&T, the world’s largest telecommunications company, reports a monthly postpaid churn of 1.00 percent for 2016.4
Importance of predicting customer churn
In competitive and mature markets such as these, new-customer acquisition can cost five to 10 times more than retention. That makes an offensive marketing strategy a zero-sum game. Rather than acquiring a competitor’s customers, it’s more important and profitable to reduce customer exits by building brand equity.
Developing strategies to keep the customers you already have makes predicting customer churn very important. This is not just a question of who will churn but also when the churn event is likely to happen. To make the most of its marketing resources and create the most effective response for keeping the customer, a company must also be aware of the customer’s value to the business and what’s driving the customer to leave.
Churn and rare events
From an analytical standpoint, customer churn is considered a rare event — something that happens with a low frequency but is potentially very significant in the nature and scope of its impact. A rare event’s impact varies based on the type of organization — individual, business, industry or government. While rare events from nature can have a widespread impact, a rare event such as a machine that fails unexpectedly on a factory shop floor may affect only the business.
Predicting a rare-event occurrence is important so businesses can prepare for the eventuality. Examples include:
- Societal: Wars, coups, epidemics
- Macroeconomic: Large economic depressions, economic shocks or market crashes
- Natural disasters: Tsunamis, hurricanes, earthquakes or asteroid impacts
- Industrial: Machine failure or catastrophic nuclear power plant failure
- Marketing: Response to mass mailing campaigns, customer churn
- Financial: Fraudulent card transactions, borrower bankruptcy and others
Classifying customer churn as a rare event is important because this determines which predictive algorithm is best suited to the task. That’s because when comparing groups, most algorithms tend to accentuate the larger class at the expense of the smaller one.
Handling class imbalance
Many practical classification problems involve imbalances — that is, exceptions that occur in a large class of otherwise regular events, such as a network intrusion, an instance of credit card fraud or insider trading. In the case of customer churn, the number of active customers significantly outweighs the number of customers likely to leave, which results in one dataset that is much larger than the other.
There are two common approaches to tackling the problem of highly imbalanced data. One is based on cost-sensitive learning, where a high cost is assigned to misclassification of the minority class. The other approach is to use a sampling technique, where either the majority class is downsampled (undersampled) or the minority class is upsampled (oversampled).
Figure 1 presents this sampling approach visually. Class imbalance is shown using a majority class (gray) and a minority class (yellow). Class imbalance is shown as a scatter of a majority class (gray) and a minority class (yellow), on a two dimensional plot representing hypothetical variables, Variable A and Variable B. When the majority class is downsampled, or the minority class is upsampled, the frequency of the gray and yellow observations is made roughly equal. Hybrid sampling combines the two approaches.
One of the methodologies used for customer churn is based on the random forest algorithm5 after the majority class has been downsampled. Decision trees are then grown on a more balanced dataset. A majority vote is taken to make a “yes” or “no” prediction as to whether a customer stays or leaves.
Other methodologies for prediction
The balanced random forest approach to classification problems is a variation of the popular random forest approach. This variation of random forest is suited for imbalanced datasets. As with random forests, the approach also:
- Combines the predictive power across several decision trees
- Builds each tree on a bootstrapped sample of the original data
- Reports, as the final prediction, the class that receives the highest vote across all trees
The only difference between balanced random forest and random forest is the choice of samples. In the balanced method, the bootstrapped sample is stratified by the response class so that the majority class is undersampled.
Another prediction method is bias-adjusted logistic regression. This variation of the popular logistic regression approach for classification problems is particularly suited to imbalanced or rare event datasets. This approach, while combining the independent variables linearly, reports its result as a probability score — e.g., a customer is 73 percent likely to leave. The output — a vector of probability scores — is then translated to classes based on a choice of threshold value, selected in such a way that the classification objective is maximized.
This approach is favored over logistic regression because it doesn’t estimate coefficients using a maximum likelihood estimator, which suffers from a small sample bias. This makes it a particularly useful estimator of coefficients for rare-event data. Some experts make the case that this approach should be used whenever logistic regression is considered.
Feature engineering
Feature engineering is the process of building features in the dataset, before algorithms such as those mentioned above are called upon. Temporal abstraction is one such approach to feature engineering. Inputs from a domain expert would be critical in shaping these features. For example, when a wireless customer regularly hits a certain data usage threshold for 3 months in a row, the customer is more likely to consider switching service providers. This approach is described separately for churn and active customers.
For churn customers:
- Identify a reference date: The date of churn for each such customer.
- Identify a variety of lag periods: Days leading up to the reference date (e.g., 3, 5 or 7 days) to provide a response window for the service provider to take necessary preventive action.
- Identify two-period combinations: Split the remaining time into a variety of two-period combinations, called P1 and P2. P2 is shorter in duration than P1. The hypothesis is that the P2 period will be more reflective of the changing behavior leading up to churn than the P1 period. On the appropriate split between the two periods, a variety of possibilities can be considered.
- Calculate average daily transaction metrics for both periods (across all possibilities): The percentage change from P1 to P2 could be the new derived variable.
For active customers, adopt a similar procedure but choose a random reference date, given the absence of a churn date.
Validation
To check whether the models developed will generalize well, a k-fold cross-validation approach is often recommended, in which:
- The data is randomly partitioned into “k” equal-sized subsamples.
- Of the “k” subsamples, a single subsample is retained as validation data.
- The remaining “k−1” subsamples are used as training data.
- The cross-validation process is then repeated “k” times (the folds), with each of the “k” subsamples used exactly once as the validation data.
- The results from the folds are then combined to produce a single estimation of error (accuracy).
In our experience, the balanced random forest approach performs significantly better in predicting churn than does the bias-adjusted logistic regression approach.
Other approaches
In addition to the above approaches, many other ways to predict subscriber churn are described in the literature.
The authors of Customer Churn Prediction in Telecommunication reviewed much of the literature on telecommunication churn prediction published from 2002 to 2013. The articles they found were identified through keyword searches in online databases such as Elsevier, IEEE Xplore, Springer Link, ScienceDirect, ACM Digital Library and Microsoft Academic Search. These articles were then classified by techniques, year and journal. This review found that decision tree and logistic regression were among the top three techniques used, validating our experience with balanced random forest and bias-adjusted logistic regression.
Operationalizing churn analytics
Because customer churn is such a fundamental problem for businesses in mature markets, it is important to integrate prediction-generating workflows into business processes. The prediction that a customer is poised to leave is an insight that needs to be readily available for downstream outbound marketing efforts and channel intelligence.
The other challenge in developing such predictions is centered on the availability of data scientists who can build the workflows. In addition to their availability, the ability to reuse the workflows to return predictions later, and quickly enough, underscores the importance of these predictions to business processes.
Knowing which customers are about to leave, when they’re most likely to go, and what they’re worth to the business is truly valuable. This is when predictions of customer churn move from being interesting data points to becoming insights that are fundamental to the business.
Reusable workflows such as outlier detection, imputation, feature engineering, model execution and generation of final predictions can be developed on a “canvas” found in tools such as Microsoft Azure and IBM SPSS Modeler.
These workflows can be built with conditional subflows that are triggered when certain conditions are met. They can be scheduled to score and generate periodically (e.g., weekly or monthly) the probability of churn by customer, and regularly record these probabilities in a database.
Operationalizing these insights also means that the probability of churn includes a measure of a customer’s value to the business. This is an important factor to add before the data is used to drive customer retention campaigns and promotions.
Knowing which customers are about to leave, when they’re most likely to go and what they’re worth to the business is truly valuable. This is when predictions of customer churn move from being interesting data points to becoming insights that are fundamental to the business.
About the author: Bharathan Shamasundar is a senior data scientist with a balanced mix of business and technology expertise. He has strong experience in delivering and managing projects that require applying advanced analytics to problem-solving. His 12 years of experience span delivery and consulting in manufacturing, direct marketing, and finance and accounting across the financial services, technology, e-commerce and telecommunication industries.