top of page

Histogram for Relationship between Acc_tenure and Churn

WX20230419-164434_2x.png

Histogram for acc_balance_change_ratio

Curve graph for adviser_change_recency, dealer_change_recency, call _recency, and login_recency

WX20230421-185100_2x.png

Heatmap of Correlation for each attribute

Churn Analysis in Subscription-Based Businesses from Prediction to Inference
CONCEPT

This is our capstone project during my undergraduate period, which is about customer churn analysis of a superannuation funds company in Australia. In this project, we will build a robust churn propensity model that can score each customer based on their probability of churn in the next six months and classify high-risk churn users into high- and low-profitable members.

 

We also propose causal Bayesian networks to predict the probabilities of causes that lead to customer churn, and discover latent variables and unknown parameters that lead to member churn through causal analysis and feature selection. Finally, we will provide corresponding measures or recommendations to clients based on our findings.

BUSINESS PROBLEM

In Australia, more than 1.1 million Australians have self-managed super funds (Drury, 2021). Retirement benefits start when an employee starts working and the employer starts paying a percentage of the employee's salary into the employee's retirement account. The superannuation funds company will invest or manage the money for the employee until retirement (Australian Government, 2021). For most people, a super pension fund is a long-term investment.

As a local superannuation funds company in Australia, our client mentioned the following business problems:

  1. The cost of acquiring new customers and rate of customer churn, both are increasing at a rapid pace

  2. The average churn rate is high

  3. The high rate of customer churn leads to a decline in the company's profitability

  4. We know very little about members data

OBJECTIVE
  1. Identify and visualize which factors lead to customer churn

  2. Build a prediction model to classify if a customer is going to churn or not

  3. Score each member for his probability of churn over the next financial year, and divide it into high-profit members and low-profit members

  4. Discover the latent variables and unknown parameters cause of churn

  5. Provide solutions or recommendations

DATA PREPARATION

Our data is based on CFS member information records from 2015 to 2016, including age, account term, savings plan, billing information and service record information. The volume of this dataset is relatively large, with a single dataset containing nearly 270,000 member records and 88 behavior attributes. Therefore, how to clean and preprocess the dataset is very important.
 

In order to better analyze the dataset, we did the following data processing: 

  1. Combine datasets - In order to reduce data storage space and improve write performance, we aggregated membership data from June and December as our observation dataset. At the same time, we de-duplicate the integrated data, and only save the unique data unit.

  2. Binary transformation (One-Hot Encoding) on categorical data - The benefit of binary transformations is that it can make our training data easier to use and more expressive.

  3. Data normalization - The purpose of normalization is to place the range of data within a specific cell range. Using the normalized data for data analysis can eliminate the dimension and simplify the model, thus speeding up the calculation.

  4. Applied two main inclusion criteria - Firstly, we only retained data on customers older than six months. Secondly, we removed account balances below $1,500 to improve forecasts, since predicting churn probabilities for inactive accounts is of low value to superannuation funds.

PROCESS
  • Understand business problems, identify project objectives and develop a project proposal to present available approaches

  • Data Understanding & Data Preprocessing

  • Data visualization, a preliminary exploration

  • Build a churn propensity modeling, and evaluate the results

  • Mid-project update report & presentation

  • Experiment analysis: Churn Prediction Results & Causal Inference Analysis

  • ​Provide final solutions and recommendations based on project findings

  • Final report & presentation

Tools: Python(seaborn, matplotlib, sklearn, dowhy), Excel

WX20230509-095545_2x.png

Age distribution of the higher profitability customers

Thanks for your time and reading.

  • Facebook
  • Twitter
  • LinkedIn
bottom of page