Customer Churn Analysis-new

Histogram for Relationship between Acc_tenure and Churn

Histogram for acc_balance_change_ratio

Curve graph for adviser_change_recency, dealer_change_recency, call _recency, and login_recency

Heatmap of Correlation for each attribute

Churn Analysis in Subscription-Based Businesses from Prediction to Inference

CONCEPT

This is our capstone project during my undergraduate period, which is about customer churn analysis of a superannuation funds company in Australia. In this project, we will build a robust churn propensity model that can score each customer based on their probability of churn in the next six months and classify high-risk churn users into high- and low-profitable members.

We also propose causal Bayesian networks to predict the probabilities of causes that lead to customer churn, and discover latent variables and unknown parameters that lead to member churn through causal analysis and feature selection. Finally, we will provide corresponding measures or recommendations to clients based on our findings.

BUSINESS PROBLEM

In Australia, more than 1.1 million Australians have self-managed super funds (Drury, 2021). Retirement benefits start when an employee starts working and the employer starts paying a percentage of the employee's salary into the employee's retirement account. The superannuation funds company will invest or manage the money for the employee until retirement (Australian Government, 2021). For most people, a super pension fund is a long-term investment.

As a local superannuation funds company in Australia, our client mentioned the following business problems:

The cost of acquiring new customers and rate of customer churn, both are increasing at a rapid pace
The average churn rate is high
The high rate of customer churn leads to a decline in the company's profitability
We know very little about members data

OBJECTIVE

Identify and visualize which factors lead to customer churn
Build a prediction model to classify if a customer is going to churn or not
Score each member for his probability of churn over the next financial year, and divide it into high-profit members and low-profit members
Discover the latent variables and unknown parameters cause of churn
Provide solutions or recommendations

DATA PREPARATION

Our data is based on CFS member information records from 2015 to 2016, including age, account term, savings plan, billing information and service record information. The volume of this dataset is relatively large, with a single dataset containing nearly 270,000 member records and 88 behavior attributes. Therefore, how to clean and preprocess the dataset is very important.

In order to better analyze the dataset, we did the following data processing:

Combine datasets - In order to reduce data storage space and improve write performance, we aggregated membership data from June and December as our observation dataset. At the same time, we de-duplicate the integrated data, and only save the unique data unit.
Binary transformation (One-Hot Encoding) on categorical data - The benefit of binary transformations is that it can make our training data easier to use and more expressive.
Data normalization - The purpose of normalization is to place the range of data within a specific cell range. Using the normalized data for data analysis can eliminate the dimension and simplify the model, thus speeding up the calculation.
Applied two main inclusion criteria - Firstly, we only retained data on customers older than six months. Secondly, we removed account balances below $1,500 to improve forecasts, since predicting churn probabilities for inactive accounts is of low value to superannuation funds.

PROCESS

Understand business problems, identify project objectives and develop a project proposal to present available approaches
Data Understanding & Data Preprocessing
Data visualization, a preliminary exploration
Build a churn propensity modeling, and evaluate the results
Mid-project update report & presentation
Experiment analysis: Churn Prediction Results & Causal Inference Analysis
Provide final solutions and recommendations based on project findings
Final report & presentation

Tools: Python(seaborn, matplotlib, sklearn, dowhy), Excel

Age distribution of the higher profitability customers

Yangyang Jin

Junior Data Analyst
Bachelor of Science in IT, UTS

yangyangjin11290@gmail.com

Churn Analysis in Subscription-Based Businesses from Prediction to Inference

CONCEPT

BUSINESS PROBLEM

OBJECTIVE

DATA PREPARATION

PROCESS

Project Proposal

Mid-project Update

Mid-project Presentation

Final Presentation

Final Report