This project focuses on Predictive Customer Churn Analysis using Machine Learning. Customer churn - the loss of clients to competitors - is a critical metric in subscription-based businesses like telecom or streaming services.
The goal is to build an interpretable ML pipeline that:
- Predicts whether a customer is likely to churn
- Explains why a prediction was made using Explainable AI (XAI) techniques
- Deploys as a web application for real-time predictions
IBM Telco Customer Churn Dataset (WA_Fn-UseC_-Telco-Customer-Churn.csv)
- Size: 7,043 customer records
- Features: 21 features including tenure, monthly charges, contract type, services used, payment method, etc.
- Target: Binary classification —
Churn(Yes/No)
| Feature | Description |
|---|---|
tenure |
Number of months the customer has stayed |
MonthlyCharges |
Monthly billing amount |
TotalCharges |
Total amount charged |
Contract |
Contract type: Month-to-Month, One year, Two year |
InternetService |
DSL / Fiber optic / No |
PaymentMethod |
Bank transfer / Credit card / Electronic check / Mailed check |
SeniorCitizen |
Whether the customer is a senior citizen |
- Dropped irrelevant
customerIDcolumn - Handled missing/inconsistent values in
TotalCharges - Binary encoding for boolean features (Partner, Dependents, PhoneService, etc.)
- One-hot encoding for categorical features (InternetService, Contract, PaymentMethod)
- Identified key patterns and anomalies in the data
- Analyzed churn distribution (class imbalance)
- Feature correlation analysis
- Engineered new features and transformed existing variables
- Applied SMOTE (Synthetic Minority Over-sampling Technique) to handle class imbalance
| Model | Description |
|---|---|
| Logistic Regression | Baseline linear classifier |
| Decision Tree | Rule-based classifier |
| Random Forest | Ensemble model - best performing (used in deployment) |
| XGBoost | Gradient boosting classifier |
| MLP (PyTorch) | Custom multi-layer perceptron neural network |
- Evaluated using: Accuracy, Precision, Recall, F1 Score, ROC-AUC
- Compared all models to identify best-performing algorithm
- Final model: Random Forest Classifier
- SHAP (SHapley Additive exPlanations):
- Global feature importance via summary plots
- Local per-prediction explanations via force plots
- LIME (Local Interpretable Model-agnostic Explanations):
- Local explanations for individual customer predictions
minor-project-churn/
├── data/
│ └── WA_Fn-UseC_-Telco-Customer-Churn.csv # IBM Telco Dataset
├── models/
│ ├── model_rfc.pkl # Trained Random Forest model
│ ├── mlp_model.pkl # Trained MLP model
│ └── explainer_rfc.bz2 # Pre-computed SHAP explainer
├── templates/
│ └── index.html # Flask web app template
├── notebooks/
│ └── main.ipynb # Main analysis notebook
├── app.py # Flask web application
├── requirements.txt # Python dependencies
├── Pipfile # Pipenv config
└── README.md # Project documentation
- Python 3.9+
- pip or pipenv
# Clone the repository
git clone <your-repo-url>
cd minor-project-churn
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtpython app.pyOpen http://127.0.0.1:5000 in your browser.
jupyter notebook notebooks/main.ipynb- Best Model: Random Forest Classifier
- Key Churn Predictors: Contract type, Tenure, Monthly Charges, Internet Service type
- XAI Integration: SHAP force plots provide per-customer explanation; LIME provides local model transparency
- Deployment: FastAPI web app with real-time churn probability gauge and SHAP explanation
| Category | Tools |
|---|---|
| Language | Python 3.9 |
| ML Libraries | scikit-learn, XGBoost, imbalanced-learn (SMOTE) |
| Deep Learning | PyTorch |
| XAI | SHAP, LIME |
| Web Framework | FastAPI |
| Data Processing | pandas, NumPy |
| Visualization | Matplotlib, Seaborn |
| Notebook | Jupyter |
- IBM Telco Customer Churn Dataset - Kaggle
- SHAP: Lundberg & Lee (2017) -"A Unified Approach to Interpreting Model Predictions"
- LIME: Ribeiro et al. (2016) -"Why Should I Trust You?"
- Scikit-learn documentation - https://scikit-learn.org