Predicting churn of pay TV clients

Predicting clients that are most likely to cancel their TV subscription and identifying top contributing factors to it

A pay TV provider is seeing an increase in loss of customers in the past months, but doesn't know exactly why this is happening. They want to use their customer's data to see if any patterns emerge on why they are leaving.

Can we predict this behavior and see what characteristics make someone more likely to churn their service?


With access to anonymized data from customers, a dataset containing client information (gender, age, zip code, ...), account information (payment method, contract age, ...), package information (package tier, pay-per-view add-ons, ...) and some engineered features (number of calls to help center, periods in debt, ...) was generated. The target column identifies if a client left that month or not.

Data was split into train and test. A few models were tested before choosing the Random Forest method to classify churning clients. Due to the imbalanced nature of the classes, an oversampling technique was also applied to the training set in some tests. A grid search was performed to select top performing hyperparameters based on test accuracy, precision, recall and f1-score.

With the random forest, we also identified what were the top contributing factors for a client to churn using its feature importance property.


SQL was used at first for queries and Python was used for exploratory analysis, clean the data, engineer features and train the model. Modeling was done using Scikit-learn and Imblearn for SMOTE

Other projects

Get in touch

If you'd like to learn more about my projects or work together, feel free to reach out! You can also connect with me on LinkedIn

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.