46 Customer Churn Prediction

📋 Learning Objectives

Understand the economics of churn: why retention is 5–25× cheaper than acquisition
Define churn precisely in different business contexts (contractual vs non-contractual, window choice)
Engineer predictive features for churn: recency, trend, engagement decay, tenure
Apply Kaplan-Meier survival analysis to estimate survival curves and compare groups
Build Cox Proportional Hazards models to quantify hazard ratios and test PH assumptions
Train and compare classification models (logistic regression, random forest, XGBoost) for churn prediction
Design intervention scoring to maximize expected value of retention campaigns

46.1 The Economics of Churn: Why Retention Matters

In nearly every business, retaining a customer is far cheaper than acquiring one. A typical retail bank spends ₦15,000–₦25,000 to acquire a new customer through digital marketing, branch promotions, and onboarding. Yet a retained customer often provides ₦50,000–₦100,000+ in net present value over their lifetime. Telecom operators see even starker ratios: acquiring a new subscriber in Nigeria might cost ₦3,000–₦5,000 via device subsidies and marketing, but the customer acquisition cost is often justified only if the customer stays for 18+ months. A competitor can lure them away with a free handset after 12 months.

This asymmetry drives a fundamental business insight: retention is 5–25× cheaper than acquisition. If you can prevent one customer from churning, you save the acquisition cost of the customer who would have replaced them, plus you retain the revenue and cross-sell opportunity of the existing customer. In mature markets like mobile telecommunications or banking, churn management often drives more bottom-line impact than new customer acquisition because the market is saturated.

Churn Rate is defined as: \[\text{Churn Rate} = \frac{\text{Number of Customers Lost in Period}}{\text{Number of Customers at Start of Period}} \times 100\%\]

For a subscription business with 100,000 subscribers at the start of the month and 2,000 churning during the month, the monthly churn rate is 2%. An annual churn rate of 24% (2% × 12 months if constant) means the company loses and must replace a quarter of its customer base every year. This is expensive.

Revenue Impact: If a customer generates ₦10,000 per month in revenue and churns after 12 months due to a preventable reason, the lost lifetime value is ₦120,000. If you spend ₦15,000 on a retention campaign and save even one in ten customers, the ROI is (₦120,000 × 0.1 − ₦15,000) / ₦15,000 = 80%. That is why churn prediction and prevention is so valuable.

Contractual vs Non-Contractual Churn: In subscription businesses (monthly phone plans, streaming services, SaaS), churn is contractual: the customer’s subscription renews monthly, and they must explicitly cancel. You see churn as a discrete event. In non-contractual businesses (e-commerce, grocery retail, banking), churn is non-contractual: there is no explicit cancellation. You must infer churn from behaviour (e.g., “no purchase in 90 days = churned”). This distinction affects how you define and measure churn.

46.2 Defining and Measuring Churn: The Definition Problem

Churn seems obvious: a customer who leaves. Yet defining churn precisely is harder than it appears, and the definition has major implications for model validity.

The Churn Window Problem: Consider an e-commerce customer. If they haven’t bought in 30 days, are they churned? 60 days? 180 days? The choice is not statistical; it is business-defined. A luxury fashion retailer whose customer visits once every 12 months would define this customer as retained, not churned. A fast-moving consumer goods (FMCG) retailer would call them churned if they haven’t bought in 6 months. The churn window depends on your product’s natural purchase frequency.

Right-Censoring: In survival analysis, right-censoring occurs when a customer is still active at the end of the observation window. If you observe customers for 12 months, and a customer is still active on month 12, you cannot be sure whether they will churn on month 13 or stay for years. This is “right-censored” data: you know they survived at least 12 months, but not the true time to churn. Survival analysis methods (like Kaplan-Meier) properly handle this.

Building the Churn Label: For a classification approach, you need historical data with a known outcome. Typical steps:

Define observation period (e.g., 12 months: months 1–12).
Define outcome period (e.g., 90 days after observation period: months 13–15).
For each customer, collect features during the observation period (spending, frequency, engagement, etc.).
Label: If the customer is active in the outcome period, label = 0 (retained); if inactive, label = 1 (churned).

This labeling scheme ensures features predict future behaviour, not current state.

The Class Imbalance Problem: In a healthy business, churn is rare. If your churn rate is 2%, then 98% of your training data are retained customers. This class imbalance biases many classifiers to simply predict “no churn” for everyone (which achieves 98% accuracy but is useless). You must use techniques like stratified sampling, SMOTE (synthetic oversampling), or cost-weighted loss to address this.

46.3 Feature Engineering for Churn Prediction

Powerful features for churn prediction are based on behavioural signals that indicate engagement and value.

Recency Signals: How recently did the customer transact? - Days since last transaction - Days since last high-value transaction (>5th percentile) - Number of transactions in last 30, 60, 90 days - Active in last 30 days? (0/1 indicator)

Trend Features: Is engagement increasing or decreasing? - Spending trend: \((X_{month12} − X_{month1}) / X_{month1}\) (percent change in spending) - Usage trend: \((F_{month12} − F_{month1}) / F_{month1}\) (percent change in frequency) - Coefficient from a linear time-series regression: \(\beta\) in \(X_t = \alpha + \beta t + \varepsilon_t\)

A negative trend is a strong churn signal. A customer whose usage has declined 50% in the past six months is at high risk.

Engagement Decay: - Is 90-day usage less than prior 90-day usage? (0/1) - Ratio of recent spending to average: \(\frac{\text{Last 30-day spend}}{\text{Average monthly spend over 12 months}}\) - “Quiet period”: months with zero transactions

Tenure: How long has the customer been with you? - Months since acquisition - Customer age cohort (new: <6 months; established: 6–24 months; veteran: >24 months)

Tenure often shows a U-shaped relationship with churn: new customers churn more (haven’t settled in), established customers churn less (sunk costs, habit), and very old customers may churn more again (life changes). Fitting a polynomial (\(\text{tenure} + \text{tenure}^2\)) captures this.

Product and Channel Signals: - Number of products: Do they hold 1 product (risky) or 5+ (sticky)? - Product diversity: Percentage of product categories they use (breadth reduces churn) - Channel preference: Mostly in-branch (older, less sticky) vs app-based (younger, more sticky)? - Cross-product usage: If they use many products, churn falls

Competitive Signals (if available): - Device type: Android vs iPhone (market signal; iPhone users often more loyal) - Did they query competitor offers? (direct churn signal) - Bundle deals taken: Customers on multi-product bundles churn less

46.4 Survival Analysis: Kaplan-Meier Estimator

For contractual churn data (e.g., subscribers with known signup and churn dates), survival analysis is the gold standard. It properly handles right-censoring and computes the probability that a customer survives (remains active) beyond time \(t\).

The Survival Function \(S(t)\) is the probability that a customer survives beyond time \(t\): \[S(t) = P(\text{Time to churn} > t)\]

At \(t=0\), \(S(0) = 1\) (everyone is “alive” at start). As \(t\) increases, \(S(t)\) decreases toward 0 (as customers churn, fewer survive). The shape of \(S(t)\) tells the story: a steep early decline suggests high new-customer churn; a shallow decline suggests a stable, loyal base.

The Kaplan-Meier Estimator is a non-parametric estimate of \(S(t)\). It uses the product-limit formula:

\[\hat{S}(t) = \prod_{t_i \leq t} \left( 1 - \frac{d_i}{n_i} \right)\]

where: - \(t_i\) are the observed churn times - \(d_i\) is the number of customers who churned at time \(t_i\) - \(n_i\) is the number of customers at risk (still active) just before time \(t_i\)

The formula says: at each churn event time, compute the proportion of at-risk customers who survive that time, then multiply all these proportions.

The Log-Rank Test compares survival curves between two groups (e.g., prepaid vs postpaid customers). The test statistic is:

\[Z = \frac{E_1 - O_1}{\sqrt{V}}\]

where: - \(O_1\) is the observed number of churn events in group 1 - \(E_1\) is the expected number under the null hypothesis (no group difference) - \(V\) is the variance

Under the null, \(Z\) follows a standard normal distribution. If \(|Z| > 1.96\), the groups have significantly different survival curves (\(p < 0.05\)).

46.5 The Cox Proportional Hazards Model

While Kaplan-Meier is useful for visualising and comparing survival curves, it does not let you adjust for multiple predictors. The Cox Proportional Hazards (Cox PH) model is a semi-parametric regression method that models the hazard function (instantaneous churn rate) as a function of covariates.

The Hazard Function \(h(t)\) is the instantaneous rate of churning at time \(t\), given the customer has survived to time \(t\). The Cox PH model specifies:

\[h(t; \mathbf{x}) = h_0(t) \times \exp(\beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p)\]

where: - \(h_0(t)\) is the baseline hazard (the hazard for a customer with all covariates = 0) - \(\mathbf{x} = (x_1, \ldots, x_p)\) are the covariates - \(\boldsymbol{\beta}\) are the regression coefficients

Interpretation of Coefficients: If \(\beta_1 = 0.5\), then a one-unit increase in \(x_1\) multiplies the hazard by \(e^{0.5} \approx 1.65\). This is the Hazard Ratio (HR): \[\text{HR} = e^{\beta}\]

An HR > 1 means the covariate increases churn risk; HR < 1 means it decreases churn risk.

Example: If “days since last transaction” has \(\beta = 0.02\), the HR per day is \(e^{0.02} \approx 1.02\). For a 30-day increase, the HR is \(e^{0.02 \times 30} \approx 1.82\). A customer with 30 more days of inactivity has 82% higher hazard of churning, adjusting for other variables.

The Proportional Hazards Assumption: The key assumption is that the HR is constant over time. That is, the effect of a covariate on churn risk doesn’t change over the customer lifecycle. This is often violated (e.g., new customers’ churn risk is more sensitive to engagement signals). Test this via Schoenfeld residuals: plot standardized residuals against time and look for trends. A smooth, zero-centered cloud suggests PH holds; a trend suggests violation.

Fitting the Cox Model: Estimate \(\boldsymbol{\beta}\) by maximizing the partial likelihood (not the full likelihood, since we don’t model \(h_0(t)\) parametrically). Most software (survival in R, lifelines in Python) does this automatically.

46.6 Classification Models for Churn: Logistic Regression to XGBoost

While Cox models are powerful for continuous-time survival data, many churn problems are framed as classification: given features at time \(t\), will the customer churn by time \(t + \Delta t\)? This is exactly a binary classification problem, and any classifier (logistic regression, random forest, XGBoost) can be applied.

Logistic Regression is a baseline. Fit: \[\log\left( \frac{P(\text{churn})}{1 - P(\text{churn})} \right) = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p\]

Interpret coefficients as log-odds ratios. It is interpretable and often performs well.

Random Forest and Gradient Boosting (XGBoost, LightGBM) are non-linear and capture feature interactions. They typically outperform logistic regression on churn data. The trade-off is interpretability: you need SHAP or permutation importance to understand which features drive predictions.

Threshold Selection: Do not use the default 0.5 threshold. Churn is rare (2% in a healthy business), so a 0.5 threshold will flag too few customers. Instead, choose the threshold to maximize expected business value: \[\text{Expected Value} = P(\text{churn}) \times \text{CLV} \times P(\text{retained | intervention}) - \text{Cost of Intervention}\]

If saving a churning customer is worth ₦100,000 (CLV) and the intervention costs ₦5,000, you should intervene if \(P(\text{churn}) > 5\% / 100 = 0.05\), even though the base churn rate is 2%. Use cost-weighted metrics (precision-recall curves, expected calibration curves) to find the optimal threshold.

46.7 Prescriptive Analytics: Intervention Scoring

Once you have a churn probability model, the next step is intervention scoring: ranking customers by how valuable it would be to prevent their churn. This is not just \(P(\text{churn})\); it is:

\[\text{Intervention Score} = P(\text{churn}) \times \text{CLV} \times P(\text{retained | intervention})\]

Components: - \(P(\text{churn})\): Model prediction - \(\text{CLV}\): Customer Lifetime Value (months expected to remain active × monthly spend), a business metric you must compute separately - \(P(\text{retained | intervention})\): Effectiveness of your intervention. If you call a customer and offer a special rate, what fraction return? This is often estimated from historical A/B tests (30–60% are typical). If you don’t have this, use 0.3 as a conservative estimate.

The score tells you: “This customer is predicted to churn (40%), is worth ₦500,000 over their lifetime, and we can save them 50% of the time with our intervention. The expected value of intervention is 0.40 × ₦500,000 × 0.50 = ₦100,000.”

Rank all customers by intervention score and work down the list until your budget runs out. This ensures you allocate retention spend to the highest-value customers most likely to respond.

📝 Section 41.7 Review Questions

Explain the difference between contractual and non-contractual churn. How does this affect model definition?
What is the Kaplan-Meier estimator, and what does the product-limit formula represent?
If a Cox model shows HR = 1.15 for “days since last transaction”, what does this mean in plain language?
Why should you not use a 0.5 probability threshold for churn prediction in a business context?
Design an intervention scoring system for a bank. What are the key inputs, and how would you estimate them?

46.8 Case Study: Subscriber Churn for a Nigerian Mobile Network Operator

Background: A major Nigerian mobile network operator (here called “NairaTel,” a fictional composite) operates a prepaid subscriber base of 100,000 active customers. The company wants to predict which subscribers will churn in the next 90 days and target them with a retention offer (e.g., 50% discount on data bundles for one month). The churn rate is ~2% per month (24% annualized), which is typical for Nigerian telecom. The company collects 12 months of behavioural data per subscriber.

Data: Columns include: - subscriber_id: Unique ID - acquisition_date: When the subscriber joined - last_recharge_date: Last purchase of credit/data bundle - arpu_12m: Average Revenue Per User over past 12 months (NGN) - data_usage_gb: Total data consumed in 12 months - voice_minutes: Total voice minutes in 12 months - recharge_count_12m: Number of recharge transactions - days_inactive: Days since last activity - churn_90d: Binary label (1 = churned in next 90 days, 0 = retained)

46.8.1 Exploratory Data Analysis and Feature Engineering

Show code

library(tidyverse)
library(survival)
library(ggplot2)

set.seed(8152)

# Synthetic Nigerian telecom subscriber data
n_subscribers <- 10000
subscribers <- tibble(
  subscriber_id = 1:n_subscribers,
  acquisition_date = today() - days(sample(30:1095, n_subscribers, replace = TRUE)),
  arpu_12m = rgamma(n_subscribers, shape = 2, scale = 3000), # Mean ₦6k/month
  data_usage_gb = rgamma(n_subscribers, shape = 2, scale = 5),
  voice_minutes = rgamma(n_subscribers, shape = 2, scale = 100),
  recharge_count_12m = rpois(n_subscribers, lambda = 8)
)

# Generate churn labels (higher ARPU, more recent activity = lower churn)
subscribers <- subscribers |>
  mutate(
    days_since_acquisition = as.numeric(today() - acquisition_date),
    tenure_score = pmin(days_since_acquisition / 365, 1), # 0–1, capped at 1 year
    activity_decay = rnorm(n_subscribers, mean = recharge_count_12m / 12, sd = 2),
    activity_decay = pmax(activity_decay, 0),
    churn_prob = 0.15 - 0.05 * tenure_score - 0.02 * log1p(arpu_12m / 1000) - 0.01 * activity_decay,
    churn_prob = pmin(pmax(churn_prob, 0.01), 0.25), # Bound between 1% and 25%
    churn_90d = as.integer(runif(n_subscribers) < churn_prob),
    days_inactive = sample(1:90, n_subscribers, replace = TRUE) +
                    (1 - churn_90d) * sample(90:180, n_subscribers, replace = TRUE)
  ) |>
  select(subscriber_id, acquisition_date, arpu_12m, data_usage_gb,
         voice_minutes, recharge_count_12m, tenure_score, churn_90d, days_inactive)

# Summary statistics
cat("Churn Rate:", mean(subscribers$churn_90d) * 100, "%\n")
#> Churn Rate: 5.98 %
cat("Mean ARPU (Retained):",
    mean(subscribers$arpu_12m[subscribers$churn_90d == 0]), "\n")
#> Mean ARPU (Retained): 6020.864
cat("Mean ARPU (Churned):",
    mean(subscribers$arpu_12m[subscribers$churn_90d == 1]), "\n")
#> Mean ARPU (Churned): 5192.804

# Visualize churn distribution by key features
p1 <- ggplot(subscribers, aes(x = factor(churn_90d), y = arpu_12m)) +
  geom_boxplot(fill = "steelblue") +
  labs(title = "ARPU by Churn Status", x = "Churned (0=No, 1=Yes)", y = "ARPU (NGN)") +
  theme_minimal()

p2 <- ggplot(subscribers, aes(x = factor(churn_90d), y = recharge_count_12m)) +
  geom_boxplot(fill = "steelblue") +
  labs(title = "Recharge Frequency by Churn Status", x = "Churned (0=No, 1=Yes)", y = "Recharges (12m)") +
  theme_minimal()

p3 <- ggplot(subscribers, aes(x = churn_90d)) +
  geom_bar(aes(fill = factor(churn_90d)), show.legend = FALSE) +
  labs(title = "Churn Distribution", x = "Churned (0=No, 1=Yes)", y = "Count") +
  scale_fill_manual(values = c("0" = "darkgreen", "1" = "darkred")) +
  theme_minimal()

gridExtra::grid.arrange(p1, p2, p3, ncol = 3)

Churn Distribution and Key Feature Distributions

Show code

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as plt

np.random.seed(8152)

# Synthetic data
n_subscribers = 10000
acquisition_dates = [datetime.now() - timedelta(days=int(d))
                     for d in np.random.randint(30, 1095, n_subscribers)]
arpu_12m = np.random.gamma(shape=2, scale=3000, size=n_subscribers)
data_usage_gb = np.random.gamma(shape=2, scale=5, size=n_subscribers)
voice_minutes = np.random.gamma(shape=2, scale=100, size=n_subscribers)
recharge_count = np.random.poisson(lam=8, size=n_subscribers)

subscribers_py = pd.DataFrame({
    "subscriber_id": np.arange(1, n_subscribers + 1),
    "acquisition_date": acquisition_dates,
    "arpu_12m": arpu_12m,
    "data_usage_gb": data_usage_gb,
    "voice_minutes": voice_minutes,
    "recharge_count_12m": recharge_count
})

# Compute churn
subscribers_py["days_since_acquisition"] = (
    (pd.Timestamp.now() - pd.to_datetime(subscribers_py["acquisition_date"])).dt.days
)
subscribers_py["tenure_score"] = np.minimum(subscribers_py["days_since_acquisition"] / 365, 1)
subscribers_py["activity_decay"] = np.maximum(
    np.random.normal(subscribers_py["recharge_count_12m"] / 12, 2),
    0
)
subscribers_py["churn_prob"] = (0.15 - 0.05 * subscribers_py["tenure_score"]
                                 - 0.02 * np.log1p(subscribers_py["arpu_12m"] / 1000)
                                 - 0.01 * subscribers_py["activity_decay"])
subscribers_py["churn_prob"] = np.clip(subscribers_py["churn_prob"], 0.01, 0.25)
subscribers_py["churn_90d"] = (np.random.random(n_subscribers) < subscribers_py["churn_prob"]).astype(int)

print(f"Churn Rate: {subscribers_py['churn_90d'].mean() * 100:.2f}%")
#> Churn Rate: 5.83%
print(f"Mean ARPU (Retained): {subscribers_py[subscribers_py['churn_90d']==0]['arpu_12m'].mean():.0f}")
#> Mean ARPU (Retained): 6154
print(f"Mean ARPU (Churned): {subscribers_py[subscribers_py['churn_90d']==1]['arpu_12m'].mean():.0f}")
#> Mean ARPU (Churned): 5472

# Visualize
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
subscribers_py.boxplot(column="arpu_12m", by="churn_90d", ax=axes[0])
axes[0].set_title("ARPU by Churn Status")
subscribers_py.boxplot(column="recharge_count_12m", by="churn_90d", ax=axes[1])
axes[1].set_title("Recharge Frequency by Churn Status")
subscribers_py["churn_90d"].value_counts().plot(kind="bar", ax=axes[2], color=["darkgreen", "darkred"])
axes[2].set_title("Churn Distribution")
plt.tight_layout()
plt.savefig("ch41_eda.png", dpi=150, bbox_inches="tight")
plt.show()

46.8.2 Kaplan-Meier Survival Analysis

Show code

# Convert to survival format: convert churn binary to time-to-event
# Assume subscribers followed for ~90 days; if churn_90d=1, event at day 45; else censored at day 90
subscribers <- subscribers |>
  mutate(
    time_to_churn = ifelse(churn_90d == 1,
                           sample(30:90, n(), replace = TRUE),
                           90),
    event = churn_90d,
    arpu_segment = cut(arpu_12m,
                       breaks = quantile(arpu_12m, c(0, 0.33, 0.67, 1)),
                       labels = c("Low ARPU", "Medium ARPU", "High ARPU"),
                       include.lowest = TRUE)
  )

# Fit Kaplan-Meier
km_fit <- survfit(Surv(time_to_churn, event) ~ arpu_segment, data = subscribers)

# Plot survival curves
ggsurvplot <- function(fit, data, title) {
  # Manual implementation for clarity
  summary_df <- summary(fit)
  plot_data <- data.frame(
    time = summary_df$time,
    surv = summary_df$surv,
    strata = rep(names(fit$strata), times = table(fit$strata))
  )

  ggplot(plot_data, aes(x = time, y = surv, color = strata, group = strata)) +
    geom_step(linewidth = 1) +
    labs(title = title,
         x = "Days", y = "Survival Probability",
         color = "Segment") +
    ylim(0, 1) +
    theme_minimal() +
    theme(legend.position = "right")
}

# Alternative: use simplified plotting
plot_km_summary <- summary(km_fit)
cat("Kaplan-Meier Summary:\n")
#> Kaplan-Meier Summary:
print(plot_km_summary)
#> Call: survfit(formula = Surv(time_to_churn, event) ~ arpu_segment, 
#>     data = subscribers)
#> 
#>                 arpu_segment=Low ARPU 
#>  time n.risk n.event survival  std.err lower 95% CI upper 95% CI
#>    30   3300       5    0.998 0.000677        0.997        1.000
#>    31   3295       5    0.997 0.000957        0.995        0.999
#>    32   3290       6    0.995 0.001209        0.993        0.998
#>    33   3284       6    0.993 0.001417        0.991        0.996
#>    34   3278       3    0.992 0.001509        0.989        0.995
#>    36   3275       3    0.992 0.001597        0.988        0.995
#>    37   3272       3    0.991 0.001679        0.987        0.994
#>    38   3269       5    0.989 0.001808        0.986        0.993
#>    39   3264       4    0.988 0.001905        0.984        0.992
#>    40   3260       4    0.987 0.001997        0.983        0.991
#>    41   3256       5    0.985 0.002105        0.981        0.989
#>    42   3251       2    0.985 0.002147        0.980        0.989
#>    43   3249       2    0.984 0.002188        0.980        0.988
#>    44   3247       4    0.983 0.002268        0.978        0.987
#>    45   3243       4    0.982 0.002345        0.977        0.986
#>    46   3239       4    0.980 0.002419        0.976        0.985
#>    47   3235       2    0.980 0.002455        0.975        0.985
#>    48   3233       5    0.978 0.002543        0.973        0.983
#>    49   3228       2    0.978 0.002577        0.973        0.983
#>    50   3226       6    0.976 0.002677        0.971        0.981
#>    51   3220       8    0.973 0.002805        0.968        0.979
#>    52   3212       4    0.972 0.002866        0.967        0.978
#>    53   3208       1    0.972 0.002881        0.966        0.977
#>    54   3207       2    0.971 0.002911        0.966        0.977
#>    55   3205       2    0.971 0.002940        0.965        0.976
#>    56   3203       1    0.970 0.002955        0.965        0.976
#>    57   3202       3    0.969 0.002998        0.964        0.975
#>    58   3199       5    0.968 0.003069        0.962        0.974
#>    59   3194       8    0.965 0.003179        0.959        0.972
#>    60   3186       2    0.965 0.003206        0.959        0.971
#>    61   3184       5    0.963 0.003272        0.957        0.970
#>    62   3179       3    0.962 0.003310        0.956        0.969
#>    63   3176       3    0.962 0.003349        0.955        0.968
#>    64   3173       5    0.960 0.003411        0.953        0.967
#>    65   3168       5    0.958 0.003472        0.952        0.965
#>    66   3163       6    0.957 0.003544        0.950        0.964
#>    67   3157       7    0.955 0.003626        0.947        0.962
#>    68   3150       2    0.954 0.003649        0.947        0.961
#>    69   3148       3    0.953 0.003683        0.946        0.960
#>    70   3145       2    0.952 0.003706        0.945        0.960
#>    71   3143       7    0.950 0.003783        0.943        0.958
#>    72   3136       5    0.949 0.003837        0.941        0.956
#>    73   3131       6    0.947 0.003901        0.939        0.955
#>    74   3125       3    0.946 0.003932        0.938        0.954
#>    75   3122       1    0.946 0.003943        0.938        0.954
#>    76   3121       4    0.945 0.003984        0.937        0.952
#>    77   3117       6    0.943 0.004045        0.935        0.951
#>    78   3111       4    0.942 0.004085        0.934        0.950
#>    79   3107       2    0.941 0.004105        0.933        0.949
#>    80   3105       3    0.940 0.004134        0.932        0.948
#>    81   3102       3    0.939 0.004163        0.931        0.947
#>    82   3099       6    0.937 0.004221        0.929        0.946
#>    83   3093       4    0.936 0.004259        0.928        0.944
#>    84   3089       5    0.935 0.004305        0.926        0.943
#>    85   3084       4    0.933 0.004342        0.925        0.942
#>    86   3080       5    0.932 0.004388        0.923        0.940
#>    87   3075       3    0.931 0.004415        0.922        0.940
#>    88   3072       2    0.930 0.004433        0.922        0.939
#>    89   3070       2    0.930 0.004450        0.921        0.938
#>    90   3068       3    0.929 0.004477        0.920        0.938
#> 
#>                 arpu_segment=Medium ARPU 
#>  time n.risk n.event survival  std.err lower 95% CI upper 95% CI
#>    30   3400       1    1.000 0.000294        0.999        1.000
#>    31   3399       2    0.999 0.000509        0.998        1.000
#>    32   3397       4    0.998 0.000777        0.996        0.999
#>    33   3393       6    0.996 0.001058        0.994        0.998
#>    34   3387       2    0.996 0.001137        0.993        0.998
#>    35   3385       5    0.994 0.001311        0.992        0.997
#>    36   3380       4    0.993 0.001436        0.990        0.996
#>    37   3376       2    0.992 0.001494        0.989        0.995
#>    38   3374       5    0.991 0.001630        0.988        0.994
#>    39   3369       4    0.990 0.001731        0.986        0.993
#>    40   3365       2    0.989 0.001779        0.986        0.993
#>    41   3363       5    0.988 0.001894        0.984        0.991
#>    42   3358       7    0.986 0.002044        0.982        0.990
#>    43   3351       5    0.984 0.002144        0.980        0.988
#>    44   3346       2    0.984 0.002183        0.979        0.988
#>    45   3344       5    0.982 0.002276        0.978        0.987
#>    46   3339       2    0.981 0.002313        0.977        0.986
#>    48   3337       2    0.981 0.002348        0.976        0.985
#>    49   3335       2    0.980 0.002384        0.976        0.985
#>    51   3333       7    0.978 0.002502        0.973        0.983
#>    52   3326       3    0.977 0.002551        0.972        0.982
#>    53   3323       2    0.977 0.002584        0.972        0.982
#>    54   3321       5    0.975 0.002662        0.970        0.981
#>    55   3316       3    0.974 0.002708        0.969        0.980
#>    56   3313       5    0.973 0.002783        0.968        0.978
#>    57   3308       2    0.972 0.002812        0.967        0.978
#>    58   3306       4    0.971 0.002869        0.966        0.977
#>    59   3302       2    0.971 0.002898        0.965        0.976
#>    60   3300       4    0.969 0.002953        0.964        0.975
#>    61   3296       3    0.969 0.002994        0.963        0.974
#>    62   3293       1    0.968 0.003008        0.962        0.974
#>    63   3292       5    0.967 0.003074        0.961        0.973
#>    64   3287       2    0.966 0.003100        0.960        0.972
#>    65   3285       6    0.964 0.003177        0.958        0.971
#>    66   3279       7    0.962 0.003264        0.956        0.969
#>    67   3272       3    0.961 0.003301        0.955        0.968
#>    68   3269       6    0.960 0.003372        0.953        0.966
#>    69   3263       4    0.959 0.003419        0.952        0.965
#>    70   3259       3    0.958 0.003454        0.951        0.964
#>    71   3256       3    0.957 0.003488        0.950        0.964
#>    72   3253       2    0.956 0.003511        0.949        0.963
#>    73   3251       5    0.955 0.003566        0.948        0.962
#>    74   3246       1    0.954 0.003577        0.947        0.961
#>    75   3245       1    0.954 0.003588        0.947        0.961
#>    76   3244       4    0.953 0.003632        0.946        0.960
#>    77   3240       7    0.951 0.003706        0.944        0.958
#>    78   3233       4    0.950 0.003748        0.942        0.957
#>    79   3229       2    0.949 0.003769        0.942        0.957
#>    80   3227       1    0.949 0.003779        0.941        0.956
#>    82   3226       5    0.947 0.003830        0.940        0.955
#>    83   3221       6    0.946 0.003890        0.938        0.953
#>    84   3215       2    0.945 0.003910        0.937        0.953
#>    85   3213       1    0.945 0.003920        0.937        0.952
#>    86   3212       4    0.944 0.003959        0.936        0.951
#>    87   3208       5    0.942 0.004007        0.934        0.950
#>    88   3203       1    0.942 0.004016        0.934        0.950
#>    89   3202       2    0.941 0.004035        0.933        0.949
#>    90   3200       5    0.940 0.004082        0.932        0.948
#> 
#>                 arpu_segment=High ARPU 
#>  time n.risk n.event survival  std.err lower 95% CI upper 95% CI
#>    30   3300       1    1.000 0.000303        0.999        1.000
#>    31   3299       3    0.999 0.000606        0.998        1.000
#>    32   3296       3    0.998 0.000801        0.996        0.999
#>    34   3293       5    0.996 0.001048        0.994        0.998
#>    36   3288       4    0.995 0.001209        0.993        0.998
#>    37   3284       2    0.995 0.001282        0.992        0.997
#>    38   3282       3    0.994 0.001384        0.991        0.996
#>    39   3279       2    0.993 0.001448        0.990        0.996
#>    40   3277       3    0.992 0.001539        0.989        0.995
#>    41   3274       2    0.992 0.001597        0.988        0.995
#>    42   3272       4    0.990 0.001706        0.987        0.994
#>    43   3268       1    0.990 0.001732        0.987        0.993
#>    44   3267       2    0.989 0.001783        0.986        0.993
#>    45   3265       3    0.988 0.001857        0.985        0.992
#>    46   3262       2    0.988 0.001905        0.984        0.992
#>    47   3260       5    0.986 0.002019        0.982        0.990
#>    48   3255       1    0.986 0.002041        0.982        0.990
#>    49   3254       1    0.986 0.002063        0.982        0.990
#>    50   3253       2    0.985 0.002105        0.981        0.989
#>    51   3251       1    0.985 0.002126        0.981        0.989
#>    52   3250       3    0.984 0.002188        0.980        0.988
#>    54   3247       3    0.983 0.002248        0.979        0.987
#>    55   3244       2    0.982 0.002287        0.978        0.987
#>    56   3242       3    0.982 0.002345        0.977        0.986
#>    57   3239       3    0.981 0.002401        0.976        0.985
#>    58   3236       3    0.980 0.002455        0.975        0.985
#>    59   3233       1    0.979 0.002473        0.975        0.984
#>    60   3232       4    0.978 0.002543        0.973        0.983
#>    61   3228       3    0.977 0.002594        0.972        0.982
#>    62   3225       5    0.976 0.002677        0.971        0.981
#>    63   3220       2    0.975 0.002710        0.970        0.980
#>    64   3218       1    0.975 0.002726        0.970        0.980
#>    65   3217       4    0.974 0.002789        0.968        0.979
#>    66   3213       4    0.972 0.002851        0.967        0.978
#>    67   3209       5    0.971 0.002926        0.965        0.977
#>    68   3204       3    0.970 0.002970        0.964        0.976
#>    69   3201       4    0.969 0.003027        0.963        0.975
#>    70   3197       2    0.968 0.003055        0.962        0.974
#>    71   3195       3    0.967 0.003097        0.961        0.973
#>    72   3192       2    0.967 0.003125        0.961        0.973
#>    73   3190       2    0.966 0.003152        0.960        0.972
#>    74   3188       4    0.965 0.003206        0.959        0.971
#>    75   3184       3    0.964 0.003246        0.958        0.970
#>    76   3181       1    0.964 0.003259        0.957        0.970
#>    77   3180       4    0.962 0.003310        0.956        0.969
#>    79   3176       4    0.961 0.003361        0.955        0.968
#>    80   3172       3    0.960 0.003399        0.954        0.967
#>    81   3169       4    0.959 0.003448        0.952        0.966
#>    82   3165       2    0.958 0.003472        0.952        0.965
#>    83   3163       3    0.958 0.003509        0.951        0.964
#>    85   3160       1    0.957 0.003521        0.950        0.964
#>    86   3159       4    0.956 0.003568        0.949        0.963
#>    87   3155       1    0.956 0.003580        0.949        0.963
#>    88   3154       3    0.955 0.003614        0.948        0.962
#>    89   3151       4    0.954 0.003660        0.946        0.961
#>    90   3147       5    0.952 0.003717        0.945        0.959

# Log-rank test
logrank_test <- survdiff(Surv(time_to_churn, event) ~ arpu_segment, data = subscribers)
cat("\nLog-Rank Test (comparing ARPU segments):\n")
#> 
#> Log-Rank Test (comparing ARPU segments):
cat("Chi-squared statistic:", logrank_test$chisq, "\n")
#> Chi-squared statistic: 16.09961
cat("P-value:", 1 - pchisq(logrank_test$chisq, df = 2), "\n")
#> P-value: 0.0003191646

Show code

from lifelines import KaplanMeierFitter
from lifelines.statistics import logrank_test
import pandas as pd
import matplotlib.pyplot as plt

# Prepare survival data
subscribers_py["time_to_churn"] = np.where(
    subscribers_py["churn_90d"] == 1,
    np.random.randint(30, 91, n_subscribers),
    90
)
subscribers_py["arpu_segment"] = pd.cut(
    subscribers_py["arpu_12m"],
    bins=3,
    labels=["Low ARPU", "Medium ARPU", "High ARPU"]
)

# Fit KM for each segment
kmf = KaplanMeierFitter()
fig, ax = plt.subplots(figsize=(10, 6))

for segment in ["Low ARPU", "Medium ARPU", "High ARPU"]:
    mask = subscribers_py["arpu_segment"] == segment
    kmf.fit(
        durations=subscribers_py[mask]["time_to_churn"],
        event_observed=subscribers_py[mask]["churn_90d"],
        label=segment
    )
    kmf.plot_survival_function(ax=ax, linewidth=2)

ax.set_xlabel("Days")
ax.set_ylabel("Survival Probability")
ax.set_title("Kaplan-Meier Survival Curves by ARPU Segment")
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("ch41_km_curves.png", dpi=150, bbox_inches="tight")
plt.show()

Show code


# Log-rank test
low_mask = subscribers_py["arpu_segment"] == "Low ARPU"
high_mask = subscribers_py["arpu_segment"] == "High ARPU"
results = logrank_test(
    durations_A=subscribers_py[low_mask]["time_to_churn"],
    durations_B=subscribers_py[high_mask]["time_to_churn"],
    event_observed_A=subscribers_py[low_mask]["churn_90d"],
    event_observed_B=subscribers_py[high_mask]["churn_90d"]
)
print("Log-Rank Test (Low vs High ARPU):")
#> Log-Rank Test (Low vs High ARPU):
print(f"Test statistic: {results.test_statistic:.4f}")
#> Test statistic: 1.5885
print(f"P-value: {results.p_value:.4f}")
#> P-value: 0.2075

46.8.3 Cox Proportional Hazards Model

Show code

library(survival)

# Fit Cox PH model
cox_model <- coxph(
  Surv(time_to_churn, event) ~
    log1p(arpu_12m) +
    log1p(data_usage_gb) +
    log1p(voice_minutes) +
    log1p(recharge_count_12m) +
    tenure_score,
  data = subscribers
)

# Summary
summary_cox <- summary(cox_model)
print(summary_cox)
#> Call:
#> coxph(formula = Surv(time_to_churn, event) ~ log1p(arpu_12m) + 
#>     log1p(data_usage_gb) + log1p(voice_minutes) + log1p(recharge_count_12m) + 
#>     tenure_score, data = subscribers)
#> 
#>   n= 10000, number of events= 598 
#> 
#>                                coef exp(coef)  se(coef)      z Pr(>|z|)    
#> log1p(arpu_12m)           -0.223996  0.799318  0.047171 -4.749 2.05e-06 ***
#> log1p(data_usage_gb)       0.136986  1.146813  0.062533  2.191   0.0285 *  
#> log1p(voice_minutes)       0.007346  1.007373  0.051076  0.144   0.8856    
#> log1p(recharge_count_12m) -0.047931  0.953200  0.119385 -0.401   0.6881    
#> tenure_score              -0.589004  0.554879  0.138558 -4.251 2.13e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>                           exp(coef) exp(-coef) lower .95 upper .95
#> log1p(arpu_12m)              0.7993     1.2511    0.7287    0.8767
#> log1p(data_usage_gb)         1.1468     0.8720    1.0145    1.2963
#> log1p(voice_minutes)         1.0074     0.9927    0.9114    1.1134
#> log1p(recharge_count_12m)    0.9532     1.0491    0.7543    1.2045
#> tenure_score                 0.5549     1.8022    0.4229    0.7280
#> 
#> Concordance= 0.58  (se = 0.012 )
#> Likelihood ratio test= 43.51  on 5 df,   p=3e-08
#> Wald test            = 46.04  on 5 df,   p=9e-09
#> Score (logrank) test = 46.17  on 5 df,   p=8e-09

# Extract hazard ratios
hr_table <- data.frame(
  variable = rownames(summary_cox$coefficients),
  coefficient = summary_cox$coefficients[, 1],
  hazard_ratio = summary_cox$coefficients[, 2],
  p_value = summary_cox$coefficients[, 5]
)

cat("\nHazard Ratios (Exponentiated Coefficients):\n")
#> 
#> Hazard Ratios (Exponentiated Coefficients):
print(hr_table)
#>                                            variable  coefficient hazard_ratio
#> log1p(arpu_12m)                     log1p(arpu_12m) -0.223995842    0.7993185
#> log1p(data_usage_gb)           log1p(data_usage_gb)  0.136986375    1.1468125
#> log1p(voice_minutes)           log1p(voice_minutes)  0.007346256    1.0073733
#> log1p(recharge_count_12m) log1p(recharge_count_12m) -0.047930989    0.9531996
#> tenure_score                           tenure_score -0.589004308    0.5548795
#>                                p_value
#> log1p(arpu_12m)           2.048559e-06
#> log1p(data_usage_gb)      2.847878e-02
#> log1p(voice_minutes)      8.856348e-01
#> log1p(recharge_count_12m) 6.880642e-01
#> tenure_score              2.128571e-05

# Test proportional hazards assumption (Schoenfeld residuals)
ph_test <- cox.zph(cox_model)
cat("\nProportional Hazards Test (Schoenfeld Residuals):\n")
#> 
#> Proportional Hazards Test (Schoenfeld Residuals):
print(ph_test)
#>                           chisq df    p
#> log1p(arpu_12m)           0.948  1 0.33
#> log1p(data_usage_gb)      0.610  1 0.43
#> log1p(voice_minutes)      0.103  1 0.75
#> log1p(recharge_count_12m) 0.945  1 0.33
#> tenure_score              2.314  1 0.13
#> GLOBAL                    4.897  5 0.43

# Visualize hazard ratios
hr_plot_data <- hr_table |>
  arrange(hazard_ratio) |>
  mutate(significant = ifelse(p_value < 0.05, "Yes", "No"))

ggplot(hr_plot_data, aes(y = reorder(variable, hazard_ratio), x = hazard_ratio, color = significant)) +
  geom_point(size = 3) +
  geom_vline(xintercept = 1, linetype = "dashed", color = "black") +
  geom_errorbar(aes(xmin = hazard_ratio - 0.1, xmax = hazard_ratio + 0.1), height = 0.2) +
  scale_color_manual(values = c("Yes" = "darkred", "No" = "grey")) +
  labs(title = "Cox PH Hazard Ratios",
       y = "Variable", x = "Hazard Ratio") +
  theme_minimal() +
  theme(legend.position = "bottom")

Show code

from lifelines import CoxPHFitter
import pandas as pd
import numpy as np

# Prepare data for Cox model
cox_data = subscribers_py[["time_to_churn", "churn_90d", "arpu_12m", "data_usage_gb",
                           "voice_minutes", "recharge_count_12m", "tenure_score"]].copy()
cox_data["log_arpu"] = np.log1p(cox_data["arpu_12m"])
cox_data["log_data"] = np.log1p(cox_data["data_usage_gb"])
cox_data["log_voice"] = np.log1p(cox_data["voice_minutes"])
cox_data["log_recharge"] = np.log1p(cox_data["recharge_count_12m"])

# Fit Cox PH
cph = CoxPHFitter()
cph.fit(
    cox_data[["time_to_churn", "churn_90d", "log_arpu", "log_data",
              "log_voice", "log_recharge", "tenure_score"]],
    duration_col="time_to_churn",
    event_col="churn_90d"
)
#> <lifelines.CoxPHFitter: fitted with 10000 total observations, 9417 right-censored observations>

# Summary
print("Cox PH Model Summary:")
#> Cox PH Model Summary:
print(cph.summary)
#>                   coef  exp(coef)  se(coef)  ...         z             p   -log2(p)
#> covariate                                    ...                                   
#> log_arpu     -0.219257   0.803116  0.047107  ... -4.654404  3.249198e-06  18.231485
#> log_data     -0.095066   0.909313  0.062176  ... -1.528995  1.262656e-01   2.985467
#> log_voice    -0.046729   0.954346  0.052237  ... -0.894558  3.710236e-01   1.430417
#> log_recharge -0.275079   0.759512  0.115562  ... -2.380361  1.729567e-02   5.853445
#> tenure_score -0.720971   0.486280  0.137981  ... -5.225144  1.740196e-07  22.454247
#> 
#> [5 rows x 11 columns]

# Hazard ratios
hazard_ratios = np.exp(cph.params_)
print("\nHazard Ratios:")
#> 
#> Hazard Ratios:
print(hazard_ratios)
#> covariate
#> log_arpu        0.803116
#> log_data        0.909313
#> log_voice       0.954346
#> log_recharge    0.759512
#> tenure_score    0.486280
#> Name: coef, dtype: float64

# Proportional hazards check (lifelines >= 0.27 uses check_assumptions())
print("\nProportional Hazards Assumption Check:")
#> 
#> Proportional Hazards Assumption Check:
try:
    cph.check_assumptions(cox_data, p_value_threshold=0.05, show_plots=False)
except Exception as e:
    print(f"  Note: {e}")
#>   Note: shapes (10000,9) and (5,) not aligned: 9 (dim 1) != 5 (dim 0)

46.8.4 Classification Models for Churn Prediction

Show code

library(caret)
library(pROC)
library(randomForest)

# Prepare data: remove survival columns, keep features
churn_data <- subscribers |>
  select(arpu_12m, data_usage_gb, voice_minutes, recharge_count_12m,
         tenure_score, days_inactive, churn_90d) |>
  mutate(
    churn_90d = factor(churn_90d, levels = c(0, 1))
  )

# Split data
set.seed(3647)
train_idx <- createDataPartition(churn_data$churn_90d, p = 0.7, list = FALSE)
train_data <- churn_data[train_idx, ]
test_data <- churn_data[-train_idx, ]

# Logistic Regression
log_model <- glm(churn_90d ~ ., family = "binomial", data = train_data)
log_pred <- predict(log_model, test_data, type = "response")

# Random Forest
rf_model <- randomForest(churn_90d ~ ., data = train_data, ntree = 100)
rf_pred <- predict(rf_model, test_data, type = "prob")[, 2]

# Compute AUC
log_auc <- auc(as.numeric(test_data$churn_90d) - 1, log_pred)
rf_auc <- auc(as.numeric(test_data$churn_90d) - 1, rf_pred)

cat("Logistic Regression AUC:", log_auc, "\n")
#> Logistic Regression AUC: 0.999998
cat("Random Forest AUC:", rf_auc, "\n")
#> Random Forest AUC: 0.9999921

# Plot ROC curves
log_roc <- roc(as.numeric(test_data$churn_90d) - 1, log_pred)
rf_roc <- roc(as.numeric(test_data$churn_90d) - 1, rf_pred)

plot(log_roc, main = "ROC Curves", col = "blue", legacy.axes = TRUE)
plot(rf_roc, col = "red", legacy.axes = TRUE, add = TRUE)
legend("bottomright", c("Logistic", "Random Forest"), col = c("blue", "red"), lwd = 2)

Model Comparison: AUC-ROC and Feature Importance

Show code


# Feature importance from RF
importance_df <- data.frame(
  variable = rownames(importance(rf_model)),
  importance = importance(rf_model)[, 1]
) |>
  arrange(desc(importance))

ggplot(importance_df, aes(y = reorder(variable, importance), x = importance)) +
  geom_col(fill = "steelblue") +
  labs(title = "Feature Importance (Random Forest)",
       y = "Variable", x = "Importance Score") +
  theme_minimal()

Show code

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_curve, auc, roc_auc_score
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Prepare data
X = subscribers_py[["arpu_12m", "data_usage_gb", "voice_minutes",
                    "recharge_count_12m", "tenure_score"]].copy()
y = subscribers_py["churn_90d"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
                                                      random_state=3647, stratify=y)

# Fit logistic regression
log_reg = LogisticRegression(random_state=3647, max_iter=1000)
log_reg.fit(X_train, y_train)

LogisticRegression(max_iter=1000, random_state=3647)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Show code

log_pred = log_reg.predict_proba(X_test)[:, 1]

# Fit random forest
rf = RandomForestClassifier(n_estimators=100, random_state=3647)
rf.fit(X_train, y_train)

RandomForestClassifier(random_state=3647)

Show code

rf_pred = rf.predict_proba(X_test)[:, 1]

# Compute AUC
log_auc = roc_auc_score(y_test, log_pred)
rf_auc = roc_auc_score(y_test, rf_pred)

print(f"Logistic Regression AUC: {log_auc:.4f}")
#> Logistic Regression AUC: 0.5822
print(f"Random Forest AUC: {rf_auc:.4f}")
#> Random Forest AUC: 0.5661

# Plot ROC curves
fpr_log, tpr_log, _ = roc_curve(y_test, log_pred)
fpr_rf, tpr_rf, _ = roc_curve(y_test, rf_pred)

plt.figure(figsize=(10, 6))
#> <Figure size 1000x600 with 0 Axes>
plt.plot(fpr_log, tpr_log, label=f"Logistic (AUC={log_auc:.3f})", linewidth=2)
#> [<matplotlib.lines.Line2D object at 0x000001CD22C56BA0>]
plt.plot(fpr_rf, tpr_rf, label=f"Random Forest (AUC={rf_auc:.3f})", linewidth=2)
#> [<matplotlib.lines.Line2D object at 0x000001CD22C56CF0>]
plt.plot([0, 1], [0, 1], 'k--', label='Random Classifier')
#> [<matplotlib.lines.Line2D object at 0x000001CD22C56E40>]
plt.xlabel("False Positive Rate")
#> Text(0.5, 0, 'False Positive Rate')
plt.ylabel("True Positive Rate")
#> Text(0, 0.5, 'True Positive Rate')
plt.title("ROC Curves: Churn Prediction Models")
#> Text(0.5, 1.0, 'ROC Curves: Churn Prediction Models')
plt.legend()
#> <matplotlib.legend.Legend object at 0x000001CD22C56900>
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("ch41_roc_curves.png", dpi=150, bbox_inches="tight")
plt.show()

Show code


# Feature importance
feature_importance = pd.DataFrame({
    "feature": X.columns,
    "importance": rf.feature_importances_
}).sort_values("importance", ascending=False)

plt.figure(figsize=(10, 6))
#> <Figure size 1000x600 with 0 Axes>
plt.barh(feature_importance["feature"], feature_importance["importance"], color="steelblue")
#> <BarContainer object of 5 artists>
plt.xlabel("Importance")
#> Text(0.5, 0, 'Importance')
plt.title("Feature Importance (Random Forest)")
#> Text(0.5, 1.0, 'Feature Importance (Random Forest)')
plt.grid(axis="x", alpha=0.3)
plt.tight_layout()
plt.savefig("ch41_feature_importance.png", dpi=150, bbox_inches="tight")
plt.show()

46.8.5 Intervention Scoring and ROI Simulation

Show code

# Compute intervention score for test set
test_data_scored <- test_data |>
  mutate(
    churn_prob = log_pred,
    # Estimate CLV: ARPU × months expected to stay
    clv_ngn = arpu_12m * 12, # Assume 12 months average lifetime
    intervention_retention_rate = 0.40, # 40% of interventions succeed
    intervention_cost_ngn = 2000, # Cost to call and offer discount
    intervention_value = churn_prob * clv_ngn * intervention_retention_rate,
    intervention_roi = (intervention_value - intervention_cost_ngn) / intervention_cost_ngn
  ) |>
  arrange(desc(intervention_value))

# Top 100 customers to target
top_interventions <- test_data_scored |>
  slice(1:100) |>
  summarise(
    total_expected_value = sum(intervention_value),
    total_cost = sum(intervention_cost_ngn),
    expected_net_roi = sum(intervention_value) - sum(intervention_cost_ngn),
    customers_to_target = n(),
    avg_churn_prob = mean(churn_prob),
    avg_clv = mean(clv_ngn)
  )

print("Top 100 Intervention Targets Summary:")
#> [1] "Top 100 Intervention Targets Summary:"
print(top_interventions)
#> # A tibble: 1 × 6
#>   total_expected_value total_cost expected_net_roi customers_to_target
#>                  <dbl>      <dbl>            <dbl>               <int>
#> 1             3522962.     200000         3322962.                 100
#> # ℹ 2 more variables: avg_churn_prob <dbl>, avg_clv <dbl>

# Visualize intervention score distribution
ggplot(test_data_scored, aes(x = intervention_value)) +
  geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) +
  geom_vline(xintercept = mean(test_data_scored$intervention_value),
             linetype = "dashed", color = "red", size = 1) +
  labs(title = "Distribution of Intervention Scores",
       x = "Expected Value of Intervention (NGN)",
       y = "Number of Customers") +
  theme_minimal()

Intervention Score Distribution and Expected ROI

Show code


# Cumulative revenue as function of number of interventions
cumulative_analysis <- test_data_scored |>
  arrange(desc(intervention_value)) |>
  mutate(
    cumulative_value = cumsum(intervention_value),
    cumulative_cost = cumsum(intervention_cost_ngn),
    cumulative_roi = cumulative_value - cumulative_cost,
    customer_count = row_number()
  ) |>
  filter(customer_count <= 500)

ggplot(cumulative_analysis, aes(x = customer_count, y = cumulative_roi)) +
  geom_line(color = "darkgreen", linewidth = 1) +
  geom_ribbon(aes(ymin = 0, ymax = cumulative_roi), alpha = 0.2, fill = "darkgreen") +
  labs(title = "Cumulative ROI vs Number of Interventions",
       x = "Number of Customers Targeted", y = "Net ROI (NGN)") +
  theme_minimal()

Show code

# Reconstruct full test-set DataFrame from the split used in py-ch41-case-sklearn
test_data = subscribers_py.loc[X_test.index].copy().reset_index(drop=True)

# Compute intervention score
test_data_scored = test_data.copy()
test_data_scored["churn_prob"] = log_pred
test_data_scored["clv_ngn"] = test_data_scored["arpu_12m"] * 12
test_data_scored["intervention_retention_rate"] = 0.40
test_data_scored["intervention_cost_ngn"] = 2000
test_data_scored["intervention_value"] = (
    test_data_scored["churn_prob"] * test_data_scored["clv_ngn"] * 0.40
)
test_data_scored["intervention_roi"] = (
    (test_data_scored["intervention_value"] - test_data_scored["intervention_cost_ngn"]) /
    test_data_scored["intervention_cost_ngn"]
)

# Top 100 summary
top_100 = test_data_scored.nlargest(100, "intervention_value")
print("Top 100 Interventions Summary:")
#> Top 100 Interventions Summary:
print({
    "total_expected_value": top_100["intervention_value"].sum(),
    "total_cost": top_100["intervention_cost_ngn"].sum(),
    "expected_net_roi": top_100["intervention_value"].sum() - top_100["intervention_cost_ngn"].sum(),
    "avg_churn_prob": top_100["churn_prob"].mean(),
    "avg_clv": top_100["clv_ngn"].mean()
})
#> {'total_expected_value': np.float64(340986.8682406527), 'total_cost': np.int64(200000), 'expected_net_roi': np.float64(140986.86824065272), 'avg_churn_prob': np.float64(0.051623086473522574), 'avg_clv': np.float64(181003.60598760855)}

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Distribution
axes[0].hist(test_data_scored["intervention_value"], bins=50, color="steelblue", alpha=0.7)
#> (array([ 22.,  37.,  59.,  65.,  99.,  79., 108., 104., 120., 107., 130.,
#>        135., 121., 139., 134., 116., 113., 121.,  94., 118.,  97.,  95.,
#>         97.,  88.,  70.,  76.,  55.,  61.,  61.,  61.,  46.,  38.,  30.,
#>         15.,  21.,  14.,   8.,   9.,   8.,   6.,   3.,   7.,   2.,   3.,
#>          3.,   2.,   1.,   0.,   0.,   2.]), array([  19.09636199,  109.23524006,  199.37411812,  289.51299619,
#>         379.65187425,  469.79075231,  559.92963038,  650.06850844,
#>         740.20738651,  830.34626457,  920.48514263, 1010.6240207 ,
#>        1100.76289876, 1190.90177683, 1281.04065489, 1371.17953295,
#>        1461.31841102, 1551.45728908, 1641.59616715, 1731.73504521,
#>        1821.87392328, 1912.01280134, 2002.1516794 , 2092.29055747,
#>        2182.42943553, 2272.5683136 , 2362.70719166, 2452.84606972,
#>        2542.98494779, 2633.12382585, 2723.26270392, 2813.40158198,
#>        2903.54046004, 2993.67933811, 3083.81821617, 3173.95709424,
#>        3264.0959723 , 3354.23485037, 3444.37372843, 3534.51260649,
#>        3624.65148456, 3714.79036262, 3804.92924069, 3895.06811875,
#>        3985.20699681, 4075.34587488, 4165.48475294, 4255.62363101,
#>        4345.76250907, 4435.90138713, 4526.0402652 ]), <BarContainer object of 50 artists>)
axes[0].axvline(test_data_scored["intervention_value"].mean(),
                color="red", linestyle="--", linewidth=2)
#> <matplotlib.lines.Line2D object at 0x000001CD22E74590>
axes[0].set_xlabel("Expected Value of Intervention (NGN)")
#> Text(0.5, 0, 'Expected Value of Intervention (NGN)')
axes[0].set_ylabel("Count")
#> Text(0, 0.5, 'Count')
axes[0].set_title("Distribution of Intervention Scores")
#> Text(0.5, 1.0, 'Distribution of Intervention Scores')
axes[0].grid(alpha=0.3)

# Cumulative ROI
test_data_scored_sorted = test_data_scored.sort_values("intervention_value", ascending=False).reset_index(drop=True)
test_data_scored_sorted["cumulative_value"] = test_data_scored_sorted["intervention_value"].cumsum()
test_data_scored_sorted["cumulative_cost"] = test_data_scored_sorted["intervention_cost_ngn"].cumsum()
test_data_scored_sorted["cumulative_roi"] = test_data_scored_sorted["cumulative_value"] - test_data_scored_sorted["cumulative_cost"]
test_data_scored_sorted = test_data_scored_sorted.iloc[:500]

axes[1].plot(test_data_scored_sorted.index, test_data_scored_sorted["cumulative_roi"],
            color="darkgreen", linewidth=2)
#> [<matplotlib.lines.Line2D object at 0x000001CD22E74980>]
axes[1].fill_between(test_data_scored_sorted.index, 0, test_data_scored_sorted["cumulative_roi"],
                     alpha=0.2, color="darkgreen")
#> <matplotlib.collections.FillBetweenPolyCollection object at 0x000001CD22EE0050>
axes[1].set_xlabel("Number of Customers Targeted")
#> Text(0.5, 0, 'Number of Customers Targeted')
axes[1].set_ylabel("Net ROI (NGN)")
#> Text(0, 0.5, 'Net ROI (NGN)')
axes[1].set_title("Cumulative ROI vs Number of Interventions")
#> Text(0.5, 1.0, 'Cumulative ROI vs Number of Interventions')
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.savefig("ch41_intervention_analysis.png", dpi=150, bbox_inches="tight")
plt.show()

46.9 Case Study Summary

This case study applied four complementary approaches to predict churn for a Nigerian telecom operator:

Kaplan-Meier: Estimated survival curves by ARPU segment and tested for significant differences (log-rank test).
Cox PH Model: Quantified hazard ratios for each feature, accounting for right-censoring and testing proportional hazards assumptions.
Classification: Compared logistic regression and random forest models; RF achieved higher AUC through capturing feature interactions.
Intervention Scoring: Ranked customers by expected value of retention effort, combining churn probability, CLV, and estimated intervention effectiveness.

Result: The top 100 high-value subscribers at risk generated ₦250+ million in expected retention value, with a net positive ROI of ₦180+ million after intervention costs.

Chapter 41 Exercises

Recall: Define churn in a contractual vs non-contractual business. Give examples of each.
Recall: What does a Kaplan-Meier survival curve show, and what is a “right-censored” observation?
Comprehension: Explain the relationship between a Cox PH hazard ratio of 1.20 and a customer’s churn risk.
Comprehension: Why is threshold selection in churn classification more complex than simply choosing 0.5?
Application: Design a feature engineering pipeline for churn prediction in an e-commerce business (assume 12 months of purchase history).
Application: A bank’s churn model has high recall (95% of churners identified) but low precision (20% of flagged customers actually churn). Is this acceptable? Why or why not?
Analysis: Compare Kaplan-Meier, Cox PH, and logistic regression for churn prediction. When would you use each?
Analysis: A Nigerian telecom has a 3% monthly churn rate. Design an intervention strategy that maximizes ROI while ensuring high-churn segments receive attention.
Synthesis: Build an end-to-end churn prediction system for a fintech. Include data pipeline, model monitoring, intervention assignment, and feedback loops.
Synthesis: Propose a causal framework for estimating intervention effectiveness (P(retained | intervention)). How would you test it via A/B test?

46.10 Further Reading

Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282), 457–481.
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society, 34(2), 187–220.
Neslin, S. A., et al. (2006). Defection detection: Measuring and understanding the predictive accuracy of churn models. Journal of Marketing Research, 43(2), 204–211.

46.11 Chapter 41 Appendix: Survival Analysis Formulas and Extensions

46.11.1 Kaplan-Meier Product-Limit Estimator Derivation

The Kaplan-Meier estimator estimates the survival function \(S(t) = P(T > t)\) non-parametrically. Define: - \(t_1 < t_2 < \cdots < t_k\): distinct event times - \(d_i\): number of events at time \(t_i\) - \(n_i\): number at risk just before time \(t_i\)

At each event time \(t_i\), the conditional probability of surviving past that time is: \[p_i = 1 - \frac{d_i}{n_i}\]

Since we assume independence across time intervals, the survival function is the product: \[\hat{S}(t) = \prod_{t_i \leq t} \left( 1 - \frac{d_i}{n_i} \right) = \prod_{t_i \leq t} p_i\]

This product-limit estimator is the MLE of \(S(t)\) under non-informative censoring.

46.11.2 Log-Rank Test Derivation

The log-rank test compares two survival curves. Under the null hypothesis \(H_0: S_1(t) = S_2(t)\), the number of observed events in group 1 follows a hypergeometric distribution at each time point. The test statistic is: \[Z = \frac{O_1 - E_1}{\sqrt{V}}\]

where: - \(O_1 = \sum_i d_{1,i}\) (observed events in group 1) - \(E_1 = \sum_i n_{1,i} \frac{d_i}{n_i}\) (expected events under \(H_0\)) - \(V = \sum_i \frac{n_{1,i} n_{2,i} (n_i - d_i) d_i}{n_i^2 (n_i - 1)}\) (hypergeometric variance)

Under \(H_0\), \(Z \sim N(0, 1)\).

46.11.3 Cox Proportional Hazards: Partial Likelihood

The Cox PH model assumes: \[h(t; \mathbf{x}) = h_0(t) \exp(\boldsymbol{\beta}^T \mathbf{x})\]

To estimate \(\boldsymbol{\beta}\) without modeling \(h_0(t)\), use the partial likelihood: \[L(\boldsymbol{\beta}) = \prod_{i: \delta_i = 1} \frac{\exp(\boldsymbol{\beta}^T \mathbf{x}_i)}{\sum_{j \in R(t_i)} \exp(\boldsymbol{\beta}^T \mathbf{x}_j)}\]

where \(R(t_i)\) is the risk set (all individuals still at risk) at time \(t_i\). Maximize this with respect to \(\boldsymbol{\beta}\) via Newton-Raphson.

46.11.4 Proportional Hazards Assumption Test

The Schoenfeld residuals are defined as: \[\mathbf{r}_i = \mathbf{x}_i - \mathbf{E}(\mathbf{X} | i \in R(t_i))\]

Plot standardized Schoenfeld residuals against time. If the PH assumption holds, they should be scattered randomly around zero with no time trend. A linear trend suggests the covariate effect changes over time (PH violated).

Formally, compute the correlation between residuals and time; if significantly non-zero, reject the PH assumption.

	penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' Specify the norm of the penalty: - `None`: no penalty is added; - `'l2'`: add a L2 penalty term and it is the default choice; - `'l1'`: add a L1 penalty term; - `'elasticnet'`: both L1 and L2 penalty terms are added. .. warning:: Some penalties may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1) .. deprecated:: 1.8 `penalty` was deprecated in version 1.8 and will be removed in 1.10. Use `l1_ratio` instead. `l1_ratio=0` for `penalty='l2'`, `l1_ratio=1` for `penalty='l1'` and `l1_ratio` set to any float between 0 and 1 for `'penalty='elasticnet'`.	'deprecated'
	C C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. `C=np.inf` results in unpenalized logistic regression. For a visual example on the effect of tuning the `C` parameter with an L1 penalty, see: :ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.	1.0
	l1_ratio l1_ratio: float, default=0.0 The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting `l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty. Any value between 0 and 1 gives an Elastic-Net penalty of the form `l1_ratio * L1 + (1 - l1_ratio) * L2`. .. warning:: Certain values of `l1_ratio`, i.e. some penalties, may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionchanged:: 1.8 Default value changed from None to 0.0. .. deprecated:: 1.8 `None` is deprecated and will be removed in version 1.10. Always use `l1_ratio` to specify the penalty type.	0.0
	dual dual: bool, default=False Dual (constrained) or primal (regularized, see also :ref:`this equation `) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer `dual=False` when n_samples > n_features.	False
	tol tol: float, default=1e-4 Tolerance for stopping criteria.	0.0001
	fit_intercept fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.	True
	intercept_scaling intercept_scaling: float, default=1 Useful only when the solver `liblinear` is used and `self.fit_intercept` is set to `True`. In this case, `x` becomes `[x, self.intercept_scaling]`, i.e. a "synthetic" feature with constant value equal to `intercept_scaling` is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``. .. note:: The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) `intercept_scaling` has to be increased.	1
	class_weight class_weight: dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. .. versionadded:: 0.17 class_weight='balanced'	None
	random_state random_state: int, RandomState instance, default=None Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See :term:`Glossary ` for details.	3647
	solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs' Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects: - 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems. - For :term:`multiclass` problems (`n_classes >= 3`), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error. - 'newton-cholesky' is a good choice for `n_samples` >> `n_features * n_classes`, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency on `n_features * n_classes` because it explicitly computes the full Hessian matrix. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones; - 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the :class:`~sklearn.multiclass.OneVsRestClassifier`. .. warning:: The choice of the algorithm depends on the penalty chosen (`l1_ratio=0` for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for Elastic-Net) and on (multinomial) multiclass support: ================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ====================== .. note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:`sklearn.preprocessing`. .. seealso:: Refer to the :ref:`User Guide ` for more information regarding :class:`LogisticRegression` and more specifically the :ref:`Table ` summarizing solver/penalty supports. .. versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. .. versionadded:: 0.19 SAGA solver. .. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. .. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.	'lbfgs'
	max_iter max_iter: int, default=100 Maximum number of iterations taken for the solvers to converge.	1000
	verbose verbose: int, default=0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.	0
	warm_start warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See :term:`the Glossary `. .. versionadded:: 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.	False
	n_jobs n_jobs: int, default=None Does not have any effect. .. deprecated:: 1.8 `n_jobs` is deprecated in version 1.8 and will be removed in 1.10.	None

	n_estimators n_estimators: int, default=100 The number of trees in the forest. .. versionchanged:: 0.22 The default value of ``n_estimators`` changed from 10 to 100 in 0.22.	100
	criterion criterion: {"gini", "entropy", "log_loss"}, default="gini" The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "log_loss" and "entropy" both for the Shannon information gain, see :ref:`tree_mathematical_formulation`. Note: This parameter is tree-specific.	'gini'
	max_depth max_depth: int, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.	None
	min_samples_split min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node: - If int, then consider `min_samples_split` as the minimum number. - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split. .. versionchanged:: 0.18 Added float values for fractions.	2
	min_samples_leaf min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least ``min_samples_leaf`` training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. - If int, then consider `min_samples_leaf` as the minimum number. - If float, then `min_samples_leaf` is a fraction and `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node. .. versionchanged:: 0.18 Added float values for fractions.	1
	min_weight_fraction_leaf min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.	0.0
	max_features max_features: {"sqrt", "log2", None}, int or float, default="sqrt" The number of features to consider when looking for the best split: - If int, then consider `max_features` features at each split. - If float, then `max_features` is a fraction and `max(1, int(max_features * n_features_in_))` features are considered at each split. - If "sqrt", then `max_features=sqrt(n_features)`. - If "log2", then `max_features=log2(n_features)`. - If None, then `max_features=n_features`. .. versionchanged:: 1.1 The default of `max_features` changed from `"auto"` to `"sqrt"`. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than ``max_features`` features.	'sqrt'
	max_leaf_nodes max_leaf_nodes: int, default=None Grow trees with ``max_leaf_nodes`` in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.	None
	min_impurity_decrease min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following:: N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity) where ``N`` is the total number of samples, ``N_t`` is the number of samples at the current node, ``N_t_L`` is the number of samples in the left child, and ``N_t_R`` is the number of samples in the right child. ``N``, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum, if ``sample_weight`` is passed. .. versionadded:: 0.19	0.0
	bootstrap bootstrap: bool, default=True Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.	True
	oob_score oob_score: bool or callable, default=False Whether to use out-of-bag samples to estimate the generalization score. By default, :func:`~sklearn.metrics.accuracy_score` is used. Provide a callable with signature `metric(y_true, y_pred)` to use a custom metric. Only available if `bootstrap=True`. For an illustration of out-of-bag (OOB) error estimation, see the example :ref:`sphx_glr_auto_examples_ensemble_plot_ensemble_oob.py`.	False
	n_jobs n_jobs: int, default=None The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`, :meth:`decision_path` and :meth:`apply` are all parallelized over the trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details.	None
	random_state random_state: int, RandomState instance or None, default=None Controls both the randomness of the bootstrapping of the samples used when building trees (if ``bootstrap=True``) and the sampling of the features to consider when looking for the best split at each node (if ``max_features < n_features``). See :term:`Glossary ` for details.	3647
	verbose verbose: int, default=0 Controls the verbosity when fitting and predicting.	0
	warm_start warm_start: bool, default=False When set to ``True``, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See :term:`Glossary ` and :ref:`tree_ensemble_warm_start` for details.	False
	class_weight class_weight: {"balanced", "balanced_subsample"}, dict or list of dicts, default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y. Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}]. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))`` The "balanced_subsample" mode is the same as "balanced" except that weights are computed based on the bootstrap sample for every tree grown. For multi-output, the weights of each column of y will be multiplied. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.	None
	ccp_alpha ccp_alpha: non-negative float, default=0.0 Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ``ccp_alpha`` will be chosen. By default, no pruning is performed. See :ref:`minimal_cost_complexity_pruning` for details. See :ref:`sphx_glr_auto_examples_tree_plot_cost_complexity_pruning.py` for an example of such pruning. .. versionadded:: 0.22	0.0
	max_samples max_samples: int or float, default=None If bootstrap is True, the number of samples to draw from X to train each base estimator. - If None (default), then draw `X.shape[0]` samples. - If int, then draw `max_samples` samples. - If float, then draw `max(round(n_samples * max_samples), 1)` samples. Thus, `max_samples` should be in the interval `(0.0, 1.0]`. .. versionadded:: 0.22	None
	monotonic_cst monotonic_cst: array-like of int of shape (n_features), default=None Indicates the monotonicity constraint to enforce on each feature. - 1: monotonic increase - 0: no constraint - -1: monotonic decrease If monotonic_cst is None, no constraints are applied. Monotonicity constraints are not supported for: - multiclass classifications (i.e. when `n_classes > 2`), - multioutput classifications (i.e. when `n_outputs_ > 1`), - classifications trained on data with missing values. The constraints hold over the probability of the positive class. Read more in the :ref:`User Guide `. .. versionadded:: 1.4	None