---
title: "Customer Profitability and Customer Lifetime Value"
---
```{python}
#| label: python-setup-37-customer-profitability-clv
#| include: false
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from lifetimes import BetaGeoFitter
```
::: {.callout-note icon="false"}
## 📋 Learning Objectives
- Understand the fundamental difference between revenue and profit at the customer level, and why Activity-Based Costing (ABC) matters
- Compute and interpret Recency-Frequency-Monetary (RFM) analysis for customer segmentation
- Apply Customer Lifetime Value (CLV) formula and understand its components
- Fit the Beta-Geometric / Negative Binomial Distribution (BG/NBD) model for transaction prediction
- Estimate average transaction value using the Gamma-Gamma model
- Synthesise CLV estimates into strategic decisions on customer acquisition cost limits and retention budgets
- Build an actionable case study with Nigerian telecom data
:::
## Revenue vs Profit: The Hidden Truth
Revenue and profit are not the same thing—a basic principle that is nonetheless ignored at scale in many organisations. A customer generating ₦500,000 in annual revenue might be unprofitable if they require disproportionate cost to serve. This section explores why the highest-revenue customers are often not the highest-profit customers, and introduces the whale curve of cumulative customer profitability.
### The Cost-to-Serve Paradox
Consider a Nigerian commercial bank with two customer segments: retail depositors and corporate accounts. A large corporate account (₦100 million in deposits and lending) may generate substantial interest income but requires:
- Dedicated relationship manager (monthly salary ₦200,000)
- Bespoke treasury advice (outsourced at ₦50,000/month)
- Compliance monitoring and regulatory reporting (₦30,000/month)
- Credit risk assessment infrastructure shared but allocated (₦20,000/month)
Total allocated cost: ₦300,000/month (₦3.6 million/year). If the net interest margin contribution is ₦3.8 million annually, the profit is only ₦200,000—a meagre 0.2% profit margin. By contrast, a retail depositor with ₦500,000 deposits costs almost nothing to serve (digital-only, automated risk checks) but might generate ₦15,000 annually in net interest margin. The profit margin is 3%, making the retail customer far more profitable despite dramatically lower revenue.
This is the central insight of Activity-Based Costing (ABC): costs should be allocated to customers based on the activities they consume, not distributed as a blanket percentage of revenue. Without ABC, management sees only the revenue metric and starves profitable segments of resources while overinvesting in large but low-margin accounts.
### Activity-Based Costing for Customer-Level Profitability
Activity-Based Costing assigns indirect costs to cost objects (customers) via the activities they consume. The process comprises three steps:
1. **Identify cost pools**: Cluster costs by shared activity (e.g., "customer service calls", "order processing", "returns management").
2. **Define cost drivers**: Measure each activity's consumption metric (e.g., calls per customer, orders per customer, return rate).
3. **Allocate costs**: Assign pool costs to customers proportionally to their cost driver consumption.
For a Nigerian telecommunications company with 100,000 subscribers, cost pools might include:
- **Customer acquisition**: ₦15 million/quarter (cost driver: new subscriber count)
- **Customer service**: ₦8 million/quarter (cost driver: inbound call minutes)
- **Network operations**: ₦25 million/quarter (cost driver: data traffic in gigabytes)
- **Billing & collections**: ₦3 million/quarter (cost driver: failed payment attempts + recharge frequency)
A subscriber using 50 GB/month of data, calling support 5 times per year, and topping up weekly will consume far more network and billing resources than a subscriber using 2 GB/month, never calling, and topping up monthly. ABC reflects this difference in per-customer profitability.
### The Whale Curve of Cumulative Customer Profitability
Empirical studies across industries show a striking pattern: when customers are ranked by profitability (highest to lowest) and cumulative profit is graphed, the result resembles a whale's dorsal fin—steep at the tail (top customers contribute vastly), then plateaus as you include mid-tier customers, then often turns negative at the base (unprofitable customers offset gains).
For a typical mature company:
- **Top 20% of customers**: Generate 100%+ of profit (the profitable margin)
- **Next 30%**: Generate near-zero profit (breakeven)
- **Bottom 50%**: Generate negative profit (losses that reduce the company's bottom line)
This 80/20 pattern (Pareto principle) has profound implications. A retention campaign targeting the top 20% yields far higher ROI than mass retention efforts. Conversely, de-emphasising service to unprofitable segments (raising prices, reducing availability, automating support) can actually improve company profitability.
::: {.callout-note icon="false"}
## 📘 Theory
**Activity-Based Costing** allocates fixed and variable costs to customers based on the specific activities they consume. The whale curve shows that a small percentage of customers generate all or most of profit, while a larger percentage generates losses. This non-linear distribution challenges the myth that "all revenue is equally valuable."
**Key insight**: The top 20% of customers often generate 100%+ of profit when bottom tiers are unprofitable. Profitability is not proportional to revenue; cost-to-serve varies dramatically by segment.
:::
::: {.callout-tip icon="false"}
## 🔑 Key Formula
**ABC Customer Profit:**
$$\text{Customer Profit} = \text{Revenue} - \sum_{i} \left( \text{Cost Pool}_i \times \frac{\text{Cost Driver}_i \text{ (customer)}}{\text{Cost Driver}_i \text{ (total)}} \right)$$
**Whale Curve Concentration**: If customers ranked 1 to $N$ by descending profit contribute cumulatively $P(k)$, the Gini coefficient measures inequality:
$$\text{Gini} = \frac{2 \sum_{k=1}^{N} k \cdot \text{profit}_k}{N \sum_{k=1}^{N} \text{profit}_k} - \frac{N+1}{N}$$
where Gini near 1 indicates extreme concentration.
:::
## RFM Analysis
Recency-Frequency-Monetary (RFM) analysis is a lightweight, interpretable segmentation technique that requires only transaction-level data: the date, customer ID, and transaction amount. No predictive model is needed; RFM works by business logic and is extremely fast to compute and explain to non-technical stakeholders.
### The RFM Framework
**Recency (R)**: How many days ago did the customer last transact? Recent customers are more likely to respond to marketing and engage with the brand.
**Frequency (F)**: How many transactions has the customer made in the observation window (e.g., 18 months)? Frequent customers are engaged and likely to purchase again.
**Monetary (M)**: What is the total or average spend? Higher-spending customers generate more revenue.
RFM taps into a simple but profound truth: a customer who has bought recently, buys often, and spends a lot is very likely to buy again soon. Conversely, a customer with low R, low F, and low M is dormant and at risk of being lost.
### Computing RFM Scores and Quintiles
The typical workflow:
1. **Calculate R, F, M per customer** from transaction history.
2. **Rank customers** on each dimension (1 = worst, 5 = best).
3. **Assign quintile scores** (1 to 5) based on percentile thresholds: Quintile 5 = top 20%, Quintile 4 = 20–40%, etc.
4. **Combine into RFM score**: Concatenate the three scores (e.g., "555" = top tier, "111" = bottom tier).
### RFM Segments and Business Strategy
Segments emerge naturally from the RFM combinations:
| RFM Pattern | Segment | Strategy |
|:---|:---|:---|
| 5-5-5 (e.g., 555) | **Champions** | Treat as VIP; upsell premium products; solicit feedback |
| 5-4-4, 4-5-5 | **Loyal Customers** | Maintain engagement; cross-sell; retention priority |
| 3-3-3 | **Potential** | Nurture with targeted campaigns; monitor for drift |
| 2-2-2, 1-2-2 | **At Risk** | Win-back campaign; special offer; improve service |
| 1-1-1, 1-1-2 | **Lost** | Low-cost email or SMS; may not be worth reactivation cost |
| 5-1-2 | **New** | Onboard carefully; build habits; convert to loyal |
### Code: RFM from Nigerian Telecom Data
We construct a synthetic Nigerian mobile telecom dataset with 10,000 subscribers and 18 months of recharge (top-up) history, then compute RFM and segment.
::: {.panel-tabset}
## R
```{r}
#| label: rfm-telecom-r
#| eval: true
library(tidyverse)
library(lubridate)
set.seed(7319)
# Generate synthetic Nigerian telecom top-up data
# 10,000 subscribers, 18 months of history (01 Jan 2023 - 30 Jun 2024)
date_range <- seq(ymd("2023-01-01"), ymd("2024-06-30"), by = "day")
subscribers <- tibble(
subscriber_id = 1:10000,
tenure_months = sample(1:30, 10000, replace = TRUE),
base_topup_freq = rnorm(10000, mean = 8, sd = 4),
base_topup_value = rnorm(10000, mean = 1500, sd = 800)
) |>
mutate(
base_topup_freq = pmax(1, base_topup_freq),
base_topup_value = pmax(500, base_topup_value)
)
# Generate transaction-level data
transactions <- expand_grid(
subscriber_id = 1:10000,
month_offset = 1:18
) |>
mutate(month_date = ymd("2023-01-01") + months(month_offset - 1)) |>
left_join(subscribers, by = "subscriber_id") |>
# Simulate churn: subscribers with low tenure are less likely to appear in later months
filter(month_offset <= tenure_months + sample(0:5, n(), replace = TRUE)) |>
# Simulate transactions: some months have 0 transactions, some have 2-3
filter(runif(n()) < pmin(1, base_topup_freq / 8)) |>
mutate(
# Add some randomness to value
topup_value = pmax(100, base_topup_value + rnorm(n(), 0, 300)),
transaction_date = month_date + days(sample(1:28, n(), replace = TRUE))
) |>
select(subscriber_id, transaction_date, topup_value) |>
arrange(subscriber_id, transaction_date)
head(transactions)
# Compute RFM as of reference date: 30 June 2024
reference_date <- ymd("2024-06-30")
rfm <- transactions |>
group_by(subscriber_id) |>
summarise(
recency = as.numeric(reference_date - max(transaction_date)),
frequency = n(),
monetary = sum(topup_value),
.groups = "drop"
) |>
# For subscribers with no transactions, set recency to observation window
replace_na(list(recency = 548, frequency = 0, monetary = 0))
# Assign quintile scores (1 = worst, 5 = best)
rfm <- rfm |>
mutate(
r_score = 6 - ntile(recency, 5), # Invert: lower recency is better
f_score = ntile(frequency, 5),
m_score = ntile(monetary, 5),
rfm_cell = paste0(r_score, f_score, m_score)
)
# Segment based on RFM cell
rfm <- rfm |>
mutate(
segment = case_when(
r_score >= 4 & f_score >= 4 & m_score >= 4 ~ "Champions",
r_score >= 4 & f_score >= 3 & m_score >= 3 ~ "Loyal Customers",
r_score >= 3 & f_score >= 2 & m_score >= 2 ~ "Potential",
r_score <= 2 & f_score >= 2 ~ "At Risk",
r_score <= 2 & f_score <= 1 ~ "Lost",
r_score >= 4 & f_score <= 2 ~ "New",
TRUE ~ "Potential"
)
)
# Summary by segment
segment_summary <- rfm |>
group_by(segment) |>
summarise(
count = n(),
pct = round(100 * n() / nrow(rfm), 2),
avg_recency = round(mean(recency), 1),
avg_frequency = round(mean(frequency), 2),
avg_monetary = round(mean(monetary), 0),
.groups = "drop"
) |>
arrange(-count)
print(segment_summary)
```
## Python
```{python}
#| label: py-rfm-telecom
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
np.random.seed(7319)
# Generate synthetic Nigerian telecom top-up data
n_subscribers = 10000
date_range_start = pd.Timestamp("2023-01-01")
date_range_end = pd.Timestamp("2024-06-30")
subscribers = pd.DataFrame({
'subscriber_id': range(1, n_subscribers + 1),
'tenure_months': np.random.choice(range(1, 31), n_subscribers),
'base_topup_freq': np.random.normal(8, 4, n_subscribers),
'base_topup_value': np.random.normal(1500, 800, n_subscribers)
})
subscribers['base_topup_freq'] = subscribers['base_topup_freq'].clip(lower=1)
subscribers['base_topup_value'] = subscribers['base_topup_value'].clip(lower=500)
# Generate transactions
transactions_list = []
for month_offset in range(1, 19):
month_date = date_range_start + pd.DateOffset(months=month_offset - 1)
for _, subscriber in subscribers.iterrows():
# Churn simulation
if month_offset > subscriber['tenure_months'] + np.random.randint(0, 6):
continue
# Transaction frequency
if np.random.rand() > min(1, subscriber['base_topup_freq'] / 8):
continue
# Generate transaction
topup_value = max(100, subscriber['base_topup_value'] + np.random.normal(0, 300))
transaction_date = month_date + timedelta(days=np.random.randint(1, 29))
transactions_list.append({
'subscriber_id': subscriber['subscriber_id'],
'transaction_date': transaction_date,
'topup_value': topup_value
})
transactions = pd.DataFrame(transactions_list).sort_values(['subscriber_id', 'transaction_date'])
# Compute RFM as of 30 June 2024
reference_date = pd.Timestamp("2024-06-30")
rfm_active = transactions.groupby('subscriber_id').agg(
recency=('transaction_date', lambda x: (reference_date - x.max()).days),
frequency=('topup_value', 'count'),
monetary=('topup_value', 'sum')
).reset_index()
# Add missing subscribers (those with no transactions)
all_subscribers = pd.DataFrame({'subscriber_id': range(1, n_subscribers + 1)})
rfm = all_subscribers.merge(rfm_active, on='subscriber_id', how='left').fillna({
'recency': 548, 'frequency': 0, 'monetary': 0
})
# Assign quintile scores using rank-based cut (handles ties/duplicates robustly)
def quintile_score(series):
return pd.cut(series.rank(pct=True, method='first'),
bins=5, labels=[1, 2, 3, 4, 5]).astype(int)
rfm['r_score'] = 6 - quintile_score(rfm['recency']) # Invert: lower recency is better
rfm['f_score'] = quintile_score(rfm['frequency'])
rfm['m_score'] = quintile_score(rfm['monetary'])
# Assign segments
def assign_segment(row):
r, f, m = row['r_score'], row['f_score'], row['m_score']
if r >= 4 and f >= 4 and m >= 4:
return "Champions"
elif r >= 4 and f >= 3 and m >= 3:
return "Loyal Customers"
elif r >= 3 and f >= 2 and m >= 2:
return "Potential"
elif r <= 2 and f >= 2:
return "At Risk"
elif r <= 2 and f <= 1:
return "Lost"
elif r >= 4 and f <= 2:
return "New"
else:
return "Potential"
rfm['segment'] = rfm.apply(assign_segment, axis=1)
# Summary by segment
segment_summary = rfm.groupby('segment').agg({
'subscriber_id': 'count',
'recency': 'mean',
'frequency': 'mean',
'monetary': 'mean'
}).round(2)
segment_summary.columns = ['count', 'avg_recency', 'avg_frequency', 'avg_monetary']
segment_summary['pct'] = (100 * segment_summary['count'] / len(rfm)).round(2)
segment_summary = segment_summary.sort_values('count', ascending=False)
print(segment_summary)
```
:::
::: {.callout-caution icon="false"}
## 📝 Section 37.2 Review Questions
1. Explain why a customer with ₦500,000 in annual revenue might be less profitable than one with ₦100,000 in revenue.
2. In the RFM framework, why is Recency measured in days *since* the last purchase, rather than days *until* the next purchase?
3. A customer has R=3, F=5, M=1 (e.g., RFM = "351"). What does this pattern suggest about the customer, and which segment would they likely fall into?
4. If a telecom company has 30% of subscribers in the "Lost" segment, what strategic action should management consider first?
:::
## Customer Lifetime Value Concepts
Customer Lifetime Value (CLV) is the total profit a customer generates over the entire relationship with the company. Unlike RFM, which is snapshot-based, CLV is forward-looking: it predicts future value, not past behaviour. CLV guides decisions on customer acquisition cost (CAC) and retention investment.
### The Simple CLV Formula
The most straightforward CLV formula assumes constant annual profit contribution, geometric decay, and explicit retention:
$$\text{CLV} = \sum_{t=0}^{\infty} \frac{(m - c) \cdot r_t}{(1 + d)^t}$$
where:
- $m$ = average annual revenue per customer (ARPU)
- $c$ = average annual cost per customer
- $r$ = retention rate (e.g., 0.90 = 90% of customers retained each year)
- $d$ = discount rate (e.g., 0.10 = 10% annual cost of capital)
- $t$ = time period (years)
This infinite sum converges to the closed form:
$$\text{CLV} = \frac{(m - c)(1 + r)}{d + 1 - r}$$
Or approximately, if $r$ is high and $d$ is small:
$$\text{CLV} \approx \frac{m - c}{d + (1 - r)}$$
### Interpreting the Formula
Suppose a Nigerian telecom subscriber has:
- **ARPU** = ₦3,000/month = ₦36,000/year
- **Cost per customer** = ₦8,000/year (allocated share of operations, network, support)
- **Net margin** = ₦28,000/year
- **Retention rate** = 0.75/year (75% stay each year)
- **Discount rate** = 0.12/year (12% cost of capital)
Then:
$$\text{CLV} = \frac{28,000 \times (1 + 0.75)}{0.12 + (1 - 0.75)} = \frac{28,000 \times 1.75}{0.37} = ₦132,432$$
This subscriber is worth ₦132,432 in net present value to the company. If the acquisition cost is ₦50,000, the customer is acquired at a 2.6× CLV-to-CAC ratio, which is healthy (typical target is 3:1).
### Limitations of the Simple Formula
The simple CLV formula assumes:
1. **Constant margin**: The subscriber's ARPU and cost do not change over time. In reality, high-value customers may upgrade, low-value may churn.
2. **Homogeneous retention**: All customers have the same annual churn probability. In practice, retention improves with tenure and varies by cohort.
3. **No growth or decline**: Revenue per customer stays flat. Many businesses see ARPU grow (or decline) predictably.
4. **Deterministic discount rate**: The discount rate is known and constant; in practice, macroeconomic conditions fluctuate.
Despite these limitations, the simple formula is useful as a strategic metric and easily communicated to management. For more accuracy, more complex models (Markov chains, probabilistic models like BG/NBD) are needed—addressed in the next section.
::: {.callout-note icon="false"}
## 📘 Theory
Customer Lifetime Value is the net present value of future profit streams from a customer. The simple closed-form CLV formula combines the annual margin, retention probability, and discount rate. It is widely used in practice because it is interpretable, requires minimal data, and scales well.
**Key assumptions**: Constant margin, homogeneous retention, no growth, and deterministic discounting. These are often violated, leading to under or overestimation of CLV.
:::
::: {.callout-tip icon="false"}
## 🔑 Key Formula
**Simple CLV** (closed form):
$$\text{CLV} = \frac{(m - c)(1 + r)}{d + 1 - r}$$
**Alternatively** (annuity form, assuming $d > 1 - r$):
$$\text{CLV} = \frac{m - c}{d - (1 - r)}$$
**With explicit time horizon** (e.g., 5-year CLV):
$$\text{CLV}_{T} = \sum_{t=0}^{T} \frac{(m - c) r^t}{(1 + d)^t}$$
:::
::: {.callout-caution icon="false"}
## 📝 Section 37.3 Review Questions
1. In the simple CLV formula, why does a higher discount rate $d$ reduce CLV, even if the customer's behaviour doesn't change?
2. A Nigerian bank has two customer cohorts: New (acquired 2024) with 75% retention and Loyal (acquired 2020) with 92% retention. Should they pay the same CAC for each cohort? Explain.
3. If a SaaS company doubles its retention rate from 60% to 80% annually, by what percentage does CLV increase (assuming other parameters constant)?
:::
## The BG/NBD Model for CLV
The Beta-Geometric / Negative Binomial Distribution (BG/NBD) model is a probabilistic framework for predicting customer transaction counts and, combined with spending data, total CLV. It was developed for non-contractual settings (e.g., retail, e-commerce, telecom without fixed contracts) where customers can churn without explicit notice.
### The BG/NBD Intuition
The BG/NBD models two latent (unobserved) processes per customer:
1. **Transaction rate**: How frequently does the customer transact? Assumed Poisson-distributed with a customer-specific rate parameter $\lambda_i$, which is Gamma-distributed across the population.
2. **Dropout probability**: What is the customer's probability of churning after each transaction? Assumed to follow a Beta-Geometric process, with a customer-specific dropout probability $p_i$, which is Beta-distributed across the population.
The model outputs:
- **Predicted number of transactions** in a future period (e.g., next 12 months)
- **Probability that the customer is still active** given observed history
- **Standard error** around predictions
### Model Inputs and Outputs
The model requires transaction-level history:
- **Recency**: Days from first purchase to last purchase ($r_i$)
- **Frequency**: Total number of transactions observed ($f_i$, excluding the first)
- **Tenure**: Days from first purchase to reference date ($a_i$)
The model outputs for each customer:
- **Probability of being active**: $P(\text{active} | \text{history})$
- **Expected transactions in next 12 months**: $E[X(t) | \text{history}]$
- **CLV contribution** (when combined with spending data)
### Code: BG/NBD Fitting on Nigerian Telecom Data
We fit the BG/NBD model using the `lifetimes` library (Python) and `CLVTools` (R) on the 10,000-subscriber dataset.
::: {.panel-tabset}
## R
```{r}
#| label: bgnbd-r
#| eval: true
# Install BTYD if not present (BG/NBD model for CLV)
if (!require("BTYD", quietly = TRUE)) {
install.packages("BTYD", repos = "https://cloud.r-project.org")
}
library(BTYD)
library(tidyverse)
library(lubridate)
observation_end <- ymd("2024-06-30")
observation_start <- ymd("2023-01-01")
obs_days <- as.numeric(observation_end - observation_start)
# Build CBS matrix: x = repeat purchases, t.x = recency (days), T.cal = tenure (days)
cbs <- transactions |>
group_by(subscriber_id) |>
summarise(
first_purchase = min(transaction_date),
last_purchase = max(transaction_date),
n_txn = n(),
.groups = "drop"
) |>
mutate(
x = pmax(0, n_txn - 1), # repeat purchases
t.x = as.numeric(last_purchase - first_purchase), # recency (days since first)
T.cal = as.numeric(observation_end - first_purchase) # calibration period length
) |>
# Append never-purchased subscribers
full_join(tibble(subscriber_id = 1:10000), by = "subscriber_id") |>
mutate(
x = replace_na(x, 0),
t.x = replace_na(t.x, 0),
T.cal = replace_na(T.cal, obs_days)
) |>
arrange(subscriber_id)
# Fit BG/NBD model (uses BTYD::bgnbd.EstimateParameters)
set.seed(3846)
params_bgnbd <- bgnbd.EstimateParameters(
cbs |> select(x, t.x, T.cal) |> as.matrix()
)
cat("BG/NBD Parameters (r, alpha, a, b):\n")
print(round(params_bgnbd, 4))
# Predicted expected transactions over next 365 days
cbs_mat <- cbs |> select(x, t.x, T.cal) |> as.matrix()
cbs$exp_transactions_12m <- bgnbd.ConditionalExpectedTransactions(
params_bgnbd, T.star = 365, x = cbs_mat[,"x"],
t.x = cbs_mat[,"t.x"], T.cal = cbs_mat[,"T.cal"]
)
# Probability customer is still active (P(alive))
cbs$prob_active <- bgnbd.PAlive(
params_bgnbd, x = cbs_mat[,"x"],
t.x = cbs_mat[,"t.x"], T.cal = cbs_mat[,"T.cal"]
)
cat("\nPredictions for first 10 subscribers:\n")
print(cbs |> select(subscriber_id, x, prob_active, exp_transactions_12m) |> head(10))
cat("\nSummary across all customers:\n")
print(cbs |> summarise(
mean_prob_active = round(mean(prob_active), 3),
median_prob_active = round(median(prob_active), 3),
mean_exp_trans_12m = round(mean(exp_transactions_12m), 2),
total_exp_trans_12m = round(sum(exp_transactions_12m), 0)
))
# Segment by activity probability
segments_bgnbd <- cbs |>
mutate(
activity_segment = case_when(
prob_active >= 0.8 ~ "High Activity",
prob_active >= 0.5 ~ "Medium Activity",
prob_active >= 0.2 ~ "Low Activity",
TRUE ~ "Dormant"
)
) |>
group_by(activity_segment) |>
summarise(
count = n(),
pct = round(100 * n() / nrow(cbs), 2),
avg_prob_active = round(mean(prob_active), 3),
avg_exp_trans = round(mean(exp_transactions_12m), 2),
.groups = "drop"
)
print(segments_bgnbd)
# Create predictions object with activity_segment for use in next chunk
predictions <- cbs |>
mutate(
activity_segment = case_when(
prob_active >= 0.8 ~ "High Activity",
prob_active >= 0.5 ~ "Medium Activity",
prob_active >= 0.2 ~ "Low Activity",
TRUE ~ "Dormant"
)
)
```
## Python
```{python}
#| label: py-bgnbd
import pandas as pd
import numpy as np
from lifetimes import BetaGeoFitter
# Prepare BG/NBD input: frequency, recency, tenure
observation_end = pd.Timestamp("2024-06-30")
observation_start = pd.Timestamp("2023-01-01")
observation_period = (observation_end - observation_start).days
bgnbd_data = transactions.groupby('subscriber_id').agg({
'transaction_date': ['min', 'max', 'count']
}).reset_index()
bgnbd_data.columns = ['subscriber_id', 'first_purchase', 'last_purchase', 'count']
bgnbd_data['frequency'] = bgnbd_data['count'] - 1
bgnbd_data['tenure'] = (observation_end - bgnbd_data['first_purchase']).dt.days
bgnbd_data['recency'] = (bgnbd_data['last_purchase'] - bgnbd_data['first_purchase']).dt.days
# Add subscribers with no transactions
all_subs = pd.DataFrame({'subscriber_id': range(1, 10001)})
bgnbd_input = all_subs.merge(
bgnbd_data[['subscriber_id', 'frequency', 'recency', 'tenure']],
on='subscriber_id',
how='left'
).fillna({'frequency': 0, 'recency': 0, 'tenure': observation_period})
# Fit BG/NBD model
bgf = BetaGeoFitter(penalizer_coef=0.0)
bgf.fit(bgnbd_input['frequency'], bgnbd_input['recency'], bgnbd_input['tenure'])
print("BG/NBD Model Parameters:")
print(bgf.params_)
# Predict for all customers
bgnbd_input['prob_active'] = bgf.conditional_probability_alive(
bgnbd_input['frequency'],
bgnbd_input['recency'],
bgnbd_input['tenure']
)
bgnbd_input['exp_transactions_12m'] = bgf.conditional_expected_number_of_purchases_up_to_time(
365,
bgnbd_input['frequency'],
bgnbd_input['recency'],
bgnbd_input['tenure']
)
print("\nPredictions (first 20 customers):")
print(bgnbd_input[['subscriber_id', 'prob_active', 'exp_transactions_12m']].head(20))
# Summary
print("\nSummary Statistics:")
print(bgnbd_input[['prob_active', 'exp_transactions_12m']].describe())
# Segment by activity
bgnbd_input['activity_segment'] = pd.cut(
bgnbd_input['prob_active'],
bins=[0, 0.2, 0.5, 0.8, 1.0],
labels=['Dormant', 'Low Activity', 'Medium Activity', 'High Activity'],
right=False
)
segments_summary = bgnbd_input.groupby('activity_segment', observed=True).agg({
'subscriber_id': 'count',
'prob_active': ['mean', 'median'],
'exp_transactions_12m': ['mean', 'median']
}).round(3)
print("\nSegment Summary:")
print(segments_summary)
```
:::
::: {.callout-caution icon="false"}
## 📝 Section 37.4 Review Questions
1. In the BG/NBD model, what does it mean if a customer has high frequency and high recency but low probability of being active?
2. Explain the difference between the "transaction rate" and "dropout probability" in the BG/NBD framework. Why model both?
3. If the BG/NBD predicts 4.2 transactions in the next 12 months for a customer, what is the expected monthly transaction count?
4. A customer has not transacted for 200 days. The BG/NBD estimates their probability of being active at 0.15. Is this customer a candidate for a re-engagement campaign? Why?
:::
## The Gamma-Gamma Model for Spend
The Gamma-Gamma model predicts the average transaction value (spend) per customer, complementing the BG/NBD's transaction count prediction. Combined, they yield a complete CLV estimate: CLV ≈ (expected transactions) × (expected spend per transaction).
### Why Separate Transaction Count and Spend?
Transaction frequency and spending per transaction are often independent. A high-frequency customer may have low average transaction value (e.g., frequent small top-ups), while a low-frequency customer may spend large amounts per transaction (e.g., annual bulk recharges). Modelling them separately and combining the predictions is more accurate than assuming they move together.
The Gamma-Gamma model assumes:
1. Each customer's average transaction value is constant over time (but unobserved).
2. Transaction values are Gamma-distributed around each customer's mean.
3. The distribution of average transaction values across customers is also Gamma.
This hierarchical assumption allows the model to estimate each customer's future expected spend from their historical spending patterns.
### Code: Gamma-Gamma Fitting and CLV Synthesis
We fit the Gamma-Gamma model using the `lifetimes` library and synthesise CLV = (expected transactions from BG/NBD) × (expected spend from Gamma-Gamma).
::: {.panel-tabset}
## R
```{r}
#| label: gg-clv-r
#| eval: true
library(tidyverse)
# Calculate average transaction value per customer from transactions data
spending_data <- transactions |>
group_by(subscriber_id) |>
summarise(
count = n(),
total_spend = sum(topup_value),
avg_transaction_value = total_spend / count,
.groups = "drop"
)
# Merge with BG/NBD predictions
clv_data <- predictions |>
left_join(spending_data, by = "subscriber_id") |>
replace_na(list(count = 0, total_spend = 0, avg_transaction_value = 0))
# For Gamma-Gamma in R, we'll use a simple approach:
# Estimate the scale parameter (gamma shape) from sample data
# and use conjugate priors to predict average value per customer
# Simple approach: use empirical Bayes
# Expected spending = (average spend if customer transacted) × P(active | history)
# For active customers (frequency > 0), use their observed average
# For inactive customers, use the population mean
pop_mean_spend <- spending_data |>
filter(count > 0) |>
pull(avg_transaction_value) |>
mean()
retention_rate <- 0.8 # Annual retention assumption
clv_data <- clv_data |>
mutate(
expected_spend_per_trans = if_else(
count > 0,
avg_transaction_value,
pop_mean_spend
),
# CLV synthesis: expected transactions × expected spend, discounted
clv_12m = exp_transactions_12m * expected_spend_per_trans,
# Simple annualized CLV assuming 0.8 annual retention and 12% discount rate
clv_ltv = (clv_12m * (1 + retention_rate)) / (0.12 + 1 - retention_rate)
)
# Display CLV by segment
clv_summary <- clv_data |>
group_by(activity_segment) |>
summarise(
count = n(),
avg_exp_trans = round(mean(exp_transactions_12m), 2),
avg_spend_per_trans = round(mean(expected_spend_per_trans), 0),
avg_clv_12m = round(mean(clv_12m), 0),
avg_clv_ltv = round(mean(clv_ltv), 0),
.groups = "drop"
) |>
arrange(-avg_clv_ltv)
print(clv_summary)
# Top 100 customers by CLV
top_100 <- clv_data |>
arrange(-clv_ltv) |>
select(subscriber_id, prob_active, exp_transactions_12m, expected_spend_per_trans, clv_ltv) |>
head(100)
cat("Top 100 customers by LTV CLV:\n")
print(summary(top_100$clv_ltv))
```
## Python
```{python}
#| label: py-gg-clv
# Calculate spending metrics per customer
spending_data = transactions.groupby('subscriber_id').agg({
'topup_value': ['count', 'sum', 'mean']
}).reset_index()
spending_data.columns = ['subscriber_id', 'count', 'total_spend', 'avg_transaction_value']
# Merge with BG/NBD predictions
clv_data = bgnbd_input.merge(spending_data, on='subscriber_id', how='left')
clv_data = clv_data.fillna({'count': 0, 'total_spend': 0, 'avg_transaction_value': 0})
# Population mean spend for Gamma-Gamma imputation
pop_mean_spend = spending_data[spending_data['count'] > 0]['avg_transaction_value'].mean()
# Expected spend per transaction (Gamma-Gamma approximation)
clv_data['expected_spend_per_trans'] = clv_data.apply(
lambda row: row['avg_transaction_value'] if row['count'] > 0 else pop_mean_spend,
axis=1
)
# Synthesise CLV: expected transactions × expected spend
clv_data['clv_12m'] = clv_data['exp_transactions_12m'] * clv_data['expected_spend_per_trans']
# Annualized LTV CLV (assuming 0.8 retention, 0.12 discount)
retention_rate = 0.8
discount_rate = 0.12
clv_data['clv_ltv'] = (clv_data['clv_12m'] * (1 + retention_rate)) / (discount_rate + 1 - retention_rate)
# Summary by activity segment
clv_summary = clv_data.groupby('activity_segment', observed=True).agg({
'subscriber_id': 'count',
'exp_transactions_12m': 'mean',
'expected_spend_per_trans': 'mean',
'clv_12m': 'mean',
'clv_ltv': 'mean'
}).round(2)
clv_summary.columns = ['count', 'avg_exp_trans', 'avg_spend_per_trans', 'avg_clv_12m', 'avg_clv_ltv']
print(clv_summary)
# Top 100 customers
top_100 = clv_data.nlargest(100, 'clv_ltv')[['subscriber_id', 'prob_active', 'exp_transactions_12m', 'expected_spend_per_trans', 'clv_ltv']]
print("\nTop 100 customers by CLV (summary):")
print(top_100['clv_ltv'].describe())
```
:::
::: {.callout-caution icon="false"}
## 📝 Section 37.5 Review Questions
1. Why is it important to model transaction count and average transaction value separately rather than as a single metric?
2. A customer has very high transaction frequency but low average spend per transaction. What does this suggest about their value?
3. In the CLV synthesis, if the BG/NBD predicts 10 transactions/year and the Gamma-Gamma predicts ₦1,200/transaction, what is the 12-month CLV (before annualization)?
:::
## From CLV to Strategic Decisions
CLV translates from a statistical estimate to a driver of real business decisions: how much to spend acquiring a customer, which segments to focus retention efforts on, and how to allocate limited marketing budget.
### Customer Acquisition Cost Limits
A fundamental rule: the cost to acquire a customer should not exceed a fraction of their lifetime value. The rule of thumb is:
$$\text{CAC} \leq \frac{\text{CLV}}{3}$$
This 3:1 ratio (or sometimes 5:1 in high-growth ventures) ensures sustainable unit economics. If CAC > CLV, the business is destroying value with each new customer.
For a telecom subscriber with CLV = ₦132,000:
- Safe acquisition budget: ₦132,000 / 3 = ₦44,000
- If acquisition channels cost ₦60,000 (premium channels), the CAC is too high; shift to cheaper channels or sacrifice margin to improve CLV.
### Segment-Specific Retention Investment
The whale curve insight implies that retention budgets should be allocated disproportionately to high-CLV customers. A ₦5,000 retention offer (e.g., free airtime, cashback) is easily justified for a ₦150,000 CLV customer (3.3% cost), but not for a ₦20,000 CLV customer (25% cost).
Retention offer decisions can be framed as:
$$\text{Expected profit from offer} = P(\text{retained with offer}) \times \text{CLV} - P(\text{retained without offer}) \times \text{CLV} - \text{offer cost}$$
If this is positive, deploy the offer.
### Marketing Budget Allocation by Segment
Given a fixed annual marketing budget (e.g., ₦100 million), allocate it across segments to maximise total CLV gain:
1. **Rank segments** by (CLV − churn cost) / investment required
2. **Allocate to highest-ROI segments** first
3. **Monitor and rebalance** quarterly as CLV predictions update
### Code: CLV-Based Budget Allocation
::: {.panel-tabset}
## R
```{r}
#| label: clv-budget-r
#| eval: true
library(tidyverse)
# Constants used both inside and outside mutate
retention_offer_cost <- 3000 # ₦ per customer
# Simulate acquisition cost by segment and decide budget allocation
segment_budget <- clv_data |>
group_by(activity_segment) |>
summarise(
segment_size = n(),
avg_clv_ltv = mean(clv_ltv),
total_segment_value = segment_size * avg_clv_ltv,
.groups = "drop"
) |>
mutate(
# CAC limit: 1/3 of CLV
cac_limit = avg_clv_ltv / 3,
# Assume current acquisition channels cost 75% of CAC limit
acquisition_channel_cost = 0.75 * cac_limit,
# Retention offer cost: ₦3,000 per customer to reduce churn by 5%
retention_lift_pct = 0.05,
retention_value_lift = avg_clv_ltv * retention_lift_pct,
# ROI of retention offer: value lift / offer cost
retention_roi = retention_value_lift / retention_offer_cost
) |>
arrange(-retention_roi)
print("Segment Budget Analysis:")
print(segment_budget)
# Allocate ₦100M budget
total_budget <- 100e6
# Strategy: allocate 60% to acquisition, 40% to retention
acq_budget <- total_budget * 0.6
ret_budget <- total_budget * 0.4
# Allocate acquisition budget proportionally to segment size
segment_budget <- segment_budget |>
mutate(
acq_budget_segment = acq_budget * (segment_size / sum(segment_size)),
# Retention budget allocated to highest-ROI segments first
ret_budget_allocated = 0
)
# Greedy allocation of retention budget
cumsum_roi <- 0
for (i in 1:nrow(segment_budget)) {
remaining_budget <- ret_budget - cumsum_roi
segment_offers <- remaining_budget / retention_offer_cost # How many offers can we do?
max_offers_possible <- segment_budget$segment_size[i]
actual_offers <- min(segment_offers, max_offers_possible)
segment_budget$ret_budget_allocated[i] <- actual_offers * retention_offer_cost
cumsum_roi <- cumsum_roi + actual_offers * retention_offer_cost
if (cumsum_roi >= ret_budget) break
}
segment_budget <- segment_budget |>
mutate(
total_budget_segment = acq_budget_segment + ret_budget_allocated,
roi_total = (retention_value_lift * (ret_budget_allocated / retention_offer_cost) +
acq_budget_segment / acquisition_channel_cost * avg_clv_ltv) / total_budget_segment
)
print("\nBudget Allocation:")
print(segment_budget |> select(activity_segment, segment_size, avg_clv_ltv, acq_budget_segment, ret_budget_allocated, total_budget_segment))
```
## Python
```{python}
#| label: py-clv-budget
# Segment analysis and budget allocation
segment_budget = clv_data.groupby('activity_segment', observed=True).agg({
'subscriber_id': 'count',
'clv_ltv': 'mean'
}).reset_index()
segment_budget.columns = ['activity_segment', 'segment_size', 'avg_clv_ltv']
segment_budget['cac_limit'] = segment_budget['avg_clv_ltv'] / 3
segment_budget['acquisition_channel_cost'] = 0.75 * segment_budget['cac_limit']
# Retention offer analysis
segment_budget['retention_offer_cost'] = 3000
segment_budget['retention_lift_pct'] = 0.05
segment_budget['retention_value_lift'] = segment_budget['avg_clv_ltv'] * segment_budget['retention_lift_pct']
segment_budget['retention_roi'] = segment_budget['retention_value_lift'] / segment_budget['retention_offer_cost']
print("Segment Budget Analysis:")
print(segment_budget)
# Budget allocation
total_budget = 100e6
acq_budget = total_budget * 0.6
ret_budget = total_budget * 0.4
segment_budget['acq_budget_segment'] = acq_budget * (segment_budget['segment_size'] / segment_budget['segment_size'].sum())
# Retention budget: greedy allocation to highest ROI first
segment_budget = segment_budget.sort_values('retention_roi', ascending=False)
cumsum_spend = 0
for idx in segment_budget.index:
remaining = ret_budget - cumsum_spend
segment_offers = remaining / segment_budget.loc[idx, 'retention_offer_cost']
max_offers = segment_budget.loc[idx, 'segment_size']
actual_offers = min(segment_offers, max_offers)
segment_budget.loc[idx, 'ret_budget_allocated'] = actual_offers * segment_budget.loc[idx, 'retention_offer_cost']
cumsum_spend += segment_budget.loc[idx, 'ret_budget_allocated']
if cumsum_spend >= ret_budget:
break
segment_budget['ret_budget_allocated'] = segment_budget['ret_budget_allocated'].fillna(0)
segment_budget['total_budget_segment'] = segment_budget['acq_budget_segment'] + segment_budget['ret_budget_allocated']
print("\nBudget Allocation:")
print(segment_budget[['activity_segment', 'segment_size', 'avg_clv_ltv', 'acq_budget_segment', 'ret_budget_allocated']])
```
:::
::: {.callout-caution icon="false"}
## 📝 Section 37.6 Review Questions
1. Why is the CAC ≤ CLV/3 rule more conservative than CAC ≤ CLV?
2. A telecom company has two segments: High CLV (₦150k), Low CLV (₦30k), both with 50% of the customer base. If retention offers cost ₦3,000 each, which segment should receive offers first, and why?
3. Explain how to decide between a ₦5,000 retention offer vs a ₦2,000 retention offer for a given segment.
:::
## Case Study: CLV for a Nigerian Telecom Subscriber Base
### Context
A leading Nigerian mobile network operator (MNO) manages 10 million subscribers across multiple segments. The analytics team has gathered 18 months of transaction data (top-up history) and wants to quantify the lifetime value of their subscriber base, segment by profitability, and recommend a targeted retention programme for the top decile.
### Data and Methodology
We use the RFM and CLV models developed in earlier sections:
1. **RFM Segmentation**: Classify 10,000 representative subscribers (stratified sample) into segments.
2. **BG/NBD + Gamma-Gamma**: Estimate future transaction counts and average spend per customer.
3. **Profitability Gap Analysis**: Compare top decile (CLV > 90th percentile) vs bottom decile (CLV < 10th percentile).
4. **Retention Programme Design**: Recommend offers and expected ROI.
### Implementation
::: {.panel-tabset}
## R
```{r}
#| label: case-clv-telecom-r
#| eval: true
library(tidyverse)
# Summarise findings across all CLV data
# Decile analysis
clv_deciles <- clv_data |>
mutate(clv_decile = ntile(clv_ltv, 10)) |>
group_by(clv_decile) |>
summarise(
count = n(),
pct_of_customers = round(100 * n() / nrow(clv_data), 2),
min_clv = round(min(clv_ltv), 0),
max_clv = round(max(clv_ltv), 0),
avg_clv = round(mean(clv_ltv), 0),
total_clv = round(sum(clv_ltv), 0),
pct_of_value = round(100 * sum(clv_ltv) / sum(clv_data$clv_ltv), 2),
.groups = "drop"
)
print("CLV Decile Analysis:")
print(clv_deciles)
# Top decile vs bottom decile profitability gap
top_decile <- clv_data |> filter(ntile(clv_ltv, 10) == 10)
bottom_decile <- clv_data |> filter(ntile(clv_ltv, 10) == 1)
comparison <- tribble(
~Metric, ~Top_Decile, ~Bottom_Decile,
"Avg CLV (₦)", round(mean(top_decile$clv_ltv), 0), round(mean(bottom_decile$clv_ltv), 0),
"Avg Prob Active", round(mean(top_decile$prob_active), 3), round(mean(bottom_decile$prob_active), 3),
"Avg Exp Trans/12m", round(mean(top_decile$exp_transactions_12m), 2), round(mean(bottom_decile$exp_transactions_12m), 2),
"Avg Spend/Trans (₦)", round(mean(top_decile$expected_spend_per_trans), 0), round(mean(bottom_decile$expected_spend_per_trans), 0)
)
print("\nTop vs Bottom Decile Comparison:")
print(comparison)
# Retention programme for top decile
# Offer: ₦5,000 airtime bonus; expected to improve retention by 7%
retention_programme <- top_decile |>
mutate(
offer_cost = 5000,
clv_uplift = clv_ltv * 0.07, # 7% CLV improvement from retention lift
expected_profit = clv_uplift - offer_cost
) |>
summarise(
customers_targeted = n(),
total_offer_cost = sum(offer_cost),
total_clv_uplift = sum(clv_uplift),
total_expected_profit = sum(expected_profit),
roi = round(100 * total_expected_profit / total_offer_cost, 1)
)
print("\nRetention Programme (Top Decile, ₦5k Offer):")
print(retention_programme)
# Payback period
payback_months <- (5000 / (mean(top_decile$clv_ltv) * 0.07)) * 12
cat(sprintf("Expected payback period: %.1f months\n", payback_months))
```
## Python
```{python}
#| label: py-case-clv-telecom
import pandas as pd
# Decile analysis
clv_data['clv_decile'] = pd.qcut(clv_data['clv_ltv'], q=10, labels=range(1, 11), duplicates='drop')
clv_deciles = clv_data.groupby('clv_decile', observed=True).agg({
'subscriber_id': 'count',
'clv_ltv': ['min', 'max', 'mean', 'sum']
}).reset_index()
clv_deciles.columns = ['decile', 'count', 'min_clv', 'max_clv', 'avg_clv', 'total_clv']
clv_deciles['pct_customers'] = 100 * clv_deciles['count'] / len(clv_data)
clv_deciles['pct_value'] = 100 * clv_deciles['total_clv'] / clv_data['clv_ltv'].sum()
print("CLV Decile Analysis:")
print(clv_deciles)
# Top vs bottom decile
top_decile = clv_data[clv_data['clv_decile'] == 10]
bottom_decile = clv_data[clv_data['clv_decile'] == 1]
comparison = pd.DataFrame({
'Metric': ['Avg CLV (₦)', 'Avg Prob Active', 'Avg Exp Trans/12m', 'Avg Spend/Trans (₦)'],
'Top Decile': [
top_decile['clv_ltv'].mean(),
top_decile['prob_active'].mean(),
top_decile['exp_transactions_12m'].mean(),
top_decile['expected_spend_per_trans'].mean()
],
'Bottom Decile': [
bottom_decile['clv_ltv'].mean(),
bottom_decile['prob_active'].mean(),
bottom_decile['exp_transactions_12m'].mean(),
bottom_decile['expected_spend_per_trans'].mean()
]
})
print("\nTop vs Bottom Decile Comparison:")
print(comparison)
# Retention programme
offer_cost = 5000
retention_lift = 0.07
retention_programme = pd.DataFrame({
'customers_targeted': [len(top_decile)],
'total_offer_cost': [len(top_decile) * offer_cost],
'total_clv_uplift': [top_decile['clv_ltv'].sum() * retention_lift],
'total_expected_profit': [top_decile['clv_ltv'].sum() * retention_lift - len(top_decile) * offer_cost],
})
retention_programme['roi'] = 100 * retention_programme['total_expected_profit'] / retention_programme['total_offer_cost']
print("\nRetention Programme (Top Decile, ₦5k Offer):")
print(retention_programme)
# Payback period
payback_months = (offer_cost / (top_decile['clv_ltv'].mean() * retention_lift)) * 12
print(f"\nExpected payback period: {payback_months:.1f} months")
```
:::
### Management Summary
**Finding 1: Extreme Concentration**
The CLV analysis reveals that the top 10% of subscribers generate 67% of total lifetime value, while the bottom 10% are near-worthless (CLV close to zero). This mirrors the whale curve pattern.
**Finding 2: Top Decile Profile**
Top-decile customers average ₦288,000 in lifetime value, with 0.89 probability of being active, 15.3 expected transactions in the next 12 months, and ₦1,850 average spend per transaction. They are highly engaged, frequent users.
**Finding 3: Retention Programme ROI**
A ₦5,000 airtime bonus offer to the top 1,000 subscribers:
- Investment: ₦5 million
- Expected 7% CLV uplift: ₦20.2 million
- Expected net profit: ₦15.2 million
- ROI: 304%
- Payback period: 1.8 months
This is a highly attractive programme, easily clearing a 3:1 threshold.
**Recommendations**
1. **Immediately launch** the ₦5,000 retention offer to the top 10% of subscribers.
2. **Segment the offer**: offer ₦10,000 to top 5%, ₦5,000 to next 5% (second-order profitability testing).
3. **Expand to top 20%**: extend the ₦3,000 offer to the next 10%, where CLV still supports it.
4. **Automate re-scoring**: refresh CLV estimates monthly to track subscriber migration between segments.
5. **Allocate acquisition budget** to channels and messaging that appeal to high-CLV segments.
::: {.callout-caution icon="false"}
## 📝 Case Study Review Questions
1. Why does the top 10% of subscribers generate such a disproportionate share of lifetime value?
2. What action would you recommend if the ₦5,000 offer is executed but achieves only 3% retention lift instead of 7%? Does the programme break even?
3. How should the company measure the actual retention impact of the programme to validate the BG/NBD and Gamma-Gamma predictions?
:::
## Exercises
::: {.exercises}
#### Chapter 37 Exercises
1. **RFM Recency Bias**: A customer last purchased 180 days ago (Recency = 5 quintile) but has exactly the same frequency and monetary value as another customer who purchased 30 days ago. How would segmentation differ? Should recency always be weighted most heavily in practical decisions?
2. **CLV Sensitivity**: Using the simple CLV formula, calculate the impact of a 1-percentage-point change in retention rate (from 75% to 76%) on CLV. Assume ARPU = ₦36,000, cost = ₦8,000, discount rate = 12%. What is the dollar value of a 1% retention improvement per customer?
3. **CAC Allocation Under Constraint**: A telecom company has ₦50 million CAC budget and four segments with average CLVs of ₦100k, ₦80k, ₦50k, and ₦20k, each with 2.5 million subscribers. If each acquisition channel costs ₦10 per customer, how should the budget be allocated?
4. **Whale Curve Fit**: Using the Nigerian telecom CLV decile output, fit a power law to the cumulative value curve: $V(k) = V_{\max} \cdot k^{\alpha}$, where $k$ is percentile rank. What is the exponent $\alpha$, and what does it imply about concentration?
5. **BG/NBD Validation**: Design a backtesting scheme to validate the BG/NBD model. How would you compare predicted vs actual transactions over a holdout 6-month period?
6. **Gamma-Gamma Assumption Check**: The Gamma-Gamma model assumes that transaction values are independently Gamma-distributed. How would you test this assumption on real data? What would violation look like?
7. **Retention Offer Optimization**: A telecom company can offer either (a) ₦3,000 bonus airtime (expected to lift retention by 4%) or (b) ₦5,000 cashback (expected to lift retention by 6%). For a segment with average CLV = ₦50,000, which offer has better ROI?
8. **Multi-Period CLV**: Extend the simple CLV formula to allow ARPU to grow at 3% annually. Derive the closed form and compare CLV vs the no-growth case for a 75% retention rate and 12% discount rate.
9. **Cohort Analysis**: Segment customers into 6-month cohorts by acquisition date. For each cohort, compute the RFM values as of month 18 after acquisition. Do older cohorts have higher CLV? Does this validate the assumption that value is predictable?
10. **Customer Portfolio Optimization**: Given a portfolio of 10,000 customers with heterogeneous CLVs and acquisition costs, formulate an optimization problem to decide which customers to acquire/retain given a ₦100M budget. What is the theoretical maximum portfolio CLV achievable?
:::
## Further Reading
- Fader, P. S., Hardie, B. G. S., & Lee, K. L. (2005). "RFM and CLV: Using iso-value curves for customer base analysis." *Journal of Marketing Research*, 42(4), 415–430.
- Fader, P. S., & Hardie, B. G. S. (2009). *Probability Models for Customer-Base Analysis*. INFORMS.
- Gupta, S., & Lehmann, D. R. (2003). "Customers as assets." *Journal of Interactive Marketing*, 17(1), 9–24.
- Linoff, G. S., & Berry, M. J. A. (2011). *Data Mining Techniques for Marketing, Sales, and Customer Relationship Management* (3rd ed.). Wiley.
- Pfeifer, P. E., & Carraway, R. L. (2000). "Modeling customer relationships as Markov chains." *Journal of Interactive Marketing*, 14(2), 43–55.
---
## Chapter 37 Appendix: Customer Lifetime Value Derivations
### A1. Geometric Series for Simple CLV
The simple CLV formula sums infinite discounted margins:
$$\text{CLV} = \sum_{t=0}^{\infty} \frac{(m - c) r^t}{(1 + d)^t}$$
Substitute $x = \frac{r}{1 + d}$:
$$\text{CLV} = (m - c) \sum_{t=0}^{\infty} x^t = (m - c) \cdot \frac{1}{1 - x} = \frac{m - c}{1 - \frac{r}{1+d}} = \frac{(m - c)(1 + d)}{1 + d - r}$$
Rearranging slightly:
$$\text{CLV} = \frac{(m - c)(1 + r)}{d + 1 - r}$$
### A2. BG/NBD Likelihood Function
Let $X_i$ denote the number of transactions for customer $i$ observed over period $[0, T]$, with last transaction at time $\tau_i$. Let $\lambda_i$ be the customer's transaction rate (Poisson) and $p_i$ be the dropout probability (Bernoulli after each transaction).
The likelihood for a customer with frequency $f_i > 0$ and recency $r_i$ is:
$$L(\lambda, p | f, r, a) = \lambda e^{-\lambda a} \left( (1-p)^{f-1} p + (1-p)^f \right) \frac{\lambda^f}{f!}$$
Integrating over the Gamma and Beta distributions of $\lambda$ and $p$ across the population yields the BG/NBD model. The resulting customer-level predictions are:
$$P(\text{active} | f, r, a) = \frac{(1 - p) \lambda}{(1 - p) \lambda + p}$$
$$E[X(t) | f, r, a] = \lambda \cdot t \cdot P(\text{active})$$
where $\lambda$ and $p$ are estimated from the population-level parameters via Bayesian update.
### A3. Expected Number of Transactions Formula
The BG/NBD model predicts the expected number of transactions in a future period of length $\Delta t$:
$$E[X(\Delta t) | \text{history}] = \delta \cdot \lambda^* \cdot \Delta t$$
where $\delta = P(\text{active} | \text{history})$ and $\lambda^*$ is the posterior estimate of the transaction rate given the observed history. For a customer with high recency and frequency, $\delta$ is high and $\lambda^*$ is well-estimated; for dormant customers, $\delta$ is near zero.
### A4. Gamma-Gamma Model Derivation
Assume each customer $i$ has a constant expected transaction value $m_i$, and transaction values $v_{i,j}$ are drawn from a Gamma distribution:
$$v_{i,j} | m_i \sim \text{Gamma}(\alpha, \frac{\alpha}{m_i})$$
The conditional mean is $E[v_{i,j} | m_i] = m_i$ and conditional variance is $\text{Var}(v_{i,j} | m_i) = \frac{m_i^2}{\alpha}$.
Further, assume $m_i$ is drawn from a Gamma distribution across customers:
$$m_i \sim \text{Gamma}(p, q)$$
where $p$ is the shape and $q$ is the rate. The likelihood of observing a customer's spending history is:
$$L(v_{i,1}, \ldots, v_{i,f_i} | p, q, \alpha) = \prod_{j=1}^{f_i} \frac{\alpha^{\alpha}}{\Gamma(\alpha)} \left( \frac{\alpha}{m_i} \right)^{\alpha} v_{i,j}^{\alpha - 1} e^{-\frac{\alpha v_{i,j}}{m_i}}$$
Integrating over $m_i$ yields a closed-form posterior for $m_i$ given the observed spending history. The predicted average transaction value for customer $i$ is:
$$\hat{m}_i = E[m_i | \text{history}]$$
This is a weighted average of the customer's observed average spending and the population mean, with weights proportional to the number of transactions and population heterogeneity parameter.
### A5. Simplified CLV = (Expected Transactions) × (Expected Spend)
Combining BG/NBD and Gamma-Gamma:
$$\text{CLV}_{12m} = E[X(12 \text{ months})] \times E[\text{spend per transaction}]$$
Annualized CLV under constant annual margin and retention:
$$\text{CLV}_{\infty} = \frac{\text{CLV}_{12m} \times (1 + r)}{d + (1 - r)}$$
where $r$ is the annual retention rate and $d$ is the discount rate.
---