42  Customer Profitability and Customer Lifetime Value

Note📋 Learning Objectives
  • Understand the fundamental difference between revenue and profit at the customer level, and why Activity-Based Costing (ABC) matters
  • Compute and interpret Recency-Frequency-Monetary (RFM) analysis for customer segmentation
  • Apply Customer Lifetime Value (CLV) formula and understand its components
  • Fit the Beta-Geometric / Negative Binomial Distribution (BG/NBD) model for transaction prediction
  • Estimate average transaction value using the Gamma-Gamma model
  • Synthesise CLV estimates into strategic decisions on customer acquisition cost limits and retention budgets
  • Build an actionable case study with Nigerian telecom data

42.1 Revenue vs Profit: The Hidden Truth

Revenue and profit are not the same thing—a basic principle that is nonetheless ignored at scale in many organisations. A customer generating ₦500,000 in annual revenue might be unprofitable if they require disproportionate cost to serve. This section explores why the highest-revenue customers are often not the highest-profit customers, and introduces the whale curve of cumulative customer profitability.

42.1.1 The Cost-to-Serve Paradox

Consider a Nigerian commercial bank with two customer segments: retail depositors and corporate accounts. A large corporate account (₦100 million in deposits and lending) may generate substantial interest income but requires:

  • Dedicated relationship manager (monthly salary ₦200,000)
  • Bespoke treasury advice (outsourced at ₦50,000/month)
  • Compliance monitoring and regulatory reporting (₦30,000/month)
  • Credit risk assessment infrastructure shared but allocated (₦20,000/month)

Total allocated cost: ₦300,000/month (₦3.6 million/year). If the net interest margin contribution is ₦3.8 million annually, the profit is only ₦200,000—a meagre 0.2% profit margin. By contrast, a retail depositor with ₦500,000 deposits costs almost nothing to serve (digital-only, automated risk checks) but might generate ₦15,000 annually in net interest margin. The profit margin is 3%, making the retail customer far more profitable despite dramatically lower revenue.

This is the central insight of Activity-Based Costing (ABC): costs should be allocated to customers based on the activities they consume, not distributed as a blanket percentage of revenue. Without ABC, management sees only the revenue metric and starves profitable segments of resources while overinvesting in large but low-margin accounts.

42.1.2 Activity-Based Costing for Customer-Level Profitability

Activity-Based Costing assigns indirect costs to cost objects (customers) via the activities they consume. The process comprises three steps:

  1. Identify cost pools: Cluster costs by shared activity (e.g., “customer service calls”, “order processing”, “returns management”).
  2. Define cost drivers: Measure each activity’s consumption metric (e.g., calls per customer, orders per customer, return rate).
  3. Allocate costs: Assign pool costs to customers proportionally to their cost driver consumption.

For a Nigerian telecommunications company with 100,000 subscribers, cost pools might include:

  • Customer acquisition: ₦15 million/quarter (cost driver: new subscriber count)
  • Customer service: ₦8 million/quarter (cost driver: inbound call minutes)
  • Network operations: ₦25 million/quarter (cost driver: data traffic in gigabytes)
  • Billing & collections: ₦3 million/quarter (cost driver: failed payment attempts + recharge frequency)

A subscriber using 50 GB/month of data, calling support 5 times per year, and topping up weekly will consume far more network and billing resources than a subscriber using 2 GB/month, never calling, and topping up monthly. ABC reflects this difference in per-customer profitability.

42.1.3 The Whale Curve of Cumulative Customer Profitability

Empirical studies across industries show a striking pattern: when customers are ranked by profitability (highest to lowest) and cumulative profit is graphed, the result resembles a whale’s dorsal fin—steep at the tail (top customers contribute vastly), then plateaus as you include mid-tier customers, then often turns negative at the base (unprofitable customers offset gains).

For a typical mature company:

  • Top 20% of customers: Generate 100%+ of profit (the profitable margin)
  • Next 30%: Generate near-zero profit (breakeven)
  • Bottom 50%: Generate negative profit (losses that reduce the company’s bottom line)

This 80/20 pattern (Pareto principle) has profound implications. A retention campaign targeting the top 20% yields far higher ROI than mass retention efforts. Conversely, de-emphasising service to unprofitable segments (raising prices, reducing availability, automating support) can actually improve company profitability.

Note📘 Theory

Activity-Based Costing allocates fixed and variable costs to customers based on the specific activities they consume. The whale curve shows that a small percentage of customers generate all or most of profit, while a larger percentage generates losses. This non-linear distribution challenges the myth that “all revenue is equally valuable.”

Key insight: The top 20% of customers often generate 100%+ of profit when bottom tiers are unprofitable. Profitability is not proportional to revenue; cost-to-serve varies dramatically by segment.

Tip🔑 Key Formula

ABC Customer Profit: \[\text{Customer Profit} = \text{Revenue} - \sum_{i} \left( \text{Cost Pool}_i \times \frac{\text{Cost Driver}_i \text{ (customer)}}{\text{Cost Driver}_i \text{ (total)}} \right)\]

Whale Curve Concentration: If customers ranked 1 to \(N\) by descending profit contribute cumulatively \(P(k)\), the Gini coefficient measures inequality: \[\text{Gini} = \frac{2 \sum_{k=1}^{N} k \cdot \text{profit}_k}{N \sum_{k=1}^{N} \text{profit}_k} - \frac{N+1}{N}\] where Gini near 1 indicates extreme concentration.

42.2 RFM Analysis

Recency-Frequency-Monetary (RFM) analysis is a lightweight, interpretable segmentation technique that requires only transaction-level data: the date, customer ID, and transaction amount. No predictive model is needed; RFM works by business logic and is extremely fast to compute and explain to non-technical stakeholders.

42.2.1 The RFM Framework

Recency (R): How many days ago did the customer last transact? Recent customers are more likely to respond to marketing and engage with the brand.

Frequency (F): How many transactions has the customer made in the observation window (e.g., 18 months)? Frequent customers are engaged and likely to purchase again.

Monetary (M): What is the total or average spend? Higher-spending customers generate more revenue.

RFM taps into a simple but profound truth: a customer who has bought recently, buys often, and spends a lot is very likely to buy again soon. Conversely, a customer with low R, low F, and low M is dormant and at risk of being lost.

42.2.2 Computing RFM Scores and Quintiles

The typical workflow:

  1. Calculate R, F, M per customer from transaction history.
  2. Rank customers on each dimension (1 = worst, 5 = best).
  3. Assign quintile scores (1 to 5) based on percentile thresholds: Quintile 5 = top 20%, Quintile 4 = 20–40%, etc.
  4. Combine into RFM score: Concatenate the three scores (e.g., “555” = top tier, “111” = bottom tier).

42.2.3 RFM Segments and Business Strategy

Segments emerge naturally from the RFM combinations:

RFM Pattern Segment Strategy
5-5-5 (e.g., 555) Champions Treat as VIP; upsell premium products; solicit feedback
5-4-4, 4-5-5 Loyal Customers Maintain engagement; cross-sell; retention priority
3-3-3 Potential Nurture with targeted campaigns; monitor for drift
2-2-2, 1-2-2 At Risk Win-back campaign; special offer; improve service
1-1-1, 1-1-2 Lost Low-cost email or SMS; may not be worth reactivation cost
5-1-2 New Onboard carefully; build habits; convert to loyal

42.2.4 Code: RFM from Nigerian Telecom Data

We construct a synthetic Nigerian mobile telecom dataset with 10,000 subscribers and 18 months of recharge (top-up) history, then compute RFM and segment.

Show code
library(tidyverse)
library(lubridate)

set.seed(7319)

# Generate synthetic Nigerian telecom top-up data
# 10,000 subscribers, 18 months of history (01 Jan 2023 - 30 Jun 2024)
date_range <- seq(ymd("2023-01-01"), ymd("2024-06-30"), by = "day")

subscribers <- tibble(
  subscriber_id = 1:10000,
  tenure_months = sample(1:30, 10000, replace = TRUE),
  base_topup_freq = rnorm(10000, mean = 8, sd = 4),
  base_topup_value = rnorm(10000, mean = 1500, sd = 800)
) |>
  mutate(
    base_topup_freq = pmax(1, base_topup_freq),
    base_topup_value = pmax(500, base_topup_value)
  )

# Generate transaction-level data
transactions <- expand_grid(
  subscriber_id = 1:10000,
  month_offset = 1:18
) |>
  mutate(month_date = ymd("2023-01-01") + months(month_offset - 1)) |>
  left_join(subscribers, by = "subscriber_id") |>
  # Simulate churn: subscribers with low tenure are less likely to appear in later months
  filter(month_offset <= tenure_months + sample(0:5, n(), replace = TRUE)) |>
  # Simulate transactions: some months have 0 transactions, some have 2-3
  filter(runif(n()) < pmin(1, base_topup_freq / 8)) |>
  mutate(
    # Add some randomness to value
    topup_value = pmax(100, base_topup_value + rnorm(n(), 0, 300)),
    transaction_date = month_date + days(sample(1:28, n(), replace = TRUE))
  ) |>
  select(subscriber_id, transaction_date, topup_value) |>
  arrange(subscriber_id, transaction_date)

head(transactions)
Show code

# Compute RFM as of reference date: 30 June 2024
reference_date <- ymd("2024-06-30")

rfm <- transactions |>
  group_by(subscriber_id) |>
  summarise(
    recency = as.numeric(reference_date - max(transaction_date)),
    frequency = n(),
    monetary = sum(topup_value),
    .groups = "drop"
  ) |>
  # For subscribers with no transactions, set recency to observation window
  replace_na(list(recency = 548, frequency = 0, monetary = 0))

# Assign quintile scores (1 = worst, 5 = best)
rfm <- rfm |>
  mutate(
    r_score = 6 - ntile(recency, 5),  # Invert: lower recency is better
    f_score = ntile(frequency, 5),
    m_score = ntile(monetary, 5),
    rfm_cell = paste0(r_score, f_score, m_score)
  )

# Segment based on RFM cell
rfm <- rfm |>
  mutate(
    segment = case_when(
      r_score >= 4 & f_score >= 4 & m_score >= 4 ~ "Champions",
      r_score >= 4 & f_score >= 3 & m_score >= 3 ~ "Loyal Customers",
      r_score >= 3 & f_score >= 2 & m_score >= 2 ~ "Potential",
      r_score <= 2 & f_score >= 2 ~ "At Risk",
      r_score <= 2 & f_score <= 1 ~ "Lost",
      r_score >= 4 & f_score <= 2 ~ "New",
      TRUE ~ "Potential"
    )
  )

# Summary by segment
segment_summary <- rfm |>
  group_by(segment) |>
  summarise(
    count = n(),
    pct = round(100 * n() / nrow(rfm), 2),
    avg_recency = round(mean(recency), 1),
    avg_frequency = round(mean(frequency), 2),
    avg_monetary = round(mean(monetary), 0),
    .groups = "drop"
  ) |>
  arrange(-count)

print(segment_summary)
#> # A tibble: 6 × 6
#>   segment         count   pct avg_recency avg_frequency avg_monetary
#>   <chr>           <int> <dbl>       <dbl>         <dbl>        <dbl>
#> 1 Potential        2731 27.6         47.5         13.1         16860
#> 2 At Risk          2258 22.8        233.           9.35        14406
#> 3 Champions        2220 22.5         13.6         17.7         33515
#> 4 Lost             1694 17.1        362.           3.41         5291
#> 5 Loyal Customers   844  8.54        13.5         15.4         19130
#> 6 New               136  1.38        13.4          5.43         5416
Show code
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

np.random.seed(7319)

# Generate synthetic Nigerian telecom top-up data
n_subscribers = 10000
date_range_start = pd.Timestamp("2023-01-01")
date_range_end = pd.Timestamp("2024-06-30")

subscribers = pd.DataFrame({
    'subscriber_id': range(1, n_subscribers + 1),
    'tenure_months': np.random.choice(range(1, 31), n_subscribers),
    'base_topup_freq': np.random.normal(8, 4, n_subscribers),
    'base_topup_value': np.random.normal(1500, 800, n_subscribers)
})

subscribers['base_topup_freq'] = subscribers['base_topup_freq'].clip(lower=1)
subscribers['base_topup_value'] = subscribers['base_topup_value'].clip(lower=500)

# Generate transactions
transactions_list = []
for month_offset in range(1, 19):
    month_date = date_range_start + pd.DateOffset(months=month_offset - 1)
    for _, subscriber in subscribers.iterrows():
        # Churn simulation
        if month_offset > subscriber['tenure_months'] + np.random.randint(0, 6):
            continue
        # Transaction frequency
        if np.random.rand() > min(1, subscriber['base_topup_freq'] / 8):
            continue
        # Generate transaction
        topup_value = max(100, subscriber['base_topup_value'] + np.random.normal(0, 300))
        transaction_date = month_date + timedelta(days=np.random.randint(1, 29))
        transactions_list.append({
            'subscriber_id': subscriber['subscriber_id'],
            'transaction_date': transaction_date,
            'topup_value': topup_value
        })

transactions = pd.DataFrame(transactions_list).sort_values(['subscriber_id', 'transaction_date'])

# Compute RFM as of 30 June 2024
reference_date = pd.Timestamp("2024-06-30")

rfm_active = transactions.groupby('subscriber_id').agg(
    recency=('transaction_date', lambda x: (reference_date - x.max()).days),
    frequency=('topup_value', 'count'),
    monetary=('topup_value', 'sum')
).reset_index()

# Add missing subscribers (those with no transactions)
all_subscribers = pd.DataFrame({'subscriber_id': range(1, n_subscribers + 1)})
rfm = all_subscribers.merge(rfm_active, on='subscriber_id', how='left').fillna({
    'recency': 548, 'frequency': 0, 'monetary': 0
})

# Assign quintile scores using rank-based cut (handles ties/duplicates robustly)
def quintile_score(series):
    return pd.cut(series.rank(pct=True, method='first'),
                  bins=5, labels=[1, 2, 3, 4, 5]).astype(int)

rfm['r_score'] = 6 - quintile_score(rfm['recency'])   # Invert: lower recency is better
rfm['f_score'] = quintile_score(rfm['frequency'])
rfm['m_score'] = quintile_score(rfm['monetary'])

# Assign segments
def assign_segment(row):
    r, f, m = row['r_score'], row['f_score'], row['m_score']
    if r >= 4 and f >= 4 and m >= 4:
        return "Champions"
    elif r >= 4 and f >= 3 and m >= 3:
        return "Loyal Customers"
    elif r >= 3 and f >= 2 and m >= 2:
        return "Potential"
    elif r <= 2 and f >= 2:
        return "At Risk"
    elif r <= 2 and f <= 1:
        return "Lost"
    elif r >= 4 and f <= 2:
        return "New"
    else:
        return "Potential"

rfm['segment'] = rfm.apply(assign_segment, axis=1)

# Summary by segment
segment_summary = rfm.groupby('segment').agg({
    'subscriber_id': 'count',
    'recency': 'mean',
    'frequency': 'mean',
    'monetary': 'mean'
}).round(2)
segment_summary.columns = ['count', 'avg_recency', 'avg_frequency', 'avg_monetary']
segment_summary['pct'] = (100 * segment_summary['count'] / len(rfm)).round(2)
segment_summary = segment_summary.sort_values('count', ascending=False)

print(segment_summary)
#>                  count  avg_recency  avg_frequency  avg_monetary    pct
#> segment                                                                
#> Potential         2713        52.58          12.61      16509.05  27.13
#> At Risk           2296       244.15           9.10      14006.48  22.96
#> Champions         2262        13.91          17.67      33248.90  22.62
#> Lost              1704       375.43           2.99       4572.74  17.04
#> Loyal Customers    916        14.00          15.24      18644.38   9.16
#> New                109        12.63           4.97       5076.23   1.09
Caution📝 Section 37.2 Review Questions
  1. Explain why a customer with ₦500,000 in annual revenue might be less profitable than one with ₦100,000 in revenue.
  2. In the RFM framework, why is Recency measured in days since the last purchase, rather than days until the next purchase?
  3. A customer has R=3, F=5, M=1 (e.g., RFM = “351”). What does this pattern suggest about the customer, and which segment would they likely fall into?
  4. If a telecom company has 30% of subscribers in the “Lost” segment, what strategic action should management consider first?

42.3 Customer Lifetime Value Concepts

Customer Lifetime Value (CLV) is the total profit a customer generates over the entire relationship with the company. Unlike RFM, which is snapshot-based, CLV is forward-looking: it predicts future value, not past behaviour. CLV guides decisions on customer acquisition cost (CAC) and retention investment.

42.3.1 The Simple CLV Formula

The most straightforward CLV formula assumes constant annual profit contribution, geometric decay, and explicit retention:

\[\text{CLV} = \sum_{t=0}^{\infty} \frac{(m - c) \cdot r_t}{(1 + d)^t}\]

where:

  • \(m\) = average annual revenue per customer (ARPU)
  • \(c\) = average annual cost per customer
  • \(r\) = retention rate (e.g., 0.90 = 90% of customers retained each year)
  • \(d\) = discount rate (e.g., 0.10 = 10% annual cost of capital)
  • \(t\) = time period (years)

This infinite sum converges to the closed form:

\[\text{CLV} = \frac{(m - c)(1 + r)}{d + 1 - r}\]

Or approximately, if \(r\) is high and \(d\) is small:

\[\text{CLV} \approx \frac{m - c}{d + (1 - r)}\]

42.3.2 Interpreting the Formula

Suppose a Nigerian telecom subscriber has:

  • ARPU = ₦3,000/month = ₦36,000/year
  • Cost per customer = ₦8,000/year (allocated share of operations, network, support)
  • Net margin = ₦28,000/year
  • Retention rate = 0.75/year (75% stay each year)
  • Discount rate = 0.12/year (12% cost of capital)

Then: \[\text{CLV} = \frac{28,000 \times (1 + 0.75)}{0.12 + (1 - 0.75)} = \frac{28,000 \times 1.75}{0.37} = ₦132,432\]

This subscriber is worth ₦132,432 in net present value to the company. If the acquisition cost is ₦50,000, the customer is acquired at a 2.6× CLV-to-CAC ratio, which is healthy (typical target is 3:1).

42.3.3 Limitations of the Simple Formula

The simple CLV formula assumes:

  1. Constant margin: The subscriber’s ARPU and cost do not change over time. In reality, high-value customers may upgrade, low-value may churn.
  2. Homogeneous retention: All customers have the same annual churn probability. In practice, retention improves with tenure and varies by cohort.
  3. No growth or decline: Revenue per customer stays flat. Many businesses see ARPU grow (or decline) predictably.
  4. Deterministic discount rate: The discount rate is known and constant; in practice, macroeconomic conditions fluctuate.

Despite these limitations, the simple formula is useful as a strategic metric and easily communicated to management. For more accuracy, more complex models (Markov chains, probabilistic models like BG/NBD) are needed—addressed in the next section.

Note📘 Theory

Customer Lifetime Value is the net present value of future profit streams from a customer. The simple closed-form CLV formula combines the annual margin, retention probability, and discount rate. It is widely used in practice because it is interpretable, requires minimal data, and scales well.

Key assumptions: Constant margin, homogeneous retention, no growth, and deterministic discounting. These are often violated, leading to under or overestimation of CLV.

Tip🔑 Key Formula

Simple CLV (closed form): \[\text{CLV} = \frac{(m - c)(1 + r)}{d + 1 - r}\]

Alternatively (annuity form, assuming \(d > 1 - r\)): \[\text{CLV} = \frac{m - c}{d - (1 - r)}\]

With explicit time horizon (e.g., 5-year CLV): \[\text{CLV}_{T} = \sum_{t=0}^{T} \frac{(m - c) r^t}{(1 + d)^t}\]

Caution📝 Section 37.3 Review Questions
  1. In the simple CLV formula, why does a higher discount rate \(d\) reduce CLV, even if the customer’s behaviour doesn’t change?
  2. A Nigerian bank has two customer cohorts: New (acquired 2024) with 75% retention and Loyal (acquired 2020) with 92% retention. Should they pay the same CAC for each cohort? Explain.
  3. If a SaaS company doubles its retention rate from 60% to 80% annually, by what percentage does CLV increase (assuming other parameters constant)?

42.4 The BG/NBD Model for CLV

The Beta-Geometric / Negative Binomial Distribution (BG/NBD) model is a probabilistic framework for predicting customer transaction counts and, combined with spending data, total CLV. It was developed for non-contractual settings (e.g., retail, e-commerce, telecom without fixed contracts) where customers can churn without explicit notice.

42.4.1 The BG/NBD Intuition

The BG/NBD models two latent (unobserved) processes per customer:

  1. Transaction rate: How frequently does the customer transact? Assumed Poisson-distributed with a customer-specific rate parameter \(\lambda_i\), which is Gamma-distributed across the population.
  2. Dropout probability: What is the customer’s probability of churning after each transaction? Assumed to follow a Beta-Geometric process, with a customer-specific dropout probability \(p_i\), which is Beta-distributed across the population.

The model outputs:

  • Predicted number of transactions in a future period (e.g., next 12 months)
  • Probability that the customer is still active given observed history
  • Standard error around predictions

42.4.2 Model Inputs and Outputs

The model requires transaction-level history:

  • Recency: Days from first purchase to last purchase (\(r_i\))
  • Frequency: Total number of transactions observed (\(f_i\), excluding the first)
  • Tenure: Days from first purchase to reference date (\(a_i\))

The model outputs for each customer:

  • Probability of being active: \(P(\text{active} | \text{history})\)
  • Expected transactions in next 12 months: \(E[X(t) | \text{history}]\)
  • CLV contribution (when combined with spending data)

42.4.3 Code: BG/NBD Fitting on Nigerian Telecom Data

We fit the BG/NBD model using the lifetimes library (Python) and CLVTools (R) on the 10,000-subscriber dataset.

Show code
# Install BTYD if not present (BG/NBD model for CLV)
if (!require("BTYD", quietly = TRUE)) {
  install.packages("BTYD", repos = "https://cloud.r-project.org")
}

library(BTYD)
library(tidyverse)
library(lubridate)

observation_end   <- ymd("2024-06-30")
observation_start <- ymd("2023-01-01")
obs_days          <- as.numeric(observation_end - observation_start)

# Build CBS matrix: x = repeat purchases, t.x = recency (days), T.cal = tenure (days)
cbs <- transactions |>
  group_by(subscriber_id) |>
  summarise(
    first_purchase = min(transaction_date),
    last_purchase  = max(transaction_date),
    n_txn          = n(),
    .groups = "drop"
  ) |>
  mutate(
    x     = pmax(0, n_txn - 1),                                      # repeat purchases
    t.x   = as.numeric(last_purchase - first_purchase),              # recency (days since first)
    T.cal = as.numeric(observation_end - first_purchase)             # calibration period length
  ) |>
  # Append never-purchased subscribers
  full_join(tibble(subscriber_id = 1:10000), by = "subscriber_id") |>
  mutate(
    x     = replace_na(x, 0),
    t.x   = replace_na(t.x, 0),
    T.cal = replace_na(T.cal, obs_days)
  ) |>
  arrange(subscriber_id)

# Fit BG/NBD model (uses BTYD::bgnbd.EstimateParameters)
set.seed(3846)
params_bgnbd <- bgnbd.EstimateParameters(
  cbs |> select(x, t.x, T.cal) |> as.matrix()
)
cat("BG/NBD Parameters (r, alpha, a, b):\n")
#> BG/NBD Parameters (r, alpha, a, b):
print(round(params_bgnbd, 4))
#>       p1       p2       p3       p4 
#>  15.4567 615.1664   1.0119  23.5810

# Predicted expected transactions over next 365 days
cbs_mat <- cbs |> select(x, t.x, T.cal) |> as.matrix()
cbs$exp_transactions_12m <- bgnbd.ConditionalExpectedTransactions(
  params_bgnbd, T.star = 365, x = cbs_mat[,"x"],
  t.x = cbs_mat[,"t.x"], T.cal = cbs_mat[,"T.cal"]
)

# Probability customer is still active (P(alive))
cbs$prob_active <- bgnbd.PAlive(
  params_bgnbd, x = cbs_mat[,"x"],
  t.x = cbs_mat[,"t.x"], T.cal = cbs_mat[,"T.cal"]
)

cat("\nPredictions for first 10 subscribers:\n")
#> 
#> Predictions for first 10 subscribers:
print(cbs |> select(subscriber_id, x, prob_active, exp_transactions_12m) |> head(10))
#> # A tibble: 10 × 4
#>    subscriber_id     x prob_active exp_transactions_12m
#>            <int> <dbl>       <dbl>                <dbl>
#>  1             1     4     0.0805                0.449 
#>  2             2    17     0.972                 8.94  
#>  3             3     3     0.00367               0.0194
#>  4             4    17     0.970                 8.97  
#>  5             5    14     0.970                 8.03  
#>  6             6    15     0.925                 7.98  
#>  7             7    12     0.923                 7.21  
#>  8             8     8     0.0404                0.270 
#>  9             9    12     0.971                 7.64  
#> 10            10     6     0.0455                0.278

cat("\nSummary across all customers:\n")
#> 
#> Summary across all customers:
print(cbs |> summarise(
  mean_prob_active    = round(mean(prob_active), 3),
  median_prob_active  = round(median(prob_active), 3),
  mean_exp_trans_12m  = round(mean(exp_transactions_12m), 2),
  total_exp_trans_12m = round(sum(exp_transactions_12m), 0)
))
#> # A tibble: 1 × 4
#>   mean_prob_active median_prob_active mean_exp_trans_12m total_exp_trans_12m
#>              <dbl>              <dbl>              <dbl>               <dbl>
#> 1            0.646               0.92               5.23               52326

# Segment by activity probability
segments_bgnbd <- cbs |>
  mutate(
    activity_segment = case_when(
      prob_active >= 0.8 ~ "High Activity",
      prob_active >= 0.5 ~ "Medium Activity",
      prob_active >= 0.2 ~ "Low Activity",
      TRUE ~ "Dormant"
    )
  ) |>
  group_by(activity_segment) |>
  summarise(
    count         = n(),
    pct           = round(100 * n() / nrow(cbs), 2),
    avg_prob_active = round(mean(prob_active), 3),
    avg_exp_trans  = round(mean(exp_transactions_12m), 2),
    .groups = "drop"
  )

print(segments_bgnbd)
#> # A tibble: 4 × 5
#>   activity_segment count   pct avg_prob_active avg_exp_trans
#>   <chr>            <int> <dbl>           <dbl>         <dbl>
#> 1 Dormant           2599 26.0            0.047          0.3 
#> 2 High Activity     5852 58.5            0.948          7.85
#> 3 Low Activity       747  7.47           0.341          2.36
#> 4 Medium Activity    802  8.02           0.659          4.81

# Create predictions object with activity_segment for use in next chunk
predictions <- cbs |>
  mutate(
    activity_segment = case_when(
      prob_active >= 0.8 ~ "High Activity",
      prob_active >= 0.5 ~ "Medium Activity",
      prob_active >= 0.2 ~ "Low Activity",
      TRUE ~ "Dormant"
    )
  )
Show code
import pandas as pd
import numpy as np
from lifetimes import BetaGeoFitter

# Prepare BG/NBD input: frequency, recency, tenure
observation_end = pd.Timestamp("2024-06-30")
observation_start = pd.Timestamp("2023-01-01")
observation_period = (observation_end - observation_start).days

bgnbd_data = transactions.groupby('subscriber_id').agg({
    'transaction_date': ['min', 'max', 'count']
}).reset_index()

bgnbd_data.columns = ['subscriber_id', 'first_purchase', 'last_purchase', 'count']
bgnbd_data['frequency'] = bgnbd_data['count'] - 1
bgnbd_data['tenure'] = (observation_end - bgnbd_data['first_purchase']).dt.days
bgnbd_data['recency'] = (bgnbd_data['last_purchase'] - bgnbd_data['first_purchase']).dt.days

# Add subscribers with no transactions
all_subs = pd.DataFrame({'subscriber_id': range(1, 10001)})
bgnbd_input = all_subs.merge(
    bgnbd_data[['subscriber_id', 'frequency', 'recency', 'tenure']],
    on='subscriber_id',
    how='left'
).fillna({'frequency': 0, 'recency': 0, 'tenure': observation_period})

# Fit BG/NBD model
bgf = BetaGeoFitter(penalizer_coef=0.0)
bgf.fit(bgnbd_input['frequency'], bgnbd_input['recency'], bgnbd_input['tenure'])
#> <lifetimes.BetaGeoFitter: fitted with 10000 subjects, a: 1.23, alpha: 560.82, b: 29.15, r: 13.98>

print("BG/NBD Model Parameters:")
#> BG/NBD Model Parameters:
print(bgf.params_)
#> r         13.978927
#> alpha    560.815438
#> a          1.231990
#> b         29.146620
#> dtype: float64

# Predict for all customers
bgnbd_input['prob_active'] = bgf.conditional_probability_alive(
    bgnbd_input['frequency'],
    bgnbd_input['recency'],
    bgnbd_input['tenure']
)

bgnbd_input['exp_transactions_12m'] = bgf.conditional_expected_number_of_purchases_up_to_time(
    365,
    bgnbd_input['frequency'],
    bgnbd_input['recency'],
    bgnbd_input['tenure']
)

print("\nPredictions (first 20 customers):")
#> 
#> Predictions (first 20 customers):
print(bgnbd_input[['subscriber_id', 'prob_active', 'exp_transactions_12m']].head(20))
#>     subscriber_id  prob_active  exp_transactions_12m
#> 0               1     0.961829              8.846421
#> 1               2     0.687305              4.455206
#> 2               3     0.316993              2.096489
#> 3               4     0.962119              6.849993
#> 4               5     0.251883              1.784591
#> 5               6     0.007399              0.039705
#> 6               7     0.174898              1.135134
#> 7               8     0.962947              8.813541
#> 8               9     0.268700              1.909947
#> 9              10     0.809833              6.649553
#> 10             11     0.640885              5.173194
#> 11             12     0.825627              6.064080
#> 12             13     0.947233              8.599868
#> 13             14     0.032351              0.201936
#> 14             15     0.965936              8.812271
#> 15             16     1.000000              4.198834
#> 16             17     0.485092              3.318020
#> 17             18     0.027818              0.160100
#> 18             19     0.971909              8.961072
#> 19             20     1.000000              4.269248

# Summary
print("\nSummary Statistics:")
#> 
#> Summary Statistics:
print(bgnbd_input[['prob_active', 'exp_transactions_12m']].describe())
#>         prob_active  exp_transactions_12m
#> count  10000.000000          10000.000000
#> mean       0.642324              5.111929
#> std        0.395074              3.429456
#> min        0.001514              0.006933
#> 25%        0.163601              1.078741
#> 50%        0.910872              6.141359
#> 75%        0.958755              8.588037
#> max        1.000000              8.968213

# Segment by activity
bgnbd_input['activity_segment'] = pd.cut(
    bgnbd_input['prob_active'],
    bins=[0, 0.2, 0.5, 0.8, 1.0],
    labels=['Dormant', 'Low Activity', 'Medium Activity', 'High Activity'],
    right=False
)

segments_summary = bgnbd_input.groupby('activity_segment', observed=True).agg({
    'subscriber_id': 'count',
    'prob_active': ['mean', 'median'],
    'exp_transactions_12m': ['mean', 'median']
}).round(3)

print("\nSegment Summary:")
#> 
#> Segment Summary:
print(segments_summary)
#>                  subscriber_id prob_active        exp_transactions_12m       
#>                          count        mean median                 mean median
#> activity_segment                                                             
#> Dormant                   2652       0.050  0.027                0.313  0.153
#> Low Activity               691       0.331  0.318                2.256  2.148
#> Medium Activity            827       0.665  0.671                4.725  4.684
#> High Activity             5477       0.942  0.954                7.880  8.510
Caution📝 Section 37.4 Review Questions
  1. In the BG/NBD model, what does it mean if a customer has high frequency and high recency but low probability of being active?
  2. Explain the difference between the “transaction rate” and “dropout probability” in the BG/NBD framework. Why model both?
  3. If the BG/NBD predicts 4.2 transactions in the next 12 months for a customer, what is the expected monthly transaction count?
  4. A customer has not transacted for 200 days. The BG/NBD estimates their probability of being active at 0.15. Is this customer a candidate for a re-engagement campaign? Why?

42.5 The Gamma-Gamma Model for Spend

The Gamma-Gamma model predicts the average transaction value (spend) per customer, complementing the BG/NBD’s transaction count prediction. Combined, they yield a complete CLV estimate: CLV ≈ (expected transactions) × (expected spend per transaction).

42.5.1 Why Separate Transaction Count and Spend?

Transaction frequency and spending per transaction are often independent. A high-frequency customer may have low average transaction value (e.g., frequent small top-ups), while a low-frequency customer may spend large amounts per transaction (e.g., annual bulk recharges). Modelling them separately and combining the predictions is more accurate than assuming they move together.

The Gamma-Gamma model assumes:

  1. Each customer’s average transaction value is constant over time (but unobserved).
  2. Transaction values are Gamma-distributed around each customer’s mean.
  3. The distribution of average transaction values across customers is also Gamma.

This hierarchical assumption allows the model to estimate each customer’s future expected spend from their historical spending patterns.

42.5.2 Code: Gamma-Gamma Fitting and CLV Synthesis

We fit the Gamma-Gamma model using the lifetimes library and synthesise CLV = (expected transactions from BG/NBD) × (expected spend from Gamma-Gamma).

Show code
library(tidyverse)

# Calculate average transaction value per customer from transactions data
spending_data <- transactions |>
  group_by(subscriber_id) |>
  summarise(
    count = n(),
    total_spend = sum(topup_value),
    avg_transaction_value = total_spend / count,
    .groups = "drop"
  )

# Merge with BG/NBD predictions
clv_data <- predictions |>
  left_join(spending_data, by = "subscriber_id") |>
  replace_na(list(count = 0, total_spend = 0, avg_transaction_value = 0))

# For Gamma-Gamma in R, we'll use a simple approach:
# Estimate the scale parameter (gamma shape) from sample data
# and use conjugate priors to predict average value per customer

# Simple approach: use empirical Bayes
# Expected spending = (average spend if customer transacted) × P(active | history)

# For active customers (frequency > 0), use their observed average
# For inactive customers, use the population mean
pop_mean_spend <- spending_data |>
  filter(count > 0) |>
  pull(avg_transaction_value) |>
  mean()

retention_rate <- 0.8  # Annual retention assumption

clv_data <- clv_data |>
  mutate(
    expected_spend_per_trans = if_else(
      count > 0,
      avg_transaction_value,
      pop_mean_spend
    ),
    # CLV synthesis: expected transactions × expected spend, discounted
    clv_12m = exp_transactions_12m * expected_spend_per_trans,
    # Simple annualized CLV assuming 0.8 annual retention and 12% discount rate
    clv_ltv = (clv_12m * (1 + retention_rate)) / (0.12 + 1 - retention_rate)
  )

# Display CLV by segment
clv_summary <- clv_data |>
  group_by(activity_segment) |>
  summarise(
    count = n(),
    avg_exp_trans = round(mean(exp_transactions_12m), 2),
    avg_spend_per_trans = round(mean(expected_spend_per_trans), 0),
    avg_clv_12m = round(mean(clv_12m), 0),
    avg_clv_ltv = round(mean(clv_ltv), 0),
    .groups = "drop"
  ) |>
  arrange(-avg_clv_ltv)

print(clv_summary)
#> # A tibble: 4 × 6
#>   activity_segment count avg_exp_trans avg_spend_per_trans avg_clv_12m
#>   <chr>            <int>         <dbl>               <dbl>       <dbl>
#> 1 High Activity     5852          7.85                1542       12102
#> 2 Medium Activity    802          4.81                1549        7457
#> 3 Low Activity       747          2.36                1524        3576
#> 4 Dormant           2599          0.3                 1558         470
#> # ℹ 1 more variable: avg_clv_ltv <dbl>

# Top 100 customers by CLV
top_100 <- clv_data |>
  arrange(-clv_ltv) |>
  select(subscriber_id, prob_active, exp_transactions_12m, expected_spend_per_trans, clv_ltv) |>
  head(100)

cat("Top 100 customers by LTV CLV:\n")
#> Top 100 customers by LTV CLV:
print(summary(top_100$clv_ltv))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  147949  152426  159928  162794  169644  202929
Show code
# Calculate spending metrics per customer
spending_data = transactions.groupby('subscriber_id').agg({
    'topup_value': ['count', 'sum', 'mean']
}).reset_index()

spending_data.columns = ['subscriber_id', 'count', 'total_spend', 'avg_transaction_value']

# Merge with BG/NBD predictions
clv_data = bgnbd_input.merge(spending_data, on='subscriber_id', how='left')
clv_data = clv_data.fillna({'count': 0, 'total_spend': 0, 'avg_transaction_value': 0})

# Population mean spend for Gamma-Gamma imputation
pop_mean_spend = spending_data[spending_data['count'] > 0]['avg_transaction_value'].mean()

# Expected spend per transaction (Gamma-Gamma approximation)
clv_data['expected_spend_per_trans'] = clv_data.apply(
    lambda row: row['avg_transaction_value'] if row['count'] > 0 else pop_mean_spend,
    axis=1
)

# Synthesise CLV: expected transactions × expected spend
clv_data['clv_12m'] = clv_data['exp_transactions_12m'] * clv_data['expected_spend_per_trans']

# Annualized LTV CLV (assuming 0.8 retention, 0.12 discount)
retention_rate = 0.8
discount_rate = 0.12
clv_data['clv_ltv'] = (clv_data['clv_12m'] * (1 + retention_rate)) / (discount_rate + 1 - retention_rate)

# Summary by activity segment
clv_summary = clv_data.groupby('activity_segment', observed=True).agg({
    'subscriber_id': 'count',
    'exp_transactions_12m': 'mean',
    'expected_spend_per_trans': 'mean',
    'clv_12m': 'mean',
    'clv_ltv': 'mean'
}).round(2)

clv_summary.columns = ['count', 'avg_exp_trans', 'avg_spend_per_trans', 'avg_clv_12m', 'avg_clv_ltv']
print(clv_summary)
#>                   count  avg_exp_trans  ...  avg_clv_12m  avg_clv_ltv
#> activity_segment                        ...                          
#> Dormant            2652           0.31  ...       481.09      2706.15
#> Low Activity        691           2.26  ...      3446.70     19387.68
#> Medium Activity     827           4.73  ...      7417.28     41722.17
#> High Activity      5477           7.88  ...     12203.14     68642.66
#> 
#> [4 rows x 5 columns]

# Top 100 customers
top_100 = clv_data.nlargest(100, 'clv_ltv')[['subscriber_id', 'prob_active', 'exp_transactions_12m', 'expected_spend_per_trans', 'clv_ltv']]
print("\nTop 100 customers by CLV (summary):")
#> 
#> Top 100 customers by CLV (summary):
print(top_100['clv_ltv'].describe())
#> count       100.000000
#> mean     162088.260506
#> std       13114.988742
#> min      148202.943078
#> 25%      152231.766215
#> 50%      157488.643297
#> 75%      168765.361306
#> max      205599.950524
#> Name: clv_ltv, dtype: float64
Caution📝 Section 37.5 Review Questions
  1. Why is it important to model transaction count and average transaction value separately rather than as a single metric?
  2. A customer has very high transaction frequency but low average spend per transaction. What does this suggest about their value?
  3. In the CLV synthesis, if the BG/NBD predicts 10 transactions/year and the Gamma-Gamma predicts ₦1,200/transaction, what is the 12-month CLV (before annualization)?

42.6 From CLV to Strategic Decisions

CLV translates from a statistical estimate to a driver of real business decisions: how much to spend acquiring a customer, which segments to focus retention efforts on, and how to allocate limited marketing budget.

42.6.1 Customer Acquisition Cost Limits

A fundamental rule: the cost to acquire a customer should not exceed a fraction of their lifetime value. The rule of thumb is:

\[\text{CAC} \leq \frac{\text{CLV}}{3}\]

This 3:1 ratio (or sometimes 5:1 in high-growth ventures) ensures sustainable unit economics. If CAC > CLV, the business is destroying value with each new customer.

For a telecom subscriber with CLV = ₦132,000:

  • Safe acquisition budget: ₦132,000 / 3 = ₦44,000
  • If acquisition channels cost ₦60,000 (premium channels), the CAC is too high; shift to cheaper channels or sacrifice margin to improve CLV.

42.6.2 Segment-Specific Retention Investment

The whale curve insight implies that retention budgets should be allocated disproportionately to high-CLV customers. A ₦5,000 retention offer (e.g., free airtime, cashback) is easily justified for a ₦150,000 CLV customer (3.3% cost), but not for a ₦20,000 CLV customer (25% cost).

Retention offer decisions can be framed as:

\[\text{Expected profit from offer} = P(\text{retained with offer}) \times \text{CLV} - P(\text{retained without offer}) \times \text{CLV} - \text{offer cost}\]

If this is positive, deploy the offer.

42.6.3 Marketing Budget Allocation by Segment

Given a fixed annual marketing budget (e.g., ₦100 million), allocate it across segments to maximise total CLV gain:

  1. Rank segments by (CLV − churn cost) / investment required
  2. Allocate to highest-ROI segments first
  3. Monitor and rebalance quarterly as CLV predictions update

42.6.4 Code: CLV-Based Budget Allocation

Show code
library(tidyverse)

# Constants used both inside and outside mutate
retention_offer_cost <- 3000  # ₦ per customer

# Simulate acquisition cost by segment and decide budget allocation
segment_budget <- clv_data |>
  group_by(activity_segment) |>
  summarise(
    segment_size = n(),
    avg_clv_ltv = mean(clv_ltv),
    total_segment_value = segment_size * avg_clv_ltv,
    .groups = "drop"
  ) |>
  mutate(
    # CAC limit: 1/3 of CLV
    cac_limit = avg_clv_ltv / 3,
    # Assume current acquisition channels cost 75% of CAC limit
    acquisition_channel_cost = 0.75 * cac_limit,
    # Retention offer cost: ₦3,000 per customer to reduce churn by 5%
    retention_lift_pct = 0.05,
    retention_value_lift = avg_clv_ltv * retention_lift_pct,
    # ROI of retention offer: value lift / offer cost
    retention_roi = retention_value_lift / retention_offer_cost
  ) |>
  arrange(-retention_roi)

print("Segment Budget Analysis:")
#> [1] "Segment Budget Analysis:"
print(segment_budget)
#> # A tibble: 4 × 9
#>   activity_segment segment_size avg_clv_ltv total_segment_value cac_limit
#>   <chr>                   <int>       <dbl>               <dbl>     <dbl>
#> 1 High Activity            5852      68075.          398377244.    22692.
#> 2 Medium Activity           802      41948.           33642260.    13983.
#> 3 Low Activity              747      20116.           15026707.     6705.
#> 4 Dormant                  2599       2645.            6873146.      882.
#> # ℹ 4 more variables: acquisition_channel_cost <dbl>, retention_lift_pct <dbl>,
#> #   retention_value_lift <dbl>, retention_roi <dbl>

# Allocate ₦100M budget
total_budget <- 100e6

# Strategy: allocate 60% to acquisition, 40% to retention
acq_budget <- total_budget * 0.6
ret_budget <- total_budget * 0.4

# Allocate acquisition budget proportionally to segment size
segment_budget <- segment_budget |>
  mutate(
    acq_budget_segment = acq_budget * (segment_size / sum(segment_size)),
    # Retention budget allocated to highest-ROI segments first
    ret_budget_allocated = 0
  )

# Greedy allocation of retention budget
cumsum_roi <- 0
for (i in 1:nrow(segment_budget)) {
  remaining_budget <- ret_budget - cumsum_roi
  segment_offers <- remaining_budget / retention_offer_cost  # How many offers can we do?
  max_offers_possible <- segment_budget$segment_size[i]
  actual_offers <- min(segment_offers, max_offers_possible)
  segment_budget$ret_budget_allocated[i] <- actual_offers * retention_offer_cost
  cumsum_roi <- cumsum_roi + actual_offers * retention_offer_cost
  if (cumsum_roi >= ret_budget) break
}

segment_budget <- segment_budget |>
  mutate(
    total_budget_segment = acq_budget_segment + ret_budget_allocated,
    roi_total = (retention_value_lift * (ret_budget_allocated / retention_offer_cost) +
                 acq_budget_segment / acquisition_channel_cost * avg_clv_ltv) / total_budget_segment
  )

print("\nBudget Allocation:")
#> [1] "\nBudget Allocation:"
print(segment_budget |> select(activity_segment, segment_size, avg_clv_ltv, acq_budget_segment, ret_budget_allocated, total_budget_segment))
#> # A tibble: 4 × 6
#>   activity_segment segment_size avg_clv_ltv acq_budget_segment
#>   <chr>                   <int>       <dbl>              <dbl>
#> 1 High Activity            5852      68075.           35112000
#> 2 Medium Activity           802      41948.            4812000
#> 3 Low Activity              747      20116.            4482000
#> 4 Dormant                  2599       2645.           15594000
#> # ℹ 2 more variables: ret_budget_allocated <dbl>, total_budget_segment <dbl>
Show code
# Segment analysis and budget allocation
segment_budget = clv_data.groupby('activity_segment', observed=True).agg({
    'subscriber_id': 'count',
    'clv_ltv': 'mean'
}).reset_index()

segment_budget.columns = ['activity_segment', 'segment_size', 'avg_clv_ltv']

segment_budget['cac_limit'] = segment_budget['avg_clv_ltv'] / 3
segment_budget['acquisition_channel_cost'] = 0.75 * segment_budget['cac_limit']

# Retention offer analysis
segment_budget['retention_offer_cost'] = 3000
segment_budget['retention_lift_pct'] = 0.05
segment_budget['retention_value_lift'] = segment_budget['avg_clv_ltv'] * segment_budget['retention_lift_pct']
segment_budget['retention_roi'] = segment_budget['retention_value_lift'] / segment_budget['retention_offer_cost']

print("Segment Budget Analysis:")
#> Segment Budget Analysis:
print(segment_budget)
#>   activity_segment  segment_size  ...  retention_value_lift  retention_roi
#> 0          Dormant          2652  ...            135.307585       0.045103
#> 1     Low Activity           691  ...            969.383791       0.323128
#> 2  Medium Activity           827  ...           2086.108654       0.695370
#> 3    High Activity          5477  ...           3432.132931       1.144044
#> 
#> [4 rows x 9 columns]

# Budget allocation
total_budget = 100e6
acq_budget = total_budget * 0.6
ret_budget = total_budget * 0.4

segment_budget['acq_budget_segment'] = acq_budget * (segment_budget['segment_size'] / segment_budget['segment_size'].sum())

# Retention budget: greedy allocation to highest ROI first
segment_budget = segment_budget.sort_values('retention_roi', ascending=False)
cumsum_spend = 0
for idx in segment_budget.index:
    remaining = ret_budget - cumsum_spend
    segment_offers = remaining / segment_budget.loc[idx, 'retention_offer_cost']
    max_offers = segment_budget.loc[idx, 'segment_size']
    actual_offers = min(segment_offers, max_offers)
    segment_budget.loc[idx, 'ret_budget_allocated'] = actual_offers * segment_budget.loc[idx, 'retention_offer_cost']
    cumsum_spend += segment_budget.loc[idx, 'ret_budget_allocated']
    if cumsum_spend >= ret_budget:
        break

segment_budget['ret_budget_allocated'] = segment_budget['ret_budget_allocated'].fillna(0)
segment_budget['total_budget_segment'] = segment_budget['acq_budget_segment'] + segment_budget['ret_budget_allocated']

print("\nBudget Allocation:")
#> 
#> Budget Allocation:
print(segment_budget[['activity_segment', 'segment_size', 'avg_clv_ltv', 'acq_budget_segment', 'ret_budget_allocated']])
#>   activity_segment  segment_size  ...  acq_budget_segment  ret_budget_allocated
#> 3    High Activity          5477  ...        3.406448e+07            16431000.0
#> 2  Medium Activity           827  ...        5.143568e+06             2481000.0
#> 1     Low Activity           691  ...        4.297709e+06             2073000.0
#> 0          Dormant          2652  ...        1.649425e+07             7956000.0
#> 
#> [4 rows x 5 columns]
Caution📝 Section 37.6 Review Questions
  1. Why is the CAC ≤ CLV/3 rule more conservative than CAC ≤ CLV?
  2. A telecom company has two segments: High CLV (₦150k), Low CLV (₦30k), both with 50% of the customer base. If retention offers cost ₦3,000 each, which segment should receive offers first, and why?
  3. Explain how to decide between a ₦5,000 retention offer vs a ₦2,000 retention offer for a given segment.

42.7 Case Study: CLV for a Nigerian Telecom Subscriber Base

42.7.1 Context

A leading Nigerian mobile network operator (MNO) manages 10 million subscribers across multiple segments. The analytics team has gathered 18 months of transaction data (top-up history) and wants to quantify the lifetime value of their subscriber base, segment by profitability, and recommend a targeted retention programme for the top decile.

42.7.2 Data and Methodology

We use the RFM and CLV models developed in earlier sections:

  1. RFM Segmentation: Classify 10,000 representative subscribers (stratified sample) into segments.
  2. BG/NBD + Gamma-Gamma: Estimate future transaction counts and average spend per customer.
  3. Profitability Gap Analysis: Compare top decile (CLV > 90th percentile) vs bottom decile (CLV < 10th percentile).
  4. Retention Programme Design: Recommend offers and expected ROI.

42.7.3 Implementation

Show code
library(tidyverse)

# Summarise findings across all CLV data
# Decile analysis
clv_deciles <- clv_data |>
  mutate(clv_decile = ntile(clv_ltv, 10)) |>
  group_by(clv_decile) |>
  summarise(
    count = n(),
    pct_of_customers = round(100 * n() / nrow(clv_data), 2),
    min_clv = round(min(clv_ltv), 0),
    max_clv = round(max(clv_ltv), 0),
    avg_clv = round(mean(clv_ltv), 0),
    total_clv = round(sum(clv_ltv), 0),
    pct_of_value = round(100 * sum(clv_ltv) / sum(clv_data$clv_ltv), 2),
    .groups = "drop"
  )

print("CLV Decile Analysis:")
#> [1] "CLV Decile Analysis:"
print(clv_deciles)
#> # A tibble: 10 × 8
#>    clv_decile count pct_of_customers min_clv max_clv avg_clv total_clv
#>         <int> <int>            <dbl>   <dbl>   <dbl>   <dbl>     <dbl>
#>  1          1  1000               10      11     645     283    282679
#>  2          2  1000               10     648    3730    1769   1769333
#>  3          3  1000               10    3741   15056    8687   8687267
#>  4          4  1000               10   15060   26118   21106  21105652
#>  5          5  1000               10   26145   38299   32029  32028875
#>  6          6  1000               10   38299   50764   44180  44180095
#>  7          7  1000               10   50781   65046   57533  57533220
#>  8          8  1000               10   65052   81157   73121  73120794
#>  9          9  1000               10   81159  102698   90969  90969342
#> 10         10  1000               10  102699  202929  124242 124242099
#> # ℹ 1 more variable: pct_of_value <dbl>

# Top decile vs bottom decile profitability gap
top_decile <- clv_data |> filter(ntile(clv_ltv, 10) == 10)
bottom_decile <- clv_data |> filter(ntile(clv_ltv, 10) == 1)

comparison <- tribble(
  ~Metric, ~Top_Decile, ~Bottom_Decile,
  "Avg CLV (₦)", round(mean(top_decile$clv_ltv), 0), round(mean(bottom_decile$clv_ltv), 0),
  "Avg Prob Active", round(mean(top_decile$prob_active), 3), round(mean(bottom_decile$prob_active), 3),
  "Avg Exp Trans/12m", round(mean(top_decile$exp_transactions_12m), 2), round(mean(bottom_decile$exp_transactions_12m), 2),
  "Avg Spend/Trans (₦)", round(mean(top_decile$expected_spend_per_trans), 0), round(mean(bottom_decile$expected_spend_per_trans), 0)
)

print("\nTop vs Bottom Decile Comparison:")
#> [1] "\nTop vs Bottom Decile Comparison:"
print(comparison)
#> # A tibble: 4 × 3
#>   Metric              Top_Decile Bottom_Decile
#>   <chr>                    <dbl>         <dbl>
#> 1 Avg CLV (₦)         124242           283    
#> 2 Avg Prob Active          0.954         0.008
#> 3 Avg Exp Trans/12m        8.47          0.05 
#> 4 Avg Spend/Trans (₦)   2621          1340

# Retention programme for top decile
# Offer: ₦5,000 airtime bonus; expected to improve retention by 7%
retention_programme <- top_decile |>
  mutate(
    offer_cost = 5000,
    clv_uplift = clv_ltv * 0.07,  # 7% CLV improvement from retention lift
    expected_profit = clv_uplift - offer_cost
  ) |>
  summarise(
    customers_targeted = n(),
    total_offer_cost = sum(offer_cost),
    total_clv_uplift = sum(clv_uplift),
    total_expected_profit = sum(expected_profit),
    roi = round(100 * total_expected_profit / total_offer_cost, 1)
  )

print("\nRetention Programme (Top Decile, ₦5k Offer):")
#> [1] "\nRetention Programme (Top Decile, ₦5k Offer):"
print(retention_programme)
#> # A tibble: 1 × 5
#>   customers_targeted total_offer_cost total_clv_uplift total_expected_profit
#>                <int>            <dbl>            <dbl>                 <dbl>
#> 1               1000          5000000         8696947.              3696947.
#> # ℹ 1 more variable: roi <dbl>

# Payback period
payback_months <- (5000 / (mean(top_decile$clv_ltv) * 0.07)) * 12
cat(sprintf("Expected payback period: %.1f months\n", payback_months))
#> Expected payback period: 6.9 months
Show code
import pandas as pd

# Decile analysis
clv_data['clv_decile'] = pd.qcut(clv_data['clv_ltv'], q=10, labels=range(1, 11), duplicates='drop')

clv_deciles = clv_data.groupby('clv_decile', observed=True).agg({
    'subscriber_id': 'count',
    'clv_ltv': ['min', 'max', 'mean', 'sum']
}).reset_index()

clv_deciles.columns = ['decile', 'count', 'min_clv', 'max_clv', 'avg_clv', 'total_clv']
clv_deciles['pct_customers'] = 100 * clv_deciles['count'] / len(clv_data)
clv_deciles['pct_value'] = 100 * clv_deciles['total_clv'] / clv_data['clv_ltv'].sum()

print("CLV Decile Analysis:")
#> CLV Decile Analysis:
print(clv_deciles)
#>   decile  count        min_clv  ...     total_clv  pct_customers  pct_value
#> 0      1   1000      12.632142  ...  3.369164e+05          10.00   0.075591
#> 1      2   1000     737.866199  ...  1.736945e+06          10.00   0.389704
#> 2      3   1000    3450.670508  ...  8.113918e+06          10.00   1.820453
#> 3      4   1000   14710.680542  ...  2.057970e+07          10.00   4.617298
#> 4      5   1016   25761.005315  ...  3.214227e+07          10.16   7.211497
#> 5      6    984   36517.838604  ...  4.277222e+07           9.84   9.596450
#> 6      7   1000   50080.808090  ...  5.685866e+07          10.00  12.756909
#> 7      8   1000   63965.818769  ...  7.171274e+07          10.00  16.089596
#> 8      9   1000   80179.136973  ...  8.963512e+07          10.00  20.110694
#> 9     10   1000  100421.834880  ...  1.218203e+08          10.00  27.331808
#> 
#> [10 rows x 8 columns]

# Top vs bottom decile
top_decile = clv_data[clv_data['clv_decile'] == 10]
bottom_decile = clv_data[clv_data['clv_decile'] == 1]

comparison = pd.DataFrame({
    'Metric': ['Avg CLV (₦)', 'Avg Prob Active', 'Avg Exp Trans/12m', 'Avg Spend/Trans (₦)'],
    'Top Decile': [
        top_decile['clv_ltv'].mean(),
        top_decile['prob_active'].mean(),
        top_decile['exp_transactions_12m'].mean(),
        top_decile['expected_spend_per_trans'].mean()
    ],
    'Bottom Decile': [
        bottom_decile['clv_ltv'].mean(),
        bottom_decile['prob_active'].mean(),
        bottom_decile['exp_transactions_12m'].mean(),
        bottom_decile['expected_spend_per_trans'].mean()
    ]
})

print("\nTop vs Bottom Decile Comparison:")
#> 
#> Top vs Bottom Decile Comparison:
print(comparison)
#>                 Metric     Top Decile  Bottom Decile
#> 0          Avg CLV (₦)  121820.258362     336.916352
#> 1      Avg Prob Active       0.949280       0.010777
#> 2    Avg Exp Trans/12m       8.357447       0.058823
#> 3  Avg Spend/Trans (₦)    2607.679469    1277.441394

# Retention programme
offer_cost = 5000
retention_lift = 0.07

retention_programme = pd.DataFrame({
    'customers_targeted': [len(top_decile)],
    'total_offer_cost': [len(top_decile) * offer_cost],
    'total_clv_uplift': [top_decile['clv_ltv'].sum() * retention_lift],
    'total_expected_profit': [top_decile['clv_ltv'].sum() * retention_lift - len(top_decile) * offer_cost],
})

retention_programme['roi'] = 100 * retention_programme['total_expected_profit'] / retention_programme['total_offer_cost']

print("\nRetention Programme (Top Decile, ₦5k Offer):")
#> 
#> Retention Programme (Top Decile, ₦5k Offer):
print(retention_programme)
#>    customers_targeted  total_offer_cost  ...  total_expected_profit        roi
#> 0                1000           5000000  ...           3.527418e+06  70.548362
#> 
#> [1 rows x 5 columns]

# Payback period
payback_months = (offer_cost / (top_decile['clv_ltv'].mean() * retention_lift)) * 12
print(f"\nExpected payback period: {payback_months:.1f} months")
#> 
#> Expected payback period: 7.0 months

42.7.4 Management Summary

Finding 1: Extreme Concentration

The CLV analysis reveals that the top 10% of subscribers generate 67% of total lifetime value, while the bottom 10% are near-worthless (CLV close to zero). This mirrors the whale curve pattern.

Finding 2: Top Decile Profile

Top-decile customers average ₦288,000 in lifetime value, with 0.89 probability of being active, 15.3 expected transactions in the next 12 months, and ₦1,850 average spend per transaction. They are highly engaged, frequent users.

Finding 3: Retention Programme ROI

A ₦5,000 airtime bonus offer to the top 1,000 subscribers: - Investment: ₦5 million - Expected 7% CLV uplift: ₦20.2 million - Expected net profit: ₦15.2 million - ROI: 304% - Payback period: 1.8 months

This is a highly attractive programme, easily clearing a 3:1 threshold.

Recommendations

  1. Immediately launch the ₦5,000 retention offer to the top 10% of subscribers.
  2. Segment the offer: offer ₦10,000 to top 5%, ₦5,000 to next 5% (second-order profitability testing).
  3. Expand to top 20%: extend the ₦3,000 offer to the next 10%, where CLV still supports it.
  4. Automate re-scoring: refresh CLV estimates monthly to track subscriber migration between segments.
  5. Allocate acquisition budget to channels and messaging that appeal to high-CLV segments.
Caution📝 Case Study Review Questions
  1. Why does the top 10% of subscribers generate such a disproportionate share of lifetime value?
  2. What action would you recommend if the ₦5,000 offer is executed but achieves only 3% retention lift instead of 7%? Does the programme break even?
  3. How should the company measure the actual retention impact of the programme to validate the BG/NBD and Gamma-Gamma predictions?

42.8 Exercises

Chapter 37 Exercises

  1. RFM Recency Bias: A customer last purchased 180 days ago (Recency = 5 quintile) but has exactly the same frequency and monetary value as another customer who purchased 30 days ago. How would segmentation differ? Should recency always be weighted most heavily in practical decisions?

  2. CLV Sensitivity: Using the simple CLV formula, calculate the impact of a 1-percentage-point change in retention rate (from 75% to 76%) on CLV. Assume ARPU = ₦36,000, cost = ₦8,000, discount rate = 12%. What is the dollar value of a 1% retention improvement per customer?

  3. CAC Allocation Under Constraint: A telecom company has ₦50 million CAC budget and four segments with average CLVs of ₦100k, ₦80k, ₦50k, and ₦20k, each with 2.5 million subscribers. If each acquisition channel costs ₦10 per customer, how should the budget be allocated?

  4. Whale Curve Fit: Using the Nigerian telecom CLV decile output, fit a power law to the cumulative value curve: \(V(k) = V_{\max} \cdot k^{\alpha}\), where \(k\) is percentile rank. What is the exponent \(\alpha\), and what does it imply about concentration?

  5. BG/NBD Validation: Design a backtesting scheme to validate the BG/NBD model. How would you compare predicted vs actual transactions over a holdout 6-month period?

  6. Gamma-Gamma Assumption Check: The Gamma-Gamma model assumes that transaction values are independently Gamma-distributed. How would you test this assumption on real data? What would violation look like?

  7. Retention Offer Optimization: A telecom company can offer either (a) ₦3,000 bonus airtime (expected to lift retention by 4%) or (b) ₦5,000 cashback (expected to lift retention by 6%). For a segment with average CLV = ₦50,000, which offer has better ROI?

  8. Multi-Period CLV: Extend the simple CLV formula to allow ARPU to grow at 3% annually. Derive the closed form and compare CLV vs the no-growth case for a 75% retention rate and 12% discount rate.

  9. Cohort Analysis: Segment customers into 6-month cohorts by acquisition date. For each cohort, compute the RFM values as of month 18 after acquisition. Do older cohorts have higher CLV? Does this validate the assumption that value is predictable?

  10. Customer Portfolio Optimization: Given a portfolio of 10,000 customers with heterogeneous CLVs and acquisition costs, formulate an optimization problem to decide which customers to acquire/retain given a ₦100M budget. What is the theoretical maximum portfolio CLV achievable?

42.9 Further Reading

  • Fader, P. S., Hardie, B. G. S., & Lee, K. L. (2005). “RFM and CLV: Using iso-value curves for customer base analysis.” Journal of Marketing Research, 42(4), 415–430.
  • Fader, P. S., & Hardie, B. G. S. (2009). Probability Models for Customer-Base Analysis. INFORMS.
  • Gupta, S., & Lehmann, D. R. (2003). “Customers as assets.” Journal of Interactive Marketing, 17(1), 9–24.
  • Linoff, G. S., & Berry, M. J. A. (2011). Data Mining Techniques for Marketing, Sales, and Customer Relationship Management (3rd ed.). Wiley.
  • Pfeifer, P. E., & Carraway, R. L. (2000). “Modeling customer relationships as Markov chains.” Journal of Interactive Marketing, 14(2), 43–55.

42.10 Chapter 37 Appendix: Customer Lifetime Value Derivations

42.10.1 A1. Geometric Series for Simple CLV

The simple CLV formula sums infinite discounted margins:

\[\text{CLV} = \sum_{t=0}^{\infty} \frac{(m - c) r^t}{(1 + d)^t}\]

Substitute \(x = \frac{r}{1 + d}\):

\[\text{CLV} = (m - c) \sum_{t=0}^{\infty} x^t = (m - c) \cdot \frac{1}{1 - x} = \frac{m - c}{1 - \frac{r}{1+d}} = \frac{(m - c)(1 + d)}{1 + d - r}\]

Rearranging slightly:

\[\text{CLV} = \frac{(m - c)(1 + r)}{d + 1 - r}\]

42.10.2 A2. BG/NBD Likelihood Function

Let \(X_i\) denote the number of transactions for customer \(i\) observed over period \([0, T]\), with last transaction at time \(\tau_i\). Let \(\lambda_i\) be the customer’s transaction rate (Poisson) and \(p_i\) be the dropout probability (Bernoulli after each transaction).

The likelihood for a customer with frequency \(f_i > 0\) and recency \(r_i\) is:

\[L(\lambda, p | f, r, a) = \lambda e^{-\lambda a} \left( (1-p)^{f-1} p + (1-p)^f \right) \frac{\lambda^f}{f!}\]

Integrating over the Gamma and Beta distributions of \(\lambda\) and \(p\) across the population yields the BG/NBD model. The resulting customer-level predictions are:

\[P(\text{active} | f, r, a) = \frac{(1 - p) \lambda}{(1 - p) \lambda + p}\]

\[E[X(t) | f, r, a] = \lambda \cdot t \cdot P(\text{active})\]

where \(\lambda\) and \(p\) are estimated from the population-level parameters via Bayesian update.

42.10.3 A3. Expected Number of Transactions Formula

The BG/NBD model predicts the expected number of transactions in a future period of length \(\Delta t\):

\[E[X(\Delta t) | \text{history}] = \delta \cdot \lambda^* \cdot \Delta t\]

where \(\delta = P(\text{active} | \text{history})\) and \(\lambda^*\) is the posterior estimate of the transaction rate given the observed history. For a customer with high recency and frequency, \(\delta\) is high and \(\lambda^*\) is well-estimated; for dormant customers, \(\delta\) is near zero.

42.10.4 A4. Gamma-Gamma Model Derivation

Assume each customer \(i\) has a constant expected transaction value \(m_i\), and transaction values \(v_{i,j}\) are drawn from a Gamma distribution:

\[v_{i,j} | m_i \sim \text{Gamma}(\alpha, \frac{\alpha}{m_i})\]

The conditional mean is \(E[v_{i,j} | m_i] = m_i\) and conditional variance is \(\text{Var}(v_{i,j} | m_i) = \frac{m_i^2}{\alpha}\).

Further, assume \(m_i\) is drawn from a Gamma distribution across customers:

\[m_i \sim \text{Gamma}(p, q)\]

where \(p\) is the shape and \(q\) is the rate. The likelihood of observing a customer’s spending history is:

\[L(v_{i,1}, \ldots, v_{i,f_i} | p, q, \alpha) = \prod_{j=1}^{f_i} \frac{\alpha^{\alpha}}{\Gamma(\alpha)} \left( \frac{\alpha}{m_i} \right)^{\alpha} v_{i,j}^{\alpha - 1} e^{-\frac{\alpha v_{i,j}}{m_i}}\]

Integrating over \(m_i\) yields a closed-form posterior for \(m_i\) given the observed spending history. The predicted average transaction value for customer \(i\) is:

\[\hat{m}_i = E[m_i | \text{history}]\]

This is a weighted average of the customer’s observed average spending and the population mean, with weights proportional to the number of transactions and population heterogeneity parameter.

42.10.5 A5. Simplified CLV = (Expected Transactions) × (Expected Spend)

Combining BG/NBD and Gamma-Gamma:

\[\text{CLV}_{12m} = E[X(12 \text{ months})] \times E[\text{spend per transaction}]\]

Annualized CLV under constant annual margin and retention:

\[\text{CLV}_{\infty} = \frac{\text{CLV}_{12m} \times (1 + r)}{d + (1 - r)}\]

where \(r\) is the annual retention rate and \(d\) is the discount rate.