Note📋 Learning Objectives

By the end of this chapter, you will be able to:

  • Understand the business problem solved by lead scoring
  • Distinguish demographic, firmographic, and behavioral signals
  • Engineer features for lead scoring from raw CRM data
  • Build calibrated probability models for conversion prediction
  • Select optimal thresholds to maximize business value
  • Create lead tiers aligned to sales team capacity
  • Deploy scoring models and monitor drift with Population Stability Index
  • Integrate scores with CRM and sales workflows
  • Measure the ROI impact of lead scoring systems

22.1 What Is Lead Scoring and Why Does It Matter?

Imagine you run a sales team of eight people at a Nigerian technology company that sells enterprise software to businesses. Every week, your team receives 300 new “leads” — companies and individuals who have expressed some interest in your product: they filled in a contact form, downloaded a brochure, attended a webinar, or were referred by an existing customer.

Your eight salespeople can realistically have deep conversations with about 80 of those leads per week. So every Monday, someone has to decide: which 80 do we call? Most companies solve this problem with gut instinct and seniority. The most experienced salesperson calls the leads that “look right.” The newest recruit gets the leftovers. Weeks pass. The team discovers that some of the leads they called first were never going to buy, while a few of the “low priority” ones turned out to be very interested but were never contacted in time. A competitor called them first.

Lead scoring replaces gut instinct with a statistical model. Instead of guessing which leads are worth pursuing, you train a machine learning model on historical data: leads from the past three years, their characteristics (company size, industry, which pages they visited on your website, whether they opened your emails), and what happened — did they buy, or not? The model learns which combinations of characteristics predict a purchase. For each new lead, it produces a probability score between 0 and 100. Score 89: this company is very likely to buy. Score 12: unlikely. Score 45: worth a follow-up, but not a priority.

The sales team is now prioritised by probability. The eight reps spend their limited hours on the 80 leads most likely to convert — not the 80 that look biggest on paper, or happened to come in first, or whose name a manager recognised. The result: close rates improve, sales cycles shorten, and revenue per rep increases. In competitive B2B markets — whether you are selling ERP software to Lagos manufacturers, banking solutions to Nigerian cooperatives, or agricultural inputs to commercial farms — this kind of systematic prioritisation can be the difference between hitting and missing revenue targets.

Lead scoring is a predictive model that ranks prospective customers by their probability of converting (purchasing, signing a contract, or taking any desired action). It is the application of the classification techniques from Chapters 13–16 to one of the most commercially valuable questions in business: “Which of our prospects is most worth pursuing right now?”

Note📘 Theory: The Business Logic of Lead Scoring

Why not just call everyone? Sales resources are limited. A rep can make 20–30 meaningful calls per day. With 300 incoming leads per week, an 8-person team has time for only about 25% of them. Choosing which 25% requires a prioritisation framework.

Why not use simple rules? Traditional approaches score leads by adding points: “Company size > 500 employees = +10 points; Downloaded whitepaper = +5 points; Job title = Director = +15 points.” These point systems are subjective, difficult to calibrate, and ignore interactions between signals. A Director at a 50-person company may be a better prospect than a junior analyst at a 5,000-person company in some industries but not others. Statistical models capture these nuanced relationships automatically.

The lead scoring pipeline: 1. Collect historical lead data with known outcomes (converted or not converted) 2. Engineer features from CRM data, website analytics, email engagement 3. Train a classification model (logistic regression, XGBoost, or Random Forest) 4. Produce a probability score (0–100) for each new lead 5. Rank all leads by score; sales team works top-down 6. Monitor model performance monthly; retrain quarterly or when market changes

Expected business impact: - Close rate increases 20–30% (effort focuses on warm leads) - Sales cycle shortens 15–25% (qualified leads are closer to a purchase decision) - Revenue per rep increases 30–40% - Customer acquisition cost falls (fewer wasted calls)

Caution📝 Section 17.1 Review Questions
  1. Why can’t a sales team pursue all leads equally?
  2. What is the difference between B2B and B2C lead scoring?
  3. How would you measure the ROI of a lead scoring system?
  4. What signals predict B2B SaaS conversion?

22.2 Data for Lead Scoring: Demographics, Firmographics, Behaviors

Note📘 Theory: Types of Lead Signals

Demographic signals (about the person): - Job title, seniority level, department. - Company location (region bias in purchasing). - Years in role (proxy for influence).

Firmographic signals (about the company): - Industry, sector, subsector. - Company size (employees, revenue). - Funding stage, growth rate. - Location (geographic expansion target). - Technologies used (fit for product).

Behavioral signals (what they’ve done): - Website visits: pages, time spent, repeat visits. - Content engagement: whitepaper downloads, blog views. - Email opens/clicks on marketing campaigns. - Webinar attendance. - Demo requests. - Trial signup and trial usage (page views, features used). - Recency: how recent was their last activity? - Frequency: how often do they engage?

Signal decay and recency: Older signals decay. A demo request 6 months ago is weaker than one 2 weeks ago. Use exponential decay:

\[Signal_{recency} = Signal_{raw} \cdot \exp(-\lambda \cdot DaysSinceActivity)\]

Common λ values: 0.01 (long half-life) to 0.1 (short half-life). For SaaS, 0.05 is typical (half-life ≈ 14 days).

Caution📝 Section 17.2 Review Questions
  1. List five firmographic signals valuable for B2B SaaS lead scoring.
  2. Why should behavioral signals decay over time?
  3. How would you obtain demographic data for leads (e.g., job title, seniority)?
  4. If a lead fills out a demo request, which signal type is that?

22.3 Feature Engineering for Lead Scoring

Show code
library(tidyverse)
library(lubridate)

# Synthetic B2B SaaS lead dataset (Nigerian/Pan-African fintech context)
set.seed(2026)

n_leads <- 1000

leads_raw <- tibble(
  lead_id = paste0("LEAD_", 1:n_leads),
  company_name = sample(c("Company_A", "Company_B", "StartUp_X"), n_leads, replace = TRUE),
  company_size = sample(c("1-10", "11-50", "51-200", "200+"), n_leads, replace = TRUE,
                        prob = c(0.3, 0.35, 0.25, 0.1)),
  sector = sample(c("Fintech", "E-commerce", "Logistics", "Healthcare", "Agriculture"),
                  n_leads, replace = TRUE),
  country = sample(c("Nigeria", "Kenya", "Ghana", "South Africa"), n_leads, replace = TRUE),
  job_title = sample(c("CEO", "CFO", "Operations", "IT Manager", "Accountant"),
                     n_leads, replace = TRUE),

  # Behavioral signals (raw)
  website_visits_30d = rpois(n_leads, lambda = 5),
  emails_opened = sample(0:10, n_leads, replace = TRUE),
  demo_requested = sample(0:1, n_leads, replace = TRUE, prob = c(0.7, 0.3)),
  trial_started = sample(0:1, n_leads, replace = TRUE, prob = c(0.8, 0.2)),

  days_since_signup = sample(1:365, n_leads, replace = TRUE),
  last_engagement_days_ago = sample(1:90, n_leads, replace = TRUE),

  # Target
  converted = sample(0:1, n_leads, replace = TRUE, prob = c(0.85, 0.15))
)

cat("Raw lead dataset (n =", nrow(leads_raw), ")\n")
#> Raw lead dataset (n = 1000 )
print(head(leads_raw, 3))
#> # A tibble: 3 × 13
#>   lead_id company_name company_size sector  country job_title website_visits_30d
#>   <chr>   <chr>        <chr>        <chr>   <chr>   <chr>                  <int>
#> 1 LEAD_1  Company_A    11-50        Agricu… Ghana   CEO                       10
#> 2 LEAD_2  Company_A    11-50        Health… Ghana   CFO                        5
#> 3 LEAD_3  Company_A    11-50        Agricu… Nigeria IT Manag…                  3
#> # ℹ 6 more variables: emails_opened <int>, demo_requested <int>,
#> #   trial_started <int>, days_since_signup <int>,
#> #   last_engagement_days_ago <int>, converted <int>

# Feature engineering
leads_engineered <- leads_raw |>
  mutate(
    # 1. Categorical encoding
    company_size_small = as.integer(company_size %in% c("1-10", "11-50")),
    sector_fintech = as.integer(sector == "Fintech"),
    country_nigeria = as.integer(country == "Nigeria"),
    job_decision_maker = as.integer(job_title %in% c("CEO", "CFO")),

    # 2. Engagement composite
    total_engagements = website_visits_30d + emails_opened + demo_requested + trial_started,

    # 3. Recency decay
    signup_decay = exp(-0.05 * days_since_signup),
    activity_decay = exp(-0.05 * last_engagement_days_ago),

    # 4. Interaction features
    demo_and_trial = demo_requested * trial_started,
    high_visits_and_demo = as.integer((website_visits_30d > 10) & (demo_requested == 1)),

    # 5. Scaled features (for interpretability)
    visits_scaled = scale(website_visits_30d)[,1],
    emails_scaled = scale(emails_opened)[,1]
  ) |>
  select(-c(company_name, company_size, sector, country, job_title))  # Drop raw categoricals

cat("\nEngineered features (first 3 leads):\n")
#> 
#> Engineered features (first 3 leads):
print(leads_engineered |> select(1:15) |> head(3))
#> # A tibble: 3 × 15
#>   lead_id website_visits_30d emails_opened demo_requested trial_started
#>   <chr>                <int>         <int>          <int>         <int>
#> 1 LEAD_1                  10             1              0             0
#> 2 LEAD_2                   5             5              0             1
#> 3 LEAD_3                   3             2              1             1
#> # ℹ 10 more variables: days_since_signup <int>, last_engagement_days_ago <int>,
#> #   converted <int>, company_size_small <int>, sector_fintech <int>,
#> #   country_nigeria <int>, job_decision_maker <int>, total_engagements <int>,
#> #   signup_decay <dbl>, activity_decay <dbl>

# Feature summary
feature_cols <- setdiff(names(leads_engineered), c("lead_id", "converted"))
cat("\n## Feature Summary ##\n")
#> 
#> ## Feature Summary ##
summary_stats <- leads_engineered |>
  select(all_of(feature_cols)) |>
  summarise(across(everything(), list(mean = mean, sd = sd, min = min, max = max)))

cat("Mean values:\n")
#> Mean values:
print(leads_engineered |> select(all_of(feature_cols)) |> colMeans())
#>       website_visits_30d            emails_opened           demo_requested 
#>             5.133000e+00             5.040000e+00             2.900000e-01 
#>            trial_started        days_since_signup last_engagement_days_ago 
#>             2.060000e-01             1.814150e+02             4.493300e+01 
#>       company_size_small           sector_fintech          country_nigeria 
#>             6.590000e-01             2.120000e-01             2.510000e-01 
#>       job_decision_maker        total_engagements             signup_decay 
#>             4.270000e-01             1.066900e+01             5.038766e-02 
#>           activity_decay           demo_and_trial     high_visits_and_demo 
#>             2.183837e-01             6.300000e-02             6.000000e-03 
#>            visits_scaled            emails_scaled 
#>             1.067896e-17            -1.353431e-17

# Conversion rate by key segment
cat("\n## Conversion Rates by Segment ##\n")
#> 
#> ## Conversion Rates by Segment ##
by_demo <- leads_engineered |>
  group_by(demo_requested) |>
  summarise(conversion_rate = mean(converted), n = n())
cat("By Demo Request:\n")
#> By Demo Request:
print(by_demo)
#> # A tibble: 2 × 3
#>   demo_requested conversion_rate     n
#>            <int>           <dbl> <int>
#> 1              0           0.138   710
#> 2              1           0.183   290

by_trial <- leads_engineered |>
  group_by(trial_started) |>
  summarise(conversion_rate = mean(converted), n = n())
cat("By Trial Start:\n")
#> By Trial Start:
print(by_trial)
#> # A tibble: 2 × 3
#>   trial_started conversion_rate     n
#>           <int>           <dbl> <int>
#> 1             0           0.149   794
#> 2             1           0.160   206
Show code
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from datetime import datetime, timedelta

# Synthetic B2B SaaS lead dataset
np.random.seed(2026)

n_leads = 1000

leads_raw = pd.DataFrame({
    'lead_id': [f'LEAD_{i}' for i in range(1, n_leads + 1)],
    'company_size': np.random.choice(['1-10', '11-50', '51-200', '200+'], n_leads, p=[0.3, 0.35, 0.25, 0.1]),
    'sector': np.random.choice(['Fintech', 'E-commerce', 'Logistics', 'Healthcare', 'Agriculture'], n_leads),
    'country': np.random.choice(['Nigeria', 'Kenya', 'Ghana', 'South Africa'], n_leads),
    'job_title': np.random.choice(['CEO', 'CFO', 'Operations', 'IT Manager', 'Accountant'], n_leads),
    'website_visits_30d': np.random.poisson(5, n_leads),
    'emails_opened': np.random.choice(range(11), n_leads),
    'demo_requested': np.random.choice([0, 1], n_leads, p=[0.7, 0.3]),
    'trial_started': np.random.choice([0, 1], n_leads, p=[0.8, 0.2]),
    'days_since_signup': np.random.randint(1, 366, n_leads),
    'last_engagement_days_ago': np.random.randint(1, 91, n_leads),
    'converted': np.random.choice([0, 1], n_leads, p=[0.85, 0.15])
})

print(f"Raw lead dataset (n = {len(leads_raw)})")
#> Raw lead dataset (n = 1000)
print(leads_raw.head(3))
#>   lead_id company_size  ... last_engagement_days_ago converted
#> 0  LEAD_1         1-10  ...                       50         0
#> 1  LEAD_2        11-50  ...                       10         0
#> 2  LEAD_3         200+  ...                       75         0
#> 
#> [3 rows x 12 columns]

# Feature engineering
leads_engineered = leads_raw.copy()

# 1. Categorical encoding
leads_engineered['company_size_small'] = (leads_raw['company_size'].isin(['1-10', '11-50'])).astype(int)
leads_engineered['sector_fintech'] = (leads_raw['sector'] == 'Fintech').astype(int)
leads_engineered['country_nigeria'] = (leads_raw['country'] == 'Nigeria').astype(int)
leads_engineered['job_decision_maker'] = (leads_raw['job_title'].isin(['CEO', 'CFO'])).astype(int)

# 2. Engagement composite
leads_engineered['total_engagements'] = (
    leads_raw['website_visits_30d'] +
    leads_raw['emails_opened'] +
    leads_raw['demo_requested'] +
    leads_raw['trial_started']
)

# 3. Recency decay
leads_engineered['signup_decay'] = np.exp(-0.05 * leads_raw['days_since_signup'])
leads_engineered['activity_decay'] = np.exp(-0.05 * leads_raw['last_engagement_days_ago'])

# 4. Interaction features
leads_engineered['demo_and_trial'] = leads_raw['demo_requested'] * leads_raw['trial_started']
leads_engineered['high_visits_and_demo'] = (
    ((leads_raw['website_visits_30d'] > 10) & (leads_raw['demo_requested'] == 1)).astype(int)
)

# 5. Scaled features
scaler = StandardScaler()
leads_engineered['visits_scaled'] = scaler.fit_transform(leads_raw[['website_visits_30d']])
leads_engineered['emails_scaled'] = scaler.fit_transform(leads_raw[['emails_opened']])

# Drop high-cardinality categorical
leads_engineered = leads_engineered.drop(['company_size', 'sector', 'country', 'job_title'], axis=1)

print("\nEngineered features (first 3 leads):")
#> 
#> Engineered features (first 3 leads):
print(leads_engineered.iloc[:3, :15])
#>   lead_id  website_visits_30d  ...  signup_decay  activity_decay
#> 0  LEAD_1                   6  ...      0.000319        0.082085
#> 1  LEAD_2                   4  ...      0.060810        0.606531
#> 2  LEAD_3                   4  ...      0.012907        0.023518
#> 
#> [3 rows x 15 columns]

# Feature statistics
feature_cols = [col for col in leads_engineered.columns if col not in ['lead_id', 'converted']]
print("\n## Feature Summary ##")
#> 
#> ## Feature Summary ##
print(leads_engineered[feature_cols].describe())
#>        website_visits_30d  emails_opened  ...  visits_scaled  emails_scaled
#> count         1000.000000    1000.000000  ...   1.000000e+03   1.000000e+03
#> mean             4.962000       4.954000  ...   1.101341e-16   6.217249e-17
#> std              2.238653       3.155922  ...   1.000500e+00   1.000500e+00
#> min              0.000000       0.000000  ...  -2.217620e+00  -1.570533e+00
#> 25%              3.000000       2.000000  ...  -8.768584e-01  -9.364864e-01
#> 50%              5.000000       5.000000  ...   1.698299e-02   1.458307e-02
#> 75%              6.000000       8.000000  ...   4.639037e-01   9.656525e-01
#> max             17.000000      10.000000  ...   5.380031e+00   1.599699e+00
#> 
#> [8 rows x 17 columns]

# Conversion by segment
print("\n## Conversion Rates by Segment ##")
#> 
#> ## Conversion Rates by Segment ##
print("By Demo Request:")
#> By Demo Request:
print(leads_engineered.groupby('demo_requested')['converted'].agg(['mean', 'count']))
#>                     mean  count
#> demo_requested                 
#> 0               0.172857    700
#> 1               0.146667    300

print("\nBy Trial Start:")
#> 
#> By Trial Start:
print(leads_engineered.groupby('trial_started')['converted'].agg(['mean', 'count']))
#>                    mean  count
#> trial_started                 
#> 0              0.157431    794
#> 1              0.194175    206
Caution📝 Section 17.3 Review Questions
  1. Why encode company_size as a binary “small” feature instead of using the raw categories?
  2. How would you handle missing values in engagement features (e.g., a lead never opened an email)?
  3. In the decay formula exp(-λ × days), what is the interpretation of λ = 0.05?
  4. Why create interaction features like “high_visits_and_demo”?

22.4 Building the Scoring Model: Logistic Regression with Calibration

Logistic regression is preferred for lead scoring (over black-box models) because: - Coefficients are interpretable (sales team can understand decision drivers). - Probabilities are naturally calibrated (range 0–1, can be trusted). - Fast inference (real-time scoring).

Show code
library(caret)
library(pROC)

# Prepare training data
set.seed(2026)

X <- leads_engineered |>
  select(-c(lead_id, converted))

y <- leads_engineered$converted

# Train-test split
train_idx <- createDataPartition(y, p = 0.7, list = FALSE)[, 1]
X_train <- X[train_idx, ]
y_train <- y[train_idx]
X_test <- X[-train_idx, ]
y_test <- y[-train_idx]

# Fit logistic regression
lr_model <- glm(
  y_train ~ .,
  data = cbind(y_train, X_train),
  family = "binomial"
)

cat("Logistic Regression Summary:\n")
#> Logistic Regression Summary:
print(summary(lr_model))
#> 
#> Call:
#> glm(formula = y_train ~ ., family = "binomial", data = cbind(y_train, 
#>     X_train))
#> 
#> Coefficients: (3 not defined because of singularities)
#>                            Estimate Std. Error z value Pr(>|z|)
#> (Intercept)              -0.9832803  0.7491338  -1.313    0.189
#> website_visits_30d       -0.0002245  0.0464735  -0.005    0.996
#> emails_opened             0.0099712  0.0337840   0.295    0.768
#> demo_requested            0.2955299  0.2571299   1.149    0.250
#> trial_started            -0.3793661  0.3829546  -0.991    0.322
#> days_since_signup        -0.0003838  0.0012054  -0.318    0.750
#> last_engagement_days_ago -0.0078371  0.0090160  -0.869    0.385
#> company_size_small       -0.1620355  0.2225909  -0.728    0.467
#> sector_fintech           -0.0048754  0.2660172  -0.018    0.985
#> country_nigeria          -0.1288782  0.2477831  -0.520    0.603
#> job_decision_maker       -0.0664834  0.2187345  -0.304    0.761
#> total_engagements                NA         NA      NA       NA
#> signup_decay              0.0225534  0.8174623   0.028    0.978
#> activity_decay           -1.4843283  0.9971940  -1.489    0.137
#> demo_and_trial            0.8348795  0.5593666   1.493    0.136
#> high_visits_and_demo      0.2559815  1.2455829   0.206    0.837
#> visits_scaled                    NA         NA      NA       NA
#> emails_scaled                    NA         NA      NA       NA
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 591.79  on 699  degrees of freedom
#> Residual deviance: 580.30  on 685  degrees of freedom
#> AIC: 610.3
#> 
#> Number of Fisher Scoring iterations: 4

# Predictions
pred_train <- predict(lr_model, cbind(y_train = 0, X_train), type = "response")
pred_test <- predict(lr_model, cbind(y_test = 0, X_test), type = "response")

# Evaluation
auc_train <- auc(y_train, pred_train)
auc_test <- auc(y_test, pred_test)

cat("\n## Model Performance ##\n")
#> 
#> ## Model Performance ##
cat("Train AUC:", round(as.numeric(auc_train), 4), "\n")
#> Train AUC: 0.5918
cat("Test AUC:", round(as.numeric(auc_test), 4), "\n")
#> Test AUC: 0.5188

# Calibration curve
cal_df <- data.frame(
  pred = pred_test,
  actual = y_test
) |>
  mutate(
    bin = cut(pred, breaks = seq(0, 1, 0.1), include.lowest = TRUE)
  ) |>
  group_by(bin) |>
  summarise(
    mean_pred = mean(pred),
    actual_rate = mean(actual),
    n = n()
  )

ggplot(cal_df, aes(x = mean_pred, y = actual_rate)) +
  geom_point(aes(size = n), alpha = 0.6) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Calibration Curve",
       x = "Predicted Probability", y = "Actual Conversion Rate",
       size = "# Leads") +
  xlim(0, 1) + ylim(0, 1) +
  theme_minimal()

Show code
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, roc_curve, auc, brier_score_loss

# Prepare data
feature_cols = [col for col in leads_engineered.columns if col not in ['lead_id', 'converted']]
X = leads_engineered[feature_cols]
y = leads_engineered['converted']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2026)

# Fit logistic regression
lr_model = LogisticRegression(max_iter=1000, random_state=2026)
lr_model.fit(X_train, y_train)
LogisticRegression(max_iter=1000, random_state=2026)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Show code

# Predictions
pred_train = lr_model.predict_proba(X_train)[:, 1]
pred_test = lr_model.predict_proba(X_test)[:, 1]

# Evaluation
auc_train = roc_auc_score(y_train, pred_train)
auc_test = roc_auc_score(y_test, pred_test)
brier_test = brier_score_loss(y_test, pred_test)

print("## Logistic Regression Model ##")
#> ## Logistic Regression Model ##
print(f"Train AUC: {auc_train:.4f}")
#> Train AUC: 0.6073
print(f"Test AUC: {auc_test:.4f}")
#> Test AUC: 0.4785
print(f"Brier Score (Test): {brier_test:.4f}")
#> Brier Score (Test): 0.1365
print()

# Feature importance (coefficients)
coef_df = pd.DataFrame({
    'Feature': feature_cols,
    'Coefficient': lr_model.coef_[0],
    'Odds Ratio': np.exp(lr_model.coef_[0])
}).sort_values('Coefficient', key=abs, ascending=False)

print("Top Feature Coefficients:")
#> Top Feature Coefficients:
print(coef_df.head(10))
#>                  Feature  Coefficient  Odds Ratio
#> 13        demo_and_trial     0.601301    1.824490
#> 7         sector_fintech     0.456052    1.577832
#> 11          signup_decay    -0.342743    0.709820
#> 14  high_visits_and_demo    -0.309695    0.733671
#> 2         demo_requested    -0.257534    0.772956
#> 6     company_size_small    -0.234161    0.791235
#> 12        activity_decay     0.227099    1.254954
#> 9     job_decision_maker     0.173603    1.189584
#> 15         visits_scaled     0.141755    1.152294
#> 16         emails_scaled     0.110262    1.116571
print()

# Calibration curve
from sklearn.calibration import calibration_curve

prob_true, prob_pred = calibration_curve(y_test, pred_test, n_bins=10)

fig, ax = plt.subplots(figsize=(10, 8))
ax.plot([0, 1], [0, 1], 'k--', label='Perfect Calibration')
ax.plot(prob_pred, prob_true, 'o-', linewidth=2, markersize=8, label='Logistic Regression')
ax.set_xlabel('Predicted Probability')
ax.set_ylabel('Actual Conversion Rate')
ax.set_title('Calibration Curve: Lead Scoring Model')
ax.legend()
ax.grid(alpha=0.3)
ax.set_xlim([0, 1])
#> (0.0, 1.0)
ax.set_ylim([0, 1])
#> (0.0, 1.0)
plt.tight_layout()
plt.savefig('calibration_curve.png', dpi=100, bbox_inches='tight')
plt.show()

Caution📝 Section 17.4 Review Questions
  1. What does a calibration curve show? Why is it important?
  2. If predicted probability is 0.4 but actual conversion rate is 0.6, what does this indicate?
  3. How would you use feature coefficients to explain to a sales leader why a lead is prioritized?
  4. Why is logistic regression preferred over XGBoost for lead scoring?

22.5 Threshold and Tier Selection: Converting Probability to Action

A predicted probability (0.3, 0.5, 0.7) is not directly actionable. We must convert it to a tier (Hot/Warm/Cold) with a corresponding action (sales call, nurture email, etc.).

Note📘 Theory: Threshold Selection for Business Value

Cost-benefit framework: - True Positive (Sales pursues, lead converts): +\(X revenue. - False Positive (Sales pursues, lead doesn't convert): -\)Y cost (sales time). - False Negative (Sales doesn’t pursue, lead converts): -$Z opportunity cost. - True Negative (Sales doesn’t pursue, lead doesn’t convert): $0 (no cost, no benefit).

Expected value per prediction at threshold t: \[EV(t) = P(\text{TP}|t) \cdot X + P(\text{FP}|t) \cdot (-Y) + P(\text{FN}|t) \cdot (-Z)\]

Choose t to maximize expected value.

In practice: 1. Estimate costs from business (what’s the cost of a sales call? typical deal value?). 2. Compute expected value across thresholds using confusion matrices. 3. Select threshold that maximizes EV. 4. Define tiers using evenly-spaced probabilities or cost-based thresholds.

Precision-recall trade-off: - High threshold: Few leads flagged as Hot (high precision, low recall). Sales team doesn’t get false alarms but misses opportunities. - Low threshold: Many leads flagged as Hot (low precision, high recall). Sales pursues more but wastes time on unlikely converts.

Tip🔑 Key Formula: Expected Value Optimization

\[\text{Optimal Threshold} = \arg\max_t \left[ P(\text{TP}|t) \cdot Value - P(\text{FP}|t) \cdot Cost \right]\]

22.5.1 Threshold Selection and Tier Assignment

Show code
# Threshold selection based on ROI

# Assume costs/values
deal_value <- 100000      # ₦100,000 average deal
sales_cost_per_hour <- 500  # ₦500/hr
hours_per_lead <- 5       # 5 hours to qualify a lead
total_sales_cost <- sales_cost_per_hour * hours_per_lead  # ₦2,500

# Compute expected value across thresholds
thresholds <- seq(0.1, 0.9, by = 0.05)
ev_results <- data.frame()

get_cm <- function(cm, row, col) {
  if (row %in% rownames(cm) && col %in% colnames(cm)) cm[row, col] else 0L
}

for (t in thresholds) {
  pred_class <- as.integer(pred_test >= t)

  cm <- table(Actual = y_test, Predicted = pred_class)
  tn <- get_cm(cm, "0", "0")
  fp <- get_cm(cm, "0", "1")
  fn <- get_cm(cm, "1", "0")
  tp <- get_cm(cm, "1", "1")

  # Expected value
  ev <- (tp * deal_value) - (fp * total_sales_cost) - (fn * deal_value * 0.1)  # 10% opportunity cost

  ev_results <- rbind(ev_results, data.frame(
    threshold = t,
    tp = tp,
    fp = fp,
    fn = fn,
    precision = tp / (tp + fp + 0.001),
    recall = tp / (tp + fn + 0.001),
    ev = ev
  ))
}

cat("Threshold Optimization Results:\n")
#> Threshold Optimization Results:
print(ev_results |> arrange(desc(ev)) |> head(5))
#>   threshold tp  fp fn precision     recall      ev
#> 1      0.10 41 221  5 0.1564880 0.89128497 3497500
#> 2      0.15 18 103 28 0.1487591 0.39129584 1262500
#> 3      0.20  4  33 42 0.1081052 0.08695463 -102500
#> 4      0.25  2  13 44 0.1333244 0.04347732 -272500
#> 5      0.35  0   0 46 0.0000000 0.00000000 -460000

optimal_threshold <- ev_results$threshold[which.max(ev_results$ev)]
cat("\nOptimal threshold:", optimal_threshold, "\n")
#> 
#> Optimal threshold: 0.1

# Define tiers based on threshold
leads_with_scores <- data.frame(
  lead_id = leads_engineered$lead_id[-train_idx],
  predicted_prob = pred_test,
  actual = y_test
) |>
  mutate(
    tier = case_when(
      predicted_prob >= optimal_threshold ~ "Hot",
      predicted_prob >= optimal_threshold * 0.6 ~ "Warm",
      TRUE ~ "Cold"
    )
  )

# Tier statistics
cat("\n## Lead Tier Distribution and Performance ##\n")
#> 
#> ## Lead Tier Distribution and Performance ##
tier_stats <- leads_with_scores |>
  group_by(tier) |>
  summarise(
    n_leads = n(),
    pct_total = n() / nrow(leads_with_scores) * 100,
    conversion_rate = mean(actual),
    avg_score = mean(predicted_prob)
  )

print(tier_stats)
#> # A tibble: 3 × 5
#>   tier  n_leads pct_total conversion_rate avg_score
#>   <chr>   <int>     <dbl>           <dbl>     <dbl>
#> 1 Cold        3       1             0        0.0545
#> 2 Hot       262      87.3           0.156    0.158 
#> 3 Warm       35      11.7           0.143    0.0828

# Expected revenue
cat("\n## Expected Revenue by Tier ##\n")
#> 
#> ## Expected Revenue by Tier ##
revenue_by_tier <- tier_stats |>
  mutate(
    expected_conversions = n_leads * conversion_rate,
    expected_revenue = expected_conversions * deal_value,
    investigation_cost = n_leads * total_sales_cost,
    net_revenue = expected_revenue - investigation_cost
  )

print(revenue_by_tier |> select(tier, expected_conversions, expected_revenue, investigation_cost, net_revenue))
#> # A tibble: 3 × 5
#>   tier  expected_conversions expected_revenue investigation_cost net_revenue
#>   <chr>                <dbl>            <dbl>              <dbl>       <dbl>
#> 1 Cold                     0                0               7500       -7500
#> 2 Hot                     41          4100000             655000     3445000
#> 3 Warm                     5           500000              87500      412500
Show code
# Threshold selection based on cost-benefit

from sklearn.metrics import confusion_matrix

# Business parameters
deal_value = 100000  # ₦100,000
sales_cost_per_hour = 500
hours_per_lead = 5
total_sales_cost = sales_cost_per_hour * hours_per_lead  # ₦2,500

# Compute EV across thresholds
thresholds = np.arange(0.1, 0.95, 0.05)
ev_results = []

for t in thresholds:
    pred_class = (pred_test >= t).astype(int)

    cm = confusion_matrix(y_test, pred_class, labels=[0, 1])
    tn, fp, fn, tp = cm.ravel()

    # Expected value
    ev = (tp * deal_value) - (fp * total_sales_cost) - (fn * deal_value * 0.1)

    precision = tp / (tp + fp + 0.001)
    recall = tp / (tp + fn + 0.001)

    ev_results.append({
        'threshold': t,
        'tp': tp,
        'fp': fp,
        'fn': fn,
        'precision': precision,
        'recall': recall,
        'ev': ev
    })

ev_df = pd.DataFrame(ev_results)
print("Threshold Optimization Results (Top 5):")
#> Threshold Optimization Results (Top 5):
print(ev_df.nlargest(5, 'ev')[['threshold', 'tp', 'fp', 'ev']])
#>    threshold  tp   fp         ev
#> 0       0.10  45  240  3870000.0
#> 1       0.15  23  153  1667500.0
#> 2       0.20  11   49   607500.0
#> 3       0.25   5   19    22500.0
#> 5       0.35   0    0  -480000.0

optimal_threshold = ev_df.loc[ev_df['ev'].idxmax(), 'threshold']
print(f"\nOptimal Threshold: {optimal_threshold:.2f}")
#> 
#> Optimal Threshold: 0.10

# Assign tiers
leads_with_scores = pd.DataFrame({
    'lead_id': leads_engineered.iloc[-len(y_test):]['lead_id'].values,
    'predicted_prob': pred_test,
    'actual': y_test.values
})

leads_with_scores['tier'] = pd.cut(
    leads_with_scores['predicted_prob'],
    bins=[0, optimal_threshold * 0.6, optimal_threshold, 1.0],
    labels=['Cold', 'Warm', 'Hot'],
    include_lowest=True
)

print("\n## Lead Tier Distribution and Performance ##")
#> 
#> ## Lead Tier Distribution and Performance ##
tier_stats = leads_with_scores.groupby('tier').agg({
    'predicted_prob': 'count',
    'actual': 'mean'
}).rename(columns={'predicted_prob': 'n_leads', 'actual': 'conversion_rate'})

tier_stats['pct_total'] = tier_stats['n_leads'] / len(leads_with_scores) * 100
tier_stats = tier_stats.reset_index()

print(tier_stats)
#>    tier  n_leads  conversion_rate  pct_total
#> 0  Cold        0              NaN        0.0
#> 1  Warm       15         0.200000        5.0
#> 2   Hot      285         0.157895       95.0

# Revenue impact
print("\n## Expected Revenue by Tier ##")
#> 
#> ## Expected Revenue by Tier ##
revenue_df = tier_stats.copy()
revenue_df['expected_conversions'] = revenue_df['n_leads'] * revenue_df['conversion_rate']
revenue_df['expected_revenue'] = revenue_df['expected_conversions'] * deal_value
revenue_df['investigation_cost'] = revenue_df['n_leads'] * total_sales_cost
revenue_df['net_revenue'] = revenue_df['expected_revenue'] - revenue_df['investigation_cost']

print(revenue_df[['tier', 'expected_conversions', 'expected_revenue', 'investigation_cost', 'net_revenue']])
#>    tier  expected_conversions  ...  investigation_cost  net_revenue
#> 0  Cold                   NaN  ...                   0          NaN
#> 1  Warm                   3.0  ...               37500     262500.0
#> 2   Hot                  45.0  ...              712500    3787500.0
#> 
#> [3 rows x 5 columns]
Caution📝 Section 17.5 Review Questions
  1. In the cost-benefit framework, what is “opportunity cost”? How would you estimate it?
  2. If sales cost per lead is ₦2,500 but deal value is ₦100,000, why pursue any low-probability leads?
  3. How would you adjust tier thresholds if the sales team’s capacity doubles?
  4. What is the “precision-recall trade-off” in lead scoring?

22.6 Deploying and Monitoring Lead Scores

Once in production, scores drift as market conditions change. Population Stability Index (PSI) detects drift.

Note📘 Theory: Population Stability Index (PSI)

Motivation: A model trained on 2026 leads may not predict 2027 leads if buyer behavior shifts. We monitor this with PSI.

Definition: \[PSI = \sum_{i=1}^{n} (P_i^{actual} - P_i^{expected}) \log\left(\frac{P_i^{actual}}{P_i^{expected}}\right)\]

where: - Divide predictions into n bins (e.g., deciles). - \(P_i^{expected}\) is the frequency of bin i in training data. - \(P_i^{actual}\) is the frequency of bin i in recent data (e.g., last month).

Interpretation: - PSI < 0.1: No significant shift (all is well). - PSI 0.1–0.25: Small shift (monitor). - PSI > 0.25: Large shift (retrain model).

Deployment workflow: 1. Train model on historical data. 2. Score all new leads daily/weekly. 3. Compute PSI monthly. 4. If PSI > 0.25, flag for retraining. 5. Retrain on latest data (last 6–12 months).

22.6.1 Monitoring Script

Show code
# Population Stability Index

compute_psi <- function(expected, actual, bins = 10) {
  # Binning
  breaks <- quantile(expected, probs = seq(0, 1, 1/bins))
  breaks[1] <- -Inf
  breaks[length(breaks)] <- Inf

  expected_bin <- cut(expected, breaks = breaks)
  actual_bin <- cut(actual, breaks = breaks)

  # Frequencies
  expected_freq <- table(expected_bin) / length(expected)
  actual_freq <- table(actual_bin) / length(actual)

  # Ensure same bins
  all_bins <- union(names(expected_freq), names(actual_freq))
  expected_freq <- expected_freq[all_bins]
  actual_freq <- actual_freq[all_bins]
  expected_freq[is.na(expected_freq) | expected_freq == 0] <- 0.0001
  actual_freq[is.na(actual_freq)     | actual_freq   == 0] <- 0.0001

  # PSI
  psi <- sum((actual_freq - expected_freq) * log(actual_freq / expected_freq))
  return(psi)
}

# Simulate model drift: 2026 baseline vs. 2027 data
set.seed(2026)
baseline_scores <- pred_test  # 2026 data

# Simulate 2027 data with drift (leads are easier/harder to convert)
drift_scores <- baseline_scores + rnorm(length(baseline_scores), mean = 0.1, sd = 0.05)
drift_scores <- pmax(0, pmin(1, drift_scores))  # Bound to [0, 1]

# Compute PSI
psi_value <- compute_psi(baseline_scores, drift_scores)

cat("=== MONITORING: POPULATION STABILITY INDEX ===\n\n")
#> === MONITORING: POPULATION STABILITY INDEX ===
cat("Baseline (2026) mean score:", round(mean(baseline_scores), 4), "\n")
#> Baseline (2026) mean score: 0.1479
cat("Recent (2027) mean score:", round(mean(drift_scores), 4), "\n")
#> Recent (2027) mean score: 0.25
cat("PSI:", round(psi_value, 4), "\n\n")
#> PSI: 3.1917

if (psi_value < 0.1) {
  status <- "✓ No drift (PSI < 0.1)"
} else if (psi_value < 0.25) {
  status <- "⚠ Minor drift (0.1 ≤ PSI < 0.25): Monitor"
} else {
  status <- "✗ Major drift (PSI ≥ 0.25): Retrain recommended"
}

cat("Status:", status, "\n")
#> Status: ✗ Major drift (PSI ≥ 0.25): Retrain recommended
Show code
def compute_psi(expected, actual, bins=10):
    """Compute Population Stability Index."""
    # Binning
    breaks = np.percentile(expected, np.linspace(0, 100, bins + 1))
    breaks[0] = -np.inf
    breaks[-1] = np.inf

    expected_bin = np.digitize(expected, breaks)
    actual_bin = np.digitize(actual, breaks)

    # Frequencies
    expected_freq = np.bincount(expected_bin) / len(expected)
    actual_freq = np.bincount(actual_bin) / len(actual)

    # Ensure same length
    max_len = max(len(expected_freq), len(actual_freq))
    expected_freq = np.pad(expected_freq, (0, max_len - len(expected_freq)), constant_values=0.0001)
    actual_freq = np.pad(actual_freq, (0, max_len - len(actual_freq)), constant_values=0.0001)

    # Replace zeros with small epsilon to avoid log(0)
    expected_freq = np.where(expected_freq == 0, 0.0001, expected_freq)
    actual_freq   = np.where(actual_freq   == 0, 0.0001, actual_freq)

    # PSI
    psi = np.sum((actual_freq - expected_freq) * np.log(actual_freq / expected_freq))
    return psi

# Simulate drift
np.random.seed(2026)
baseline_scores = pred_test  # 2026 baseline

# 2027 data with drift
drift_scores = baseline_scores + np.random.normal(0.1, 0.05, len(baseline_scores))
drift_scores = np.clip(drift_scores, 0, 1)

# Compute PSI
psi_value = compute_psi(baseline_scores, drift_scores)

print("=== MONITORING: POPULATION STABILITY INDEX ===\n")
#> === MONITORING: POPULATION STABILITY INDEX ===
print(f"Baseline (2026) mean score: {baseline_scores.mean():.4f}")
#> Baseline (2026) mean score: 0.1663
print(f"Recent (2027) mean score: {drift_scores.mean():.4f}")
#> Recent (2027) mean score: 0.2695
print(f"PSI: {psi_value:.4f}\n")
#> PSI: 2.3309

if psi_value < 0.1:
    status = "✓ No drift (PSI < 0.1)"
elif psi_value < 0.25:
    status = "⚠ Minor drift (0.1 ≤ PSI < 0.25): Monitor"
else:
    status = "✗ Major drift (PSI ≥ 0.25): Retrain recommended"

print(f"Status: {status}")
#> Status: ✗ Major drift (PSI ≥ 0.25): Retrain recommended
Caution📝 Section 17.6 Review Questions
  1. What are typical causes of score drift in lead scoring?
  2. If PSI = 0.15, what action would you take?
  3. How frequently should you monitor PSI in production?
  4. What data should you use to retrain after drift detection?

22.7 Case Study: B2B Lead Scoring for a Pan-African Fintech

Complete pipeline: data preprocessing → feature engineering → model training → threshold tuning → business recommendations.

Show code
cat("=== CASE STUDY: B2B LEAD SCORING FOR PAN-AFRICAN FINTECH ===\n\n")
#> === CASE STUDY: B2B LEAD SCORING FOR PAN-AFRICAN FINTECH ===

cat("## Step 1: Data Summary ##\n")
#> ## Step 1: Data Summary ##
cat("Total leads:", nrow(leads_engineered), "\n")
#> Total leads: 1000
cat("Conversions:", sum(leads_engineered$converted), "\n")
#> Conversions: 151
cat("Base conversion rate:", round(mean(leads_engineered$converted), 4), "\n\n")
#> Base conversion rate: 0.151

cat("## Step 2: Model Performance ##\n")
#> ## Step 2: Model Performance ##
cat("Test AUC:", round(as.numeric(auc_test), 4), "\n")
#> Test AUC: 0.5188
cat("Brier Score:", round(mean((pred_test - y_test)^2), 4), "\n\n")
#> Brier Score: 0.1329

# Top features
lr_coefs <- coef(lr_model)[-1]  # Exclude intercept
top_features <- names(sort(abs(lr_coefs), decreasing = TRUE)[1:5])

cat("## Step 3: Top 5 Most Predictive Features ##\n")
#> ## Step 3: Top 5 Most Predictive Features ##
for (feat in top_features) {
  cat(sprintf("%s: %+.4f (Odds Ratio: %.2f)\n",
              feat,
              lr_coefs[feat],
              exp(lr_coefs[feat])))
}
#> activity_decay: -1.4843 (Odds Ratio: 0.23)
#> demo_and_trial: +0.8349 (Odds Ratio: 2.30)
#> trial_started: -0.3794 (Odds Ratio: 0.68)
#> demo_requested: +0.2955 (Odds Ratio: 1.34)
#> high_visits_and_demo: +0.2560 (Odds Ratio: 1.29)

cat("\n## Step 4: Lead Tier Analysis ##\n")
#> 
#> ## Step 4: Lead Tier Analysis ##
print(tier_stats)
#> # A tibble: 3 × 5
#>   tier  n_leads pct_total conversion_rate avg_score
#>   <chr>   <int>     <dbl>           <dbl>     <dbl>
#> 1 Cold        3       1             0        0.0545
#> 2 Hot       262      87.3           0.156    0.158 
#> 3 Warm       35      11.7           0.143    0.0828

cat("\n## Step 5: Business Recommendations ##\n")
#> 
#> ## Step 5: Business Recommendations ##

# Top Hot leads for immediate pursuit
top_hot_leads <- leads_with_scores |>
  filter(tier == "Hot") |>
  arrange(desc(predicted_prob)) |>
  head(10)

cat("\nTop 10 Hot leads (ready for sales outreach):\n")
#> 
#> Top 10 Hot leads (ready for sales outreach):
print(top_hot_leads |> select(lead_id, predicted_prob, tier))
#>      lead_id predicted_prob tier
#> 269 LEAD_917      0.3174784  Hot
#> 169 LEAD_618      0.2939429  Hot
#> 267 LEAD_914      0.2867694  Hot
#> 162 LEAD_576      0.2810086  Hot
#> 144 LEAD_491      0.2794254  Hot
#> 148 LEAD_507      0.2773991  Hot
#> 189 LEAD_678      0.2735761  Hot
#> 48  LEAD_156      0.2716951  Hot
#> 258 LEAD_889      0.2674437  Hot
#> 174 LEAD_637      0.2651086  Hot

# Revenue opportunity
total_hot_revenue <- revenue_by_tier |>
  filter(tier == "Hot") |>
  pull(expected_revenue)

total_warm_revenue <- revenue_by_tier |>
  filter(tier == "Warm") |>
  pull(expected_revenue)

cat(sprintf("\nRevenue Opportunity:\n"))
#> 
#> Revenue Opportunity:
cat(sprintf("  Hot tier (immediate pursuit): ₦%.0f\n", total_hot_revenue))
#>   Hot tier (immediate pursuit): ₦4100000
cat(sprintf("  Warm tier (nurture track): ₦%.0f\n", total_warm_revenue))
#>   Warm tier (nurture track): ₦500000
cat(sprintf("  Total potential: ₦%.0f\n\n", total_hot_revenue + total_warm_revenue))
#>   Total potential: ₦4600000

# Sales allocation
hot_leads_count <- tier_stats |> filter(tier == "Hot") |> pull(n_leads)
cat(sprintf("Recommended Sales Allocation:\n"))
#> Recommended Sales Allocation:
cat(sprintf("  Assign %d sales reps to %d Hot leads (%.1f leads/rep)\n",
            ceiling(hot_leads_count / 20), hot_leads_count, hot_leads_count / ceiling(hot_leads_count / 20)))
#>   Assign 14 sales reps to 262 Hot leads (18.7 leads/rep)

cat("\n## Step 6: Deployment Checklist ##\n")
#> 
#> ## Step 6: Deployment Checklist ##
cat("[ ] Model integrated into CRM\n")
#> [ ] Model integrated into CRM
cat("[ ] Scores refresh daily\n")
#> [ ] Scores refresh daily
cat("[ ] PSI monitoring active (threshold: 0.25)\n")
#> [ ] PSI monitoring active (threshold: 0.25)
cat("[ ] Sales team trained on tier definitions\n")
#> [ ] Sales team trained on tier definitions
cat("[ ] Feedback loop: track conversion vs. prediction\n")
#> [ ] Feedback loop: track conversion vs. prediction
cat("[ ] Retrain schedule: quarterly (or when PSI > 0.25)\n")
#> [ ] Retrain schedule: quarterly (or when PSI > 0.25)
Show code
print("=" * 70)
#> ======================================================================
print("CASE STUDY: B2B LEAD SCORING FOR PAN-AFRICAN FINTECH")
#> CASE STUDY: B2B LEAD SCORING FOR PAN-AFRICAN FINTECH
print("=" * 70)
#> ======================================================================
print()

print("## Step 1: Data Summary ##")
#> ## Step 1: Data Summary ##
print(f"Total leads: {len(leads_engineered)}")
#> Total leads: 1000
print(f"Conversions: {leads_engineered['converted'].sum()}")
#> Conversions: 165
print(f"Base conversion rate: {leads_engineered['converted'].mean():.4f}\n")
#> Base conversion rate: 0.1650

print("## Step 2: Model Performance ##")
#> ## Step 2: Model Performance ##
print(f"Test AUC: {auc_test:.4f}")
#> Test AUC: 0.4785
print(f"Brier Score: {brier_test:.4f}\n")
#> Brier Score: 0.1365

print("## Step 3: Top 5 Most Predictive Features ##")
#> ## Step 3: Top 5 Most Predictive Features ##
top_coef = coef_df.head(5)
for _, row in top_coef.iterrows():
    print(f"{row['Feature']}: {row['Coefficient']:+.4f} (Odds Ratio: {row['Odds Ratio']:.2f})")
#> demo_and_trial: +0.6013 (Odds Ratio: 1.82)
#> sector_fintech: +0.4561 (Odds Ratio: 1.58)
#> signup_decay: -0.3427 (Odds Ratio: 0.71)
#> high_visits_and_demo: -0.3097 (Odds Ratio: 0.73)
#> demo_requested: -0.2575 (Odds Ratio: 0.77)

print("\n## Step 4: Lead Tier Analysis ##")
#> 
#> ## Step 4: Lead Tier Analysis ##
print(tier_stats.to_string(index=False))
#> tier  n_leads  conversion_rate  pct_total
#> Cold        0              NaN        0.0
#> Warm       15         0.200000        5.0
#>  Hot      285         0.157895       95.0

print("\n## Step 5: Business Recommendations ##")
#> 
#> ## Step 5: Business Recommendations ##

top_hot = leads_with_scores[leads_with_scores['tier'] == 'Hot'].nlargest(10, 'predicted_prob')
print("\nTop 10 Hot leads (ready for sales outreach):")
#> 
#> Top 10 Hot leads (ready for sales outreach):
print(top_hot[['lead_id', 'predicted_prob', 'tier']].to_string(index=False))
#>   lead_id  predicted_prob tier
#>  LEAD_830        0.324333  Hot
#>  LEAD_759        0.313935  Hot
#> LEAD_1000        0.305582  Hot
#>  LEAD_989        0.299978  Hot
#>  LEAD_972        0.298514  Hot
#>  LEAD_712        0.294243  Hot
#>  LEAD_950        0.291669  Hot
#>  LEAD_808        0.288975  Hot
#>  LEAD_736        0.288905  Hot
#>  LEAD_838        0.286801  Hot

hot_revenue = revenue_df[revenue_df['tier'] == 'Hot']['expected_revenue'].sum()
warm_revenue = revenue_df[revenue_df['tier'] == 'Warm']['expected_revenue'].sum()

print(f"\nRevenue Opportunity:")
#> 
#> Revenue Opportunity:
print(f"  Hot tier (immediate pursuit): ₦{hot_revenue:,.0f}")
#>   Hot tier (immediate pursuit): ₦4,500,000
print(f"  Warm tier (nurture track): ₦{warm_revenue:,.0f}")
#>   Warm tier (nurture track): ₦300,000
print(f"  Total potential: ₦{hot_revenue + warm_revenue:,.0f}\n")
#>   Total potential: ₦4,800,000

hot_count = (leads_with_scores['tier'] == 'Hot').sum()
reps_needed = max(1, hot_count // 20)
print(f"Recommended Sales Allocation:")
#> Recommended Sales Allocation:
print(f"  Assign {reps_needed} sales reps to {hot_count} Hot leads ({hot_count / reps_needed:.1f} leads/rep)")
#>   Assign 14 sales reps to 285 Hot leads (20.4 leads/rep)

print("\n## Step 6: Deployment Checklist ##")
#> 
#> ## Step 6: Deployment Checklist ##
print("[ ] Model integrated into CRM")
#> [ ] Model integrated into CRM
print("[ ] Scores refresh daily")
#> [ ] Scores refresh daily
print("[ ] PSI monitoring active (threshold: 0.25)")
#> [ ] PSI monitoring active (threshold: 0.25)
print("[ ] Sales team trained on tier definitions")
#> [ ] Sales team trained on tier definitions
print("[ ] Feedback loop: track conversion vs. prediction")
#> [ ] Feedback loop: track conversion vs. prediction
print("[ ] Retrain schedule: quarterly (or when PSI > 0.25)")
#> [ ] Retrain schedule: quarterly (or when PSI > 0.25)
Caution📝 Section 17.7 Review Questions
  1. In the case study, what is the expected revenue impact of the lead scoring system?
  2. How would you adjust tier thresholds if sales team feedback suggests too many false positives?
  3. What would you do if 6 months later, PSI = 0.28?

22.8 Chapter 17 Exercises

  1. Feature engineering: Create 10 new features for lead scoring from raw CRM data.

  2. Threshold optimization: Calculate optimal threshold using precision-recall trade-off (F1 score).

  3. Tiering strategy: Define 4 tiers (Platinum, Gold, Silver, Bronze) with different nurture actions.

  4. ROI calculation: Estimate the revenue lift from lead scoring (conversion rate improvement × deal volume).

  5. Calibration analysis: Plot calibration curves for different feature sets; which is best calibrated?

  6. Segment analysis: Build separate models for different sectors (fintech vs. e-commerce). Do they differ?

  7. Champion-challenger test: A/B test the new lead scoring model against the old approach for 1 month.

  8. CRM integration: Design an API contract for scoring leads in real-time.

  9. Fairness audit: Check if score distributions are similar across geographies (bias check).

  10. Long-term tracking: Score leads now, follow up in 6 months to measure conversion rate vs. prediction.

22.9 Further Reading

  1. Blattberg, R. C., Kim, B. D., & Neslin, S. A. (2008). Database Marketing: Analyzing and Managing Customers. Springer. — Classic reference on customer lifetime value and scoring.

  2. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer. — Chapter on logistic regression and classification.

  3. Naeem Siddiqi. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. John Wiley & Sons. — Detailed guide to scorecard building.

  4. Verbeke, W., Martens, D., & Baesens, B. (2011). “Social network analysis for customer churn prediction.” Journal of the Operational Research Society, 62(1), 28–46. — Network effects in churn prediction.

  5. Kaggle Lead Scoring Competitions. [https://www.kaggle.com] — Real-world lead scoring datasets and solutions.

22.10 Chapter 17 Appendix

22.10.1 A. Platt Scaling for Probability Calibration

If a model’s predicted probabilities are poorly calibrated (e.g., always too high), we can apply Platt scaling: fit a logistic regression on the model’s outputs.

Before calibration: \[\hat{p} = f_{model}(x)\]

Platt scaling: \[\hat{p}_{calibrated} = \frac{1}{1 + e^{-(a \cdot \hat{p} + b)}}\]

Estimate a, b on a hold-out validation set by fitting logistic regression with \(\hat{p}\) as input and true y as output.

Advantages: - Simple, interpretable. - Preserves AUC (only shifts probabilities).

Disadvantage: - Requires additional hold-out data.

22.10.2 B. Isotonic Regression for Calibration

For non-monotonic miscalibration, isotonic regression is more flexible:

\[y_{cal} = f_{iso}(\hat{p})\]

where \(f_{iso}\) is a monotonically increasing function fit to minimize residuals. This requires more data but is less restrictive than Platt scaling.

22.10.3 C. Population Stability Index: Alternative Interpretation

PSI can be decomposed into two components:

\[PSI = \underbrace{\sum (P_i^{actual} - P_i^{expected}) \log(P_i^{actual})}_{\text{Distribution shift}} + \underbrace{\sum (P_i^{expected} - P_i^{actual}) \log(P_i^{expected})}_{\text{Reference shift}}\]

This separates changes in the distribution from changes in the reference baseline, helping diagnose whether drift is due to new lead characteristics or model miscalibration.


End of Chapter 17