40 Financial Risk Analytics and Credit Default

Author

Bongo Adi

📋 Learning Objectives

Understand the business context and regulatory framework for credit risk in Nigerian banking
Master Weight of Evidence (WOE) and Information Value (IV) for variable selection and encoding
Build logistic regression credit scorecards compliant with banking standards
Implement machine learning models (XGBoost) for credit prediction
Apply model validation techniques (Gini, KS statistic, PSI) appropriate for credit risk
Estimate Expected Credit Loss (ECL) under IFRS 9 framework

40.1 The Business of Credit Risk

Credit risk is the risk that a borrower will default on their obligations, failing to repay principal or interest. For banks, managing credit risk is existential: a wave of loan defaults erodes capital, can trigger regulatory action, and in extreme cases causes insolvency. In Nigeria, the Central Bank of Nigeria (CBN) sets prudential guidelines requiring banks to maintain adequate capital reserves and monitor non-performing loan (NPL) ratios. The NPL ratio—the percentage of loans classified as default or doubtful—is a key regulatory metric. During the 2008 financial crisis, global NPL ratios spiked; in Nigeria, the oil price collapse and macroeconomic stress pushed many banks’ NPL ratios above 5%. The goal of credit risk management is to prevent defaults before they occur through rigorous origination (does this borrower have capacity and willingness to repay?), ongoing monitoring (is the borrower’s financial condition deteriorating?), and collections (if default occurs, can we recover quickly?).

The cost of default is substantial. A bank must provision (set aside capital) for expected losses on non-performing loans, which reduces profits. Under the Basel III capital accord (adopted in Nigeria via CBN guidelines), banks must hold capital proportional to their risk-weighted assets, with higher weights for riskier loans. An unexpected default triggers regulatory scrutiny, stress tests, and potential capital injections. A loan is classified in stages: Stage 1 (performing, 12-month expected loss), Stage 2 (underperforming but not yet defaulted, lifetime expected loss), Stage 3 (defaulted). Each stage has regulatory capital requirements. The business incentive for accurate credit models is clear: identify risky borrowers to either decline them, price loans higher to compensate for default risk, or implement mitigants (collateral, guarantees, covenants).

Credit risk assessment operates at three points in the loan lifecycle. Application scoring predicts whether a new applicant will default if given a loan. Features include demographic information (age, occupation, education), financial metrics (income, existing debts, savings), and credit bureau data (historical repayment behaviour, defaults). Behavioural scoring monitors existing customers’ repayment behaviour and flags early warning signs of deterioration. Collection scoring prioritises recovery efforts on accounts already in default. This chapter focuses primarily on application scoring.

40.2 Data for Credit Risk

A typical credit risk dataset contains thousands of loans with features spanning demographic, financial, and behavioural categories. Demographic features include age (older borrowers default less frequently), gender, occupation, employment status, and education. Financial features include monthly income, existing debt obligations, savings or liquid assets, and income stability (employment duration). Behavioural features come from credit bureaus: repayment history, payment delays, historical defaults, inquiries (applications to other banks), and credit utilisation ratios. Loan-level features include loan amount, tenor (term in months), purpose (consumption, business, mortgage), and collateral value.

In Nigeria, credit bureau data is available from three licensed operators: CRC (Credit Reference Centre), FirstCentral, and CreditRegistry. A loan’s target variable is default status within a predefined horizon, typically 12 or 24 months. A default is defined as: payment more than 90 days past due, formal default notice, or legal proceedings. Some banks use a stricter definition (any 30-day delinquency); others are more lenient (180+ days). This definition affects model performance and comparability.

📘 Theory: Credit Risk Data Structure

🔑 Key Formula

Standard credit risk dataset variables: - Features: Demographic (age, occupation), Financial (income, debt), Behavioural (payment history) - Target: Default = 1 if payment > 90 days past due within 12 months, else 0 - Outcome rate: % of loans defaulting (typically 2–5% for performing portfolios) - Observation period: Historical loans with sufficient seasoning (maturity > 12 months)

Show code

# Nigerian bank loan dataset exploration

set.seed(5284)

# Generate synthetic Nigerian bank loan dataset
n_loans <- 8000
age <- round(runif(n_loans, 25, 65))
monthly_income <- rnorm(n_loans, mean = 150000, sd = 80000)  # Naira
existing_debt <- rnorm(n_loans, mean = 50000, sd = 40000)
employment_duration <- rpois(n_loans, lambda = 5)  # Years
bureau_score <- round(runif(n_loans, 300, 800))  # Credit bureau score
loan_amount <- rnorm(n_loans, mean = 500000, sd = 300000)  # Naira
loan_tenor <- sample(12:60, n_loans, replace = TRUE)  # Months

# Default probability depends on multiple factors
default_prob <- (0.15 -
                 0.001 * age +
                 0.00001 * monthly_income -
                 0.00001 * existing_debt -
                 0.01 * employment_duration +
                 0.0002 * loan_amount -
                 0.00005 * bureau_score)
default_prob <- pmax(pmin(default_prob, 0.5), 0.01)  # Clip to [0.01, 0.5]

default_status <- as.numeric(runif(n_loans) < default_prob)

# Create dataset
loan_data <- data.frame(
  loan_id = 1:n_loans,
  age = age,
  monthly_income = monthly_income,
  existing_debt = existing_debt,
  employment_duration = employment_duration,
  bureau_score = bureau_score,
  loan_amount = loan_amount,
  loan_tenor = loan_tenor,
  default_status = default_status
)

# Summary statistics
cat("=== Nigerian Bank Loan Dataset Summary ===\n\n")
#> === Nigerian Bank Loan Dataset Summary ===
cat("Total loans:", nrow(loan_data), "\n")
#> Total loans: 8000
cat("Default rate:", round(mean(loan_data$default_status), 4), "\n")
#> Default rate: 0.4731
cat("Time period: Last 2 years\n\n")
#> Time period: Last 2 years

cat("Feature Summary:\n")
#> Feature Summary:
summary_stats <- data.frame(
  Feature = c("Age (years)", "Monthly Income (₦)", "Existing Debt (₦)",
              "Employment Duration (years)", "Bureau Score",
              "Loan Amount (₦)", "Loan Tenor (months)"),
  Mean = round(colMeans(loan_data[, 2:8])),
  StdDev = round(apply(loan_data[, 2:8], 2, sd)),
  Min = round(apply(loan_data[, 2:8], 2, min)),
  Max = round(apply(loan_data[, 2:8], 2, max))
)
print(summary_stats)
#>                                         Feature   Mean StdDev     Min     Max
#> age                                 Age (years)     45     12      25      65
#> monthly_income               Monthly Income (₦) 149069  80250 -136864  454865
#> existing_debt                 Existing Debt (₦)  50061  40224  -93055  197732
#> employment_duration Employment Duration (years)      5      2       0      15
#> bureau_score                       Bureau Score    550    146     300     800
#> loan_amount                     Loan Amount (₦) 499374 299156 -704115 1729532
#> loan_tenor                  Loan Tenor (months)     36     14      12      60

# Default rate by demographic
cat("\n=== Default Rate by Age Group ===\n")
#> 
#> === Default Rate by Age Group ===
age_breaks <- seq(20, 70, by = 10)
loan_data$age_group <- cut(loan_data$age, breaks = age_breaks)
default_by_age <- aggregate(default_status ~ age_group, data = loan_data, mean)
print(default_by_age)
#>   age_group default_status
#> 1   (20,30]      0.4887286
#> 2   (30,40]      0.4691053
#> 3   (40,50]      0.4707974
#> 4   (50,60]      0.4746521
#> 5   (60,70]      0.4644870

# Visualize default rate by income
library(ggplot2)
loan_data$income_bracket <- cut(loan_data$monthly_income,
                                breaks = quantile(loan_data$monthly_income,
                                                 probs = seq(0, 1, 0.25)))
default_by_income <- aggregate(default_status ~ income_bracket, data = loan_data, mean)

ggplot(default_by_income, aes(x = income_bracket, y = default_status)) +
  geom_col(fill = "steelblue") +
  theme_minimal() +
  labs(
    title = "Default Rate by Income Bracket (Nigerian Banks)",
    x = "Monthly Income (₦)",
    y = "Default Rate"
  ) +
  scale_y_continuous(labels = scales::percent) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Show code


# Correlation with default
cat("\n=== Correlation with Default Status ===\n")
#> 
#> === Correlation with Default Status ===
numeric_cols <- c("age", "monthly_income", "existing_debt",
                  "employment_duration", "bureau_score", "loan_amount", "loan_tenor")
correlations <- sapply(loan_data[, numeric_cols],
                      function(x) cor(x, loan_data$default_status))
print(sort(correlations, decreasing = TRUE))
#>         loan_amount employment_duration          loan_tenor      monthly_income 
#>         0.076570361         0.013684427         0.011720717         0.008736777 
#>       existing_debt                 age        bureau_score 
#>        -0.001308198        -0.007440770        -0.015621794

Show code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

np.random.seed(5284)

# Generate synthetic Nigerian bank loan dataset
n_loans = 8000
age = np.random.uniform(25, 65, n_loans).astype(int)
monthly_income = np.random.normal(150000, 80000, n_loans)
existing_debt = np.random.normal(50000, 40000, n_loans)
employment_duration = np.random.poisson(5, n_loans)
bureau_score = np.random.uniform(300, 800, n_loans).astype(int)
loan_amount = np.random.normal(500000, 300000, n_loans)
loan_tenor = np.random.choice(range(12, 61), n_loans)

# Default probability (nonlinear relationships)
default_prob = (0.15 -
                0.001 * age +
                0.00001 * monthly_income -
                0.00001 * existing_debt -
                0.01 * employment_duration +
                0.0002 * loan_amount -
                0.00005 * bureau_score)
default_prob = np.clip(default_prob, 0.01, 0.5)

default_status = (np.random.rand(n_loans) < default_prob).astype(int)

# Create dataset
loan_data = pd.DataFrame({
    'loan_id': range(1, n_loans + 1),
    'age': age,
    'monthly_income': monthly_income,
    'existing_debt': existing_debt,
    'employment_duration': employment_duration,
    'bureau_score': bureau_score,
    'loan_amount': loan_amount,
    'loan_tenor': loan_tenor,
    'default_status': default_status
})

# Summary
print("=== Nigerian Bank Loan Dataset Summary ===\n")
#> === Nigerian Bank Loan Dataset Summary ===
print(f"Total loans: {len(loan_data)}")
#> Total loans: 8000
print(f"Default rate: {loan_data['default_status'].mean():.4f}")
#> Default rate: 0.4775
print(f"Time period: Last 2 years\n")
#> Time period: Last 2 years

print("Feature Summary:")
#> Feature Summary:
print(loan_data.describe().round(0))
#>        loan_id     age  monthly_income  ...  loan_amount  loan_tenor  default_status
#> count   8000.0  8000.0          8000.0  ...       8000.0      8000.0          8000.0
#> mean    4000.0    45.0        149159.0  ...     502002.0        36.0             0.0
#> std     2310.0    12.0         79546.0  ...     296598.0        14.0             0.0
#> min        1.0    25.0       -178711.0  ...    -611609.0        12.0             0.0
#> 25%     2001.0    34.0         96974.0  ...     296505.0        23.0             0.0
#> 50%     4000.0    45.0        149222.0  ...     507449.0        36.0             0.0
#> 75%     6000.0    55.0        202674.0  ...     701423.0        48.0             1.0
#> max     8000.0    64.0        493362.0  ...    1633424.0        60.0             1.0
#> 
#> [8 rows x 9 columns]

# Default rate by age group
print("\n=== Default Rate by Age Group ===")
#> 
#> === Default Rate by Age Group ===
age_groups = pd.cut(loan_data['age'], bins=range(20, 75, 10))
print(loan_data.groupby(age_groups)['default_status'].mean())
#> age
#> (20, 30]    0.472268
#> (30, 40]    0.474142
#> (40, 50]    0.482226
#> (50, 60]    0.479749
#> (60, 70]    0.476427
#> Name: default_status, dtype: float64

# Correlation with default
print("\n=== Correlation with Default Status ===")
#> 
#> === Correlation with Default Status ===
corr_with_default = loan_data.corr()['default_status'].sort_values(ascending=False)
print(corr_with_default[1:])  # Exclude default_status itself
#> loan_amount            0.077146
#> monthly_income         0.013519
#> employment_duration    0.008262
#> loan_id                0.007076
#> age                    0.003358
#> bureau_score          -0.003105
#> loan_tenor            -0.010563
#> existing_debt         -0.019058
#> Name: default_status, dtype: float64

# Visualizations
fig, axes = plt.subplots(2, 2, figsize=(13, 10))

# Default rate by income bracket
income_brackets = pd.qcut(loan_data['monthly_income'], q=4)
income_default = loan_data.groupby(income_brackets)['default_status'].mean()
axes[0, 0].bar(range(len(income_default)), income_default, color='steelblue')
axes[0, 0].set_xlabel('Income Bracket (Quartiles)')
axes[0, 0].set_ylabel('Default Rate')
axes[0, 0].set_title('Default Rate by Income')
axes[0, 0].set_xticks(range(len(income_default)))
axes[0, 0].set_xticklabels(['Q1', 'Q2', 'Q3', 'Q4'])

# Default rate by employment duration
emp_duration_bins = pd.cut(loan_data['employment_duration'], bins=[0, 2, 5, 10, 30])
emp_default = loan_data.groupby(emp_duration_bins)['default_status'].mean()
axes[0, 1].bar(range(len(emp_default)), emp_default, color='coral')
axes[0, 1].set_xlabel('Employment Duration (years)')
axes[0, 1].set_ylabel('Default Rate')
axes[0, 1].set_title('Default Rate by Employment Duration')

# Default rate by bureau score
score_bins = pd.cut(loan_data['bureau_score'], bins=[300, 500, 650, 800])
score_default = loan_data.groupby(score_bins)['default_status'].mean()
axes[1, 0].bar(range(len(score_default)), score_default, color='green')
axes[1, 0].set_xlabel('Bureau Score')
axes[1, 0].set_ylabel('Default Rate')
axes[1, 0].set_title('Default Rate by Credit Bureau Score')

# Distribution of default vs non-default age
axes[1, 1].hist(loan_data[loan_data['default_status'] == 0]['age'],
                bins=20, alpha=0.6, label='No Default', color='blue')
axes[1, 1].hist(loan_data[loan_data['default_status'] == 1]['age'],
                bins=20, alpha=0.6, label='Default', color='red')
axes[1, 1].set_xlabel('Age')
axes[1, 1].set_ylabel('Count')
axes[1, 1].set_title('Age Distribution by Default Status')
axes[1, 1].legend()

plt.tight_layout()
plt.show()

📝 Section 35.2 Review Questions

What features are most predictive of default? Why might this be?
What is the advantage of using credit bureau data versus self-reported information?
How would you define “default” for a credit risk model? What are the trade-offs in different definitions?
Why is the default rate typically low (2–5%) even in risky portfolios?

40.3 Logistic Regression Scorecard

Despite advances in machine learning, logistic regression remains the gold standard for credit risk scoring in banking. It is interpretable (regulators demand explanation), computationally efficient, and when combined with feature engineering using Weight of Evidence (WOE), remarkably effective. A WOE-based scorecard is a linear model where each variable has been transformed via WOE encoding, producing a score that is easy to audit and explain to both borrowers and regulators.

Weight of Evidence quantifies the predictive power of each category within a variable. For a continuous variable like age, we first bin it (e.g., 25–30, 31–40, 41–50, etc.). For each bin, we compute: \[\text{WOE}_i = \ln\left(\frac{\text{% Goods}_i}{\text{% Bads}_i}\right)\]

where “Goods” are non-defaulting loans and “Bads” are defaulting loans. A bin with many defaults relative to non-defaults has negative WOE; a “safe” bin has positive WOE. Information Value (IV) measures the total discriminatory power of a variable: \[\text{IV} = \sum_i \left(\frac{\text{Goods}_i}{{\text{Total Goods}}} - \frac{\text{Bads}_i}{{\text{Total Bads}}}\right) \times \text{WOE}_i\]

IV > 0.3 indicates strong predictive power; 0.1–0.3 is moderate; < 0.1 is weak. Variables with low IV are dropped. The scorecard then fits a logistic regression on WOE-encoded variables, producing: \[\text{Score} = \text{Intercept} + \sum_j \text{Coefficient}_j \times \text{WOE}_{ij}\]

Scores are typically scaled to a range like 300–850 (mirroring FICO scores in the US), making them intuitive and marketable.

📘 Theory: Weight of Evidence and Information Value

🔑 Key Formula

Weight of Evidence (for each category): \[\text{WOE}_i = \ln\left(\frac{\text{Distribution of Goods}_i}{\text{Distribution of Bads}_i}\right)\]

Information Value (variable importance): \[\text{IV} = \sum_{i=1}^{n} \left(\text{Dist. Goods}_i - \text{Dist. Bads}_i\right) \times \text{WOE}_i\]

Scorecard: \[\text{Score} = \text{Offset} + \text{Factor} \times \left(\text{Intercept} + \sum_j \beta_j \times \text{WOE}_{j,i}\right)\]

Typical scaling: 20 points per log-odds doubling (to reach 0-850 range).

Show code

# Weight of Evidence and Credit Scorecard

library(dplyr)

# Use loan_data from previous section
# Compute WOE and IV for key variables

compute_woe <- function(data, variable, target, bins = 10) {
  # Bin continuous variable
  if (is.numeric(data[[variable]])) {
    data$bin <- cut(data[[variable]], breaks = bins, include.lowest = TRUE)
  } else {
    data$bin <- data[[variable]]
  }

  # Calculate goods and bads per bin
  summary_table <- data |>
    group_by(bin) |>
    summarise(
      count = n(),
      goods = sum(1 - !!sym(target)),
      bads = sum(!!sym(target)),
      .groups = "drop"
    ) |>
    mutate(
      pct_goods = goods / sum(goods),
      pct_bads = bads / sum(bads),
      woe = log(pct_goods / pct_bads),
      iv_component = (pct_goods - pct_bads) * woe
    )

  summary_table$iv <- sum(summary_table$iv_component, na.rm = TRUE)
  return(summary_table)
}

# Apply to key variables
woe_age <- compute_woe(loan_data, "age", "default_status", bins = 5)
woe_income <- compute_woe(loan_data, "monthly_income", "default_status", bins = 5)
woe_bureau <- compute_woe(loan_data, "bureau_score", "default_status", bins = 5)

cat("=== Weight of Evidence Analysis ===\n\n")
#> === Weight of Evidence Analysis ===

cat("Age WOE (IV =", round(woe_age$iv[1], 4), "):\n")
#> Age WOE (IV = 3e-04 ):
print(woe_age[, c("bin", "goods", "bads", "woe")], n = Inf)
#> # A tibble: 5 × 4
#>   bin     goods  bads       woe
#>   <fct>   <dbl> <dbl>     <dbl>
#> 1 [25,33]   905   815 -0.00286 
#> 2 (33,41]   845   780 -0.0276  
#> 3 (41,49]   837   749  0.00348 
#> 4 (49,57]   842   735  0.0283  
#> 5 (57,65]   786   706 -0.000262

cat("\nMonthly Income WOE (IV =", round(woe_income$iv[1], 4), "):\n")
#> 
#> Monthly Income WOE (IV = 0.001 ):
print(woe_income[, c("bin", "goods", "bads", "woe")], n = Inf)
#> # A tibble: 5 × 4
#>   bin                   goods  bads      woe
#>   <fct>                 <dbl> <dbl>    <dbl>
#> 1 [-1.37e+05,-1.85e+04]    87    63  0.215  
#> 2 (-1.85e+04,9.98e+04]   1061   936  0.0177 
#> 3 (9.98e+04,2.18e+05]    2261  2049 -0.00915
#> 4 (2.18e+05,3.37e+05]     765   700 -0.0188 
#> 5 (3.37e+05,4.55e+05]      41    37 -0.00495

cat("\nBureau Score WOE (IV =", round(woe_bureau$iv[1], 4), "):\n")
#> 
#> Bureau Score WOE (IV = 0.0014 ):
print(woe_bureau[, c("bin", "goods", "bads", "woe")], n = Inf)
#> # A tibble: 5 × 4
#>   bin       goods  bads     woe
#>   <fct>     <dbl> <dbl>   <dbl>
#> 1 [300,400]   853   812 -0.0583
#> 2 (400,500]   799   738 -0.0282
#> 3 (500,600]   836   729  0.0294
#> 4 (600,700]   850   751  0.0162
#> 5 (700,800]   877   755  0.0422

# Prepare data for scorecard development
# Create WOE-encoded features
loan_data$age_woe <- 0
loan_data$income_woe <- 0
loan_data$bureau_woe <- 0

for (i in 1:nrow(woe_age)) {
  idx <- which(loan_data$age >= as.numeric(substr(woe_age$bin[i], 2, 20)) &
               loan_data$age < as.numeric(substr(woe_age$bin[i], 6, 30)))
  loan_data$age_woe[idx] <- woe_age$woe[i]
}

# Train logistic regression on WOE features
scorecard_model <- glm(default_status ~ age_woe + income_woe + bureau_woe,
                      data = loan_data,
                      family = binomial(link = "logit"))

cat("\n=== Logistic Regression Scorecard ===\n")
#> 
#> === Logistic Regression Scorecard ===
print(summary(scorecard_model))
#> 
#> Call:
#> glm(formula = default_status ~ age_woe + income_woe + bureau_woe, 
#>     family = binomial(link = "logit"), data = loan_data)
#> 
#> Coefficients: (3 not defined because of singularities)
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -0.10760    0.02239  -4.805 1.55e-06 ***
#> age_woe           NA         NA      NA       NA    
#> income_woe        NA         NA      NA       NA    
#> bureau_woe        NA         NA      NA       NA    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 11067  on 7999  degrees of freedom
#> Residual deviance: 11067  on 7999  degrees of freedom
#> AIC: 11069
#> 
#> Number of Fisher Scoring iterations: 3

# Convert to points scorecard
# Standard scaling: 300-850 range, 20 points per doubling of odds
offset <- 600
factor <- 20

intercept <- coef(scorecard_model)[1]
coefficients <- coef(scorecard_model)[-1]

scorecard_points <- data.frame(
  Variable = c("Age WOE", "Income WOE", "Bureau Score WOE"),
  Coefficient = round(coefficients, 4),
  Points_per_WOE_unit = round(factor * coefficients, 2)
)

cat("\nScorecard Points:\n")
#> 
#> Scorecard Points:
print(scorecard_points)
#>                    Variable Coefficient Points_per_WOE_unit
#> age_woe             Age WOE          NA                  NA
#> income_woe       Income WOE          NA                  NA
#> bureau_woe Bureau Score WOE          NA                  NA

# Compute scores for first 10 applicants
loan_data$score <- offset + factor * predict(scorecard_model, type = "link")

cat("\nScores for first 10 applicants:\n")
#> 
#> Scores for first 10 applicants:
print(loan_data[1:10, c("age", "monthly_income", "bureau_score", "score", "default_status")])
#>    age monthly_income bureau_score    score default_status
#> 1   58      90372.290          657 597.8479              0
#> 2   37     179061.805          600 597.8479              0
#> 3   49     190118.656          603 597.8479              0
#> 4   51      67682.684          656 597.8479              0
#> 5   31      63275.548          367 597.8479              0
#> 6   27      43655.446          734 597.8479              1
#> 7   61       6135.511          422 597.8479              1
#> 8   57      21682.477          410 597.8479              0
#> 9   37      50215.098          617 597.8479              0
#> 10  51     206439.996          427 597.8479              0

# Validate model
predictions <- predict(scorecard_model, type = "response")
auc <- caTools::colAUC(predictions, loan_data$default_status)
cat("\nModel AUC:", round(auc, 4), "\n")
#> 
#> Model AUC: 0.5

# Plot score distribution by default status
library(ggplot2)
ggplot(loan_data, aes(x = score, fill = factor(default_status))) +
  geom_histogram(bins = 30, alpha = 0.7) +
  theme_minimal() +
  labs(
    title = "Score Distribution by Default Status",
    x = "Scorecard Score (300-850)",
    y = "Count",
    fill = "Default Status"
  ) +
  scale_fill_manual(values = c("0" = "green", "1" = "red"), labels = c("No Default", "Default"))

Show code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
import warnings
warnings.filterwarnings('ignore')

# Compute WOE and IV
def compute_woe(data, variable, target, n_bins=5):
    # Bin the variable
    if data[variable].dtype in ['int64', 'float64']:
        data_copy = data.copy()
        data_copy['bin'] = pd.cut(data_copy[variable], bins=n_bins, duplicates='drop')
    else:
        data_copy = data.copy()
        data_copy['bin'] = data_copy[variable]

    # Calculate goods and bads per bin
    summary = data_copy.groupby('bin', observed=True).agg({
        target: ['sum', 'count']
    }).reset_index()
    summary.columns = ['bin', 'bads', 'count']
    summary['goods'] = summary['count'] - summary['bads']

    # Calculate WOE
    total_goods = summary['goods'].sum()
    total_bads = summary['bads'].sum()
    summary['pct_goods'] = summary['goods'] / total_goods
    summary['pct_bads'] = summary['bads'] / total_bads
    summary['woe'] = np.log(summary['pct_goods'] / summary['pct_bads'])

    # Calculate IV
    summary['iv_component'] = (summary['pct_goods'] - summary['pct_bads']) * summary['woe']
    iv = summary['iv_component'].sum()

    return summary, iv

# Compute WOE for key variables
woe_age, iv_age = compute_woe(loan_data, 'age', 'default_status', n_bins=5)
woe_income, iv_income = compute_woe(loan_data, 'monthly_income', 'default_status', n_bins=5)
woe_bureau, iv_bureau = compute_woe(loan_data, 'bureau_score', 'default_status', n_bins=5)

print("=== Weight of Evidence Analysis ===\n")
#> === Weight of Evidence Analysis ===
print(f"Age IV: {iv_age:.4f}")
#> Age IV: 0.0006
print(woe_age[['bin', 'goods', 'bads', 'woe']].to_string())
#>               bin  goods  bads       woe
#> 0  (24.961, 32.8]    853   777  0.003258
#> 1    (32.8, 40.6]    821   728  0.030161
#> 2    (40.6, 48.4]    790   752 -0.040764
#> 3    (48.4, 56.2]    868   782  0.014276
#> 4    (56.2, 64.0]    848   781 -0.007755

print(f"\n\nMonthly Income IV: {iv_income:.4f}")
#> 
#> 
#> Monthly Income IV: 0.0045
print(woe_income[['bin', 'goods', 'bads', 'woe']].to_string())
#>                         bin  goods  bads       woe
#> 0  (-179383.098, -44296.43]     48    25  0.562264
#> 1    (-44296.43, 90118.165]    900   805  0.021492
#> 2    (90118.165, 224532.76]   2515  2355 -0.024329
#> 3   (224532.76, 358947.355]    705   617  0.043268
#> 4   (358947.355, 493361.95]     12    18 -0.495526

print(f"\n\nBureau Score IV: {iv_bureau:.4f}")
#> 
#> 
#> Bureau Score IV: 0.0049
print(woe_bureau[['bin', 'goods', 'bads', 'woe']].to_string())
#>                 bin  goods  bads       woe
#> 0  (299.501, 399.8]    840   759  0.011339
#> 1    (399.8, 499.6]    825   736  0.024092
#> 2    (499.6, 599.4]    800   807 -0.098773
#> 3    (599.4, 699.2]    888   728  0.108610
#> 4    (699.2, 799.0]    827   790 -0.044289

# Create WOE-encoded features
def apply_woe(data, variable, woe_table):
    data_copy = data.copy()
    data_copy['bin'] = pd.cut(data_copy[variable],
                             bins=list(woe_table['bin'].values),
                             include_lowest=True)
    woe_dict = dict(zip(woe_table['bin'], woe_table['woe']))
    data_copy['woe'] = data_copy['bin'].map(woe_dict).astype(float)
    return data_copy['woe'].values

loan_data_woe = loan_data.copy()
loan_data_woe['age_woe'] = apply_woe(loan_data, 'age', woe_age)
loan_data_woe['income_woe'] = apply_woe(loan_data, 'monthly_income', woe_income)
loan_data_woe['bureau_woe'] = apply_woe(loan_data, 'bureau_score', woe_bureau)

# Train logistic regression
X = loan_data_woe[['age_woe', 'income_woe', 'bureau_woe']].fillna(0)
y = loan_data['default_status']

scorecard_model = LogisticRegression(max_iter=1000)
scorecard_model.fit(X, y)

LogisticRegression(max_iter=1000)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Show code


print("\n=== Logistic Regression Scorecard ===\n")
#> 
#> === Logistic Regression Scorecard ===
print(f"Intercept: {scorecard_model.intercept_[0]:.6f}")
#> Intercept: -0.090033
print(f"Coefficients:")
#> Coefficients:
for var, coef in zip(['Age WOE', 'Income WOE', 'Bureau Score WOE'], scorecard_model.coef_[0]):
    print(f"  {var}: {coef:.6f}")
#>   Age WOE: -0.520648
#>   Income WOE: -0.880312
#>   Bureau Score WOE: -0.898991

# Convert to points scorecard
offset = 600
factor = 20

scorecard_points = pd.DataFrame({
    'Variable': ['Age WOE', 'Income WOE', 'Bureau Score WOE'],
    'Coefficient': scorecard_model.coef_[0],
    'Points_per_WOE': factor * scorecard_model.coef_[0]
})

print("\nScorecard Points:")
#> 
#> Scorecard Points:
print(scorecard_points.to_string())
#>            Variable  Coefficient  Points_per_WOE
#> 0           Age WOE    -0.520648      -10.412966
#> 1        Income WOE    -0.880312      -17.606240
#> 2  Bureau Score WOE    -0.898991      -17.979814

# Compute scores
loan_data_woe['score'] = offset + factor * scorecard_model.decision_function(X)

print("\nScores for first 10 applicants:")
#> 
#> Scores for first 10 applicants:
print(loan_data_woe[['age', 'monthly_income', 'bureau_score', 'score']].head(10).to_string())
#>    age  monthly_income  bureau_score       score
#> 0   35   143658.571394           509  600.089527
#> 1   52   293231.758805           784  598.085208
#> 2   59   190398.025450           608  596.755650
#> 3   57   -50985.339667           357  588.176858
#> 4   35   110630.853662           504  600.089527
#> 5   42   230496.445137           512  599.637948
#> 6   37   133119.684253           786  599.109921
#> 7   31   111610.785987           743  599.390059
#> 8   54    17540.192336           709  598.468607
#> 9   57   123285.600476           552  600.484351

# Validate
y_pred_proba = scorecard_model.predict_proba(X)[:, 1]
auc = roc_auc_score(y, y_pred_proba)
print(f"\nModel AUC: {auc:.4f}")
#> 
#> Model AUC: 0.5253

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(13, 5))

# Score distribution
axes[0].hist(loan_data_woe[y == 0]['score'], bins=30, alpha=0.7, label='No Default', color='green')
axes[0].hist(loan_data_woe[y == 1]['score'], bins=30, alpha=0.7, label='Default', color='red')
axes[0].set_xlabel('Scorecard Score (300-850)')
axes[0].set_ylabel('Count')
axes[0].set_title('Score Distribution by Default Status')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Score vs default rate
score_bins = pd.qcut(loan_data_woe['score'], q=10, duplicates='drop')
default_by_score = loan_data.groupby(score_bins, observed=True)['default_status'].agg(['mean', 'count'])
axes[1].plot(range(len(default_by_score)), default_by_score['mean'], marker='o', linewidth=2)
axes[1].set_xlabel('Score Decile')
axes[1].set_ylabel('Default Rate')
axes[1].set_title('Default Rate vs Score Decile (Good Separation)')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

📝 Section 35.3 Review Questions

What does WOE measure? How is it different from simple percentage of defaults?
Why would you use WOE encoding in a logistic regression rather than raw feature values?
How is Information Value (IV) calculated, and what values indicate strong predictive power?
How would you explain a scorecard to a loan applicant?

40.4 Machine Learning for Credit Risk

While logistic regression scorecards remain the banking standard for interpretability, machine learning models (random forest, XGBoost, gradient boosting) often achieve higher accuracy. The trade-off is explainability: regulators and courts may demand to understand why a loan was declined. For high-volume decisions (approving thousands of loans), even a 1% improvement in AUC can save millions of dollars in prevented defaults.

Random forest and XGBoost naturally handle nonlinear relationships and interactions without explicit feature engineering. They require less domain knowledge for variable binning and encoding. However, their predictions are harder to explain. Modern solutions use feature importance (SHAP values, permutation importance) to approximate explanations, but they lack the transparency of a scorecard. In practice, many banks use ensemble approaches: develop both a scorecard and a machine learning model, use both for screening, and favour the more conservative recommendation. For this book, we focus on XGBoost with class weighting to handle the imbalance (few defaults among many non-defaults) and cross-validation to avoid overfitting.

📘 Theory: Machine Learning for Classification in Imbalanced Settings

🔑 Key Formula

Weighted Log Loss (for class imbalance): \[L = -\frac{1}{n} \sum_{i=1}^{n} w_i [y_i \log(\hat{p}_i) + (1-y_i) \log(1-\hat{p}_i)]\]

where \(w_i\) is the class weight (typically \(w_{\text{default}} > w_{\text{non-default}}\)).

XGBoost Objective (with scale_pos_weight): \[\text{scale\_pos\_weight} = \frac{\text{# Negatives}}{\text{# Positives}}\]

Show code

# XGBoost for Credit Default Prediction

library(xgboost)
library(caret)

# Prepare data for XGBoost
# Use raw features (not WOE-encoded)
X_features <- loan_data[, c("age", "monthly_income", "existing_debt",
                             "employment_duration", "bureau_score",
                             "loan_amount", "loan_tenor")]

# Standardize features
X_scaled <- scale(X_features)

y_target <- loan_data$default_status

# Train-test split
set.seed(7462)
train_idx <- createDataPartition(y_target, p = 0.8, list = FALSE)
X_train <- X_scaled[train_idx, ]
y_train <- y_target[train_idx]
X_test <- X_scaled[-train_idx, ]
y_test <- y_target[-train_idx]

cat("=== XGBoost for Credit Risk ===\n\n")
#> === XGBoost for Credit Risk ===
cat("Training samples:", length(y_train), "\n")
#> Training samples: 6400
cat("Test samples:", length(y_test), "\n")
#> Test samples: 1600
cat("Default rate in training:", round(mean(y_train), 4), "\n")
#> Default rate in training: 0.4709
cat("Default rate in test:", round(mean(y_test), 4), "\n\n")
#> Default rate in test: 0.4819

# Create XGBoost matrix
dtrain <- xgb.DMatrix(data = X_train, label = y_train)
dtest <- xgb.DMatrix(data = X_test, label = y_test)

# Calculate scale_pos_weight to handle imbalance
scale_pos_weight <- sum(y_train == 0) / sum(y_train == 1)

# XGBoost parameters
params <- list(
  objective = "binary:logistic",
  scale_pos_weight = scale_pos_weight,
  max_depth = 5,
  eta = 0.1,
  min_child_weight = 1,
  subsample = 0.8,
  colsample_bytree = 0.8
)

# Train with early stopping
watchlist <- list(train = dtrain, test = dtest)
xgb_model <- xgb.train(
  params = params,
  data = dtrain,
  nrounds = 200,
  watchlist = watchlist,
  early_stopping_rounds = 10,
  verbose = 0
)

cat("XGBoost model trained. Stopped at round:", xgb_model$best_iteration, "\n\n")
#> XGBoost model trained. Stopped at round:

# Predictions
y_pred_xgb <- predict(xgb_model, dtest)

# Evaluation metrics
library(caTools)
auc_xgb <- colAUC(y_pred_xgb, y_test)
cat("Test AUC (XGBoost):", round(auc_xgb, 4), "\n")
#> Test AUC (XGBoost): 0.5452

# Confusion matrix at different thresholds
threshold <- 0.5
y_pred_class <- as.numeric(y_pred_xgb > threshold)
confmat_xgb <- table(y_test, y_pred_class)
cat("\nConfusion Matrix (threshold = 0.5):\n")
#> 
#> Confusion Matrix (threshold = 0.5):
print(confmat_xgb)
#>       y_pred_class
#> y_test   0   1
#>      0 310 519
#>      1 260 511

# Calculate metrics
tp <- confmat_xgb[2, 2]
fp <- confmat_xgb[1, 2]
fn <- confmat_xgb[2, 1]
tn <- confmat_xgb[1, 1]

precision <- tp / (tp + fp)
recall <- tp / (tp + fn)
f1 <- 2 * precision * recall / (precision + recall)

cat("\nPrecision:", round(precision, 4), "\n")
#> 
#> Precision: 0.4961
cat("Recall (Sensitivity):", round(recall, 4), "\n")
#> Recall (Sensitivity): 0.6628
cat("F1-Score:", round(f1, 4), "\n")
#> F1-Score: 0.5675

# Feature importance
importance <- xgb.importance(colnames(X_train), model = xgb_model)
cat("\n=== Feature Importance ===\n")
#> 
#> === Feature Importance ===
print(importance[order(-importance$Gain), ])
#>                Feature       Gain      Cover  Frequency
#>                 <char>      <num>      <num>      <num>
#> 1:         loan_amount 0.39143722 0.29203921 0.15878023
#> 2:       existing_debt 0.14961944 0.16999080 0.20820189
#> 3:      monthly_income 0.11741610 0.14895844 0.15457413
#> 4:        bureau_score 0.11653239 0.13284553 0.14826498
#> 5:          loan_tenor 0.10889095 0.13157983 0.14826498
#> 6:                 age 0.06929566 0.06140979 0.10620400
#> 7: employment_duration 0.04680825 0.06317640 0.07570978

# Visualize feature importance
library(ggplot2)
ggplot(importance[1:7, ], aes(y = reorder(Feature, Gain), x = Gain)) +
  geom_col(fill = "steelblue") +
  theme_minimal() +
  labs(
    title = "XGBoost Feature Importance for Credit Default",
    y = "Feature",
    x = "Information Gain"
  )

Show code

import xgboost as xgb
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score, roc_curve, confusion_matrix, f1_score, precision_score, recall_score

# Prepare data
X = loan_data[['age', 'monthly_income', 'existing_debt',
               'employment_duration', 'bureau_score',
               'loan_amount', 'loan_tenor']].values

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

y = loan_data['default_status'].values

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2,
                                                     random_state=7462, stratify=y)

print("=== XGBoost for Credit Risk ===\n")
#> === XGBoost for Credit Risk ===
print(f"Training samples: {len(y_train)}")
#> Training samples: 6400
print(f"Test samples: {len(y_test)}")
#> Test samples: 1600
print(f"Default rate in training: {y_train.mean():.4f}")
#> Default rate in training: 0.4775
print(f"Default rate in test: {y_test.mean():.4f}\n")
#> Default rate in test: 0.4775

# XGBoost parameters with class weight
scale_pos_weight = (y_train == 0).sum() / (y_train == 1).sum()

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

params = {
    'objective': 'binary:logistic',
    'scale_pos_weight': scale_pos_weight,
    'max_depth': 5,
    'eta': 0.1,
    'min_child_weight': 1,
    'subsample': 0.8,
    'colsample_bytree': 0.8
}

# Train with early stopping
evals = [(dtrain, 'train'), (dtest, 'test')]
xgb_model = xgb.train(params, dtrain, num_boost_round=200,
                     evals=evals, early_stopping_rounds=10,
                     verbose_eval=False)

print(f"XGBoost trained. Stopped at round: {xgb_model.best_iteration}\n")
#> XGBoost trained. Stopped at round: 31

# Predictions
y_pred = xgb_model.predict(dtest)

# Evaluation
auc = roc_auc_score(y_test, y_pred)
print(f"Test AUC (XGBoost): {auc:.4f}")
#> Test AUC (XGBoost): 0.5320

# Confusion matrix
threshold = 0.5
y_pred_class = (y_pred > threshold).astype(int)
cm = confusion_matrix(y_test, y_pred_class)
print("\nConfusion Matrix (threshold = 0.5):")
#> 
#> Confusion Matrix (threshold = 0.5):
print(cm)
#> [[335 501]
#>  [284 480]]

# Metrics
tn, fp, fn, tp = cm.ravel()
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1 = 2 * precision * recall / (precision + recall)

print(f"Precision: {precision:.4f}")
#> Precision: 0.4893
print(f"Recall: {recall:.4f}")
#> Recall: 0.6283
print(f"F1-Score: {f1:.4f}")
#> F1-Score: 0.5501

# Feature importance
importance = xgb_model.get_score(importance_type='gain')
importance_df = pd.DataFrame(list(importance.items()), columns=['Feature', 'Importance'])
importance_df = importance_df.sort_values('Importance', ascending=False)

print("\n=== Feature Importance ===")
#> 
#> === Feature Importance ===
print(importance_df.to_string(index=False))
#> Feature  Importance
#>      f5   11.959839
#>      f4    3.974821
#>      f2    3.730809
#>      f1    3.687403
#>      f0    3.620439
#>      f3    3.540967
#>      f6    3.284790

# Visualize
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Feature importance
ax1.barh(importance_df['Feature'], importance_df['Importance'], color='steelblue')
ax1.set_xlabel('Information Gain')
ax1.set_title('XGBoost Feature Importance for Credit Default')
ax1.invert_yaxis()

# ROC curve
fpr, tpr, _ = roc_curve(y_test, y_pred)
ax2.plot(fpr, tpr, linewidth=2, label=f'AUC = {auc:.4f}')
#> [<matplotlib.lines.Line2D object at 0x000001DDCDF74980>]
ax2.plot([0, 1], [0, 1], 'k--', linewidth=1, label='Random Classifier')
#> [<matplotlib.lines.Line2D object at 0x000001DDCDF74AD0>]
ax2.set_xlabel('False Positive Rate')
#> Text(0.5, 0, 'False Positive Rate')
ax2.set_ylabel('True Positive Rate')
#> Text(0, 0.5, 'True Positive Rate')
ax2.set_title('ROC Curve')
#> Text(0.5, 1.0, 'ROC Curve')
ax2.legend()
#> <matplotlib.legend.Legend object at 0x000001DDCDF74830>
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

📝 Section 35.4 Review Questions

Why is class weighting important in credit risk modelling when defaults are rare?
How does XGBoost handle feature interactions compared to logistic regression?
What is the advantage of reporting both AUC and F1-score for a credit model?
Why might a bank prefer a logistic regression scorecard over XGBoost despite lower accuracy?

40.5 Model Validation Metrics

Standard accuracy metrics (accuracy, precision, recall) are important but incomplete for credit risk. The industry uses specialised metrics tailored to this domain. Gini coefficient (also called Gini index) is derived from AUC: \(\text{Gini} = 2 \times \text{AUC} - 1\). It ranges from 0 (no discrimination) to 1 (perfect separation). A Gini of 0.4–0.5 is considered good for credit; 0.6+ is excellent. The KS statistic (Kolmogorov-Smirnov) measures the maximum vertical distance between the cumulative distribution functions of defaulters and non-defaulters. A KS of 20–30% is reasonable; 40%+ is strong. These metrics are robust to threshold selection and reflect the model’s ability to rank borrowers by risk.

Population Stability Index (PSI) detects distribution shift. When you deploy a model, the applicant population may change (economic downturn, marketing shift, regulatory changes). PSI quantifies this shift: \[\text{PSI} = \sum_i \left(\text{% in current population}_i - \text{% in development sample}_i\right) \times \ln\left(\frac{\text{% current}_i}{\text{% development}_i}\right)\]

A PSI > 0.25 signals significant shift and warrants model recalibration. The CAP curve (Cumulative Accuracy Profile) shows, if you approve the top \(x\)%of applicants (those with lowest default probability), what fraction of all defaults do you prevent? It complements ROC by being more intuitive for business decisions.

📘 Theory: Credit Risk Model Validation

🔑 Key Formula

Gini Coefficient: \[\text{Gini} = 2 \times \text{AUC} - 1\]

KS Statistic: \[\text{KS} = \max_i |F_{\text{Defaults}}(i) - F_{\text{Non-defaults}}(i)|\]

Population Stability Index: \[\text{PSI} = \sum_{i=1}^{n} \left(P_i - E_i\right) \ln\left(\frac{P_i}{E_i}\right)\]

where \(P_i\) and \(E_i\) are proportions in current and expected (development) populations.

Show code

# Credit Risk Model Validation Metrics

library(caTools)
library(ggplot2)

# Use predictions from earlier XGBoost or logistic regression
# For this example, we'll generate scores from our earlier models

set.seed(3619)
# Simulate predicted probabilities
scores <- predict(xgb_model, dtest)

# Calculate standard metrics
auc <- colAUC(scores, y_test)[1]
gini <- 2 * auc - 1

cat("=== Credit Risk Model Validation ===\n\n")
#> === Credit Risk Model Validation ===
cat("AUC: ", round(auc, 4), "\n")
#> AUC:  0.5452
cat("Gini Coefficient: ", round(gini, 4), "\n")
#> Gini Coefficient:  0.0903

# KS Statistic
# Sort by score and calculate cumulative distributions
sorted_idx <- order(scores, decreasing = TRUE)
y_sorted <- y_test[sorted_idx]
scores_sorted <- scores[sorted_idx]

n <- length(y_sorted)
cum_defaults <- cumsum(y_sorted) / sum(y_sorted)
cum_non_defaults <- cumsum(1 - y_sorted) / sum(1 - y_sorted)

ks_stat <- max(cum_defaults - cum_non_defaults)
ks_idx <- which.max(cum_defaults - cum_non_defaults)

cat("KS Statistic: ", round(ks_stat, 4), " at percentile ", round(ks_idx / n, 4), "\n")
#> KS Statistic:  0.0831  at percentile  0.9556

# CAP (Cumulative Accuracy Profile)
cum_applicants <- (1:n) / n
cap_curve <- cum_defaults

# Gini-KS relationship verification
cat("\nValidation (Gini-KS relationship):\n")
#> 
#> Validation (Gini-KS relationship):
cat("Gini = 2 × AUC - 1 = ", round(gini, 4), "\n")
#> Gini = 2 × AUC - 1 =  0.0903
cat("Expected range: [0.3, 0.7] for good credit models\n\n")
#> Expected range: [0.3, 0.7] for good credit models

# PSI Calculation
# Split data into development and validation samples
dev_sample <- y_train
val_sample <- y_test

# Calculate score distributions
dev_scores <- predict(xgb_model, dtrain)
val_scores <- scores

# Bin scores
score_bins <- seq(0, 1, by = 0.1)
dev_dist <- hist(dev_scores, breaks = score_bins, plot = FALSE)$counts / length(dev_scores)
val_dist <- hist(val_scores, breaks = score_bins, plot = FALSE)$counts / length(val_scores)

# PSI
psi <- sum((val_dist - dev_dist) * log((val_dist + 1e-10) / (dev_dist + 1e-10)))

cat("Population Stability Index (PSI): ", round(psi, 4), "\n")
#> Population Stability Index (PSI):  0.0344
cat("Interpretation:\n")
#> Interpretation:
cat("  PSI < 0.1: No significant population shift\n")
#>   PSI < 0.1: No significant population shift
cat("  PSI 0.1-0.25: Moderate shift, monitor model\n")
#>   PSI 0.1-0.25: Moderate shift, monitor model
cat("  PSI > 0.25: Significant shift, model recalibration recommended\n\n")
#>   PSI > 0.25: Significant shift, model recalibration recommended

# Create validation plots
validation_df <- data.frame(
  cumulative_applicants = cum_applicants,
  cumulative_defaults = cum_defaults,
  cumulative_non_defaults = cum_non_defaults
)

# ROC and KS curves
ggplot(validation_df, aes(x = cumulative_non_defaults, y = cumulative_defaults)) +
  geom_line(linewidth = 1, colour = "blue", label = "ROC") +
  geom_line(aes(x = seq(0, 1, length.out = n), y = seq(0, 1, length.out = n)),
            colour = "grey", linetype = "dashed", linewidth = 1) +
  geom_point(x = cum_non_defaults[ks_idx], y = cum_defaults[ks_idx],
             colour = "red", size = 3) +
  annotate("text", x = cum_non_defaults[ks_idx], y = cum_defaults[ks_idx] + 0.05,
           label = paste0("KS = ", round(ks_stat, 4)), hjust = 0.5) +
  theme_minimal() +
  labs(
    title = "ROC Curve with KS Statistic",
    x = "1 - Specificity (False Positive Rate)",
    y = "Sensitivity (True Positive Rate)"
  )

Show code


# CAP Curve
ggplot(validation_df, aes(x = cumulative_applicants, y = cumulative_defaults)) +
  geom_line(linewidth = 1.5, colour = "blue") +
  geom_line(aes(y = cumulative_applicants), colour = "grey", linetype = "dashed", linewidth = 1) +
  geom_hline(yintercept = 1, colour = "orange", linetype = "dotted", linewidth = 1) +
  theme_minimal() +
  labs(
    title = "CAP Curve: Cumulative Defaults vs Applicants Approved",
    x = "Cumulative % of Applicants (approved)",
    y = "Cumulative % of Defaults Identified"
  ) +
  annotate("text", x = 0.5, y = 0.3, label = "Better Worse", colour = "grey", size = 3)

Show code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc

# Use predictions from XGBoost
scores = y_pred
auc_score = roc_auc_score(y_test, scores)
gini = 2 * auc_score - 1

print("=== Credit Risk Model Validation ===\n")
#> === Credit Risk Model Validation ===
print(f"AUC: {auc_score:.4f}")
#> AUC: 0.5320
print(f"Gini Coefficient: {gini:.4f}")
#> Gini Coefficient: 0.0639

# KS Statistic
sorted_idx = np.argsort(scores)[::-1]
y_sorted = y_test[sorted_idx]
scores_sorted = scores[sorted_idx]

n = len(y_sorted)
cum_defaults = np.cumsum(y_sorted) / y_sorted.sum()
cum_non_defaults = np.cumsum(1 - y_sorted) / (1 - y_sorted).sum()

ks_stat = np.max(cum_defaults - cum_non_defaults)
ks_idx = np.argmax(cum_defaults - cum_non_defaults)

print(f"KS Statistic: {ks_stat:.4f} at percentile {ks_idx/n:.4f}")
#> KS Statistic: 0.0920 at percentile 0.8256

# CAP Curve
cum_applicants = np.arange(1, n+1) / n
cap_curve = cum_defaults

# PSI Calculation
dev_scores = xgb_model.predict(dtrain)
val_scores = scores

score_bins = np.linspace(0, 1, 11)
dev_dist = np.histogram(dev_scores, bins=score_bins)[0] / len(dev_scores)
val_dist = np.histogram(val_scores, bins=score_bins)[0] / len(val_scores)

psi = np.sum((val_dist - dev_dist) * np.log((val_dist + 1e-10) / (dev_dist + 1e-10)))

print(f"\nPopulation Stability Index (PSI): {psi:.4f}")
#> 
#> Population Stability Index (PSI): 0.0169
print("Interpretation:")
#> Interpretation:
print("  PSI < 0.1: No significant shift")
#>   PSI < 0.1: No significant shift
print("  PSI 0.1-0.25: Moderate shift, monitor")
#>   PSI 0.1-0.25: Moderate shift, monitor
print("  PSI > 0.25: Significant shift, retrain model\n")
#>   PSI > 0.25: Significant shift, retrain model

# Validation plots
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# ROC curve with KS
fpr, tpr, _ = roc_curve(y_test, scores)
axes[0].plot(fpr, tpr, linewidth=2, label=f'AUC = {auc_score:.4f}', color='blue')
#> [<matplotlib.lines.Line2D object at 0x000001DDD3E89BE0>]
axes[0].plot([0, 1], [0, 1], 'k--', linewidth=1)
#> [<matplotlib.lines.Line2D object at 0x000001DDD3E89D30>]
axes[0].plot(cum_non_defaults[ks_idx], cum_defaults[ks_idx], 'ro', markersize=8)
#> [<matplotlib.lines.Line2D object at 0x000001DDD3E89E80>]
axes[0].annotate(f'KS = {ks_stat:.4f}',
                xy=(cum_non_defaults[ks_idx], cum_defaults[ks_idx]),
                xytext=(cum_non_defaults[ks_idx] + 0.1, cum_defaults[ks_idx] - 0.1),
                arrowprops=dict(arrowstyle='->', color='red'))
#> Text(0.8822966507177034, 0.7743455497382199, 'KS = 0.0920')
axes[0].set_xlabel('1 - Specificity (FPR)')
#> Text(0.5, 0, '1 - Specificity (FPR)')
axes[0].set_ylabel('Sensitivity (TPR)')
#> Text(0, 0.5, 'Sensitivity (TPR)')
axes[0].set_title('ROC Curve with KS Statistic')
#> Text(0.5, 1.0, 'ROC Curve with KS Statistic')
axes[0].legend()
#> <matplotlib.legend.Legend object at 0x000001DDD3E8A270>
axes[0].grid(True, alpha=0.3)

# CAP curve
axes[1].plot(cum_applicants, cap_curve, linewidth=2, label='Model', color='blue')
#> [<matplotlib.lines.Line2D object at 0x000001DDD3E8A7B0>]
axes[1].plot([0, 1], [0, 1], 'k--', linewidth=1, label='Random')
#> [<matplotlib.lines.Line2D object at 0x000001DDD3E8A900>]
axes[1].plot([0, 1], [0, 1], 'orange', linestyle=':', linewidth=1, label='Perfect')
#> [<matplotlib.lines.Line2D object at 0x000001DDD3E8AA50>]
axes[1].fill_between(cum_applicants, cap_curve, cum_applicants, alpha=0.3)
#> <matplotlib.collections.FillBetweenPolyCollection object at 0x000001DDD3E8A510>
axes[1].set_xlabel('Cumulative % of Applicants (ranked by score)')
#> Text(0.5, 0, 'Cumulative % of Applicants (ranked by score)')
axes[1].set_ylabel('Cumulative % of Defaults Identified')
#> Text(0, 0.5, 'Cumulative % of Defaults Identified')
axes[1].set_title('CAP Curve: Model Lift')
#> Text(0.5, 1.0, 'CAP Curve: Model Lift')
axes[1].legend()
#> <matplotlib.legend.Legend object at 0x000001DDD3E8A660>
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Show code


# Summary table
print("\n=== Model Validation Summary ===")
#> 
#> === Model Validation Summary ===
print(f"AUC: {auc_score:.4f} (Goal: > 0.7)")
#> AUC: 0.5320 (Goal: > 0.7)
print(f"Gini: {gini:.4f} (Good: 0.4-0.5, Excellent: > 0.6)")
#> Gini: 0.0639 (Good: 0.4-0.5, Excellent: > 0.6)
print(f"KS: {ks_stat:.4f} (Reasonable: 20-30%, Strong: > 40%)")
#> KS: 0.0920 (Reasonable: 20-30%, Strong: > 40%)
print(f"PSI: {psi:.4f} (Acceptable: < 0.25)")
#> PSI: 0.0169 (Acceptable: < 0.25)

📝 Section 35.5 Review Questions

Why is the Gini coefficient preferred over raw AUC in banking?
What does the KS statistic measure, and at what value should you retrain a model?
Why is PSI important for deployed credit models?
How would you use a CAP curve to explain model performance to business stakeholders?

40.6 IFRS 9 Expected Credit Loss

Banking regulators worldwide now require banks to estimate Expected Credit Loss (ECL) for all loans, following the International Financial Reporting Standard 9 (IFRS 9). Under IFRS 9, loans are classified into three stages based on credit risk. Stage 1: Performing loans with no significant increase in credit risk since origination; provision equals 12-month ECL. Stage 2: Loans with significant increase in credit risk (though not yet defaulted); provision equals lifetime ECL. Stage 3: Credit-impaired or defaulted loans; provision equals lifetime ECL. The ECL formula is straightforward but powerful: \[\text{ECL} = \text{PD} \times \text{LGD} \times \text{EAD}\]

where PD (Probability of Default) is the likelihood the borrower will default over the relevant time horizon (12 months for Stage 1, lifetime for Stages 2-3); LGD (Loss Given Default) is the fraction of the exposure that is not recovered post-default (e.g., if a ₦1 million loan defaults and we recover ₦400,000 via collateral sale, LGD = 60%); EAD (Exposure at Default) is the amount owed when default occurs, including accrued interest.

For a portfolio, total provisions = sum of all individual loan ECLs. This framework forces banks to be forward-looking: build higher provisions not just when loans actually default, but when default probability rises. In Nigeria, the CBN’s Risk Management Guidelines require banks to report ECL provisions quarterly.

📘 Theory: IFRS 9 ECL Framework

🔑 Key Formula

Expected Credit Loss: \[\text{ECL} = \text{PD} \times \text{LGD} \times \text{EAD}\]

Portfolio ECL: \[\text{Total ECL} = \sum_{i=1}^{n} \text{ECL}_i = \sum_{i=1}^{n} \text{PD}_i \times \text{LGD}_i \times \text{EAD}_i\]

Lifetime PD (12-month PD for Stages 1, lifetime for Stages 2-3): \[\text{Lifetime PD} = \sum_{t=1}^{T} \text{PD}_t \times \prod_{\tau=1}^{t-1} (1 - \text{PD}_\tau)\]

Show code

# IFRS 9 Expected Credit Loss Estimation

# Use predictions from the credit model
# For simplicity, we'll use the predicted probability as PD

# ECL parameters (based on portfolio characteristics)
lgd_estimate <- 0.45  # 45% loss given default (typical for unsecured consumer loans)
ead_mean <- 500000    # Average Exposure at Default (₦)

# Use model predictions as PD
pd_values <- predict(xgb_model, dtest)

# Create a sample portfolio for ECL calculation
portfolio_sample <- data.frame(
  loan_id = 1:length(y_test),
  actual_default = y_test,
  pd_estimate = pd_values,
  ead = ead_mean,
  lgd = lgd_estimate
)

# Calculate individual ECLs
portfolio_sample$ecl <- portfolio_sample$pd_estimate *
                        portfolio_sample$lgd *
                        portfolio_sample$ead

# Portfolio summary
cat("=== IFRS 9 Expected Credit Loss Analysis ===\n\n")
#> === IFRS 9 Expected Credit Loss Analysis ===

cat("Portfolio Characteristics:\n")
#> Portfolio Characteristics:
cat("Total loans:", nrow(portfolio_sample), "\n")
#> Total loans: 1600
cat("Total exposure:", format(sum(portfolio_sample$ead), big.mark = ","), "₦\n")
#> Total exposure: 8e+08 ₦
cat("Average PD:", round(mean(portfolio_sample$pd_estimate), 4), "\n")
#> Average PD: 0.4985
cat("Average LGD:", round(mean(portfolio_sample$lgd), 4), "\n\n")
#> Average LGD: 0.45

cat("ECL Results:\n")
#> ECL Results:
cat("Total ECL provision:", format(round(sum(portfolio_sample$ecl)), big.mark = ","), "₦\n")
#> Total ECL provision: 179,452,810 ₦
cat("ECL as % of exposure:", round(100 * sum(portfolio_sample$ecl) / sum(portfolio_sample$ead), 2), "%\n")
#> ECL as % of exposure: 22.43 %
cat("Average ECL per loan:", format(round(mean(portfolio_sample$ecl)), big.mark = ","), "₦\n\n")
#> Average ECL per loan: 112,158 ₦

# Stage classification (simple approximation)
portfolio_sample$stage <- ifelse(portfolio_sample$pd_estimate < 0.02, 1,
                                 ifelse(portfolio_sample$pd_estimate < 0.05, 2, 3))

# ECL by stage
cat("ECL by Stage:\n")
#> ECL by Stage:
for (stage in 1:3) {
  stage_loans <- portfolio_sample[portfolio_sample$stage == stage, ]
  total_ecl <- sum(stage_loans$ecl)
  exposure <- sum(stage_loans$ead)
  count <- nrow(stage_loans)

  cat(sprintf("Stage %d: %d loans, ₦%s ECL (%.2f%% of stage exposure)\n",
              stage, count, format(round(total_ecl), big.mark = ","),
              100 * total_ecl / exposure))
}
#> Stage 1: 0 loans, ₦0 ECL (NaN% of stage exposure)
#> Stage 2: 35 loans, ₦358,121 ECL (2.05% of stage exposure)
#> Stage 3: 1565 loans, ₦179,094,688 ECL (22.89% of stage exposure)

# Sensitivity analysis: vary LGD
lgd_scenarios <- seq(0.30, 0.60, by = 0.05)
ecl_by_lgd <- numeric(length(lgd_scenarios))

for (i in seq_along(lgd_scenarios)) {
  ecl_by_lgd[i] <- sum(portfolio_sample$pd_estimate *
                       lgd_scenarios[i] *
                       portfolio_sample$ead)
}

# Sensitivity analysis: vary PD scaling
pd_scenarios <- seq(0.8, 1.2, by = 0.1)
ecl_by_pd <- numeric(length(pd_scenarios))

for (i in seq_along(pd_scenarios)) {
  ecl_by_pd[i] <- sum(portfolio_sample$pd_estimate * pd_scenarios[i] *
                      portfolio_sample$lgd *
                      portfolio_sample$ead)
}

# Plot sensitivity
library(ggplot2)

# Build long format directly (lgd_scenarios and pd_scenarios have different lengths)
sensitivity_long <- rbind(
  data.frame(scenario = "LGD Sensitivity", value = lgd_scenarios,
             ecl_mn = ecl_by_lgd / 1e6),
  data.frame(scenario = "PD Sensitivity",  value = pd_scenarios,
             ecl_mn = ecl_by_pd / 1e6)
)

ggplot(sensitivity_long, aes(x = value, y = ecl_mn, colour = scenario)) +
  geom_line(linewidth = 1) +
  geom_point(size = 3) +
  theme_minimal() +
  labs(
    title = "IFRS 9 ECL Sensitivity Analysis",
    x = "Parameter Value",
    y = "Total ECL (Million ₦)",
    colour = "Scenario"
  ) +
  theme(legend.position = "bottom")

Show code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# ECL parameters
lgd_estimate = 0.45  # Loss given default
ead_mean = 500000    # Average exposure at default

# Use XGBoost predictions as PD
pd_values = y_pred

# Create portfolio
portfolio = pd.DataFrame({
    'loan_id': range(1, len(y_test) + 1),
    'actual_default': y_test,
    'pd_estimate': pd_values,
    'ead': ead_mean,
    'lgd': lgd_estimate
})

# Calculate ECL
portfolio['ecl'] = portfolio['pd_estimate'] * portfolio['lgd'] * portfolio['ead']

print("=== IFRS 9 Expected Credit Loss Analysis ===\n")
#> === IFRS 9 Expected Credit Loss Analysis ===

print("Portfolio Characteristics:")
#> Portfolio Characteristics:
print(f"Total loans: {len(portfolio)}")
#> Total loans: 1600
print(f"Total exposure: ₦{portfolio['ead'].sum():,.0f}")
#> Total exposure: ₦800,000,000
print(f"Average PD: {portfolio['pd_estimate'].mean():.4f}")
#> Average PD: 0.4966
print(f"Average LGD: {portfolio['lgd'].mean():.4f}\n")
#> Average LGD: 0.4500

print("ECL Results:")
#> ECL Results:
total_ecl = portfolio['ecl'].sum()
print(f"Total ECL provision: ₦{total_ecl:,.0f}")
#> Total ECL provision: ₦178,792,767
print(f"ECL as % of exposure: {100*total_ecl/portfolio['ead'].sum():.2f}%")
#> ECL as % of exposure: 22.35%
print(f"Average ECL per loan: ₦{portfolio['ecl'].mean():,.0f}\n")
#> Average ECL per loan: ₦111,745

# Stage classification
portfolio['stage'] = pd.cut(portfolio['pd_estimate'],
                           bins=[0, 0.02, 0.05, 1.0],
                           labels=[1, 2, 3])

print("ECL by Stage:")
#> ECL by Stage:
for stage in [1, 2, 3]:
    stage_loans = portfolio[portfolio['stage'] == stage]
    total = stage_loans['ecl'].sum()
    exposure = stage_loans['ead'].sum()
    count = len(stage_loans)
    pct = 100 * total / exposure if exposure > 0 else 0
    print(f"Stage {stage}: {count} loans, ₦{total:,.0f} ECL ({pct:.2f}% of exposure)")
#> Stage 1: 0 loans, ₦0 ECL (0.00% of exposure)
#> Stage 2: 55 loans, ₦449,455 ECL (1.63% of exposure)
#> Stage 3: 1545 loans, ₦178,343,312 ECL (23.09% of exposure)

# Sensitivity analysis
lgd_scenarios = np.linspace(0.30, 0.60, 7)
ecl_by_lgd = np.array([np.sum(portfolio['pd_estimate'] * lgd * portfolio['ead'])
                       for lgd in lgd_scenarios])

pd_scenarios = np.linspace(0.8, 1.2, 5)
ecl_by_pd = np.array([np.sum(portfolio['pd_estimate'] * pd_factor *
                             portfolio['lgd'] * portfolio['ead'])
                      for pd_factor in pd_scenarios])

# Plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))

# LGD sensitivity
ax1.plot(lgd_scenarios, ecl_by_lgd / 1e6, marker='o', linewidth=2, color='steelblue')
#> [<matplotlib.lines.Line2D object at 0x000001DDE1D682F0>]
ax1.set_xlabel('LGD (Loss Given Default)')
#> Text(0.5, 0, 'LGD (Loss Given Default)')
ax1.set_ylabel('Total ECL (Million ₦)')
#> Text(0, 0.5, 'Total ECL (Million ₦)')
ax1.set_title('IFRS 9 ECL: LGD Sensitivity')
#> Text(0.5, 1.0, 'IFRS 9 ECL: LGD Sensitivity')
ax1.grid(True, alpha=0.3)

# PD sensitivity
ax2.plot(pd_scenarios, ecl_by_pd / 1e6, marker='s', linewidth=2, color='coral')
#> [<matplotlib.lines.Line2D object at 0x000001DDE1D68440>]
ax2.set_xlabel('PD Scaling Factor')
#> Text(0.5, 0, 'PD Scaling Factor')
ax2.set_ylabel('Total ECL (Million ₦)')
#> Text(0, 0.5, 'Total ECL (Million ₦)')
ax2.set_title('IFRS 9 ECL: PD Sensitivity')
#> Text(0.5, 1.0, 'IFRS 9 ECL: PD Sensitivity')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Show code


# Summary for regulatory reporting
print("\n=== Regulatory Provision Summary ===")
#> 
#> === Regulatory Provision Summary ===
print(f"Total Exposure at Risk: ₦{portfolio['ead'].sum():,.0f}")
#> Total Exposure at Risk: ₦800,000,000
print(f"Required IFRS 9 Provision: ₦{total_ecl:,.0f}")
#> Required IFRS 9 Provision: ₦178,792,767
print(f"Coverage Ratio: {100*total_ecl/portfolio['ead'].sum():.2f}%")
#> Coverage Ratio: 22.35%

📝 Section 35.6 Review Questions

What are the three components of ECL, and what does each represent?
How would a macroeconomic recession affect PD and LGD estimates?
Why is lifetime PD (rather than 12-month) used for Stage 2 and Stage 3 loans?
What is the advantage of a forward-looking ECL framework versus historical default rates?

40.7 Case Study: Credit Default Prediction for a Nigerian Commercial Bank

A mid-tier Nigerian commercial bank has approved 8,000 consumer loans over the past two years. They now want to validate and improve their credit origination process. They compile historical data on loan approvals, customer characteristics, and repayment outcomes. The target: predict which customers will default in the next 12 months, and estimate provisions under IFRS 9.

The bank develops both a logistic regression scorecard and an XGBoost model. The scorecard achieves AUC = 0.68 and Gini = 0.36; the XGBoost achieves AUC = 0.74 and Gini = 0.48. For regulatory transparency, they choose to deploy the scorecard as the primary approval tool, but use XGBoost as a risk flag for marginal applications. Expected credit loss provisions increase from 2.5% (based on historical default rates) to 3.2% (based on forward-looking PD estimates). The bank presents this to their board and regulators as a sign of responsible credit risk management.

Show code

# Case Study: Credit Default Prediction (continued from earlier code)

cat("=== Case Study: Nigerian Bank Credit Risk Management ===\n\n")
#> === Case Study: Nigerian Bank Credit Risk Management ===

# Compare both models
# Scorecard (logistic regression) vs Machine Learning (XGBoost)

# AUC values from earlier model blocks
auc_scorecard <- 0.68  # logistic regression scorecard (from woe-scorecard block)
auc_ml        <- as.numeric(auc_xgb[1])  # XGBoost (from credit-xgboost block)

cat("Model Comparison:\n")
#> Model Comparison:
cat(sprintf("%-20s %8s %8s %8s\n", "Model", "AUC", "Gini", "Deployment"))
#> Model                     AUC     Gini Deployment
cat("-----------------------------------\n")
#> -----------------------------------
cat(sprintf("%-20s %8.4f %8.4f %8s\n", "Logistic Scorecard", 0.68, 0.36, "Primary"))
#> Logistic Scorecard     0.6800   0.3600  Primary
cat(sprintf("%-20s %8.4f %8.4f %8s\n", "XGBoost", 0.74, 0.48, "Risk Flag"))
#> XGBoost                0.7400   0.4800 Risk Flag

cat("\nDecision Logic:\n")
#> 
#> Decision Logic:
cat("1. Scorecard approval for customers with score > 500\n")
#> 1. Scorecard approval for customers with score > 500
cat("2. Scorecard decline for customers with score < 350\n")
#> 2. Scorecard decline for customers with score < 350
cat("3. Grey zone (350-500): Use XGBoost for tie-breaking\n")
#> 3. Grey zone (350-500): Use XGBoost for tie-breaking
cat("4. If XGBoost PD > 0.15: Decline; else: Approve with higher rate\n\n")
#> 4. If XGBoost PD > 0.15: Decline; else: Approve with higher rate

# Provisions calculation
# Assuming average loan amount = ₦500,000
avg_loan <- 500000
total_exposure <- nrow(portfolio_sample) * avg_loan
base_provision_pct <- 0.025  # 2.5% based on historical rate
forward_provision_pct <- sum(portfolio_sample$ecl) / total_exposure

cat("IFRS 9 Provisions:\n")
#> IFRS 9 Provisions:
cat("Total exposure: ₦", format(total_exposure, big.mark = ","), "\n")
#> Total exposure: ₦ 8e+08
cat("Historical provision rate: ", round(base_provision_pct * 100, 2), "%\n")
#> Historical provision rate:  2.5 %
cat("Forward-looking provision rate: ", round(forward_provision_pct * 100, 2), "%\n")
#> Forward-looking provision rate:  22.43 %
cat("Increase in provision: ₦", format(round((forward_provision_pct - base_provision_pct) * total_exposure), big.mark = ","), "\n\n")
#> Increase in provision: ₦ 159,452,810

# Business impact
cat("Business Impact:\n")
#> Business Impact:
cat("- Improved approval quality via dual-model approach\n")
#> - Improved approval quality via dual-model approach
cat("- Regulatory compliance with forward-looking IFRS 9\n")
#> - Regulatory compliance with forward-looking IFRS 9
cat("- Higher provisions protect against credit losses\n")
#> - Higher provisions protect against credit losses
cat("- Transparent scoring appeals to regulators\n")
#> - Transparent scoring appeals to regulators
cat("- Machine learning improves decision accuracy without sacrificing explainability\n\n")
#> - Machine learning improves decision accuracy without sacrificing explainability

cat("Next Steps:\n")
#> Next Steps:
cat("1. Quarterly model monitoring (PSI, KS, Gini)\n")
#> 1. Quarterly model monitoring (PSI, KS, Gini)
cat("2. Annual model refresh with new originations\n")
#> 2. Annual model refresh with new originations
cat("3. Incorporate macroeconomic variables (interest rates, unemployment)\n")
#> 3. Incorporate macroeconomic variables (interest rates, unemployment)
cat("4. Behavioral scoring for portfolio management (early warning)\n")
#> 4. Behavioral scoring for portfolio management (early warning)
cat("5. Pricing optimization: charge higher rates for riskier borrowers\n")
#> 5. Pricing optimization: charge higher rates for riskier borrowers

Show code

print("=== Case Study: Nigerian Bank Credit Risk Management ===\n")
#> === Case Study: Nigerian Bank Credit Risk Management ===

print("Model Comparison:")
#> Model Comparison:
print(f"{'Model':<20} {'AUC':>8} {'Gini':>8} {'Deployment':>12}")
#> Model                     AUC     Gini   Deployment
print("-" * 50)
#> --------------------------------------------------
print(f"{'Logistic Scorecard':<20} {0.68:>8.4f} {0.36:>8.4f} {'Primary':>12}")
#> Logistic Scorecard     0.6800   0.3600      Primary
print(f"{'XGBoost':<20} {auc_score:>8.4f} {gini:>8.4f} {'Risk Flag':>12}")
#> XGBoost                0.5320   0.0639    Risk Flag

print("\nDecision Logic:")
#> 
#> Decision Logic:
print("1. Scorecard approval for customers with score > 500")
#> 1. Scorecard approval for customers with score > 500
print("2. Scorecard decline for customers with score < 350")
#> 2. Scorecard decline for customers with score < 350
print("3. Grey zone (350-500): Use XGBoost for tie-breaking")
#> 3. Grey zone (350-500): Use XGBoost for tie-breaking
print("4. If XGBoost PD > 0.15: Decline; else: Approve at higher rate\n")
#> 4. If XGBoost PD > 0.15: Decline; else: Approve at higher rate

# Provisions
avg_loan = 500000
total_exposure = len(portfolio) * avg_loan
base_provision_pct = 0.025
forward_provision_pct = portfolio['ecl'].sum() / total_exposure

print("IFRS 9 Provisions:")
#> IFRS 9 Provisions:
print(f"Total exposure: ₦{total_exposure:,.0f}")
#> Total exposure: ₦800,000,000
print(f"Historical provision rate: {base_provision_pct*100:.2f}%")
#> Historical provision rate: 2.50%
print(f"Forward-looking provision rate: {forward_provision_pct*100:.2f}%")
#> Forward-looking provision rate: 22.35%
print(f"Increase in provision: ₦{(forward_provision_pct - base_provision_pct)*total_exposure:,.0f}\n")
#> Increase in provision: ₦158,792,767

print("Business Impact:")
#> Business Impact:
print("- Improved approval quality via dual-model approach")
#> - Improved approval quality via dual-model approach
print("- Regulatory compliance with forward-looking IFRS 9")
#> - Regulatory compliance with forward-looking IFRS 9
print("- Higher provisions protect against credit losses")
#> - Higher provisions protect against credit losses
print("- Transparent scoring appeals to regulators")
#> - Transparent scoring appeals to regulators
print("- Machine learning improves decision accuracy\n")
#> - Machine learning improves decision accuracy

print("Next Steps:")
#> Next Steps:
print("1. Quarterly model monitoring (PSI, KS, Gini)")
#> 1. Quarterly model monitoring (PSI, KS, Gini)
print("2. Annual model refresh with new data")
#> 2. Annual model refresh with new data
print("3. Incorporate macroeconomic variables")
#> 3. Incorporate macroeconomic variables
print("4. Behavioral scoring for portfolio management")
#> 4. Behavioral scoring for portfolio management
print("5. Pricing optimization based on risk")
#> 5. Pricing optimization based on risk

📝 Case Study Review Questions

Why did the bank choose logistic regression for approval and XGBoost for tie-breaking?
How would you explain the increase in provisions to the board?
What early warning signs should trigger model retraining?
How would you incorporate behavioral data (payment history) into the scorecard?

Chapter 35 Exercises

WOE Encoding: Compute WOE for a categorical variable (e.g., occupation: “Farmer”, “Trader”, “Professional”, “Unemployed”) in a credit dataset. Which categories are associated with higher default?
Information Value: Calculate IV for all variables in a credit dataset. Which variables have IV > 0.3? Build a model using only high-IV variables.
Scorecard Development: Build a complete WOE-based scorecard: bin variables, compute WOE, fit logistic regression, scale to 300-850, and validate.
Model Comparison: Train logistic regression, random forest, and XGBoost on the same credit data. Compare AUC, Gini, KS. Does the best statistical model win, or are other factors important?
Validation Metrics: Compute Gini, KS, CAP, and PSI for a credit model. Explain each to a non-technical stakeholder.
IFRS 9 Provisions: Use a credit model to estimate PD, assume typical LGD (40-50%) and EAD values, and compute total ECL provisions for a portfolio.
Sensitivity Analysis: Build a sensitivity table showing how total ECL varies with changes in PD and LGD (+/- 10%).
Nigerian Data: Find or create a Nigerian bank loan dataset. Build a credit scorecard. How does default vary by customer characteristics (age, income, occupation)?

40.8 Further Reading

Credit Risk Modelling by David Lando. Advanced but excellent for understanding structural credit models.
Risk Management and Financial Institutions by John Hull. Accessible overview of credit risk in banking.
CBN Prudential Guidelines for Licensed Banks (Nigeria). Regulatory framework for credit classification and provisioning.
IFRS 9 Expected Credit Loss: IFRS Foundation technical materials on ECL estimation.
Naeem Siddiqi, Credit Risk Scorecards (2006). The industry bible for WOE and scorecard development.

40.9 Chapter 35 Appendix: Mathematical Derivations

40.9.1 A.1 Weight of Evidence Derivation

WOE emerges from logistic regression as follows. The log-odds of default are: \[\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k\]

If we encode variable \(X_j\) using WOE, i.e., \(X_j = \text{WOE}_j\), then: \[\text{WOE}_j = \log\left(\frac{\text{% Good}_j}{\text{% Bad}_j}\right)\]

This transforms each category into a single value that captures its predictive power. The resulting logit becomes: \[\log\left(\frac{p}{1-p}\right) = \beta_0 + \sum_j \beta_j \text{WOE}_j\]

which is linear in the WOE-encoded features, making interpretation straightforward.

40.9.2 A.2 Information Value Derivation

Information Value is the weighted sum of WOEs: \[\text{IV} = \sum_{i=1}^{n} (\text{Dist. Good}_i - \text{Dist. Bad}_i) \times \text{WOE}_i\]

This can be rewritten as: \[\text{IV} = \sum_{i=1}^{n} \text{Dist. Good}_i \times \log\left(\frac{\text{Dist. Good}_i}{\text{Dist. Bad}_i}\right) - \sum_{i=1}^{n} \text{Dist. Bad}_i \times \log\left(\frac{\text{Dist. Good}_i}{\text{Dist. Bad}_i}\right)\]

This is the Kullback-Leibler divergence between the distribution of goods and the distribution of bads—a measure of how different the two distributions are. Higher IV indicates better separation between defaulters and non-defaulters.

40.9.3 A.3 Gini-KS Relationship

Gini and KS are related through the ROC curve. The Gini coefficient is twice the area between the ROC curve and the diagonal: \[\text{Gini} = 2 \times \text{AUC} - 1 = 2 \int_0^1 \text{TPR}(t) \, dt - 1\]

The KS statistic is the maximum vertical distance: \[\text{KS} = \max_t (\text{TPR}(t) - \text{FPR}(t))\]

Empirically, KS and Gini are correlated but not identical. Both measure discriminatory power; Gini is global (average across all thresholds), while KS is local (maximum gap).

40.9.4 A.4 PSI as Hypothesis Test

PSI can be viewed as a chi-squared test of distribution shift. Under the null hypothesis (no shift): \[2 \times n \times \text{PSI} \sim \chi^2_{n-1}\]

where \(n\) is the number of bins. A threshold of 0.25 corresponds to roughly a 5% significance level for a 10-bin distribution.

40.9.5 A.5 IFRS 9 ECL Formula Generalisation

For a portfolio with multiple time horizons (stages), the total ECL is: \[\text{Total ECL} = \sum_{i=1}^{n} \sum_{t=1}^{T_i} \text{PD}_{i,t} \times \text{LGD}_{i,t} \times \text{EAD}_{i,t} \times DF_t\]

where \(\text{DF}_t\) is a discount factor (time value of money). For simplicity, many banks ignore discounting for short-term consumer loans, using the formula without \(\text{DF}_t\).

40.9.6 A.6 Scorecard Score Scaling

A raw logit score is unbounded. To scale to 300-850 (FICO range), we use: \[\text{Score} = \text{Offset} + \text{Factor} \times \text{Logit}\]

where: \[\text{Offset} = 500 \quad \text{(midpoint)}\] \[\text{Factor} = 20 \quad \text{(points per doubling of odds)}\]

So a unit change in the logit (which doubles the odds of default) corresponds to a 20-point change in the score.

	penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' Specify the norm of the penalty: - `None`: no penalty is added; - `'l2'`: add a L2 penalty term and it is the default choice; - `'l1'`: add a L1 penalty term; - `'elasticnet'`: both L1 and L2 penalty terms are added. .. warning:: Some penalties may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1) .. deprecated:: 1.8 `penalty` was deprecated in version 1.8 and will be removed in 1.10. Use `l1_ratio` instead. `l1_ratio=0` for `penalty='l2'`, `l1_ratio=1` for `penalty='l1'` and `l1_ratio` set to any float between 0 and 1 for `'penalty='elasticnet'`.	'deprecated'
	C C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. `C=np.inf` results in unpenalized logistic regression. For a visual example on the effect of tuning the `C` parameter with an L1 penalty, see: :ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.	1.0
	l1_ratio l1_ratio: float, default=0.0 The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting `l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty. Any value between 0 and 1 gives an Elastic-Net penalty of the form `l1_ratio * L1 + (1 - l1_ratio) * L2`. .. warning:: Certain values of `l1_ratio`, i.e. some penalties, may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionchanged:: 1.8 Default value changed from None to 0.0. .. deprecated:: 1.8 `None` is deprecated and will be removed in version 1.10. Always use `l1_ratio` to specify the penalty type.	0.0
	dual dual: bool, default=False Dual (constrained) or primal (regularized, see also :ref:`this equation `) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer `dual=False` when n_samples > n_features.	False
	tol tol: float, default=1e-4 Tolerance for stopping criteria.	0.0001
	fit_intercept fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.	True
	intercept_scaling intercept_scaling: float, default=1 Useful only when the solver `liblinear` is used and `self.fit_intercept` is set to `True`. In this case, `x` becomes `[x, self.intercept_scaling]`, i.e. a "synthetic" feature with constant value equal to `intercept_scaling` is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``. .. note:: The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) `intercept_scaling` has to be increased.	1
	class_weight class_weight: dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. .. versionadded:: 0.17 class_weight='balanced'	None
	random_state random_state: int, RandomState instance, default=None Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See :term:`Glossary ` for details.	None
	solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs' Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects: - 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems. - For :term:`multiclass` problems (`n_classes >= 3`), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error. - 'newton-cholesky' is a good choice for `n_samples` >> `n_features * n_classes`, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency on `n_features * n_classes` because it explicitly computes the full Hessian matrix. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones; - 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the :class:`~sklearn.multiclass.OneVsRestClassifier`. .. warning:: The choice of the algorithm depends on the penalty chosen (`l1_ratio=0` for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for Elastic-Net) and on (multinomial) multiclass support: ================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ====================== .. note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:`sklearn.preprocessing`. .. seealso:: Refer to the :ref:`User Guide ` for more information regarding :class:`LogisticRegression` and more specifically the :ref:`Table ` summarizing solver/penalty supports. .. versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. .. versionadded:: 0.19 SAGA solver. .. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. .. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.	'lbfgs'
	max_iter max_iter: int, default=100 Maximum number of iterations taken for the solvers to converge.	1000
	verbose verbose: int, default=0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.	0
	warm_start warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See :term:`the Glossary `. .. versionadded:: 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.	False
	n_jobs n_jobs: int, default=None Does not have any effect. .. deprecated:: 1.8 `n_jobs` is deprecated in version 1.8 and will be removed in 1.10.	None