41 Fraud Detection

Author

Bongo Adi

📋 Learning Objectives

Understand the fraud landscape in African fintech and banking, with emphasis on Nigerian context
Recognise the class imbalance problem and why standard metrics fail for fraud
Implement rule-based fraud detection systems and their limitations
Build supervised machine learning models with cost matrices and SMOTE
Apply unsupervised anomaly detection (Isolation Forest, autoencoders)
Detect fraud rings using graph-based methods
Design real-time scoring systems and monitoring pipelines

41.1 The Fraud Landscape

Fraud is endemic in financial systems, especially in developing economies where identity verification, regulatory oversight, and transaction monitoring may be weaker. In Nigeria, the Nigeria Inter-Bank Settlement System (NIBSS) annual fraud reports document billions of Naira in losses. Common fraud types include: (1) Payment fraud: Unauthorised card transactions, account compromise, phishing; (2) Identity theft: Fraudsters impersonating legitimate customers, using stolen credentials or BVN (Bank Verification Number); (3) Loan fraud: False income statements, forged collateral documents, identity misrepresentation; (4) Money laundering: Structuring transactions to evade reporting thresholds, using mules and cash-in agents; (5) SIM swap fraud: Fraudsters taking over phone numbers to reset passwords and steal mobile money. The financial, reputational, and regulatory costs are substantial. A single data breach exposes a bank to millions in direct losses, regulatory fines (CBN penalties can reach 2% of annual turnover), and loss of customer trust.

The strategic challenge is that fraudsters adapt. A rule that catches a fraud technique today is bypassed tomorrow. Traditional rule-based systems (if transaction amount > ₦500,000 AND unusual_country = TRUE, flag) are rigid and generate many false positives, frustrating customers with legitimate needs. Machine learning models learn patterns of fraud from historical data, but they face an extreme class imbalance: fraud typically comprises 0.1–2% of transactions, sometimes lower in mature markets. A naive classifier that predicts “not fraud” for everything achieves 99% accuracy but misses all frauds. The metric that matters in fraud is precision-recall (or equivalently, F1 score and lift), not overall accuracy.

41.2 The Class Imbalance Problem Revisited

Class imbalance is the core challenge in fraud detection. Given a dataset with 99.5% legitimate transactions and 0.5% fraud, a model that always predicts “legitimate” has 99.5% accuracy but fails utterly. Standard metrics (accuracy, ROC-AUC in extreme imbalance) become misleading. Instead, practitioners use precision, recall, and F1-score. Precision = (Fraud caught) / (Total flagged) answers: “Of the alerts we send, how many are real fraud?” Recall = (Fraud caught) / (All fraud) answers: “Of the fraud that happened, what fraction did we catch?” There is a trade-off: stricter rules catch more fraud (high recall) but flag many innocents (low precision).

The cost matrix formalises this trade-off. A false positive (flagging a legitimate transaction) costs ₦1,000 in customer friction and operational overhead. A false negative (missing fraud) costs ₦50,000 in direct losses, chargebacks, and regulatory provisions. So catching 100 frauds at the cost of 1,000 false positives is a profitable trade. The optimal threshold is not 0.5; it is wherever the expected cost is minimised: \(\text{Cost} = (1 - \text{Recall}) \times \text{Fraud}\_\text{Cost} + (1 - \text{Precision}) \times \text{FP}\_\text{Cost}\). Methods like SMOTE (Synthetic Minority Over-sampling Technique) artificially generate synthetic fraud samples to balance classes, allowing standard algorithms to work better. Weighted loss functions apply higher penalty to misclassifying fraud.

📘 Theory: Class Imbalance and Cost-Sensitive Learning

🔑 Key Formula

Precision and Recall (for fraud): \[\text{Precision} = \frac{TP}{TP + FP}, \quad \text{Recall} = \frac{TP}{TP + FN}\]

F1 Score (harmonic mean): \[F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\]

Cost-Weighted Classification: \[\text{Optimal Threshold} = \arg\min_\theta \left[ (1-R(\theta)) \times C_{\text{FN}} + (1-P(\theta)) \times C_{\text{FP}} \right]\]

where \(C_{\text{FN}}\) and \(C_{\text{FP}}\) are the costs of missing fraud and false positives.

Show code

# Demonstrate the class imbalance problem in fraud detection

set.seed(4738)

# Generate synthetic NIBSS-style transaction data
n_transactions <- 500000
fraud_rate <- 0.003  # 0.3% fraud
n_fraud <- as.integer(n_transactions * fraud_rate)
n_legitimate <- n_transactions - n_fraud

# Legitimate transactions: smaller amounts, regular patterns
legitimate_amount <- rnorm(n_legitimate, mean = 50000, sd = 30000)
legitimate_transactions <- data.frame(
  amount = legitimate_amount,
  is_fraud = 0
)

# Fraud transactions: tend to be larger, at unusual times
fraud_amount <- rnorm(n_fraud, mean = 300000, sd = 150000)
fraud_transactions <- data.frame(
  amount = fraud_amount,
  is_fraud = 1
)

# Combine
transactions <- rbind(legitimate_transactions, fraud_transactions)

cat("=== Class Imbalance Problem in Fraud Detection ===\n\n")
#> === Class Imbalance Problem in Fraud Detection ===
cat("Dataset:\n")
#> Dataset:
cat("Total transactions:", nrow(transactions), "\n")
#> Total transactions: 500000
cat("Legitimate transactions:", sum(transactions$is_fraud == 0), "(",
    round(100 * mean(transactions$is_fraud == 0), 2), "%)\n")
#> Legitimate transactions: 498500 ( 99.7 %)
cat("Fraud transactions:", sum(transactions$is_fraud == 1), "(",
    round(100 * mean(transactions$is_fraud == 1), 2), "%)\n\n")
#> Fraud transactions: 1500 ( 0.3 %)

# Naive classifier: always predict "legitimate"
naive_accuracy <- mean(transactions$is_fraud == 0)

cat("Naive Classifier (always predicts 'Not Fraud'):\n")
#> Naive Classifier (always predicts 'Not Fraud'):
cat("Accuracy:", round(naive_accuracy, 4), "\n")
#> Accuracy: 0.997
cat("Recall (catches fraud):", 0, "\n")
#> Recall (catches fraud): 0
cat("Precision:", "Undefined (no positive predictions)\n")
#> Precision: Undefined (no positive predictions)
cat("Usefulness: USELESS (misses all fraud)\n\n")
#> Usefulness: USELESS (misses all fraud)

# A better classifier: threshold-based on amount
threshold <- 150000
predicted_fraud <- as.numeric(transactions$amount > threshold)

accuracy <- mean(predicted_fraud == transactions$is_fraud)
precision <- sum(predicted_fraud == 1 & transactions$is_fraud == 1) / sum(predicted_fraud == 1)
recall <- sum(predicted_fraud == 1 & transactions$is_fraud == 1) / sum(transactions$is_fraud == 1)
f1 <- 2 * precision * recall / (precision + recall)

cat("Amount > ₦150,000 Rule:\n")
#> Amount > ₦150,000 Rule:
cat("Accuracy:", round(accuracy, 4), "\n")
#> Accuracy: 0.9991
cat("Precision:", round(precision, 4), "->", round(100 * precision, 2), "% of flagged are actual fraud\n")
#> Precision: 0.8504 -> 85.04 % of flagged are actual fraud
cat("Recall:", round(recall, 4), "->", round(100 * recall, 2), "% of fraud caught\n")
#> Recall: 0.8453 -> 84.53 % of fraud caught
cat("F1-Score:", round(f1, 4), "\n\n")
#> F1-Score: 0.8479

# Cost matrix: FN costs ₦50,000, FP costs ₦1,000
cost_fn <- 50000
cost_fp <- 1000

# Calculate expected cost for different thresholds
thresholds <- seq(50000, 400000, by = 25000)
costs <- numeric(length(thresholds))
recalls <- numeric(length(thresholds))
precisions <- numeric(length(thresholds))

for (i in seq_along(thresholds)) {
  pred <- as.numeric(transactions$amount > thresholds[i])
  tp <- sum(pred == 1 & transactions$is_fraud == 1)
  fp <- sum(pred == 1 & transactions$is_fraud == 0)
  fn <- sum(pred == 0 & transactions$is_fraud == 1)

  # Expected cost
  costs[i] <- fn * cost_fn + fp * cost_fp

  precisions[i] <- tp / max(tp + fp, 1)
  recalls[i] <- tp / (tp + fn)
}

# Plot
library(ggplot2)

cost_df <- data.frame(
  threshold = thresholds,
  total_cost = costs / 1e9,  # Billions of Naira
  recall = recalls,
  precision = precisions
)

ggplot(cost_df, aes(x = threshold / 1000, y = total_cost)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  theme_minimal() +
  labs(
    title = "Fraud Detection: Cost vs Threshold",
    x = "Amount Threshold (₦ thousands)",
    y = "Total Expected Cost (Billion ₦)"
  ) +
  ylim(0, max(cost_df$total_cost))

Show code


# Optimal threshold
optimal_idx <- which.min(costs)
optimal_threshold <- thresholds[optimal_idx]

cat("Optimal Threshold (minimising total cost):", optimal_threshold, "₦\n")
#> Optimal Threshold (minimising total cost): 150000 ₦
cat("Expected cost:", format(round(costs[optimal_idx]), big.mark = ","), "₦\n")
#> Expected cost: 11,823,000 ₦
cat("Recall at optimal:", round(recalls[optimal_idx], 4), "\n")
#> Recall at optimal: 0.8453
cat("Precision at optimal:", round(precisions[optimal_idx], 4), "\n")
#> Precision at optimal: 0.8504

Show code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(4738)

# Generate NIBSS-style transaction data
n_transactions = 500000
fraud_rate = 0.003
n_fraud = int(n_transactions * fraud_rate)
n_legitimate = n_transactions - n_fraud

# Legitimate transactions
legit_amount = np.random.normal(50000, 30000, n_legitimate)
legit_df = pd.DataFrame({'amount': legit_amount, 'is_fraud': 0})

# Fraud transactions (larger amounts)
fraud_amount = np.random.normal(300000, 150000, n_fraud)
fraud_df = pd.DataFrame({'amount': fraud_amount, 'is_fraud': 1})

transactions = pd.concat([legit_df, fraud_df], ignore_index=True)

print("=== Class Imbalance Problem in Fraud Detection ===\n")
#> === Class Imbalance Problem in Fraud Detection ===
print("Dataset:")
#> Dataset:
print(f"Total transactions: {len(transactions)}")
#> Total transactions: 500000
print(f"Legitimate: {(transactions['is_fraud'] == 0).sum()} ({100*(transactions['is_fraud']==0).mean():.2f}%)")
#> Legitimate: 498500 (99.70%)
print(f"Fraud: {(transactions['is_fraud'] == 1).sum()} ({100*(transactions['is_fraud']==1).mean():.2f}%)\n")
#> Fraud: 1500 (0.30%)

# Naive classifier
naive_accuracy = (transactions['is_fraud'] == 0).mean()
print("Naive Classifier (always 'Not Fraud'):")
#> Naive Classifier (always 'Not Fraud'):
print(f"Accuracy: {naive_accuracy:.4f}")
#> Accuracy: 0.9970
print(f"Recall: 0 (catches 0% of fraud)")
#> Recall: 0 (catches 0% of fraud)
print(f"Usefulness: USELESS\n")
#> Usefulness: USELESS

# Amount threshold rule
def evaluate_threshold(data, threshold):
    pred = (data['amount'] > threshold).astype(int)
    tp = ((pred == 1) & (data['is_fraud'] == 1)).sum()
    fp = ((pred == 1) & (data['is_fraud'] == 0)).sum()
    fn = ((pred == 0) & (data['is_fraud'] == 1)).sum()

    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0

    return precision, recall, f1, tp, fp, fn

threshold = 150000
precision, recall, f1, tp, fp, fn = evaluate_threshold(transactions, threshold)

print(f"Amount > ₦{threshold} Rule:")
#> Amount > ₦150000 Rule:
print(f"Precision: {precision:.4f} ({100*precision:.2f}% of flagged are fraud)")
#> Precision: 0.8529 (85.29% of flagged are fraud)
print(f"Recall: {recall:.4f} ({100*recall:.2f}% of fraud caught)")
#> Recall: 0.8507 (85.07% of fraud caught)
print(f"F1-Score: {f1:.4f}\n")
#> F1-Score: 0.8518

# Cost analysis
cost_fn = 50000
cost_fp = 1000

thresholds = np.linspace(50000, 400000, 15)
costs = []
recalls_list = []
precisions_list = []

for t in thresholds:
    prec, rec, _, tp, fp, fn = evaluate_threshold(transactions, t)
    cost = fn * cost_fn + fp * cost_fp
    costs.append(cost)
    recalls_list.append(rec)
    precisions_list.append(prec)

# Plot
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Cost vs threshold
axes[0].plot(thresholds/1000, np.array(costs)/1e9, marker='o', linewidth=2)
#> [<matplotlib.lines.Line2D object at 0x000001C7A2772E40>]
axes[0].set_xlabel('Amount Threshold (₦ thousands)')
#> Text(0.5, 0, 'Amount Threshold (₦ thousands)')
axes[0].set_ylabel('Total Expected Cost (Billion ₦)')
#> Text(0, 0.5, 'Total Expected Cost (Billion ₦)')
axes[0].set_title('Expected Cost vs Threshold')
#> Text(0.5, 1.0, 'Expected Cost vs Threshold')
axes[0].grid(True, alpha=0.3)

# Precision-recall trade-off
axes[1].plot(recalls_list, precisions_list, marker='s', linewidth=2)
#> [<matplotlib.lines.Line2D object at 0x000001C7A2772F90>]
axes[1].set_xlabel('Recall (% of fraud caught)')
#> Text(0.5, 0, 'Recall (% of fraud caught)')
axes[1].set_ylabel('Precision (% of alerts that are real fraud)')
#> Text(0, 0.5, 'Precision (% of alerts that are real fraud)')
axes[1].set_title('Precision-Recall Trade-off')
#> Text(0.5, 1.0, 'Precision-Recall Trade-off')
axes[1].grid(True, alpha=0.3)

# Amount distributions
axes[2].hist(legit_df['amount'], bins=50, alpha=0.7, label='Legitimate', color='green', density=True)
#> (array([3.54730337e-10, 3.54730337e-10, 1.41892135e-09, 2.12838202e-09,
#>        7.80406742e-09, 1.56081348e-08, 3.47635731e-08, 5.81757753e-08,
#>        1.06773832e-07, 1.81267202e-07, 3.05068090e-07, 5.13294798e-07,
#>        8.03818945e-07, 1.25397174e-06, 1.86517211e-06, 2.62642342e-06,
#>        3.57781018e-06, 4.79347105e-06, 6.16947003e-06, 7.64514823e-06,
#>        9.15984677e-06, 1.06238189e-05, 1.18494122e-05, 1.27752584e-05,
#>        1.31952591e-05, 1.32207997e-05, 1.27408495e-05, 1.19576049e-05,
#>        1.06993764e-05, 9.50428993e-06, 7.82003029e-06, 6.43019683e-06,
#>        4.97651190e-06, 3.79348623e-06, 2.75377161e-06, 1.90135461e-06,
#>        1.29512046e-06, 8.33616293e-07, 5.66504349e-07, 3.28125562e-07,
#>        2.06098326e-07, 1.14223169e-07, 5.92399664e-08, 3.22804607e-08,
#>        1.45439438e-08, 9.57771911e-09, 4.61149439e-09, 3.54730337e-09,
#>        1.06419101e-09, 3.54730337e-10]), array([-91699.5078933 , -86044.45716225, -80389.4064312 , -74734.35570015,
#>        -69079.30496909, -63424.25423804, -57769.20350699, -52114.15277593,
#>        -46459.10204488, -40804.05131383, -35149.00058278, -29493.94985172,
#>        -23838.89912067, -18183.84838962, -12528.79765857,  -6873.74692751,
#>         -1218.69619646,   4436.35453459,  10091.40526565,  15746.4559967 ,
#>         21401.50672775,  27056.5574588 ,  32711.60818986,  38366.65892091,
#>         44021.70965196,  49676.76038301,  55331.81111407,  60986.86184512,
#>         66641.91257617,  72296.96330723,  77952.01403828,  83607.06476933,
#>         89262.11550038,  94917.16623144, 100572.21696249, 106227.26769354,
#>        111882.31842459, 117537.36915565, 123192.4198867 , 128847.47061775,
#>        134502.52134881, 140157.57207986, 145812.62281091, 151467.67354196,
#>        157122.72427302, 162777.77500407, 168432.82573512, 174087.87646617,
#>        179742.92719723, 185397.97792828, 191053.02865933]), <BarContainer object of 50 artists>)
axes[2].hist(fraud_df['amount'], bins=50, alpha=0.7, label='Fraud', color='red', density=True)
#> (array([2.81224235e-08, 0.00000000e+00, 0.00000000e+00, 2.81224235e-08,
#>        2.81224235e-08, 8.43672704e-08, 2.81224235e-08, 1.68734541e-07,
#>        2.24979388e-07, 2.81224235e-07, 4.49958775e-07, 3.09346658e-07,
#>        5.34326046e-07, 8.71795127e-07, 8.99917551e-07, 1.43424360e-06,
#>        1.06865209e-06, 1.91232480e-06, 1.96856964e-06, 1.88420237e-06,
#>        2.47477326e-06, 2.67163023e-06, 2.72787508e-06, 2.69975265e-06,
#>        2.36228357e-06, 2.58726296e-06, 2.58726296e-06, 2.10918176e-06,
#>        2.13730418e-06, 1.34987633e-06, 1.46236602e-06, 1.32175390e-06,
#>        8.71795127e-07, 6.18693316e-07, 5.06203622e-07, 5.62448469e-07,
#>        4.49958775e-07, 2.24979388e-07, 8.43672704e-08, 2.81224235e-08,
#>        2.81224235e-08, 5.62448469e-08, 2.81224235e-08, 0.00000000e+00,
#>        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
#>        0.00000000e+00, 2.81224235e-08]), array([-248928.97686228, -225223.10145758, -201517.22605287,
#>        -177811.35064817, -154105.47524346, -130399.59983876,
#>        -106693.72443406,  -82987.84902935,  -59281.97362465,
#>         -35576.09821994,  -11870.22281524,   11835.65258946,
#>          35541.52799417,   59247.40339887,   82953.27880358,
#>         106659.15420828,  130365.02961298,  154070.90501769,
#>         177776.78042239,  201482.6558271 ,  225188.5312318 ,
#>         248894.40663651,  272600.28204121,  296306.15744591,
#>         320012.03285062,  343717.90825532,  367423.78366003,
#>         391129.65906473,  414835.53446943,  438541.40987414,
#>         462247.28527884,  485953.16068355,  509659.03608825,
#>         533364.91149296,  557070.78689766,  580776.66230236,
#>         604482.53770707,  628188.41311177,  651894.28851648,
#>         675600.16392118,  699306.03932588,  723011.91473059,
#>         746717.79013529,  770423.66554   ,  794129.5409447 ,
#>         817835.4163494 ,  841541.29175411,  865247.16715881,
#>         888953.04256352,  912658.91796822,  936364.79337293]), <BarContainer object of 50 artists>)
axes[2].axvline(x=150000, color='black', linestyle='--', label='Threshold')
#> <matplotlib.lines.Line2D object at 0x000001C7838392B0>
axes[2].set_xlabel('Transaction Amount (₦)')
#> Text(0.5, 0, 'Transaction Amount (₦)')
axes[2].set_ylabel('Density')
#> Text(0, 0.5, 'Density')
axes[2].set_title('Amount Distribution: Legitimate vs Fraud')
#> Text(0.5, 1.0, 'Amount Distribution: Legitimate vs Fraud')
axes[2].legend()
#> <matplotlib.legend.Legend object at 0x000001C7A2772BA0>

plt.tight_layout()
plt.show()

Show code


# Optimal threshold
optimal_idx = np.argmin(costs)
optimal_threshold = thresholds[optimal_idx]
print(f"Optimal Threshold (min cost): ₦{optimal_threshold:,.0f}")
#> Optimal Threshold (min cost): ₦150,000
print(f"Expected cost: ₦{costs[optimal_idx]:,.0f}")
#> Expected cost: ₦11,420,000
print(f"Recall: {recalls_list[optimal_idx]:.4f}, Precision: {precisions_list[optimal_idx]:.4f}")
#> Recall: 0.8507, Precision: 0.8529

📝 Section 36.2 Review Questions

Why is accuracy misleading as a metric for fraud detection?
What is the precision-recall trade-off, and how do we choose the optimal balance?
How would you estimate the cost of a false positive and false negative in fraud?
What is SMOTE, and how does it help with imbalanced datasets?

41.3 Rule-Based Systems

Historically, fraud detection relied entirely on rules. An expert (domain specialist) observes fraud patterns and translates them into Boolean logic: if condition_1 AND condition_2 AND condition_3, flag as suspicious. Examples:

If amount > ₦1,000,000 AND unusual_country = TRUE AND new_card, flag.
If velocity (transactions per hour) > 10, flag.
If distance_from_home > 1,000 km AND amount > median, flag.

Rules are transparent, explainable, and immediately actionable by analysts. A transaction flagged by a clear rule can be quickly reviewed. They also have zero false negatives for known patterns: if the rule perfectly captures a fraud technique, all such frauds are caught. However, rules are brittle. Fraudsters observe rules and adapt. If the rule is “unusual_country = TRUE,” fraudsters stop using foreign cards, starting instead to test stolen cards domestically. Rules also generate many false positives, overwhelming analysts. A typical bank’s rule set may flag 10,000 transactions a day as suspicious, requiring a team of 50+ analysts to review them.

A modern approach combines rules and machine learning: rules handle obvious, known patterns; machine learning flags suspicious patterns not captured by rules. For example, a rule catches velocity fraud; machine learning catches subtle deviations in amount, time-of-day, or merchant patterns.

📘 Theory: Rule-Based vs Statistical Fraud Detection

🔑 Key Formula

Simple Rule: If \(\sum_{i=1}^{k} \mathbb{1}[\text{Condition}_i] \geq m\), flag as fraud.

Weighted Rule: If \(\sum_{i=1}^{k} w_i \times \mathbb{1}[\text{Condition}_i] \geq \text{threshold}\), flag.

Velocity Rule: If \(\frac{\text{# transactions in past hour}}{\text{typical daily rate}} > 10\), flag.

Show code

# Rule-based fraud detection system

set.seed(8527)

# Generate synthetic transaction data with multiple features
n_trans <- 50000
data <- data.frame(
  tx_id = 1:n_trans,
  amount = rnorm(n_trans, mean = 50000, sd = 40000),
  is_new_card = sample(c(0, 1), n_trans, replace = TRUE, prob = c(0.95, 0.05)),
  is_foreign_merchant = sample(c(0, 1), n_trans, replace = TRUE, prob = c(0.98, 0.02)),
  is_unusual_time = sample(c(0, 1), n_trans, replace = TRUE, prob = c(0.95, 0.05)),
  is_high_velocity = sample(c(0, 1), n_trans, replace = TRUE, prob = c(0.99, 0.01)),
  is_distance_far = sample(c(0, 1), n_trans, replace = TRUE, prob = c(0.96, 0.04))
)

# True fraud: happens in combinations
# Fraud rule: (new_card OR foreign_merchant) AND (high_velocity OR unusual_time)
# Plus some random fraud
fraud_prob <- (
  ((data$is_new_card + data$is_foreign_merchant > 0) &
   (data$is_high_velocity + data$is_unusual_time > 0)) * 0.3 +
  (data$is_distance_far * 0.1) +
  0.003  # Base fraud rate
)
fraud_prob <- pmin(fraud_prob, 1)
data$is_fraud <- as.numeric(runif(n_trans) < fraud_prob)

cat("=== Rule-Based Fraud Detection System ===\n\n")
#> === Rule-Based Fraud Detection System ===

# Rule 1: Amount
rule1 <- data$amount > 300000
cat("Rule 1 (Amount > ₦300,000):\n")
#> Rule 1 (Amount > ₦300,000):
cat("  Flagged:", sum(rule1), "transactions\n")
#>   Flagged: 0 transactions
cat("  Fraud caught:", sum(rule1 & data$is_fraud), "\n")
#>   Fraud caught: 0
cat("  Precision:", round(sum(rule1 & data$is_fraud) / sum(rule1), 4), "\n\n")
#>   Precision: NaN

# Rule 2: Velocity
rule2 <- data$is_high_velocity == 1
cat("Rule 2 (High velocity):\n")
#> Rule 2 (High velocity):
cat("  Flagged:", sum(rule2), "\n")
#>   Flagged: 475
cat("  Fraud caught:", sum(rule2 & data$is_fraud), "\n")
#>   Fraud caught: 12
cat("  Precision:", round(sum(rule2 & data$is_fraud) / sum(rule2), 4), "\n\n")
#>   Precision: 0.0253

# Rule 3: New card + foreign merchant
rule3 <- (data$is_new_card == 1) & (data$is_foreign_merchant == 1)
cat("Rule 3 (New card + Foreign merchant):\n")
#> Rule 3 (New card + Foreign merchant):
cat("  Flagged:", sum(rule3), "\n")
#>   Flagged: 57
cat("  Fraud caught:", sum(rule3 & data$is_fraud), "\n")
#>   Fraud caught: 2
cat("  Precision:", round(sum(rule3 & data$is_fraud) / sum(rule3), 4), "\n\n")
#>   Precision: 0.0351

# Combined rule (OR: flag if ANY rule triggers)
combined_or <- rule1 | rule2 | rule3
cat("Combined Rule (OR logic - flag if ANY rule):\n")
#> Combined Rule (OR logic - flag if ANY rule):
cat("  Flagged:", sum(combined_or), "\n")
#>   Flagged: 531
cat("  Fraud caught:", sum(combined_or & data$is_fraud), "\n")
#>   Fraud caught: 13
cat("  Recall:", round(sum(combined_or & data$is_fraud) / sum(data$is_fraud), 4), "\n")
#>   Recall: 0.032
cat("  Precision:", round(sum(combined_or & data$is_fraud) / sum(combined_or), 4), "\n")
#>   Precision: 0.0245
cat("  False positives:", sum(combined_or & data$is_fraud == 0), "\n\n")
#>   False positives: 518

# Weighted rule
weighted_score <- (
  rule1 * 20 +  # High suspicion
  rule2 * 30 +
  rule3 * 25 +
  (data$is_unusual_time == 1) * 10 +
  (data$is_distance_far == 1) * 15
)

threshold <- 30
weighted_flag <- weighted_score >= threshold

cat("Weighted Rule (score >= 30):\n")
#> Weighted Rule (score >= 30):
cat("  Flagged:", sum(weighted_flag), "\n")
#>   Flagged: 479
cat("  Fraud caught:", sum(weighted_flag & data$is_fraud), "\n")
#>   Fraud caught: 12
cat("  Recall:", round(sum(weighted_flag & data$is_fraud) / sum(data$is_fraud), 4), "\n")
#>   Recall: 0.0296
cat("  Precision:", round(sum(weighted_flag & data$is_fraud) / sum(weighted_flag), 4), "\n")
#>   Precision: 0.0251
cat("  False positives:", sum(weighted_flag & data$is_fraud == 0), "\n")
#>   False positives: 467

Show code

import numpy as np
import pandas as pd

np.random.seed(8527)

# Generate transaction data
n_trans = 50000
data = pd.DataFrame({
    'tx_id': range(1, n_trans + 1),
    'amount': np.random.normal(50000, 40000, n_trans),
    'is_new_card': np.random.choice([0, 1], n_trans, p=[0.95, 0.05]),
    'is_foreign_merchant': np.random.choice([0, 1], n_trans, p=[0.98, 0.02]),
    'is_unusual_time': np.random.choice([0, 1], n_trans, p=[0.95, 0.05]),
    'is_high_velocity': np.random.choice([0, 1], n_trans, p=[0.99, 0.01]),
    'is_distance_far': np.random.choice([0, 1], n_trans, p=[0.96, 0.04])
})

# True fraud patterns
fraud_prob = (
    ((data['is_new_card'] + data['is_foreign_merchant'] > 0) &
     (data['is_high_velocity'] + data['is_unusual_time'] > 0)) * 0.3 +
    (data['is_distance_far'] * 0.1) +
    0.003
)
fraud_prob = np.clip(fraud_prob, 0, 1)
data['is_fraud'] = (np.random.rand(n_trans) < fraud_prob).astype(int)

print("=== Rule-Based Fraud Detection System ===\n")
#> === Rule-Based Fraud Detection System ===

# Rule 1: Amount
rule1 = data['amount'] > 300000
print("Rule 1 (Amount > ₦300,000):")
#> Rule 1 (Amount > ₦300,000):
print(f"  Flagged: {rule1.sum()}")
#>   Flagged: 0
print(f"  Fraud caught: {(rule1 & (data['is_fraud'] == 1)).sum()}")
#>   Fraud caught: 0
precision1 = (rule1 & (data['is_fraud'] == 1)).sum() / rule1.sum()
print(f"  Precision: {precision1:.4f}\n")
#>   Precision: nan

# Rule 2: Velocity
rule2 = data['is_high_velocity'] == 1
print("Rule 2 (High velocity):")
#> Rule 2 (High velocity):
print(f"  Flagged: {rule2.sum()}")
#>   Flagged: 546
print(f"  Fraud caught: {(rule2 & (data['is_fraud'] == 1)).sum()}")
#>   Fraud caught: 20
precision2 = (rule2 & (data['is_fraud'] == 1)).sum() / rule2.sum() if rule2.sum() > 0 else 0
print(f"  Precision: {precision2:.4f}\n")
#>   Precision: 0.0366

# Rule 3: New card + foreign merchant
rule3 = (data['is_new_card'] == 1) & (data['is_foreign_merchant'] == 1)
print("Rule 3 (New card + Foreign merchant):")
#> Rule 3 (New card + Foreign merchant):
print(f"  Flagged: {rule3.sum()}")
#>   Flagged: 47
print(f"  Fraud caught: {(rule3 & (data['is_fraud'] == 1)).sum()}")
#>   Fraud caught: 1
precision3 = (rule3 & (data['is_fraud'] == 1)).sum() / rule3.sum() if rule3.sum() > 0 else 0
print(f"  Precision: {precision3:.4f}\n")
#>   Precision: 0.0213

# Combined OR
combined = rule1 | rule2 | rule3
fraud_caught = (combined & (data['is_fraud'] == 1)).sum()
recall = fraud_caught / (data['is_fraud'] == 1).sum()
precision = fraud_caught / combined.sum() if combined.sum() > 0 else 0
print(f"Combined Rule (OR - flag if ANY):")
#> Combined Rule (OR - flag if ANY):
print(f"  Flagged: {combined.sum()}")
#>   Flagged: 593
print(f"  Fraud caught: {fraud_caught}")
#>   Fraud caught: 21
print(f"  Recall: {recall:.4f}, Precision: {precision:.4f}")
#>   Recall: 0.0528, Precision: 0.0354
print(f"  False positives: {(combined & (data['is_fraud'] == 0)).sum()}\n")
#>   False positives: 572

# Weighted rule
weighted_score = (
    rule1.astype(int) * 20 +
    rule2.astype(int) * 30 +
    rule3.astype(int) * 25 +
    (data['is_unusual_time'] * 10) +
    (data['is_distance_far'] * 15)
)

weighted_flag = weighted_score >= 30
fraud_caught_w = (weighted_flag & (data['is_fraud'] == 1)).sum()
recall_w = fraud_caught_w / (data['is_fraud'] == 1).sum()
precision_w = fraud_caught_w / weighted_flag.sum() if weighted_flag.sum() > 0 else 0

print(f"Weighted Rule (score >= 30):")
#> Weighted Rule (score >= 30):
print(f"  Flagged: {weighted_flag.sum()}")
#>   Flagged: 549
print(f"  Fraud caught: {fraud_caught_w}")
#>   Fraud caught: 21
print(f"  Recall: {recall_w:.4f}, Precision: {precision_w:.4f}")
#>   Recall: 0.0528, Precision: 0.0383
print(f"  False positives: {(weighted_flag & (data['is_fraud'] == 0)).sum()}")
#>   False positives: 528

📝 Section 36.3 Review Questions

What is the advantage of rule-based fraud detection? What is the disadvantage?
How would you design a weighted rule system that balances coverage and precision?
Why might rules generate high false positives?
How would you adapt a rule system if fraudsters learn to evade it?

41.4 Supervised Fraud Models

Machine learning models trained on historical fraud data can capture patterns more flexibly than hand-coded rules. The challenge is handling the class imbalance. Standard logistic regression or decision trees, trained on 99.5% non-fraud data, learn that “non-fraud” is the safe default; they rarely predict fraud even when features suggest it.

Solutions include: (1) Class weighting: Penalise fraud misclassification (false negatives) more heavily in the loss function. A penalty ratio of 200:1 (cost of missing fraud : cost of false alarm) is common. (2) SMOTE (Synthetic Minority Over-sampling Technique): Generate synthetic fraud samples using nearest neighbours in feature space. If a real fraud has features \(\mathbf{x}_1\) and its nearest neighbour (also fraud) has \(\mathbf{x}_2\), create a synthetic sample at \(\mathbf{x}_1 + \alpha(\mathbf{x}_2 - \mathbf{x}_1)\) for random \(\alpha \in [0,1]\). This inflates the fraud class without duplicating exactly. (3) Threshold tuning: Train on balanced or weighted data, then adjust the decision threshold post-hoc to maximise F1 score or a business metric (expected cost).

XGBoost handles class weighting well via the scale_pos_weight parameter. After training, we don’t use threshold 0.5; we choose a threshold that maximises expected profit given the cost matrix.

📘 Theory: Class Balancing for Fraud

🔑 Key Formula

SMOTE: Generate Synthetic Minority Samples \[\mathbf{x}_{\text{synthetic}} = \mathbf{x}_i + \lambda (\mathbf{x}_{k-\text{NN}} - \mathbf{x}_i), \quad \lambda \sim U(0, 1)\]

Weighted Loss for Class Imbalance: \[L = w_0 \sum_{i: y_i = 0} \ell(y_i, \hat{y}_i) + w_1 \sum_{i: y_i = 1} \ell(y_i, \hat{y}_i)\]

Typical: \(w_1 / w_0 = (\text{# Negatives}) / (\text{# Positives})\).

XGBoost scale_pos_weight: \[\text{scale\_pos\_weight} = w_1 / w_0\]

Show code

# Supervised models for fraud detection with class balancing

library(xgboost)
library(caret)

# Use the synthetic fraud data from earlier
set.seed(6384)

# Prepare features
features <- c("amount", "is_new_card", "is_foreign_merchant",
              "is_unusual_time", "is_high_velocity", "is_distance_far")
X <- as.matrix(data[, features])

# Standardize
X <- scale(X)

y <- data$is_fraud

# Train-test split
train_idx <- createDataPartition(y, p = 0.8, list = FALSE)
X_train <- X[train_idx, ]
y_train <- y[train_idx]
X_test <- X[-train_idx, ]
y_test <- y[-train_idx]

cat("=== Supervised Fraud Detection ===\n\n")
#> === Supervised Fraud Detection ===
cat("Training set fraud rate:", round(mean(y_train), 4), "\n")
#> Training set fraud rate: 0.0081
cat("Test set fraud rate:", round(mean(y_test), 4), "\n\n")
#> Test set fraud rate: 0.0082

# XGBoost with class weighting
scale_pos_weight <- sum(y_train == 0) / sum(y_train == 1)

dtrain <- xgb.DMatrix(data = X_train, label = y_train)
dtest <- xgb.DMatrix(data = X_test, label = y_test)

params <- list(
  objective = "binary:logistic",
  scale_pos_weight = scale_pos_weight,
  max_depth = 4,
  eta = 0.1,
  min_child_weight = 1
)

xgb_model <- xgb.train(
  params = params,
  data = dtrain,
  nrounds = 100,
  watchlist = list(test = dtest),
  verbose = 0
)

# Predictions
y_pred <- predict(xgb_model, dtest)

# Evaluate at different thresholds
thresholds <- seq(0.01, 0.99, by = 0.05)
results <- data.frame(
  threshold = numeric(),
  recall = numeric(),
  precision = numeric(),
  f1 = numeric(),
  false_positives = numeric()
)

for (i in seq_along(thresholds)) {
  pred <- as.numeric(y_pred > thresholds[i])
  tp <- sum(pred == 1 & y_test == 1)
  fp <- sum(pred == 1 & y_test == 0)
  fn <- sum(pred == 0 & y_test == 1)

  recall <- tp / (tp + fn)
  precision <- tp / max(tp + fp, 1)
  f1 <- 2 * precision * recall / max(precision + recall, 1)

  results <- rbind(results, data.frame(
    threshold = thresholds[i],
    recall = recall,
    precision = precision,
    f1 = f1,
    false_positives = fp
  ))
}

# Find optimal threshold (maximise F1)
optimal_idx <- which.max(results$f1)
optimal_threshold <- results$threshold[optimal_idx]

cat("Optimal Threshold (max F1):", round(optimal_threshold, 3), "\n")
#> Optimal Threshold (max F1): 0.76
cat("Recall:", round(results$recall[optimal_idx], 4), "\n")
#> Recall: 0.5366
cat("Precision:", round(results$precision[optimal_idx], 4), "\n")
#> Precision: 0.1158
cat("F1-Score:", round(results$f1[optimal_idx], 4), "\n")
#> F1-Score: 0.1243
cat("False positives:", results$false_positives[optimal_idx], "\n\n")
#> False positives: 336

# Visualize
library(ggplot2)
ggplot(results, aes(x = recall, y = precision)) +
  geom_path(linewidth = 1) +
  geom_point(aes(colour = f1), size = 3) +
  scale_colour_gradient(low = "red", high = "green") +
  theme_minimal() +
  labs(
    title = "Precision-Recall Trade-off: XGBoost Fraud Model",
    x = "Recall (% fraud caught)",
    y = "Precision (% alerts that are real fraud)",
    colour = "F1-Score"
  ) +
  annotate("point", x = results$recall[optimal_idx], y = results$precision[optimal_idx],
           shape = 21, size = 5, color = "black", fill = NA, stroke = 2)

Show code

import xgboost as xgb
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Prepare data
features = ['amount', 'is_new_card', 'is_foreign_merchant',
            'is_unusual_time', 'is_high_velocity', 'is_distance_far']
X = data[features].values

# Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

y = data['is_fraud'].values

# Split
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=6384, stratify=y
)

print("=== Supervised Fraud Detection ===\n")
#> === Supervised Fraud Detection ===
print(f"Training fraud rate: {y_train.mean():.4f}")
#> Training fraud rate: 0.0080
print(f"Test fraud rate: {y_test.mean():.4f}\n")
#> Test fraud rate: 0.0080

# XGBoost with class weight
scale_pos_weight = (y_train == 0).sum() / (y_train == 1).sum()

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

params = {
    'objective': 'binary:logistic',
    'scale_pos_weight': scale_pos_weight,
    'max_depth': 4,
    'eta': 0.1,
    'min_child_weight': 1
}

xgb_model = xgb.train(params, dtrain, num_boost_round=100, verbose_eval=False)

# Predictions
y_pred = xgb_model.predict(dtest)

# Evaluate at different thresholds
thresholds = np.linspace(0.01, 0.99, 20)
results = []

for threshold in thresholds:
    pred = (y_pred > threshold).astype(int)
    tp = ((pred == 1) & (y_test == 1)).sum()
    fp = ((pred == 1) & (y_test == 0)).sum()
    fn = ((pred == 0) & (y_test == 1)).sum()

    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0

    results.append({
        'threshold': threshold,
        'recall': recall,
        'precision': precision,
        'f1': f1,
        'false_positives': fp
    })

results_df = pd.DataFrame(results)
optimal_idx = results_df['f1'].idxmax()
optimal_threshold = results_df.loc[optimal_idx, 'threshold']

print(f"Optimal Threshold (max F1): {optimal_threshold:.3f}")
#> Optimal Threshold (max F1): 0.732
print(f"Recall: {results_df.loc[optimal_idx, 'recall']:.4f}")
#> Recall: 0.6000
print(f"Precision: {results_df.loc[optimal_idx, 'precision']:.4f}")
#> Precision: 0.1188
print(f"F1-Score: {results_df.loc[optimal_idx, 'f1']:.4f}")
#> F1-Score: 0.1983
print(f"False positives: {int(results_df.loc[optimal_idx, 'false_positives'])}\n")
#> False positives: 356

# Plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))

# Precision-recall curve
ax1.plot(results_df['recall'], results_df['precision'], marker='o', linewidth=2)
#> [<matplotlib.lines.Line2D object at 0x000001C7A2AA9D30>]
ax1.scatter([results_df.loc[optimal_idx, 'recall']], [results_df.loc[optimal_idx, 'precision']],
           s=200, c='red', marker='*', zorder=5, label='Optimal')
#> <matplotlib.collections.PathCollection object at 0x000001C7A2AA9A90>
ax1.set_xlabel('Recall')
#> Text(0.5, 0, 'Recall')
ax1.set_ylabel('Precision')
#> Text(0, 0.5, 'Precision')
ax1.set_title('Precision-Recall Curve: XGBoost Fraud Model')
#> Text(0.5, 1.0, 'Precision-Recall Curve: XGBoost Fraud Model')
ax1.legend()
#> <matplotlib.legend.Legend object at 0x000001C7A2AA9BE0>
ax1.grid(True, alpha=0.3)

# F1 vs threshold
ax2.plot(results_df['threshold'], results_df['f1'], marker='o', linewidth=2, color='green')
#> [<matplotlib.lines.Line2D object at 0x000001C7A2AAA120>]
ax2.axvline(x=optimal_threshold, color='red', linestyle='--', label=f'Optimal: {optimal_threshold:.3f}')
#> <matplotlib.lines.Line2D object at 0x000001C7A2AA9E80>
ax2.set_xlabel('Prediction Threshold')
#> Text(0.5, 0, 'Prediction Threshold')
ax2.set_ylabel('F1-Score')
#> Text(0, 0.5, 'F1-Score')
ax2.set_title('F1-Score vs Threshold')
#> Text(0.5, 1.0, 'F1-Score vs Threshold')
ax2.legend()
#> <matplotlib.legend.Legend object at 0x000001C7A2AA9FD0>
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

📝 Section 36.4 Review Questions

Why is class weighting important in XGBoost for fraud detection?
How does SMOTE work, and when should you use it?
Why don’t we use threshold 0.5 for fraud classification?
How would you choose the optimal threshold given a cost matrix?

41.5 Anomaly Detection

Unsupervised anomaly detection finds transactions that deviate significantly from the normal pattern, without requiring historical fraud labels. This is valuable because (a) fraudsters continuously innovate, creating patterns not seen in training data; (b) in new markets, fraud labels may be scarce. Two approaches stand out.

Isolation Forest is fast and effective for high-dimensional data. The intuition: anomalies are isolated (few neighbours). Isolation Forest builds random decision trees, splitting on random features and random thresholds. Anomalies require fewer splits to isolate than normal points. Each transaction gets an anomaly score: number of splits needed to isolate it. A score near 0 suggests normal; near 1 suggests anomalous. Isolation Forest is unsupervised (no labels needed), robust to outliers, and fast (O(n log n)).

Autoencoders (neural networks) learn to compress and reconstruct normal transactions. A normal transaction (all features of a legitimate purchase) reconstructs well; a fraudulent transaction (unusual feature combinations) has high reconstruction error. Train an autoencoder on only non-fraudulent transactions, then use reconstruction error as the anomaly score. Autoencoders are more flexible than Isolation Forest (can learn complex manifolds) but slower to train.

Both methods complement supervised models. A transaction flagged by both the supervised model and an anomaly detector is very likely fraud.

📘 Theory: Unsupervised Anomaly Detection

🔑 Key Formula

Isolation Forest Anomaly Score: \[s(x) = 2^{-E(h(x))/c(n)}\]

where \(E(h(x))\) is the expected path length from root to leaf, \(c(n)\) is the average path length in isolation trees, and \(n\) is sample size.

Autoencoder Reconstruction Error: \[\text{Anomaly Score} = \|\mathbf{x} - \text{Decoder}(\text{Encoder}(\mathbf{x}))\|_2\]

Higher error → more anomalous.

Show code

# Anomaly detection for fraud: Isolation Forest

library(isotree)

set.seed(2916)

# Prepare features (excluding fraud label for unsupervised learning)
X_features <- data[, c("amount", "is_new_card", "is_foreign_merchant",
                       "is_unusual_time", "is_high_velocity", "is_distance_far")]
X_scaled <- scale(X_features)

# Train Isolation Forest
iso_model <- isolation.forest(X_scaled, ntrees = 100, nthreads = 1)

# Compute anomaly scores
anomaly_scores <- predict(iso_model, X_scaled)

cat("=== Anomaly Detection: Isolation Forest ===\n\n")
#> === Anomaly Detection: Isolation Forest ===
cat("Anomaly score range:", round(min(anomaly_scores), 4), "to", round(max(anomaly_scores), 4), "\n")
#> Anomaly score range: 0.3728 to 0.7937
cat("Mean anomaly score:", round(mean(anomaly_scores), 4), "\n")
#> Mean anomaly score: 0.4156
cat("Std dev:", round(sd(anomaly_scores), 4), "\n\n")
#> Std dev: 0.0608

# Detect anomalies at different thresholds
anomaly_thresholds <- c(0.6, 0.7, 0.8)

for (threshold in anomaly_thresholds) {
  anomaly_pred <- as.numeric(anomaly_scores > threshold)
  tp <- sum(anomaly_pred == 1 & data$is_fraud == 1)
  fp <- sum(anomaly_pred == 1 & data$is_fraud == 0)
  fn <- sum(anomaly_pred == 0 & data$is_fraud == 1)

  recall <- tp / (tp + fn)
  precision <- tp / max(tp + fp, 1)

  cat(sprintf("Threshold %.2f:\n", threshold))
  cat("  Flagged:", sum(anomaly_pred), "transactions\n")
  cat("  Fraud caught:", tp, "\n")
  cat("  Recall:", round(recall, 4), ", Precision:", round(precision, 4), "\n\n")
}
#> Threshold 0.60:
#>   Flagged: 1070 transactions
#>   Fraud caught: 85 
#>   Recall: 0.2094 , Precision: 0.0794 
#> 
#> Threshold 0.70:
#>   Flagged: 301 transactions
#>   Fraud caught: 37 
#>   Recall: 0.0911 , Precision: 0.1229 
#> 
#> Threshold 0.80:
#>   Flagged: 0 transactions
#>   Fraud caught: 0 
#>   Recall: 0 , Precision: 0

# Compare to supervised model
# Use predictions from earlier supervised model
supervised_pred <- y_pred

# Combine: flag if EITHER model says anomalous/fraud
combined_pred <- (anomaly_scores > 0.7) | (supervised_pred > 0.5)

cat("Combined (Isolation Forest OR Supervised):\n")
#> Combined (Isolation Forest OR Supervised):
cat("Flagged:", sum(combined_pred), "\n")
#> Flagged: 3600
tp_combined <- sum(combined_pred & data$is_fraud == 1)
fp_combined <- sum(combined_pred & data$is_fraud == 0)
recall_combined <- tp_combined / sum(data$is_fraud == 1)
precision_combined <- tp_combined / sum(combined_pred)
cat("Fraud caught:", tp_combined, "\n")
#> Fraud caught: 66
cat("Recall:", round(recall_combined, 4), ", Precision:", round(precision_combined, 4), "\n")
#> Recall: 0.1626 , Precision: 0.0183

# Visualize anomaly scores by true label
library(ggplot2)
score_df <- data.frame(
  anomaly_score = anomaly_scores,
  is_fraud = factor(data$is_fraud, labels = c("Legitimate", "Fraud"))
)

ggplot(score_df, aes(x = is_fraud, y = anomaly_score)) +
  geom_boxplot(fill = c("green", "red"), alpha = 0.7) +
  geom_jitter(width = 0.2, size = 1, alpha = 0.3) +
  theme_minimal() +
  labs(
    title = "Isolation Forest Anomaly Scores by Fraud Status",
    y = "Anomaly Score (higher = more anomalous)"
  ) +
  geom_hline(yintercept = 0.7, linetype = "dashed", colour = "blue", label = "Detection threshold")

Show code

from sklearn.ensemble import IsolationForest
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Prepare features
features = ['amount', 'is_new_card', 'is_foreign_merchant',
            'is_unusual_time', 'is_high_velocity', 'is_distance_far']
X_features = data[features].values

# Standardize
X_scaled = (X_features - X_features.mean(axis=0)) / (X_features.std(axis=0) + 1e-8)

# Train Isolation Forest
iso_model = IsolationForest(contamination=0.05, random_state=2916, n_estimators=100)
iso_model.fit(X_scaled)
#> IsolationForest(contamination=0.05, random_state=2916)

# Anomaly scores (-1 to 1, higher = more anomalous)
anomaly_scores = -iso_model.score_samples(X_scaled)
anomaly_scores = (anomaly_scores - anomaly_scores.min()) / (anomaly_scores.max() - anomaly_scores.min())

print("=== Anomaly Detection: Isolation Forest ===\n")
#> === Anomaly Detection: Isolation Forest ===
print(f"Anomaly score range: {anomaly_scores.min():.4f} to {anomaly_scores.max():.4f}")
#> Anomaly score range: 0.0000 to 1.0000
print(f"Mean: {anomaly_scores.mean():.4f}, Std: {anomaly_scores.std():.4f}\n")
#> Mean: 0.1487, Std: 0.2226

# Evaluate at different thresholds
for threshold in [0.6, 0.7, 0.8]:
    pred = (anomaly_scores > threshold).astype(int)
    tp = ((pred == 1) & (data['is_fraud'] == 1)).sum()
    fp = ((pred == 1) & (data['is_fraud'] == 0)).sum()
    fn = ((pred == 0) & (data['is_fraud'] == 1)).sum()

    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0

    print(f"Threshold {threshold:.2f}:")
    print(f"  Flagged: {pred.sum()} transactions")
    print(f"  Fraud caught: {tp}")
    print(f"  Recall: {recall:.4f}, Precision: {precision:.4f}\n")
#> Threshold 0.60:
#>   Flagged: 3591 transactions
#>   Fraud caught: 174
#>   Recall: 0.4372, Precision: 0.0485
#> 
#> Threshold 0.70:
#>   Flagged: 2143 transactions
#>   Fraud caught: 117
#>   Recall: 0.2940, Precision: 0.0546
#> 
#> Threshold 0.80:
#>   Flagged: 334 transactions
#>   Fraud caught: 46
#>   Recall: 0.1156, Precision: 0.1377

# Combined with supervised — re-score the same test set used by XGBoost
# X_test comes from the supervised block (StandardScaler, same 6 features)
anon_test = -iso_model.score_samples(X_test)
anon_test_norm = (anon_test - anomaly_scores.min()) / (anomaly_scores.max() - anomaly_scores.min())
combined = (anon_test_norm > 0.7) | (y_pred > 0.5)
tp_combined = ((combined) & (y_test == 1)).sum()
fp_combined = ((combined) & (y_test == 0)).sum()
recall_combined = tp_combined / (y_test == 1).sum()
precision_combined = tp_combined / combined.sum() if combined.sum() > 0 else 0

print("Combined (IF OR Supervised):")
#> Combined (IF OR Supervised):
print(f"  Flagged: {combined.sum()}")
#>   Flagged: 516
print(f"  Fraud caught: {tp_combined}")
#>   Fraud caught: 54
print(f"  Recall: {recall_combined:.4f}, Precision: {precision_combined:.4f}\n")
#>   Recall: 0.6750, Precision: 0.1047

# Visualize
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))

# Box plot of anomaly scores
legitimate_scores = anomaly_scores[data['is_fraud'] == 0]
fraud_scores = anomaly_scores[data['is_fraud'] == 1]

ax1.boxplot([legitimate_scores, fraud_scores], labels=['Legitimate', 'Fraud'],
           patch_artist=True, boxprops=dict(facecolor='lightblue'))
#> {'whiskers': [<matplotlib.lines.Line2D object at 0x000001C7A2B38EC0>, <matplotlib.lines.Line2D object at 0x000001C7A2B39010>, <matplotlib.lines.Line2D object at 0x000001C7A2B396A0>, <matplotlib.lines.Line2D object at 0x000001C7A2B397F0>], 'caps': [<matplotlib.lines.Line2D object at 0x000001C7A2B39160>, <matplotlib.lines.Line2D object at 0x000001C7A2B392B0>, <matplotlib.lines.Line2D object at 0x000001C7A2B39940>, <matplotlib.lines.Line2D object at 0x000001C7A2B39A90>], 'boxes': [<matplotlib.patches.PathPatch object at 0x000001C7A2B38AD0>, <matplotlib.patches.PathPatch object at 0x000001C7A2B1B750>], 'medians': [<matplotlib.lines.Line2D object at 0x000001C7A2B39400>, <matplotlib.lines.Line2D object at 0x000001C7A2B39BE0>], 'fliers': [<matplotlib.lines.Line2D object at 0x000001C7A2B39550>, <matplotlib.lines.Line2D object at 0x000001C7A2B39D30>], 'means': []}
ax1.axhline(y=0.7, color='red', linestyle='--', label='Threshold 0.7')
#> <matplotlib.lines.Line2D object at 0x000001C7A2B39E80>
ax1.set_ylabel('Anomaly Score')
#> Text(0, 0.5, 'Anomaly Score')
ax1.set_title('Isolation Forest Anomaly Scores')
#> Text(0.5, 1.0, 'Isolation Forest Anomaly Scores')
ax1.legend()
#> <matplotlib.legend.Legend object at 0x000001C7A2B39FD0>
ax1.grid(True, alpha=0.3)

# ROC-like curve
thresholds = np.linspace(0, 1, 50)
recalls = []
precisions = []
for th in thresholds:
    pred = (anomaly_scores > th).astype(int)
    tp = ((pred == 1) & (data['is_fraud'] == 1)).sum()
    fp = ((pred == 1) & (data['is_fraud'] == 0)).sum()
    fn = ((pred == 0) & (data['is_fraud'] == 1)).sum()
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recalls.append(recall)
    precisions.append(precision)

ax2.plot(recalls, precisions, linewidth=2, label='Isolation Forest')
#> [<matplotlib.lines.Line2D object at 0x000001C7A2B3A510>]
ax2.scatter([recall_combined], [precision_combined], s=100, c='red', marker='*', label='Combined threshold')
#> <matplotlib.collections.PathCollection object at 0x000001C7A2B84550>
ax2.set_xlabel('Recall')
#> Text(0.5, 0, 'Recall')
ax2.set_ylabel('Precision')
#> Text(0, 0.5, 'Precision')
ax2.set_title('Precision-Recall: Isolation Forest')
#> Text(0.5, 1.0, 'Precision-Recall: Isolation Forest')
ax2.legend()
#> <matplotlib.legend.Legend object at 0x000001C7A2B3A270>
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

📝 Section 36.5 Review Questions

How does Isolation Forest detect anomalies without knowing what fraud looks like?
When would you prefer anomaly detection over supervised models?
Why might combining supervised and unsupervised methods improve fraud detection?
What is the computational advantage of Isolation Forest over autoencoders?

41.6 Network-Based Fraud Detection

Fraudsters often operate in rings: multiple accounts controlled by the same person or gang, sharing infrastructure (device, phone number, address, email). Detecting rings requires network analysis. We model accounts and shared attributes as a bipartite graph: one set of nodes is accounts, the other is attributes (device IDs, phone numbers, addresses). An edge connects an account to an attribute if they share it. A dense subgraph—many accounts tightly interconnected through shared attributes—is a potential fraud ring.

For example, if accounts A, B, C all use device ID “D123”, phone “555-1234”, and address “123 Fake St,” they form a tight cluster. A graph clustering algorithm (e.g., label propagation, louvain community detection) identifies these clusters. Accounts in the same cluster as a known fraudster are suspicious.

In practice, the graph is huge: millions of accounts, billions of edges. Efficient algorithms are essential. Community detection runs in linear time; graph pattern matching (finding specific subgraph patterns) is more targeted but slower.

41.7 Real-Time Scoring Architecture

Production fraud detection demands real-time scoring: a transaction arrives, is processed in <100 milliseconds, and a decision (approve/review/decline) is made before the customer notices delay. This requires careful engineering. Feature store: Pre-computed features (velocity, historical patterns) are cached in fast databases (Redis, memcached). Model serving: The trained model is deployed in a low-latency service (using TensorFlow Serving, KServe, or custom services). A/B testing: New models are tested on a small fraction of traffic before rollout. Monitoring: Model performance, feature distributions, and decision rates are monitored in real time. If performance degrades (PSI rises, fraud rates spike, AUC drops), alerts trigger and the model may be rolled back.

41.8 Case Study: Mobile Money Fraud Detection in Nigerian Fintech

A Nigerian fintech company processes 10 million mobile money transactions per day. Fraud occurs in ~0.3% of transactions (30,000 frauds daily). They build a fraud detection system combining rules, supervised learning, and anomaly detection. The system flags ~2% of transactions (200,000) for review, of which ~15% are confirmed fraud. This is acceptable: the bank prevents ₦5 billion in fraud annually while blocking only ₦500 million in legitimate transactions (which are refunded upon customer dispute).

The system uses XGBoost for primary detection, Isolation Forest as a secondary check, and a graph-based ring detection module. Rules handle obvious patterns (velocity, amount). The model is retrained monthly. Key metrics (AUC, Gini, recall, precision, PSI) are monitored daily.

Show code

# Case Study: Mobile Money Fraud Detection (Summary)

cat("=== Case Study: Nigerian Mobile Money Fraud Detection ===\n\n")
#> === Case Study: Nigerian Mobile Money Fraud Detection ===

cat("Business Context:\n")
#> Business Context:
cat("- Platform: Mobile money (₦ transfers, airtime, bills)\n")
#> - Platform: Mobile money (₦ transfers, airtime, bills)
cat("- Daily transactions: 10 million\n")
#> - Daily transactions: 10 million
cat("- Fraud rate: 0.3% (30,000 frauds/day)\n")
#> - Fraud rate: 0.3% (30,000 frauds/day)
cat("- Direct fraud loss (undetected): ₦500 million/day\n\n")
#> - Direct fraud loss (undetected): ₦500 million/day

cat("System Architecture:\n")
#> System Architecture:
cat("1. Rules Engine\n")
#> 1. Rules Engine
cat("   - Velocity: > 10 txns/hour → high suspicion\n")
#>    - Velocity: > 10 txns/hour → high suspicion
cat("   - Amount: > ₦1 million AND unusual_time → flag\n")
#>    - Amount: > ₦1 million AND unusual_time → flag
cat("   - Geographic: sudden location jump > 500 km → flag\n\n")
#>    - Geographic: sudden location jump > 500 km → flag

cat("2. Supervised Model (XGBoost)\n")
#> 2. Supervised Model (XGBoost)
cat("   - Features: amount, velocity, time_of_day, device_id_new, distance_from_home, etc.\n")
#>    - Features: amount, velocity, time_of_day, device_id_new, distance_from_home, etc.
cat("   - Training: 2 million historical transactions, 0.3% fraud\n")
#>    - Training: 2 million historical transactions, 0.3% fraud
cat("   - Performance: AUC = 0.82, Gini = 0.64\n\n")
#>    - Performance: AUC = 0.82, Gini = 0.64

cat("3. Anomaly Detection (Isolation Forest)\n")
#> 3. Anomaly Detection (Isolation Forest)
cat("   - Threshold: score > 0.75\n")
#>    - Threshold: score > 0.75
cat("   - Complements supervised model\n")
#>    - Complements supervised model
cat("   - Detects novel fraud patterns\n\n")
#>    - Detects novel fraud patterns

cat("4. Network Detection\n")
#> 4. Network Detection
cat("   - Bipartite graph: accounts × devices/phone numbers\n")
#>    - Bipartite graph: accounts × devices/phone numbers
cat("   - Community detection identifies fraud rings\n")
#>    - Community detection identifies fraud rings
cat("   - Flag accounts sharing infrastructure with known fraud\n\n")
#>    - Flag accounts sharing infrastructure with known fraud

cat("Decision Logic:\n")
#> Decision Logic:
cat("- Rules triggered → DECLINE (clear fraud)\n")
#> - Rules triggered → DECLINE (clear fraud)
cat("- XGBoost score > 0.7 AND IF score > 0.75 → REVIEW (likely fraud)\n")
#> - XGBoost score > 0.7 AND IF score > 0.75 → REVIEW (likely fraud)
cat("- XGBoost score 0.4-0.7 → MONITOR (marginal, gather more data)\n")
#> - XGBoost score 0.4-0.7 → MONITOR (marginal, gather more data)
cat("- All else → APPROVE\n\n")
#> - All else → APPROVE

cat("Results (Monthly Average):\n")
#> Results (Monthly Average):
cat("- Transactions flagged: 200,000 (2% of daily volume)\n")
#> - Transactions flagged: 200,000 (2% of daily volume)
cat("- Fraud detected: 30,000 (true positives)\n")
#> - Fraud detected: 30,000 (true positives)
cat("- False positives: 170,000 (legitimate txns flagged)\n")
#> - False positives: 170,000 (legitimate txns flagged)
cat("- Precision: 15%\n")
#> - Precision: 15%
cat("- Recall: 100% (all fraud caught)\n")
#> - Recall: 100% (all fraud caught)
cat("- Fraud prevented: ₦5 billion\n")
#> - Fraud prevented: ₦5 billion
cat("- False positive cost: ₦170 million (customer friction)\n")
#> - False positive cost: ₦170 million (customer friction)
cat("- Net benefit: ₦4.83 billion/month\n\n")
#> - Net benefit: ₦4.83 billion/month

cat("Ongoing Monitoring:\n")
#> Ongoing Monitoring:
cat("- Daily: AUC, Gini, recall, precision, alert volume\n")
#> - Daily: AUC, Gini, recall, precision, alert volume
cat("- Weekly: Feature distributions (PSI)\n")
#> - Weekly: Feature distributions (PSI)
cat("- Monthly: Model retraining, A/B testing new features\n")
#> - Monthly: Model retraining, A/B testing new features
cat("- Quarterly: Business review, threshold optimization\n")
#> - Quarterly: Business review, threshold optimization

Show code

print("=== Case Study: Nigerian Mobile Money Fraud Detection ===\n")
#> === Case Study: Nigerian Mobile Money Fraud Detection ===

print("Business Context:")
#> Business Context:
print("- Platform: Mobile money (₦ transfers, airtime, bills)")
#> - Platform: Mobile money (₦ transfers, airtime, bills)
print("- Daily transactions: 10 million")
#> - Daily transactions: 10 million
print("- Fraud rate: 0.3% (30,000 frauds/day)")
#> - Fraud rate: 0.3% (30,000 frauds/day)
print("- Direct fraud loss: ₦500 million/day\n")
#> - Direct fraud loss: ₦500 million/day

print("System Components:")
#> System Components:
print("1. Rules Engine (immediate DECLINE if triggered)")
#> 1. Rules Engine (immediate DECLINE if triggered)
print("   - Velocity: > 10 txns/hour")
#>    - Velocity: > 10 txns/hour
print("   - Amount: > ₦1M + unusual time")
#>    - Amount: > ₦1M + unusual time
print("   - Geographic: > 500 km jump\n")
#>    - Geographic: > 500 km jump

print("2. XGBoost Supervised Model")
#> 2. XGBoost Supervised Model
print("   - Training: 2M historical transactions, 0.3% fraud")
#>    - Training: 2M historical transactions, 0.3% fraud
print("   - Performance: AUC = 0.82, Gini = 0.64\n")
#>    - Performance: AUC = 0.82, Gini = 0.64

print("3. Isolation Forest Anomaly Detection")
#> 3. Isolation Forest Anomaly Detection
print("   - Flags novel fraud patterns")
#>    - Flags novel fraud patterns
print("   - Threshold: 0.75\n")
#>    - Threshold: 0.75

print("4. Network-Based Ring Detection")
#> 4. Network-Based Ring Detection
print("   - Identifies fraud rings via shared infrastructure")
#>    - Identifies fraud rings via shared infrastructure
print("   - Flags accounts related to known fraudsters\n")
#>    - Flags accounts related to known fraudsters

print("Decision Rules:")
#> Decision Rules:
print("- Rules triggered → DECLINE")
#> - Rules triggered → DECLINE
print("- XGBoost > 0.7 AND IF > 0.75 → REVIEW")
#> - XGBoost > 0.7 AND IF > 0.75 → REVIEW
print("- XGBoost 0.4-0.7 → MONITOR")
#> - XGBoost 0.4-0.7 → MONITOR
print("- Else → APPROVE\n")
#> - Else → APPROVE

print("Monthly Results:")
#> Monthly Results:
print("- Flagged: 200,000 (2% of volume)")
#> - Flagged: 200,000 (2% of volume)
print("- Fraud caught: 30,000 (100% recall)")
#> - Fraud caught: 30,000 (100% recall)
print("- False positives: 170,000")
#> - False positives: 170,000
print("- Precision: 15%")
#> - Precision: 15%
print("- Fraud prevented: ₦5 billion")
#> - Fraud prevented: ₦5 billion
print("- Net benefit: ₦4.83 billion (after FP cost)\n")
#> - Net benefit: ₦4.83 billion (after FP cost)

print("Monitoring:")
#> Monitoring:
print("- Daily: AUC, Gini, recall, precision")
#> - Daily: AUC, Gini, recall, precision
print("- Weekly: PSI (detect distribution shift)")
#> - Weekly: PSI (detect distribution shift)
print("- Monthly: Model retraining")
#> - Monthly: Model retraining
print("- Quarterly: Threshold optimization")
#> - Quarterly: Threshold optimization

📝 Case Study Review Questions

Why does the system tolerate 15% precision (85% false positives)?
How would you prioritise which flagged transactions to manually review?
What metrics would trigger a model retraining?
How would you adapt the system if fraudsters start using the same devices as legitimate customers?

Chapter 36 Exercises

Imbalance Metrics: Generate an imbalanced classification dataset. Train a model and compute precision, recall, F1, and ROC-AUC at different thresholds. Plot the precision-recall curve.
Cost Matrix Optimisation: For a fraud dataset, define cost_FN (missed fraud) and cost_FP (false alarm). Find the threshold that minimises total cost.
SMOTE Implementation: Apply SMOTE to an imbalanced fraud dataset. Train models on original and SMOTE-balanced data. Compare performance.
Isolation Forest: Implement Isolation Forest on transaction data. Evaluate anomaly scores; visualise the distribution.
Anomaly + Supervised Ensemble: Train both an unsupervised (Isolation Forest) and supervised (XGBoost) model. Combine with an OR rule. Does ensemble improve recall?
Fraud Rings: Create a synthetic bipartite graph (accounts × devices). Use community detection to identify potential rings.
Real-Time Simulation: Simulate real-time fraud detection: stream transactions, compute features, score in < 100ms.
Nigerian Data: Find or generate a Nigerian mobile money fraud dataset. Build a complete system: rules, supervised model, anomaly detection. Measure impact.

41.9 Further Reading

NIBSS Fraud Report (annual). Nigerian Inter-Bank Settlement System. Publishes annual fraud statistics.
Fraud Detection Handbook by Le-Khac, Healy, and Whelan. Practical guide to fraud analytics.
Isolation Forests by Liu, Ting, and Zhou (2008). Seminal paper on anomaly detection.
Graph-based Fraud Ring Detection in social networks and e-commerce: Akoglu & Faloutsos (2010).
Class Imbalance in Machine Learning by He and Garcia (2009). Comprehensive survey of techniques.

41.10 Chapter 36 Appendix: Technical Derivations

41.10.1 A.1 Isolation Forest Anomaly Score

An Isolation Forest builds \(t\) random binary trees. For each sample, it records the path length \(L(x)\) from root to leaf. Anomalies are isolated quickly (short paths); normal points require longer paths. The anomaly score is: \[s(x) = 2^{-\frac{E[L(x)]}{c(n)}}\]

where \(E[L(x)]\) is the expected path length and \(c(n) = 2 H(n-1) - \frac{2(n-1)}{n}\) is the average path length. A score close to 0 indicates normal; close to 1 indicates anomalous.

41.10.2 A.2 SMOTE Synthetic Sample Generation

Given a fraud sample \(\mathbf{x}_i\) and its \(k\)-nearest neighbour in feature space (also fraud) \(\mathbf{x}_j\), SMOTE generates: \[\mathbf{x}_{\text{synthetic}} = \mathbf{x}_i + \lambda(\mathbf{x}_j - \mathbf{x}_i), \quad \lambda \sim U(0, 1)\]

Repeating this for multiple neighbours creates a larger minority class. This augmentation is done in feature space, not pixel space, so it avoids unrealistic duplicates.

41.10.3 A.3 Cost-Sensitive Threshold

Given a cost matrix where missing fraud costs \(C_{\text{FN}}\) and a false positive costs \(C_{\text{FP}}\), the optimal threshold minimises expected cost: \[\theta^* = \arg\min_\theta \left[ (1 - R(\theta)) C_{\text{FN}} + (1 - P(\theta)) C_{\text{FP}} \right]\]

where \(R(\theta)\) is recall and \(P(\theta)\) is precision at threshold \(\theta\). In practice, we evaluate this loss across candidate thresholds and select the best.

41.10.4 A.4 Bipartite Graph Community Detection

A bipartite graph \(G = (A \cup B, E)\) has accounts \(A\) and attributes \(B\) (devices, phones). An edge \((a, b)\) exists if account \(a\) has attribute \(b\). A fraud ring is a dense subgraph: many accounts sharing many attributes. Standard algorithms (Louvain, label propagation) find communities. The modularity of a partition is: \[Q = \frac{1}{2m} \sum_{ij} \left(A_{ij} - \frac{k_i k_j}{2m}\right) \delta(c_i, c_j)\]

where \(A_{ij}\) is the adjacency, \(k_i\) is the degree of node \(i\), \(m\) is the number of edges, and \(\delta(c_i, c_j)\) is 1 if nodes \(i\) and \(j\) are in the same community. Algorithms greedily optimise modularity.

41.10.5 A.5 Autoencoder Reconstruction Loss

An autoencoder has an encoder \(f: \mathbb{R}^d \to \mathbb{R}^h\) and decoder \(g: \mathbb{R}^h \to \mathbb{R}^d\). For a normal transaction \(\mathbf{x}\): \[\text{Loss} = \|\mathbf{x} - g(f(\mathbf{x}))\|^2\]

After training on non-fraud data, an anomalous transaction has high reconstruction error. The threshold for anomaly detection is chosen via a validation set or a quantile of the error distribution.