53 Supply Chain and Logistics Analytics

Author

Bongo Adi

📋 Learning Objectives

Understand supply chain networks and the bullwhip effect
Calculate key supply chain metrics: OTIF, on-time delivery, fill rate, perfect order rate
Perform exploratory analysis to identify late delivery hotspots
Build predictive models for late deliveries using machine learning
Analyse supply chain risk: concentration risk and network resilience
Solve vehicle routing problems (VRP) using heuristics
Implement supply chain analytics with Nigerian FMCG case study

53.1 Why Supply Chains Break

Supply chains are networks under uncertainty. A brand manufactures products, distributes through warehouses, and delivers to retail points and end consumers. At each step, variability enters: demand fluctuates (weather, promotions, holidays), lead times vary (port delays, production downtime), quality issues arise (defective units, damage in transit). If these uncertainties are managed—inventory buffers, safety stock, multiple sourcing—the chain hums. But when assumptions break (a supplier fails, demand spikes unexpectedly), the chain fractures, and products don’t reach customers on time.

The bullwhip effect is a well-documented phenomenon where small demand fluctuations at retail amplify upstream. If end-market demand increases by 10%, a retailer increases orders to wholesalers by 15% (to build safety stock). The wholesaler, seeing a 15% spike, increases orders to manufacturers by 25% (anticipating further growth). The manufacturer sees 25% swings and increases raw material orders by 40%. A 10% consumer demand bump becomes a 40% swing in raw material demand upstream. This amplification destabilises production planning, inflates inventory costs, and increases stockout risk when demand reverses.

In Nigeria, supply chain fragility is acute. Apapa port congestion (Lagos) creates unpredictable import delays (declared 10 days, actual 40 days). Fuel scarcity stalls logistics vehicles, delaying shipments from refineries to distributors, cascading failures downstream. Road infrastructure (Lagos-Ibadan expressway often congested) stretches last-mile delivery windows. Inventory shrinkage from theft (especially in northern Nigeria) requires larger safety stock buffers. Customs clearance delays (ports, land borders) create batch-level uncertainty. A pharmaceutical distributor aiming for 98% on-time delivery must account for these endemic risks.

Why analytics matters: visibility (real-time tracking), prediction (identifying which shipments will be late before they leave the warehouse), and optimisation (routing trucks to minimise delays and costs). A late shipment hurts: retail stockouts, customer complaints, contractual penalties. Reducing late delivery by 5 percentage points (97% → 92%) saves customer acquisition costs (fewer lost customers) and improves EBITDA (lower expedited shipping costs).

📘 Theory: Supply Chain Risk and Resilience

Supply chain resilience is the ability to absorb shocks (demand spikes, supplier failures) without severe degradation. Metrics: recovery time (how fast to normal operations), loss magnitude (revenue/inventory impact during disruption). Lean supply chains (minimized inventory, single-sourcing) are efficient but fragile. Robust chains accept higher steady-state costs (dual suppliers, safety stock) to absorb shocks. Nigerian supply chains need robustness: endemic disruptions (port delays, fuel scarcity, security) are not exceptions but norms. Analytics quantifies the cost-benefit tradeoff.

🔑 Key Metric: Bullwhip Effect Quantification

Order Variance Amplification: \[\text{Bullwhip Ratio} = \frac{\text{Variance of Orders Upstream}}{\text{Variance of Demand Downstream}}\]

A ratio > 1 indicates amplification; ratio = 1.3 means a 10% demand swings becomes 13% order swings upstream.

53.2 Supply Chain Key Metrics: How Do You Know If the Chain Is Working?

You cannot improve what you cannot measure. Supply chain management has developed a set of standard metrics — numerical scorecards — that tell you, at a glance, how well goods are flowing from production to customers. Understanding these metrics is the first step to diagnosing problems and designing improvements.

On-Time In-Full (OTIF) is the headline number. It measures the percentage of orders that were delivered both on time AND with the exact quantity the customer ordered. Notice that both conditions must be met simultaneously. If a retailer in Kano orders 100 bags of rice and you deliver 100 bags one day late, you have failed OTIF. If you deliver 85 bags on time, you have also failed OTIF. Only a delivery of exactly 100 bags by the agreed date counts as a pass.

\[\text{OTIF} = \frac{\text{Orders delivered on time AND complete}}{\text{Total orders}} \times 100\%\]

A 98% OTIF means 2% of orders are either late, short, or both. This sounds small, but for a company shipping 10,000 orders per month, that is 200 failures per month — 200 frustrated customers, 200 potential chargebacks, 200 opportunities for a competitor to step in.

Why does OTIF matter so much? Because from the customer’s perspective, a delivery is either right or wrong. A supermarket chain cannot stock its shelves with 85 bags of rice when they ordered 100. A hospital cannot safely substitute one drug for another just because the right one arrived three days late. Major retailers like Shoprite, PZ Cussons, and FMCG distributors across Nigeria now enforce OTIF clauses in supplier contracts — suppliers who consistently underperform face financial penalties or lose their listing.

Other key metrics that together paint a complete picture:

On-Time Delivery Rate (OTD): Measures only whether the delivery arrived by the promised date, regardless of whether it was complete. A shipment of 85 bags (short) delivered on time would pass OTD but fail OTIF. Tracking OTD separately from OTIF helps identify whether problems are transport-related (timing) or warehouse-related (picking and packing errors).
Fill Rate: The percentage of units in customer orders that were fulfilled from available stock without needing to back-order or delay. If a customer orders 500 units and you can ship only 430 because 70 were out of stock, your fill rate for that order is 86%. Fill rate reveals stockout problems at the warehouse level, before goods even reach the carrier.
Perfect Order Rate: The most demanding metric — the percentage of orders that are correct on every single dimension: right product, right quantity, right delivery date, right documentation, and undamaged on arrival. Even a small labelling error or a cracked package counts as a failure. This is the metric pharmaceutical companies must track for regulatory compliance.
Cash-to-Cash Cycle Time: This measures how long it takes for money invested in raw materials and inventory to return to the company as cash from customer payments. Shorter cycles mean less working capital tied up in the supply chain. If you pay your supplier on day 0, the goods sit in your warehouse for 30 days, you ship them on day 30, and your customer pays on day 60, your cash-to-cash cycle is 60 days — meaning you need to finance 60 days of inventory from your own resources or a credit line.
Inventory Turnover: Annual sales divided by average inventory value. A turnover of 6 means the business sells and replaces its entire inventory six times per year — once every two months. Higher turnover generally means less capital tied up in stock and fresher products. But extremely high turnover can lead to stockouts if replenishment cannot keep up with demand.

In Nigeria specifically, OTIF varies dramatically by geography and carrier. Lagos-to-Abuja routes with reliable logistics partners might achieve 94–96% OTIF on a good week. Deliveries to rural markets in Kebbi or Adamawa, over poor roads with fuel scarcity, might achieve only 65–70%. A single metric across the entire country would mask these differences. Supply chain analytics must always be broken down by route, region, carrier, and product category to identify where interventions will have the greatest impact.

📘 Theory: OTIF and Service Level Economics

Service level (percentage of demand met from stock without delay) is chosen to balance inventory cost and stockout cost. Higher service levels require higher safety stock, increasing carrying costs. Lower service levels risk customer dissatisfaction and lost sales. The optimal service level is where marginal cost of extra inventory equals marginal benefit of avoided stockouts. Most B2B firms target 95–99% service level; retail tolerates 90–95%.

Show code

library(tidyverse)
library(ggplot2)
library(maps)

# Synthetic Nigerian FMCG distribution data: 50,000 order lines
set.seed(7038)

n_orders <- 50000

supply_data <- tibble(
  Order_ID = rep(1:5000, each = 10),  # 5,000 orders × 10 line items
  Order_Date = rep(as.Date("2024-01-01") + sample(0:89, 5000, replace = TRUE), each = 10),
  Origin_Warehouse = rep(sample(c("Lagos", "Abuja", "Kano"), 5000, replace = TRUE), each = 10),
  Destination_State = rep(sample(c("Lagos", "Ogun", "Abeokuta", "Abuja", "Ibadan",
                                    "Kano", "Katsina", "Kebbi"), 5000, replace = TRUE), each = 10),
  Distance_km = NA,
  Carrier = rep(sample(c("Carrier_A", "Carrier_B", "Carrier_C"), 5000, replace = TRUE), each = 10),
  Order_Day_of_Week = rep(lubridate::wday(as.Date("2024-01-01") + sample(0:89, 5000, replace = TRUE)), each = 10),
  SKU_ID = rep(sample(paste0("SKU_", 1:200), 5000, replace = TRUE), each = 10),
  Quantity_Ordered = rep(sample(10:200, 5000, replace = TRUE), each = 10),
  Quantity_Delivered = NA,
  Expected_Delivery_Date = NA,
  Actual_Delivery_Date = NA,
  Delivered_On_Time = NA,
  Product_Category = NA
) |>
  mutate(
    # Distance by route
    Distance_km = case_when(
      Origin_Warehouse == "Lagos" & Destination_State == "Lagos" ~ 20 + rnorm(n(), 0, 5),
      Origin_Warehouse == "Lagos" & Destination_State %in% c("Ogun", "Abeokuta") ~ 80 + rnorm(n(), 0, 10),
      Origin_Warehouse == "Lagos" & Destination_State == "Ibadan" ~ 120 + rnorm(n(), 0, 10),
      Origin_Warehouse == "Abuja" & Destination_State == "Abuja" ~ 15 + rnorm(n(), 0, 5),
      Origin_Warehouse == "Abuja" & Destination_State %in% c("Kano", "Katsina") ~ 450 + rnorm(n(), 0, 30),
      Origin_Warehouse == "Kano" & Destination_State %in% c("Kano", "Katsina", "Kebbi") ~ 100 + rnorm(n(), 0, 15),
      TRUE ~ 200 + rnorm(n(), 0, 30)
    ) |> pmax(10),
    # Expected delivery: vary by distance and carrier
    Lead_Time_Days = case_when(
      Carrier == "Carrier_A" ~ ceiling(Distance_km / 120),  # 120 km/day
      Carrier == "Carrier_B" ~ ceiling(Distance_km / 100),  # 100 km/day (slower)
      Carrier == "Carrier_C" ~ ceiling(Distance_km / 110)   # 110 km/day
    ),
    Expected_Delivery_Date = Order_Date + Lead_Time_Days,
    # Product category affects stock availability
    Product_Category = sample(c("Beverages", "Snacks", "Toiletries", "Medicines"), n(), replace = TRUE),
    # Stock availability (fill rate)
    Stock_Availability = case_when(
      Product_Category == "Beverages" ~ rbinom(n(), 1, 0.92),
      Product_Category == "Snacks" ~ rbinom(n(), 1, 0.94),
      Product_Category == "Toiletries" ~ rbinom(n(), 1, 0.88),
      Product_Category == "Medicines" ~ rbinom(n(), 1, 0.96)
    ),
    Quantity_Delivered = ifelse(Stock_Availability == 1, Quantity_Ordered,
                               round(Quantity_Ordered * runif(n(), 0.6, 0.95))),
    # Actual delivery date (carrier delays)
    Delay_Days = case_when(
      Carrier == "Carrier_A" ~ pmax(0, rnorm(n(), 0, 1)),
      Carrier == "Carrier_B" ~ pmax(0, rnorm(n(), 2, 2)),
      Carrier == "Carrier_C" ~ pmax(0, rnorm(n(), 0.5, 1))
    ) |> round(),
    # Port/border delays for cross-state shipments
    Border_Delay = ifelse(substr(Origin_Warehouse, 1, 1) != substr(Destination_State, 1, 1),
                         pmax(0, rnorm(n(), 3, 3)), 0) |> round(),
    Actual_Delivery_Date = Expected_Delivery_Date + Delay_Days + Border_Delay,
    Delivered_On_Time = ifelse(Actual_Delivery_Date <= Expected_Delivery_Date, 1, 0)
  ) |>
  select(-Stock_Availability)

cat("=== Supply Chain Dataset Summary ===\n")
#> === Supply Chain Dataset Summary ===
cat("Total Order Lines:", nrow(supply_data), "\n")
#> Total Order Lines: 50000
cat("Unique Orders:", n_distinct(supply_data$Order_ID), "\n")
#> Unique Orders: 5000
cat("Date Range:", min(supply_data$Order_Date), "to", max(supply_data$Order_Date), "\n")
#> Date Range: 19723 to 19812
cat("Warehouses:", n_distinct(supply_data$Origin_Warehouse), "\n")
#> Warehouses: 3
cat("Carriers:", n_distinct(supply_data$Carrier), "\n\n")
#> Carriers: 3

# Calculate OTIF and related metrics at order level
order_metrics <- supply_data |>
  group_by(Order_ID) |>
  summarise(
    Order_Date = first(Order_Date),
    Origin_Warehouse = first(Origin_Warehouse),
    Destination_State = first(Destination_State),
    Carrier = first(Carrier),
    Expected_Delivery_Date = first(Expected_Delivery_Date),
    All_On_Time = min(Delivered_On_Time),  # 1 = all lines on time
    All_Complete = ifelse(sum(Quantity_Delivered) == sum(Quantity_Ordered), 1, 0),  # 1 = full order fulfilled
    OTIF_Order = All_On_Time * All_Complete,  # 1 = on time AND in full
    OTD_Order = All_On_Time,
    Fill_Rate = sum(Quantity_Delivered) / sum(Quantity_Ordered),
    Perfect_Order = All_On_Time * All_Complete * (Fill_Rate == 1),
    .groups = "drop"
  )

# Calculate key metrics
otif_overall <- mean(order_metrics$OTIF_Order) * 100
otd_overall <- mean(order_metrics$OTD_Order) * 100
fill_rate_overall <- mean(order_metrics$Fill_Rate) * 100
perfect_order_overall <- mean(order_metrics$Perfect_Order) * 100

cat("=== Key Supply Chain Metrics (Overall) ===\n")
#> === Key Supply Chain Metrics (Overall) ===
cat("On-Time In-Full (OTIF):", round(otif_overall, 1), "%\n")
#> On-Time In-Full (OTIF): 0.1 %
cat("On-Time Delivery (OTD):", round(otd_overall, 1), "%\n")
#> On-Time Delivery (OTD): 0.2 %
cat("Fill Rate:", round(fill_rate_overall, 1), "%\n")
#> Fill Rate: 98.3 %
cat("Perfect Order Rate:", round(perfect_order_overall, 1), "%\n\n")
#> Perfect Order Rate: 0.1 %

# Metrics by state
metrics_by_state <- order_metrics |>
  group_by(Destination_State) |>
  summarise(
    Orders = n(),
    OTIF_Pct = mean(OTIF_Order) * 100,
    OTD_Pct = mean(OTD_Order) * 100,
    Fill_Rate_Pct = mean(Fill_Rate) * 100,
    Avg_Days_Late = mean(ifelse(All_On_Time == 0,
                               supply_data$Actual_Delivery_Date - supply_data$Expected_Delivery_Date, NA),
                         na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(desc(OTIF_Pct))

cat("=== OTIF by Destination State ===\n")
#> === OTIF by Destination State ===
print(metrics_by_state |> select(Destination_State, Orders, OTIF_Pct, OTD_Pct))
#> # A tibble: 8 × 4
#>   Destination_State Orders OTIF_Pct OTD_Pct
#>   <chr>              <int>    <dbl>   <dbl>
#> 1 Abeokuta             628    0.478   0.637
#> 2 Abuja                646    0.310   0.310
#> 3 Ibadan               623    0       0    
#> 4 Kano                 596    0       0.168
#> 5 Katsina              606    0       0    
#> 6 Kebbi                600    0       0.167
#> 7 Lagos                591    0       0.338
#> 8 Ogun                 710    0       0

# Metrics by carrier
metrics_by_carrier <- order_metrics |>
  group_by(Carrier) |>
  summarise(
    Orders = n(),
    OTIF_Pct = mean(OTIF_Order) * 100,
    OTD_Pct = mean(OTD_Order) * 100,
    .groups = "drop"
  )

cat("\n=== OTIF by Carrier ===\n")
#> 
#> === OTIF by Carrier ===
print(metrics_by_carrier)
#> # A tibble: 3 × 4
#>   Carrier   Orders OTIF_Pct OTD_Pct
#>   <chr>      <int>    <dbl>   <dbl>
#> 1 Carrier_A   1646    0.304   0.608
#> 2 Carrier_B   1680    0       0    
#> 3 Carrier_C   1674    0       0

# Visualisation: OTIF by state
p_otif_state <- ggplot(metrics_by_state, aes(x = reorder(Destination_State, -OTIF_Pct),
                                             y = OTIF_Pct)) +
  geom_col(fill = "#2ca02c", colour = "black", linewidth = 0.5) +
  geom_text(aes(label = paste0(round(OTIF_Pct, 1), "%")), vjust = -0.5, fontsize = 3) +
  geom_hline(yintercept = 95, linetype = "dashed", colour = "red", linewidth = 1) +
  labs(
    title = "On-Time In-Full (OTIF) by State",
    x = "Destination State",
    y = "OTIF %",
    caption = "Red line = 95% target"
  ) +
  ylim(0, 105) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12))

print(p_otif_state)

Show code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(7038)

# Synthetic distribution data
n_orders = 5000
n_lines = 10
n_total = n_orders * n_lines

order_ids = np.repeat(np.arange(1, n_orders + 1), n_lines)
order_dates = np.repeat(pd.to_datetime('2024-01-01') +
                       pd.to_timedelta(np.random.randint(0, 90, n_orders), unit='D'), n_lines)
origin_warehouse = np.repeat(np.random.choice(['Lagos', 'Abuja', 'Kano'], n_orders), n_lines)
destination_state = np.repeat(np.random.choice(['Lagos', 'Ogun', 'Abeokuta', 'Abuja',
                                               'Ibadan', 'Kano', 'Katsina', 'Kebbi'], n_orders), n_lines)
carrier = np.repeat(np.random.choice(['Carrier_A', 'Carrier_B', 'Carrier_C'], n_orders), n_lines)

supply_data = pd.DataFrame({
    'Order_ID': order_ids,
    'Order_Date': order_dates,
    'Origin_Warehouse': origin_warehouse,
    'Destination_State': destination_state,
    'Carrier': carrier,
    'Quantity_Ordered': np.repeat(np.random.randint(10, 201, n_orders), n_lines)
})

# Distance calculation
def get_distance(origin, dest):
    routes = {
        ('Lagos', 'Lagos'): 20,
        ('Lagos', 'Ogun'): 80,
        ('Lagos', 'Ibadan'): 120,
        ('Abuja', 'Abuja'): 15,
        ('Abuja', 'Kano'): 450,
        ('Kano', 'Kano'): 10,
    }
    return routes.get((origin, dest), 200) + np.random.normal(0, 10)

supply_data['Distance_km'] = supply_data.apply(
    lambda row: get_distance(row['Origin_Warehouse'], row['Destination_State']), axis=1
).clip(lower=10)

# Lead time by carrier speed
carrier_speed = {'Carrier_A': 120, 'Carrier_B': 100, 'Carrier_C': 110}
supply_data['Lead_Time_Days'] = supply_data.apply(
    lambda row: int(np.ceil(row['Distance_km'] / carrier_speed[row['Carrier']])), axis=1
)
supply_data['Expected_Delivery_Date'] = supply_data['Order_Date'] + \
                                       pd.to_timedelta(supply_data['Lead_Time_Days'], unit='D')

# Stock availability and delivery quantity
product_fill_rates = {'Beverages': 0.92, 'Snacks': 0.94, 'Toiletries': 0.88, 'Medicines': 0.96}
supply_data['Product_Category'] = np.repeat(np.random.choice(list(product_fill_rates.keys()), n_orders), n_lines)
supply_data['Stock_Available'] = supply_data['Product_Category'].apply(
    lambda x: np.random.binomial(1, product_fill_rates[x])
)
supply_data['Quantity_Delivered'] = supply_data.apply(
    lambda row: row['Quantity_Ordered'] if row['Stock_Available'] else
               int(row['Quantity_Ordered'] * np.random.uniform(0.6, 0.95)), axis=1
)

# Carrier delays
carrier_delays = {'Carrier_A': (0, 1), 'Carrier_B': (2, 2), 'Carrier_C': (0.5, 1)}
supply_data['Delay_Days'] = supply_data['Carrier'].apply(
    lambda x: int(max(0, np.random.normal(carrier_delays[x][0], carrier_delays[x][1])))
)
supply_data['Actual_Delivery_Date'] = supply_data['Expected_Delivery_Date'] + \
                                     pd.to_timedelta(supply_data['Delay_Days'], unit='D')
supply_data['Delivered_On_Time'] = (supply_data['Actual_Delivery_Date'] <=
                                   supply_data['Expected_Delivery_Date']).astype(int)

print("=== Supply Chain Dataset Summary ===")
#> === Supply Chain Dataset Summary ===
print(f"Total Order Lines: {len(supply_data)}")
#> Total Order Lines: 50000
print(f"Unique Orders: {supply_data['Order_ID'].nunique()}")
#> Unique Orders: 5000
print(f"Warehouses: {supply_data['Origin_Warehouse'].nunique()}")
#> Warehouses: 3
print(f"Carriers: {supply_data['Carrier'].nunique()}\n")
#> Carriers: 3

# Order-level metrics
order_metrics = supply_data.groupby('Order_ID').agg({
    'Order_Date': 'first',
    'Destination_State': 'first',
    'Carrier': 'first',
    'Delivered_On_Time': 'min',  # 1 = all lines on time
    'Quantity_Ordered': 'sum',
    'Quantity_Delivered': 'sum'
}).reset_index()

order_metrics['All_Complete'] = (order_metrics['Quantity_Delivered'] ==
                                order_metrics['Quantity_Ordered']).astype(int)
order_metrics['OTIF'] = order_metrics['Delivered_On_Time'] * order_metrics['All_Complete']
order_metrics['Fill_Rate'] = order_metrics['Quantity_Delivered'] / order_metrics['Quantity_Ordered']

# Overall metrics
otif_pct = order_metrics['OTIF'].mean() * 100
otd_pct = order_metrics['Delivered_On_Time'].mean() * 100
fill_rate_pct = order_metrics['Fill_Rate'].mean() * 100

print("=== Key Supply Chain Metrics (Overall) ===")
#> === Key Supply Chain Metrics (Overall) ===
print(f"On-Time In-Full (OTIF): {otif_pct:.1f}%")
#> On-Time In-Full (OTIF): 3.5%
print(f"On-Time Delivery (OTD): {otd_pct:.1f}%")
#> On-Time Delivery (OTD): 6.4%
print(f"Fill Rate: {fill_rate_pct:.1f}%\n")
#> Fill Rate: 98.3%

# Metrics by state
metrics_by_state = order_metrics.groupby('Destination_State').agg({
    'Order_ID': 'count',
    'OTIF': lambda x: (x.mean() * 100),
    'Delivered_On_Time': lambda x: (x.mean() * 100),
    'Fill_Rate': lambda x: (x.mean() * 100)
}).round(1)
metrics_by_state.columns = ['Orders', 'OTIF_Pct', 'OTD_Pct', 'Fill_Rate_Pct']
metrics_by_state = metrics_by_state.sort_values('OTIF_Pct', ascending=False)

print("=== OTIF by Destination State ===")
#> === OTIF by Destination State ===
print(metrics_by_state.to_string())
#>                    Orders  OTIF_Pct  OTD_Pct  Fill_Rate_Pct
#> Destination_State                                          
#> Abuja                 630       4.0      8.3           98.1
#> Lagos                 623       4.0      6.7           98.5
#> Ogun                  653       3.8      7.5           98.3
#> Abeokuta              627       3.5      6.7           98.3
#> Kano                  576       3.3      5.2           98.3
#> Ibadan                660       3.2      5.6           98.4
#> Kebbi                 616       3.1      5.2           98.2
#> Katsina               615       2.8      5.5           98.3

# Metrics by carrier
metrics_by_carrier = order_metrics.groupby('Carrier').agg({
    'Order_ID': 'count',
    'OTIF': lambda x: (x.mean() * 100),
    'Delivered_On_Time': lambda x: (x.mean() * 100)
}).round(1)
metrics_by_carrier.columns = ['Orders', 'OTIF_Pct', 'OTD_Pct']

print("\n=== OTIF by Carrier ===")
#> 
#> === OTIF by Carrier ===
print(metrics_by_carrier.to_string())
#>            Orders  OTIF_Pct  OTD_Pct
#> Carrier                             
#> Carrier_A    1663       9.0     17.2
#> Carrier_B    1716       0.0      0.0
#> Carrier_C    1621       1.5      2.0

# Visualisation
fig, ax = plt.subplots(figsize=(11, 6))
states_sorted = metrics_by_state.sort_values('OTIF_Pct', ascending=True)
ax.barh(states_sorted.index, states_sorted['OTIF_Pct'], color='#2ca02c', edgecolor='black', linewidth=0.5)
ax.axvline(95, linestyle='--', color='red', linewidth=2, label='95% Target')
ax.set_xlabel('OTIF %', fontsize=11)
ax.set_ylabel('Destination State', fontsize=11)
ax.set_title('On-Time In-Full (OTIF) by State', fontsize=12, fontweight='bold')
ax.legend()
ax.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

53.3 Exploratory Analysis: Late Delivery Patterns

Late deliveries concentrate in specific routes, carriers, and conditions. An EDA identifies these hotspots. Questions:

Which states have the highest late delivery rates? (Geographic bottlenecks)
Which carriers are most unreliable? (Carrier comparison)
What product categories are most delayed? (Handling or demand issues)
Do delays vary by time of week or season? (Operational patterns)
How much do distance and route affect on-time delivery?

A Pareto analysis often reveals 80/20 patterns: “80% of late deliveries occur on 20% of routes.” Targeting these high-impact routes yields disproportionate benefits. A heatmap of late delivery rate by origin warehouse and destination state visualises geographic problem areas.

Show code

library(tidyverse)

# Use supply_data and order_metrics from previous section

# 1. Pareto: which routes account for most late deliveries?
route_late <- supply_data |>
  filter(Delivered_On_Time == 0) |>
  group_by(Origin_Warehouse, Destination_State) |>
  summarise(late_count = n(), .groups = "drop") |>
  arrange(desc(late_count)) |>
  mutate(
    route     = paste(Origin_Warehouse, "→", Destination_State),
    cum_pct   = cumsum(late_count) / sum(late_count) * 100
  )

cat("=== Top 10 Late Delivery Routes (Pareto) ===\n")
#> === Top 10 Late Delivery Routes (Pareto) ===
print(head(route_late |> select(route, late_count, cum_pct), 10))
#> # A tibble: 10 × 3
#>    route            late_count cum_pct
#>    <chr>                 <int>   <dbl>
#>  1 Abuja → Ogun           2249    5.51
#>  2 Lagos → Ogun           2128   10.7 
#>  3 Lagos → Kebbi          2099   15.9 
#>  4 Kano → Abuja           2049   20.9 
#>  5 Kano → Ogun            2024   25.9 
#>  6 Lagos → Abuja          1910   30.5 
#>  7 Abuja → Ibadan         1894   35.2 
#>  8 Abuja → Katsina        1893   39.8 
#>  9 Lagos → Abeokuta       1883   44.4 
#> 10 Kano → Abeokuta        1882   49.1

# 2. Delay distribution by carrier
carrier_delay <- supply_data |>
  mutate(
    days_late = pmax(0, as.numeric(Actual_Delivery_Date - Expected_Delivery_Date))
  ) |>
  group_by(Carrier) |>
  summarise(
    total_shipments = n(),
    pct_late        = mean(Delivered_On_Time == 0) * 100,
    avg_days_late   = mean(days_late[days_late > 0], na.rm = TRUE),
    max_days_late   = max(days_late),
    .groups         = "drop"
  )

cat("\n=== Delay Profile by Carrier ===\n")
#> 
#> === Delay Profile by Carrier ===
print(carrier_delay)
#> # A tibble: 3 × 5
#>   Carrier   total_shipments pct_late avg_days_late max_days_late
#>   <chr>               <int>    <dbl>         <dbl>         <dbl>
#> 1 Carrier_A           16460     73.0          3.91            16
#> 2 Carrier_B           16800     90.7          5.03            19
#> 3 Carrier_C           16740     80.8          3.97            16

# 3. Late rate by day of week
dow_late <- supply_data |>
  mutate(dow = lubridate::wday(Order_Date, label = TRUE)) |>
  group_by(dow) |>
  summarise(late_rate = mean(Delivered_On_Time == 0) * 100, .groups = "drop")

cat("\n=== Late Rate by Day of Week (Order Date) ===\n")
#> 
#> === Late Rate by Day of Week (Order Date) ===
print(dow_late)
#> # A tibble: 7 × 2
#>   dow   late_rate
#>   <ord>     <dbl>
#> 1 Sun        80.4
#> 2 Mon        81.7
#> 3 Tue        81.9
#> 4 Wed        81.8
#> 5 Thu        80.8
#> 6 Fri        82.5
#> 7 Sat        81.9

# 4. Visualise: late rate by origin warehouse & carrier
heatmap_oc <- order_metrics |>
  group_by(Origin_Warehouse, Carrier) |>
  summarise(
    late_rate   = (1 - mean(OTIF_Order)) * 100,
    .groups     = "drop"
  )

ggplot(heatmap_oc, aes(x = Carrier, y = Origin_Warehouse, fill = late_rate)) +
  geom_tile(colour = "white") +
  geom_text(aes(label = paste0(round(late_rate, 1), "%")), size = 4) +
  scale_fill_gradient2(low = "#2ca02c", mid = "lightyellow", high = "#d62728",
                       midpoint = 10) +
  labs(
    title = "Late Delivery Rate by Warehouse × Carrier",
    x = "Carrier", y = "Origin Warehouse", fill = "Late %"
  ) +
  theme_minimal()

Show code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Use supply_data from previous section

# 1. Pareto: routes with most late deliveries
late_df = supply_data[supply_data['Delivered_On_Time'] == 0].copy()
route_late = (
    late_df
    .groupby(['Origin_Warehouse', 'Destination_State'])
    .size().reset_index(name='late_count')
    .sort_values('late_count', ascending=False)
)
route_late['route'] = route_late['Origin_Warehouse'] + ' → ' + route_late['Destination_State']
route_late['cum_pct'] = route_late['late_count'].cumsum() / route_late['late_count'].sum() * 100

print("=== Top 10 Late Delivery Routes (Pareto) ===")
#> === Top 10 Late Delivery Routes (Pareto) ===
print(route_late[['route', 'late_count', 'cum_pct']].head(10).to_string(index=False))
#>            route  late_count   cum_pct
#>    Lagos → Lagos         954  4.893813
#>     Kano → Abuja         909  9.556787
#>   Lagos → Ibadan         898 14.163332
#>    Lagos → Kebbi         884 18.698061
#> Lagos → Abeokuta         882 23.222530
#>     Abuja → Ogun         878 27.726480
#>      Kano → Ogun         869 32.184262
#>   Kano → Katsina         860 36.595876
#>  Lagos → Katsina         849 40.951062
#>     Kano → Kebbi         839 45.254950

# 2. Delay days by carrier
supply_data['days_late'] = (
    (supply_data['Actual_Delivery_Date'] - supply_data['Expected_Delivery_Date'])
    .dt.days.clip(lower=0)
)
carrier_delay = supply_data.groupby('Carrier').agg(
    total_shipments = ('Delivered_On_Time', 'count'),
    pct_late        = ('Delivered_On_Time', lambda x: (1 - x.mean()) * 100),
    avg_days_late   = ('days_late', lambda x: x[x > 0].mean()),
    max_days_late   = ('days_late', 'max')
).round(2)

print("\n=== Delay Profile by Carrier ===")
#> 
#> === Delay Profile by Carrier ===
print(carrier_delay.to_string())
#>            total_shipments  pct_late  avg_days_late  max_days_late
#> Carrier                                                           
#> Carrier_A            16630     15.87           1.15              3
#> Carrier_B            17160     69.69           2.54              9
#> Carrier_C            16210     30.20           1.24              4

# 3. Visualise: late rate heatmap (use supply_data which has Origin_Warehouse)
pivot = supply_data.groupby(['Carrier', 'Origin_Warehouse'])['Delivered_On_Time'].mean().unstack()
pivot = 1 - pivot  # Convert on-time rate to late rate

fig, ax = plt.subplots(figsize=(8, 5))
sns.heatmap(pivot * 100, annot=True, fmt='.1f', cmap='RdYlGn_r',
            linewidths=0.5, cbar_kws={'label': 'Late %'}, ax=ax)
ax.set_title('Late Delivery Rate: Carrier × Warehouse', fontweight='bold')
ax.set_xlabel('Origin Warehouse')
ax.set_ylabel('Carrier')
plt.tight_layout()
plt.show()

53.4 Late Delivery Prediction

Once patterns are identified, predictive models anticipate which shipments will be late before departure. This enables proactive intervention: expediting, rerouting, or customer communication. Classification models (logistic regression, random forest, XGBoost) predict binary outcome: Will this shipment be delivered late (1) or on time (0)?

Features: - Carrier: Carrier_A vs B vs C have different reliability - Route: Origin warehouse + destination state encode distance and condition - Product category: Certain categories (medicines) require special handling, risking delays - Order characteristics: Large orders may saturate local handling capacity - Temporal: Orders placed on Fridays may defer movement until Monday - Carrier load: If carrier is heavily loaded, delays cascade

Model evaluation uses classification metrics (precision, recall, F1) and ROI analysis: cost of false positives (expediting a shipment that would arrive on time) vs false negatives (failing to flag a late shipment, incurring customer penalty).

Show code

library(tidyverse)
library(pROC)

set.seed(6042)

# Build feature set at order level
model_data <- order_metrics |>
  mutate(
    carrier_B    = as.integer(Carrier == "Carrier_B"),
    carrier_C    = as.integer(Carrier == "Carrier_C"),
    dest_remote  = as.integer(Destination_State %in% c("Kano", "Katsina", "Kebbi")),
    dest_mid     = as.integer(Destination_State %in% c("Ibadan", "Abeokuta", "Ogun")),
    is_friday    = as.integer(weekdays(Order_Date) == "Friday"),
    y            = 1L - All_On_Time
  )

# Train/test split
train_idx <- sample(1:nrow(model_data), 0.75 * nrow(model_data))
train     <- model_data[train_idx, ]
test      <- model_data[-train_idx, ]

# Logistic regression
late_model <- glm(
  y ~ carrier_B + carrier_C + dest_remote + dest_mid + is_friday,
  data = train, family = binomial
)

cat("=== Late Delivery Prediction Model ===\n")
#> === Late Delivery Prediction Model ===
coef_tbl <- summary(late_model)$coefficients |>
  as.data.frame() |>
  mutate(
    OddsRatio  = round(exp(Estimate), 3),
    Estimate   = round(Estimate, 3),
    `Pr(>|z|)` = round(`Pr(>|z|)`, 4)
  )
print(coef_tbl[, c("Estimate", "OddsRatio", "Pr(>|z|)")])
#>             Estimate    OddsRatio Pr(>|z|)
#> (Intercept)    4.917 1.366250e+02   0.0000
#> carrier_B     18.533 1.118325e+08   0.9933
#> carrier_C     18.538 1.124173e+08   0.9934
#> dest_remote    0.455 1.576000e+00   0.6504
#> dest_mid      -0.118 8.890000e-01   0.8924
#> is_friday      0.121 1.129000e+00   0.9099

# Model performance
test <- test |>
  mutate(pred_late_prob = predict(late_model, newdata = test, type = "response"))

roc_obj   <- roc(test$y, test$pred_late_prob, quiet = TRUE)
cat("\nAUC:", round(auc(roc_obj), 3), "\n")
#> 
#> AUC: 0.843

# Business value: how many late shipments would we flag if we expedite
# those with pred_late_prob > 0.40?
threshold <- 0.40
test <- test |>
  mutate(
    flagged      = pred_late_prob > threshold,
    true_late    = y == 1
  )

tp <- sum(test$flagged &  test$true_late)
fp <- sum(test$flagged & !test$true_late)
fn <- sum(!test$flagged & test$true_late)

cat("\n=== Business Value at Threshold", threshold, "===\n")
#> 
#> === Business Value at Threshold 0.4 ===
cat("Late shipments caught early (TP):", tp, "\n")
#> Late shipments caught early (TP): 1248
cat("On-time shipments unnecessarily flagged (FP, expediting cost):", fp, "\n")
#> On-time shipments unnecessarily flagged (FP, expediting cost): 2
cat("Late shipments missed (FN):", fn, "\n")
#> Late shipments missed (FN): 0
cat("Recall (late shipments captured):", round(tp / (tp + fn), 3), "\n")
#> Recall (late shipments captured): 1
cat("Precision (flagged shipments that were genuinely late):", round(tp / (tp + fp), 3), "\n")
#> Precision (flagged shipments that were genuinely late): 0.998

Show code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, precision_score, recall_score
from sklearn.model_selection import train_test_split

np.random.seed(6042)

# Feature engineering
md = order_metrics.copy()
md['carrier_B']   = (md['Carrier'] == 'Carrier_B').astype(int)
md['carrier_C']   = (md['Carrier'] == 'Carrier_C').astype(int)
md['dest_remote'] = md['Destination_State'].isin(['Kano', 'Katsina', 'Kebbi']).astype(int)
md['dest_mid']    = md['Destination_State'].isin(['Ibadan', 'Abeokuta', 'Ogun']).astype(int)
md['is_friday']   = (md['Order_Date'].dt.dayofweek == 4).astype(int)
md['y']           = 1 - md['Delivered_On_Time']

features = ['carrier_B', 'carrier_C', 'dest_remote', 'dest_mid', 'is_friday']
X = md[features]
y = md['y']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=6042)

clf = LogisticRegression(max_iter=500)
clf.fit(X_train, y_train)

LogisticRegression(max_iter=500)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Show code


pred_prob = clf.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, pred_prob)

print("=== Late Delivery Prediction Model ===")
#> === Late Delivery Prediction Model ===
coef_df = pd.DataFrame({
    'Feature':    features,
    'Coef':       clf.coef_[0].round(3),
    'OddsRatio':  np.exp(clf.coef_[0]).round(3)
})
print(coef_df.to_string(index=False))
#>     Feature  Coef  OddsRatio
#>   carrier_B 4.147     63.257
#>   carrier_C 2.274      9.719
#> dest_remote 0.412      1.510
#>    dest_mid 0.200      1.221
#>   is_friday 0.074      1.077
print(f"\nAUC: {auc:.3f}")
#> 
#> AUC: 0.834

# Business value at threshold 0.40
threshold = 0.40
pred_class = (pred_prob >= threshold).astype(int)
precision  = precision_score(y_test, pred_class, zero_division=0)
recall     = recall_score(y_test, pred_class, zero_division=0)

tp = int(((pred_class == 1) & (y_test == 1)).sum())
fp = int(((pred_class == 1) & (y_test == 0)).sum())
fn = int(((pred_class == 0) & (y_test == 1)).sum())

print(f"\n=== Business Value at Threshold {threshold} ===")
#> 
#> === Business Value at Threshold 0.4 ===
print(f"Late shipments caught early (TP): {tp}")
#> Late shipments caught early (TP): 1165
print(f"On-time shipments flagged (FP):   {fp}")
#> On-time shipments flagged (FP):   85
print(f"Late shipments missed (FN):       {fn}")
#> Late shipments missed (FN):       0
print(f"Recall (late captured):    {recall:.3f}")
#> Recall (late captured):    1.000
print(f"Precision (flags correct): {precision:.3f}")
#> Precision (flags correct): 0.932

# Plot predicted probability distribution
fig, ax = plt.subplots(figsize=(10, 5))
ax.hist(pred_prob[y_test == 0], bins=20, alpha=0.6, label='On-Time',
        color='steelblue', edgecolor='white')
ax.hist(pred_prob[y_test == 1], bins=20, alpha=0.6, label='Late',
        color='#d62728', edgecolor='white')
ax.axvline(threshold, color='black', linestyle='--', linewidth=1.5,
           label=f'Decision threshold = {threshold}')
ax.set_xlabel('Predicted Late Probability', fontsize=11)
ax.set_ylabel('Count', fontsize=11)
ax.set_title('Late Delivery Model: Score Distribution', fontweight='bold')
ax.legend()
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

53.5 Supply Chain Risk Analytics

Resilience depends on supplier diversity and network structure. Concentration risk measures whether too much supply comes from one vendor. The Herfindahl-Hirschman Index (HHI) quantifies concentration:

\[\text{HHI} = \sum_i (\text{Market Share}_i)^2\]

HHI = 10,000 = complete monopoly (one supplier 100%). HHI = 2,500 (four equal suppliers, each 25%). HHI > 2,500 signals high concentration. In Nigerian pharma, if 60% of supply comes from a single importer/wholesaler, and that importer faces port delays, the entire supply chain stalls.

Network resilience also depends on single points of failure (nodes whose failure disconnects the network). A warehouse is a single point of failure if it’s the only path from manufacturer to a region. Betweenness centrality (from network analysis, Ch. 30) quantifies this: nodes with high betweenness are critical chokepoints.

53.6 Route Optimisation Basics

The Vehicle Routing Problem (VRP) is: given a set of locations and vehicles, find routes minimizing travel distance/time while satisfying constraints (vehicle capacity, delivery time windows). VRP is NP-hard (no polynomial-time algorithm), so heuristics are standard.

Nearest Neighbour Heuristic: Start at depot, visit nearest unvisited location, repeat until all visited, return to depot. Fast (O(n²)) but suboptimal.

Clarke-Wright Savings Algorithm: Iteratively merge routes that save the most distance. More sophisticated, often yields 85–95% of optimal.

For a last-mile example: 30 retail points in Lagos need daily delivery from a central depot. Nearest neighbour builds routes by greedily visiting nearest stops. Clarke-Wright savings (e.g., merging route (depot-A-depot) and (depot-B-depot) into (depot-A-B-depot)) reduces total distance.

53.7 Case Study: Supply Chain Analytics for a West African FMCG Distributor

Background: A fictional fast-moving consumer goods distributor (beverages, snacks, toiletries) operates three warehouses (Lagos, Abuja, Kano) and services 8 states through 3 carriers. OTIF currently runs at 88%, below the industry benchmark of 95%. Stockouts cost the company approximately ₦180 million per year. Management wants to: (a) identify the primary causes of missed OTIF, (b) predict which future shipments will be late, and (c) design an improvement program.

Analytical workflow (using the 50,000 order-line dataset generated earlier):

Show code

library(tidyverse)

# Step 1: Root-cause ranking
cat("=== Step 1: Root-Cause Analysis ===\n")
#> === Step 1: Root-Cause Analysis ===

cause_decomp <- order_metrics |>
  summarise(
    pct_late_delivery   = (1 - mean(All_On_Time)) * 100,
    pct_incomplete_fill = (1 - mean(All_Complete)) * 100,
    pct_otif_failure    = (1 - mean(OTIF_Order)) * 100
  )

cat("Failure decomposition:\n")
#> Failure decomposition:
cat(" - Orders late to destination:          ", round(cause_decomp$pct_late_delivery, 1),   "%\n")
#>  - Orders late to destination:           99.8 %
cat(" - Orders with incomplete fill:         ", round(cause_decomp$pct_incomplete_fill, 1), "%\n")
#>  - Orders with incomplete fill:          53.5 %
cat(" - OTIF failures (either late or short):", round(cause_decomp$pct_otif_failure, 1),    "%\n")
#>  - OTIF failures (either late or short): 99.9 %

# Step 2: Carrier scorecard
cat("\n=== Step 2: Carrier Performance Scorecard ===\n")
#> 
#> === Step 2: Carrier Performance Scorecard ===
carrier_scorecard <- order_metrics |>
  group_by(Carrier) |>
  summarise(
    orders      = n(),
    otif_pct    = mean(OTIF_Order) * 100,
    otd_pct     = mean(All_On_Time) * 100,
    fill_pct    = mean(All_Complete) * 100,
    .groups     = "drop"
  ) |>
  arrange(otif_pct)

print(carrier_scorecard |>
      mutate(across(c(otif_pct, otd_pct, fill_pct), ~ round(., 1))))
#> # A tibble: 3 × 5
#>   Carrier   orders otif_pct otd_pct fill_pct
#>   <chr>      <int>    <dbl>   <dbl>    <dbl>
#> 1 Carrier_B   1680      0       0       46.3
#> 2 Carrier_C   1674      0       0       45  
#> 3 Carrier_A   1646      0.3     0.6     48.2

# Step 3: State risk tier
cat("\n=== Step 3: State Risk Tiers ===\n")
#> 
#> === Step 3: State Risk Tiers ===
state_tier <- order_metrics |>
  group_by(Destination_State) |>
  summarise(otif = mean(OTIF_Order) * 100, .groups = "drop") |>
  mutate(
    risk_tier = case_when(
      otif >= 95 ~ "Green (On Target)",
      otif >= 88 ~ "Amber (At Risk)",
      TRUE       ~ "Red (Critical)"
    )
  )

print(state_tier)
#> # A tibble: 8 × 3
#>   Destination_State  otif risk_tier     
#>   <chr>             <dbl> <chr>         
#> 1 Abeokuta          0.478 Red (Critical)
#> 2 Abuja             0.310 Red (Critical)
#> 3 Ibadan            0     Red (Critical)
#> 4 Kano              0     Red (Critical)
#> 5 Katsina           0     Red (Critical)
#> 6 Kebbi             0     Red (Critical)
#> 7 Lagos             0     Red (Critical)
#> 8 Ogun              0     Red (Critical)

# Step 4: Improvement scenario
cat("\n=== Step 4: Improvement Scenario ===\n")
#> 
#> === Step 4: Improvement Scenario ===
cat("Current OTIF:", round(mean(order_metrics$OTIF_Order) * 100, 1), "%\n")
#> Current OTIF: 0.1 %
cat("If Carrier_B delays improved to Carrier_A standard:\n")
#> If Carrier_B delays improved to Carrier_A standard:
# Carrier_B's contribution to total failures
carrier_b_failures <- order_metrics |>
  filter(Carrier == "Carrier_B", OTIF_Order == 0) |> nrow()
total_orders <- nrow(order_metrics)
improvement  <- carrier_b_failures * 0.50 / total_orders * 100

cat("  → Estimated OTIF improvement:", round(improvement, 1), "percentage points\n")
#>   → Estimated OTIF improvement: 16.8 percentage points
cat("  → New OTIF target:", round(mean(order_metrics$OTIF_Order) * 100 + improvement, 1), "%\n")
#>   → New OTIF target: 16.9 %

cat("\nStrategic recommendations:\n")
#> 
#> Strategic recommendations:
cat("1. Review Carrier_B contract: SLA enforcement for delay penalties\n")
#> 1. Review Carrier_B contract: SLA enforcement for delay penalties
cat("2. Route high-value orders through Carrier_A for remote states\n")
#> 2. Route high-value orders through Carrier_A for remote states
cat("3. Increase safety stock for Medicines (lowest fill rate product category)\n")
#> 3. Increase safety stock for Medicines (lowest fill rate product category)
cat("4. Deploy late-delivery prediction model to trigger pre-emptive re-routing\n")
#> 4. Deploy late-delivery prediction model to trigger pre-emptive re-routing

# Visualise: OTIF over time (simulated monthly trend)
monthly_otif <- supply_data |>
  mutate(month = lubridate::floor_date(Order_Date, "month")) |>
  group_by(month) |>
  summarise(otif_rate = mean(Delivered_On_Time) * 100, .groups = "drop")

ggplot(monthly_otif, aes(x = month, y = otif_rate)) +
  geom_line(colour = "steelblue", linewidth = 1.2) +
  geom_point(colour = "steelblue", size = 2.5) +
  geom_hline(yintercept = 95, linetype = "dashed", colour = "red",
             linewidth = 0.8) +
  labs(
    title    = "Monthly OTIF Trend",
    x        = NULL, y = "OTIF (%)",
    subtitle = "Red dashed line = 95% industry benchmark"
  ) +
  ylim(50, 100) +
  theme_minimal()

Show code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Root-cause analysis
print("=== Step 1: Root-Cause Analysis ===")
#> === Step 1: Root-Cause Analysis ===
pct_late = (1 - order_metrics['Delivered_On_Time'].mean()) * 100
pct_inc  = (1 - order_metrics['All_Complete'].mean()) * 100
pct_otif = (1 - order_metrics['OTIF'].mean()) * 100
print(f" Late delivery failures:         {pct_late:.1f}%")
#>  Late delivery failures:         93.6%
print(f" Incomplete fill failures:       {pct_inc:.1f}%")
#>  Incomplete fill failures:       50.6%
print(f" OTIF failures (any):            {pct_otif:.1f}%")
#>  OTIF failures (any):            96.5%

# Carrier scorecard
print("\n=== Step 2: Carrier Scorecard ===")
#> 
#> === Step 2: Carrier Scorecard ===
scorecard = order_metrics.groupby('Carrier').agg(
    orders      = ('OTIF', 'count'),
    otif_pct    = ('OTIF', lambda x: x.mean() * 100),
    otd_pct     = ('Delivered_On_Time', lambda x: x.mean() * 100),
    fill_pct    = ('All_Complete', lambda x: x.mean() * 100)
).round(1).sort_values('otif_pct')
print(scorecard.to_string())
#>            orders  otif_pct  otd_pct  fill_pct
#> Carrier                                       
#> Carrier_B    1716       0.0      0.0      48.4
#> Carrier_C    1621       1.5      2.0      50.3
#> Carrier_A    1663       9.0     17.2      49.6

# State risk tiers
print("\n=== Step 3: State Risk Tiers ===")
#> 
#> === Step 3: State Risk Tiers ===
state_tier = order_metrics.groupby('Destination_State').agg(
    otif = ('OTIF', lambda x: x.mean() * 100)
).round(1)
state_tier['risk_tier'] = np.where(
    state_tier['otif'] >= 95, 'Green (On Target)',
    np.where(state_tier['otif'] >= 88, 'Amber (At Risk)', 'Red (Critical)')
)
print(state_tier.sort_values('otif').to_string())
#>                    otif       risk_tier
#> Destination_State                      
#> Katsina             2.8  Red (Critical)
#> Kebbi               3.1  Red (Critical)
#> Ibadan              3.2  Red (Critical)
#> Kano                3.3  Red (Critical)
#> Abeokuta            3.5  Red (Critical)
#> Ogun                3.8  Red (Critical)
#> Abuja               4.0  Red (Critical)
#> Lagos               4.0  Red (Critical)

# Monthly OTIF trend
supply_data['month'] = supply_data['Order_Date'].dt.to_period('M')
monthly = supply_data.groupby('month')['Delivered_On_Time'].mean() * 100

fig, ax = plt.subplots(figsize=(11, 5))
ax.plot(monthly.index.astype(str), monthly.values, marker='o',
        linewidth=1.5, markersize=5, color='steelblue')
ax.axhline(95, linestyle='--', color='red', linewidth=1, label='95% benchmark')
ax.set_xlabel('Month', fontsize=11)
ax.set_ylabel('OTIF (%)', fontsize=11)
ax.set_title('Monthly OTIF Trend', fontweight='bold')
ax.legend()
ax.tick_params(axis='x', rotation=45)
ax.set_ylim(50, 100)
#> (50.0, 100.0)
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

Chapter 48 Exercises

OTIF Decomposition: Using the supply chain dataset, calculate the joint probability of on-time AND in-full delivery. Does \(P(\text{OTIF}) = P(\text{On-Time}) \times P(\text{In-Full})\)? What does a deviation tell you?
Carrier Comparison: Perform a chi-square test comparing on-time delivery rates across the three carriers. Report the test statistic and p-value. Which pairwise differences are statistically significant?
Predictive Model Extension: Add a “high value order” indicator (total order value above the 75th percentile) to the late-delivery prediction model. Does it improve AUC? Interpret the odds ratio.
Safety Stock Calculation: For a product with weekly demand \(\mu = 500\) units and standard deviation \(\sigma = 80\) units, and a lead time of 7 days, calculate the safety stock needed for a 95% service level. (Hint: safety stock = \(z \times \sigma \times \sqrt{L}\), where \(z = 1.645\) for 95%.)
Route Optimisation: Simulate the nearest-neighbour VRP heuristic for 15 retail delivery points in a city. Generate random (x, y) coordinates and compute the total tour distance. Compare to the optimal tour found by brute force (feasible for ≤ 10 points).
Network Resilience: Build a supply network graph with 5 warehouses and 12 routes. Compute betweenness centrality for each warehouse. Remove the highest-centrality node and measure how many routes become disconnected. What does this imply for supply chain design?

53.8 Further Reading

Bowersox, Closs & Cooper (2013). “Supply Chain Logistics Management”
OpenRouteService: Free VRP solver at https://openrouteservice.org/
PySpark: Distributed logistics modelling for large networks

53.9 Chapter 48 Appendix: Mathematical Derivations

53.9.1 A48.1 Bullwhip Effect Mathematical Model

Let \(D_t\) = demand in period \(t\), \(O_t\) = orders placed. Assume linear demand forecast: \(F_t = D_t + \alpha(D_t - D_{t-1})\), where \(\alpha\) is forecast adjustment coefficient. Orders: \(O_t = F_t + S_t\) (forecast + safety stock adjustment).

Substituting: \[O_t = D_t + \alpha(D_t - D_{t-1}) + S_t\]

If \(D_t\) follows an AR(1) process \(D_t = \rho D_{t-1} + \epsilon_t\), then variance of \(O_t\) is:

\[\text{Var}(O_t) = \left[1 + \alpha(\alpha + 2)\right] \text{Var}(D_t) + (1 + \alpha)^2 \text{Var}(S_t)\]

The multiplier \([1 + \alpha(\alpha + 2)]\) > 1 (for \(\alpha > 0\)) shows order variance amplification.

53.9.2 A48.2 OTIF Decomposition

OTIF combines timeliness and completeness:

\[\text{OTIF} = P(\text{On Time}) \times P(\text{In Full | On Time})\]

If on-time and in-full are independent:

\[\text{OTIF} \approx P(\text{On Time}) \times P(\text{In Full})\]

Typically, \(P(\text{On Time}) = 0.96\) and \(P(\text{In Full}) = 0.94\), yielding OTIF ≈ 0.90 (90%).

53.9.3 A48.3 HHI for Supplier Concentration

\[\text{HHI} = \sum_{i=1}^{n} S_i^2\]

where \(S_i\) is supplier \(i\)’s share of total supply volume. Maximum = 10,000 (monopoly), minimum = 10,000/n (n equal suppliers).

53.9.4 A48.4 Clarke-Wright Savings Algorithm

For each pair of nodes \(i\) and \(j\), the savings from merging two routes is:

\[s_{ij} = d_{0i} + d_{0j} - d_{ij}\]

where \(d_{0i}\) is distance from depot to node \(i\). Sort savings in descending order and greedily merge routes, respecting vehicle capacity.

53.9.5 A48.5 Betweenness Centrality in Supply Networks

\[C_B(v) = \sum_{s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}}\]

where \(\sigma_{st}(v)\) is the number of shortest paths from \(s\) to \(t\) that pass through \(v\), and \(\sigma_{st}\) is the total number of shortest paths. High \(C_B(v)\) indicates \(v\) is a critical intermediary.

End of Chapter 48

	penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' Specify the norm of the penalty: - `None`: no penalty is added; - `'l2'`: add a L2 penalty term and it is the default choice; - `'l1'`: add a L1 penalty term; - `'elasticnet'`: both L1 and L2 penalty terms are added. .. warning:: Some penalties may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1) .. deprecated:: 1.8 `penalty` was deprecated in version 1.8 and will be removed in 1.10. Use `l1_ratio` instead. `l1_ratio=0` for `penalty='l2'`, `l1_ratio=1` for `penalty='l1'` and `l1_ratio` set to any float between 0 and 1 for `'penalty='elasticnet'`.	'deprecated'
	C C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. `C=np.inf` results in unpenalized logistic regression. For a visual example on the effect of tuning the `C` parameter with an L1 penalty, see: :ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.	1.0
	l1_ratio l1_ratio: float, default=0.0 The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting `l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty. Any value between 0 and 1 gives an Elastic-Net penalty of the form `l1_ratio * L1 + (1 - l1_ratio) * L2`. .. warning:: Certain values of `l1_ratio`, i.e. some penalties, may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionchanged:: 1.8 Default value changed from None to 0.0. .. deprecated:: 1.8 `None` is deprecated and will be removed in version 1.10. Always use `l1_ratio` to specify the penalty type.	0.0
	dual dual: bool, default=False Dual (constrained) or primal (regularized, see also :ref:`this equation `) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer `dual=False` when n_samples > n_features.	False
	tol tol: float, default=1e-4 Tolerance for stopping criteria.	0.0001
	fit_intercept fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.	True
	intercept_scaling intercept_scaling: float, default=1 Useful only when the solver `liblinear` is used and `self.fit_intercept` is set to `True`. In this case, `x` becomes `[x, self.intercept_scaling]`, i.e. a "synthetic" feature with constant value equal to `intercept_scaling` is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``. .. note:: The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) `intercept_scaling` has to be increased.	1
	class_weight class_weight: dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. .. versionadded:: 0.17 class_weight='balanced'	None
	random_state random_state: int, RandomState instance, default=None Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See :term:`Glossary ` for details.	None
	solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs' Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects: - 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems. - For :term:`multiclass` problems (`n_classes >= 3`), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error. - 'newton-cholesky' is a good choice for `n_samples` >> `n_features * n_classes`, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency on `n_features * n_classes` because it explicitly computes the full Hessian matrix. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones; - 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the :class:`~sklearn.multiclass.OneVsRestClassifier`. .. warning:: The choice of the algorithm depends on the penalty chosen (`l1_ratio=0` for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for Elastic-Net) and on (multinomial) multiclass support: ================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ====================== .. note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:`sklearn.preprocessing`. .. seealso:: Refer to the :ref:`User Guide ` for more information regarding :class:`LogisticRegression` and more specifically the :ref:`Table ` summarizing solver/penalty supports. .. versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. .. versionadded:: 0.19 SAGA solver. .. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. .. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.	'lbfgs'
	max_iter max_iter: int, default=100 Maximum number of iterations taken for the solvers to converge.	500
	verbose verbose: int, default=0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.	0
	warm_start warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See :term:`the Glossary `. .. versionadded:: 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.	False
	n_jobs n_jobs: int, default=None Does not have any effect. .. deprecated:: 1.8 `n_jobs` is deprecated in version 1.8 and will be removed in 1.10.	None