36 Location-Based Analytics

📋 Learning Objectives

By the end of this chapter, you will understand why geographic variation matters for business strategy and how to quantify it using geospatial data. You will load and manipulate vector data (points, lines, polygons) using R’s sf package and Python’s geopandas, and operate on coordinate reference systems to ensure spatial consistency. You will perform spatial operations: buffering, spatial joins, overlays, and distance matrix computation. You will visualise geospatial data in both static (choropleth maps, heat maps) and interactive formats (Leaflet, Folium). You will apply location analytics to real business challenges: trade area analysis using Voronoi diagrams and Reilly’s Law, site selection scoring, and optimal facility location. You will master the mathematical foundations: Voronoi tessellation, Reilly’s Law of Retail Gravitation, the Huff model of spatial choice, and great-circle distance. By project completion, you will have conducted site selection analysis for a Nigerian microfinance bank and built interactive facility planning dashboards.

36.1 Why Location Matters for Business Decisions

Geography is destiny in retail, banking, and logistics. A fast-food franchise thrives on high-traffic intersections; relocate 500 metres and revenue plummets. A bank’s branch profitability depends on deposit catchment: branch density, competitor presence, and local income. A manufacturer’s supply chain cost is driven by distance to suppliers and customers. Yet traditional analytics often ignore space, treating all customers identically regardless of proximity to competition.

Nigeria exemplifies geographic complexity. Lagos and Abuja are densely populated urban cores; rural areas are sparse. Road networks are poor in the North; coastal South has better infrastructure. Purchasing power varies by 10x across regions (Northern rural vs. Lagos island). Consumer behaviour differs: North prefers informal banking and cash; South adopts mobile and digital. A bank expanding nationally must adapt branch density, staffing, and product mix by region—a one-size-fits-all approach fails.

Geospatial data enables this. We have access to: (1) Customer locations (GPS coordinates from mobile, branch visits); (2) Competitor locations (bank branch registry, online business listings); (3) Infrastructure (roads, power, water pipes); (4) Socioeconomic attributes (NBS census data by ward, CBN banking penetration by state); (5) Physical constraints (flood zones, protected areas, land use zoning). With these layers, we can answer: “Where should we open the next branch?” “What is our addressable market within 5 km?” “Which competitor is our nearest threat?”

36.2 Geospatial Data Types and Structures

Before we can analyse geographic data, we need to understand how location information is stored and represented in a computer. Think of a map. It shows three kinds of things: dots (individual locations — a petrol station, a hospital, an ATM); lines (routes and connections — roads, rivers, pipelines); and shaded areas (regions — a state boundary, a flood zone, a market catchment area). These correspond exactly to the three fundamental data types in geospatial analysis.

Points represent a single location — a specific spot on the Earth’s surface. An ATM machine is a point. A customer’s home address is a point. A farm is a point (if we represent it by its centre). Points are stored as pairs of numbers: longitude (how far east or west of the Greenwich meridian) and latitude (how far north or south of the equator). Lagos, for example, is approximately longitude 3.4°E, latitude 6.5°N.

Lines represent connections between places — roads, railways, rivers, power transmission lines. A line is stored as a sequence of points, with straight segments drawn between each pair. The Apapa–Oshodi Expressway would be stored as a series of coordinate pairs tracing the road’s route.

Polygons represent areas — regions with boundaries. A state boundary is a polygon. So is a market trade area, a flood-prone zone, or the catchment area of a bank branch. A polygon is stored as a sequence of points forming a closed loop — where the last point connects back to the first.

Rasters are a fourth type: grids of values, like a photograph of the Earth. Satellite imagery, elevation maps, and night-time light intensity maps are all rasters. They are less common in business analytics but very useful for agricultural and environmental applications.

Data is typically stored in GeoJSON (a human-readable text format, easy to share between systems) or Shapefile (an older but widely used format from mapping software companies). Any serious geospatial tool — including R’s sf package and Python’s geopandas — can read both formats.

One final concept before we start coding: every geospatial dataset needs a Coordinate Reference System (CRS) — a specification of how the coordinates in the data map to actual locations on the Earth. The most common is WGS84 (the GPS standard), which uses degrees of longitude and latitude. For distance calculations, we typically convert to a “projected” coordinate system that represents distances in metres rather than degrees. Nigeria is mostly covered by UTM Zone 32N, which uses metric coordinates and is accurate for Nigeria’s geography.

📘 Theory: Coordinate Reference Systems

A CRS is defined by a datum (Earth model), projection (2D transformation), and units. WGS84 (datum) + no projection = geographic coordinates (degrees, not metric). WGS84 + Web Mercator projection = web maps (Google Maps, OpenStreetMap). UTM Zone 31N (Nigeria is split: 31N for East, 31S for South) + WGS84 datum = metric coordinates (metres). Distances computed in geographic CRS are arc lengths; in projected CRS they are planar Euclidean. For accurate distance calculations, always project to a local metric CRS.

Show code

library(sf)
library(rnaturalearth)
library(dplyr)

# Load Nigeria state and LGA boundaries
# Using Natural Earth data; in practice, fetch from Nigerian geospatial authority

# Load Africa polygons
africa <- ne_countries(continent = "Africa", scale = "medium", returnclass = "sf")
nigeria <- africa |> filter(name == "Nigeria")

# Get Nigeria LGA boundaries (proxy: using admin1, which are states)
# For true LGA boundaries, fetch from: https://www.naturalearthdata.com/
nigeria_admin1 <- ne_states(country = "Nigeria", returnclass = "sf")

cat("Nigeria Geospatial Data Loaded\n")
#> Nigeria Geospatial Data Loaded
cat("==============================\n")
#> ==============================

# Inspect the data
cat("Nigeria state shapefile:\n")
#> Nigeria state shapefile:
print(st_geometry_type(nigeria_admin1))
#>  [1] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
#>  [6] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
#> [11] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
#> [16] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
#> [21] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
#> [26] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
#> [31] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
#> [36] MULTIPOLYGON MULTIPOLYGON
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
cat("Number of states:", nrow(nigeria_admin1), "\n")
#> Number of states: 37
cat("CRS:", as.character(st_crs(nigeria_admin1)$input), "\n")
#> CRS: WGS 84

# Bounding box
bbox_nigeria <- st_bbox(nigeria)
cat("\nNigeria bounding box:\n")
#> 
#> Nigeria bounding box:
print(bbox_nigeria)
#>      xmin      ymin      xmax      ymax 
#>  2.686035  4.277393 14.627148 13.872852

# Project to UTM Zone 31N for metric operations
# Note: Nigeria spans multiple UTM zones; Zone 31N covers most
nigeria_utm <- st_transform(nigeria_admin1, crs = "EPSG:32631")

cat("\nAfter projection to UTM 31N:\n")
#> 
#> After projection to UTM 31N:
cat("CRS:", as.character(st_crs(nigeria_utm)$input), "\n")
#> CRS: EPSG:32631

# Calculate area of each state
nigeria_utm$area_km2 <- as.numeric(st_area(nigeria_utm)) / 1e6

cat("\nArea of top 5 states (km²):\n")
#> 
#> Area of top 5 states (km²):
print(arrange(nigeria_utm, desc(area_km2)) |> select(name, area_km2) |> head(5))
#> Simple feature collection with 5 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 562680.3 ymin: 717991.2 xmax: 1777713 ymax: 1540497
#> Projected CRS: WGS 84 / UTM zone 31N
#>     name area_km2                       geometry
#> 1  Borno 73199.24 MULTIPOLYGON (((1652838 154...
#> 2  Niger 70603.13 MULTIPOLYGON (((582851 1150...
#> 3 Taraba 61540.29 MULTIPOLYGON (((1486294 833...
#> 4 Bauchi 49716.20 MULTIPOLYGON (((1388174 107...
#> 5   Yobe 46884.35 MULTIPOLYGON (((1330322 149...

Show code

import geopandas as gpd
import pandas as pd
from shapely.geometry import Point, box, Polygon
import numpy as np

# gpd.datasets was removed in GeoPandas 1.0.
# We build Nigeria's outline from its known bounding box and use
# synthetic state polygons for the admin-1 demonstration.

# Nigeria approximate bounding box (lon: 2.7–14.7, lat: 4.3–13.9)
nigeria_outline = Polygon([
    (2.7, 4.3), (14.7, 4.3), (14.7, 13.9), (2.7, 13.9), (2.7, 4.3)
])
nigeria = gpd.GeoDataFrame({'name': ['Nigeria'], 'geometry': [nigeria_outline]},
                            crs='EPSG:4326')

print("Nigeria Geospatial Data Loaded (synthetic bounding box)")
#> Nigeria Geospatial Data Loaded (synthetic bounding box)
print("=" * 50)
#> ==================================================
print(f"CRS: {nigeria.crs}")
#> CRS: EPSG:4326
print(f"Geometry type: {nigeria.geometry.geom_type.values}")
#> Geometry type: ['Polygon']
print(f"Bounds: {nigeria.total_bounds}")
#> Bounds: [ 2.7  4.3 14.7 13.9]

# Admin-1 state polygons for key Nigerian states
states_data = {
    'state': ['Lagos', 'Ogun', 'Oyo', 'FCT Abuja', 'Plateau', 'Kaduna',
              'Kano', 'Borno', 'Enugu', 'Rivers'],
    'geometry': [
        box(3.0,  6.0,  3.9,  6.7),   # Lagos
        box(3.0,  6.7,  3.8,  7.5),   # Ogun
        box(2.8,  7.5,  4.5,  8.5),   # Oyo
        box(6.8,  8.8,  7.5,  9.5),   # FCT Abuja
        box(8.4,  8.5,  9.7,  9.9),   # Plateau
        box(7.0, 10.0,  8.5, 11.2),   # Kaduna
        box(8.0, 11.0, 10.0, 12.5),   # Kano
        box(11.5, 10.0, 15.0, 13.5),  # Borno
        box(6.9,  5.8,  7.8,  7.0),   # Enugu
        box(6.5,  4.3,  7.8,  5.7),   # Rivers
    ]
}
nigeria_states = gpd.GeoDataFrame(states_data, crs='EPSG:4326')

print("\nNigeria State Polygons (10-state synthetic example):")
#> 
#> Nigeria State Polygons (10-state synthetic example):
print(nigeria_states[['state']].to_string(index=False))
#>     state
#>     Lagos
#>      Ogun
#>       Oyo
#> FCT Abuja
#>   Plateau
#>    Kaduna
#>      Kano
#>     Borno
#>     Enugu
#>    Rivers

# Project to UTM 31N for metric operations
nigeria_utm = nigeria_states.to_crs('EPSG:32631')
print(f"\nAfter projection to UTM 31N:")
#> 
#> After projection to UTM 31N:
print(f"CRS: {nigeria_utm.crs}")
#> CRS: EPSG:32631

nigeria_utm['area_km2'] = nigeria_utm.geometry.area / 1e6
print("\nArea by state (km²):")
#> 
#> Area by state (km²):
print(nigeria_utm[['state', 'area_km2']].sort_values('area_km2', ascending=False).to_string(index=False))
#>     state      area_km2
#>     Borno 152257.532204
#>      Kano  36533.431233
#>    Rivers  22419.231008
#>   Plateau  22347.872779
#>    Kaduna  21918.002821
#>       Oyo  20715.368295
#>     Enugu  13278.336889
#>      Ogun   7813.181186
#>     Lagos   7702.877437
#> FCT Abuja   5982.722014

print(f"\nNigeria bounding box (lat/lon): {nigeria.total_bounds}")
#> 
#> Nigeria bounding box (lat/lon): [ 2.7  4.3 14.7 13.9]

36.3 Spatial Operations

With geospatial data loaded, we perform three fundamental operations:

1. Buffer: Create a ring of radius r around a point or polygon. A bank branch’s 5 km buffer is its catchment area (rough approximation). All customers within the buffer are reachable.

2. Spatial Join: Assign attributes from one layer to another based on proximity. “Which LGA is each customer in?” is a point-in-polygon spatial join. “How many competitors are within 1 km of this location?” is a distance-based join.

3. Overlay: Combine two polygon layers to find intersections. “What fraction of a branch’s catchment overlaps with a competitor’s?” is a polygon-polygon overlay.

We also compute distance matrices: all-pairs distances between locations. With M branches and N candidate sites, the distance matrix is M × N. This is used in site selection optimization: assign sites to branches to minimize travel distance.

🔑 Key Formula

The haversine formula computes great-circle distance between two points on Earth:

\[d = 2R \arcsin\left( \sqrt{\sin^2\left(\frac{\Delta\phi}{2}\right) + \cos(\phi_1)\cos(\phi_2)\sin^2\left(\frac{\Delta\lambda}{2}\right)} \right)\]

where R = 6371 km (Earth’s radius), φ are latitudes, λ are longitudes, both in radians. For short distances (<100 km), Euclidean distance on a projected CRS is faster and nearly as accurate.

Show code

library(sf)
library(tidyverse)

# Create synthetic dataset: 20 bank branches in Lagos
set.seed(5736)

branches_df <- data.frame(
  branch_id = paste0("B", 1:20),
  branch_name = c("VI Branch", "Lekki Branch", "Ikoyi Branch", "Yaba Branch", "Shomolu Branch",
                  "Surulere Branch", "Isolo Branch", "Ikorodu Branch", "Epe Branch", "Ijebu-Ode",
                  "Badagry Branch", "Abeokuta Branch", "Ota Branch", "Mushin Branch", "Bariga Branch",
                  "Ajah Branch", "Ajaji Branch", "Oshodi Branch", "Lagos Island", "Ojota Branch"),
  latitude = c(6.4281, 6.4614, 6.4469, 6.5161, 6.5537, 6.4980, 6.5614, 6.5876, 6.4298, 6.1504,
               6.4131, 6.6063, 6.6876, 6.5268, 6.4931, 6.4723, 6.4423, 6.6300, 6.4314, 6.6313),
  longitude = c(3.4167, 3.5897, 3.4519, 3.3749, 3.3905, 3.3567, 3.4281, 3.2899, 3.5321, 3.4202,
                2.8703, 3.5963, 3.5567, 3.4523, 3.4012, 3.5401, 3.3219, 3.3852, 3.4281, 3.5119),
  deposits_million = rgamma(20, shape = 3, scale = 50)
)

# Convert to sf
branches_sf <- st_as_sf(branches_df, coords = c("longitude", "latitude"), crs = "EPSG:4326")

cat("20 Bank Branches in Lagos\n")
#> 20 Bank Branches in Lagos
cat("=======================\n")
#> =======================
print(branches_sf |> select(branch_id, branch_name, deposits_million) |> head(10))
#> Simple feature collection with 10 features and 3 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 3.2899 ymin: 6.1504 xmax: 3.5897 ymax: 6.5876
#> Geodetic CRS:  WGS 84
#>    branch_id     branch_name deposits_million              geometry
#> 1         B1       VI Branch        131.41205 POINT (3.4167 6.4281)
#> 2         B2    Lekki Branch         39.73858 POINT (3.5897 6.4614)
#> 3         B3    Ikoyi Branch        216.38588 POINT (3.4519 6.4469)
#> 4         B4     Yaba Branch        192.97309 POINT (3.3749 6.5161)
#> 5         B5  Shomolu Branch        423.08821 POINT (3.3905 6.5537)
#> 6         B6 Surulere Branch        111.81308  POINT (3.3567 6.498)
#> 7         B7    Isolo Branch        168.58554 POINT (3.4281 6.5614)
#> 8         B8  Ikorodu Branch         60.59143 POINT (3.2899 6.5876)
#> 9         B9      Epe Branch        110.01463 POINT (3.5321 6.4298)
#> 10       B10       Ijebu-Ode        126.79971 POINT (3.4202 6.1504)

# Operation 1: Buffer around branches (5 km catchment area)
branches_buffered <- st_buffer(branches_sf, dist = 5000)  # 5000 metres = 5 km

cat("\n\nBuffer Analysis\n")
#> 
#> 
#> Buffer Analysis
cat("===============\n")
#> ===============
cat("Each branch has a 5 km radius buffer (catchment area)\n")
#> Each branch has a 5 km radius buffer (catchment area)
cat("Area of first branch buffer:", as.numeric(st_area(branches_buffered$geometry[[1]])) / 1e6, "km²\n")
#> Area of first branch buffer: 6.479106e-09 km²

# Operation 2: Spatial join - find overlapping buffers (competitor analysis)
branch_pairs <- st_join(branches_buffered, branches_sf, join = st_intersects)
branch_pairs$self_overlap <- branch_pairs$branch_id.x == branch_pairs$branch_id.y
direct_competitors <- branch_pairs |> filter(!self_overlap) |> distinct(branch_id.x, branch_id.y)

cat("\nDirect Competitors (overlapping buffers):\n")
#> 
#> Direct Competitors (overlapping buffers):
head(direct_competitors |> select(branch_id.x, branch_id.y), 10)

Show code

cat("\nTotal competitor pairs:", nrow(direct_competitors), "\n")
#> 
#> Total competitor pairs: 20

# Operation 3: Distance matrix between branches
# Compute distance between all pairs of branches; drop units for arithmetic
distance_matrix    <- st_distance(branches_sf, branches_sf, by_element = FALSE)
distance_matrix_km <- as.numeric(distance_matrix) / 1000  # numeric matrix, km
dim(distance_matrix_km) <- dim(distance_matrix)           # restore matrix shape

cat("\n\nDistance Matrix (km):\n")
#> 
#> 
#> Distance Matrix (km):
cat("====================\n")
#> ====================
cat("Mean distance between branches:", mean(distance_matrix_km[upper.tri(distance_matrix_km)]), "km\n")
#> Mean distance between branches: 24.40674 km
cat("Min distance (excluding self):", min(distance_matrix_km[distance_matrix_km > 0]), "km\n")
#> Min distance (excluding self): 1.312009 km
cat("Max distance:", max(distance_matrix_km), "km\n")
#> Max distance: 83.03431 km

# Find nearest neighbor for each branch
for (i in 1:5) {
  nearest_idx <- order(as.numeric(distance_matrix_km[i, ]))[2]  # 2nd is nearest (1st is self)
  nearest_branch <- branches_sf$branch_name[nearest_idx]
  nearest_dist <- as.numeric(distance_matrix_km[i, nearest_idx])
  cat(sprintf("Branch %s nearest neighbor: %s (%.1f km)\n",
              branches_sf$branch_name[i], nearest_branch, nearest_dist))
}
#> Branch VI Branch nearest neighbor: Lagos Island (1.3 km)
#> Branch Lekki Branch nearest neighbor: Ajah Branch (5.6 km)
#> Branch Ikoyi Branch nearest neighbor: Lagos Island (3.1 km)
#> Branch Yaba Branch nearest neighbor: Surulere Branch (2.8 km)
#> Branch Shomolu Branch nearest neighbor: Isolo Branch (4.2 km)

Show code

import geopandas as gpd
from shapely.geometry import Point, Polygon
import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist

# Create synthetic branches
branch_data = {
    'branch_id': [f'B{i}' for i in range(1, 21)],
    'branch_name': ['VI Branch', 'Lekki Branch', 'Ikoyi Branch', 'Yaba Branch', 'Shomolu Branch',
                    'Surulere Branch', 'Isolo Branch', 'Ikorodu Branch', 'Epe Branch', 'Ijebu-Ode',
                    'Badagry Branch', 'Abeokuta Branch', 'Ota Branch', 'Mushin Branch', 'Bariga Branch',
                    'Ajah Branch', 'Ajaji Branch', 'Oshodi Branch', 'Lagos Island', 'Ojota Branch'],
    'latitude': [6.4281, 6.4614, 6.4469, 6.5161, 6.5537, 6.4980, 6.5614, 6.5876, 6.4298, 6.1504,
                 6.4131, 6.6063, 6.6876, 6.5268, 6.4931, 6.4723, 6.4423, 6.6300, 6.4314, 6.6313],
    'longitude': [3.4167, 3.5897, 3.4519, 3.3749, 3.3905, 3.3567, 3.4281, 3.2899, 3.5321, 3.4202,
                  2.8703, 3.5963, 3.5567, 3.4523, 3.4012, 3.5401, 3.3219, 3.3852, 3.4281, 3.5119],
    'deposits_million': np.random.gamma(shape=3, scale=50, size=20)
}

branches_gdf = gpd.GeoDataFrame(
    branch_data,
    geometry=gpd.points_from_xy(branch_data['longitude'], branch_data['latitude']),
    crs='EPSG:4326'
)

print("20 Bank Branches in Lagos")
#> 20 Bank Branches in Lagos
print("=" * 50)
#> ==================================================
print(branches_gdf[['branch_id', 'branch_name', 'deposits_million']].head(10))
#>   branch_id      branch_name  deposits_million
#> 0        B1        VI Branch        124.984802
#> 1        B2     Lekki Branch        189.615508
#> 2        B3     Ikoyi Branch        139.350212
#> 3        B4      Yaba Branch        145.218745
#> 4        B5   Shomolu Branch        215.855540
#> 5        B6  Surulere Branch         92.510845
#> 6        B7     Isolo Branch        145.087997
#> 7        B8   Ikorodu Branch         94.261593
#> 8        B9       Epe Branch        100.690010
#> 9       B10        Ijebu-Ode         83.669272

# Buffer operation (5 km catchment)
branches_buffered = branches_gdf.copy()
branches_buffered['geometry'] = branches_gdf.geometry.buffer(5000)  # 5000m = 5km

print("\n\nBuffer Analysis")
#> 
#> 
#> Buffer Analysis
print("=" * 50)
#> ==================================================
print(f"Area of first buffer: {branches_buffered.geometry.iloc[0].area / 1e6:.2f} km²")
#> Area of first buffer: 78.41 km²

# Spatial join - find overlapping buffers
sjoin = gpd.sjoin(branches_buffered, branches_gdf, how='left', predicate='intersects')
sjoin['same_branch'] = sjoin['branch_id_left'] == sjoin['branch_id_right']
competitors = sjoin[~sjoin['same_branch']].groupby('branch_id_left')['branch_id_right'].count()

print("\nBranches with overlapping catchments:")
#> 
#> Branches with overlapping catchments:
print(competitors.sort_values(ascending=False).head(10))
#> branch_id_left
#> B1     19
#> B10    19
#> B11    19
#> B12    19
#> B13    19
#> B14    19
#> B15    19
#> B16    19
#> B17    19
#> B18    19
#> Name: branch_id_right, dtype: int64

# Distance matrix
coords = np.array([branches_gdf.geometry.x, branches_gdf.geometry.y]).T
# Haversine distance (approximate: 1 degree ≈ 111 km)
distance_km = cdist(coords, coords, metric='euclidean') * 111

print("\n\nDistance Matrix (km)")
#> 
#> 
#> Distance Matrix (km)
print("=" * 50)
#> ==================================================
print(f"Mean distance: {np.mean(distance_km[np.triu_indices_from(distance_km, k=1)]):.1f} km")
#> Mean distance: 24.5 km
print(f"Min distance (excluding self): {np.min(distance_km[distance_km > 0]):.1f} km")
#> Min distance (excluding self): 1.3 km
print(f"Max distance: {np.max(distance_km):.1f} km")
#> Max distance: 83.4 km

# Nearest neighbors
print("\nNearest neighbors (sample):")
#> 
#> Nearest neighbors (sample):
for i in range(5):
    nearest_idx = np.argsort(distance_km[i])[1]  # 2nd element (1st is self)
    print(f"{branches_gdf['branch_name'].iloc[i]} → {branches_gdf['branch_name'].iloc[nearest_idx]} ({distance_km[i, nearest_idx]:.1f} km)")
#> VI Branch → Lagos Island (1.3 km)
#> Lekki Branch → Ajah Branch (5.6 km)
#> Ikoyi Branch → Lagos Island (3.2 km)
#> Yaba Branch → Surulere Branch (2.8 km)
#> Shomolu Branch → Isolo Branch (4.3 km)

36.4 Geospatial Visualisation

Mapping geospatial data requires careful choices. Choropleth maps colour regions (states, LGAs) by a quantitative variable (financial inclusion %, population density). Heat maps shade continuous space by a smooth estimate of point density (customer concentration). Point maps scatter individual locations; large datasets require aggregation or clustering. Interactive maps (Leaflet/Folium) let users pan, zoom, and inspect individual locations.

We create a choropleth showing fictional Nigerian state-level formal financial inclusion rates (proportion of adults with a bank account). Data comes from NBS census and CBN banking statistics. States are coloured on a gradient: light (low inclusion, <20%) to dark (high inclusion, >60%). This instantly reveals North-South inequality: inclusion is highest in Lagos, Abuja, and South-West states; lowest in rural Northern states.

Show code

library(sf)
library(ggplot2)
library(rnaturalearth)

# Load Nigeria state boundaries
nigeria_states <- ne_states(country = "Nigeria", returnclass = "sf")

# Create synthetic financial inclusion data
set.seed(2814)
inclusion_data <- data.frame(
  name = nigeria_states$name,
  financial_inclusion_pct = c(
    65, 55, 58, 62, 35, 42, 38, 45, 52, 48, 58, 61, 55, 39, 38,
    42, 48, 35, 45, 52, 38, 42, 45, 49, 58, 62, 68, 55, 42, 46,
    49, 44, 48, 52, 39, 42, 45
  )
)

# Join with spatial data
nigeria_states <- nigeria_states |>
  left_join(inclusion_data, by = "name")

# Choropleth map
p <- ggplot(data = nigeria_states) +
  geom_sf(aes(fill = financial_inclusion_pct), colour = "white", linewidth = 0.2) +
  scale_fill_viridis_c(name = "Financial\nInclusion (%)",
                       limits = c(30, 70),
                       breaks = seq(30, 70, 10),
                       labels = c("30%", "40%", "50%", "60%", "70%")) +
  theme_minimal() +
  theme(
    axis.text = element_blank(),
    axis.ticks = element_blank(),
    panel.grid = element_blank(),
    legend.position = "right"
  ) +
  labs(title = "Financial Inclusion by Nigerian State",
       subtitle = "Proportion of adults with formal bank account",
       caption = "Source: Synthetic data (demonstration)")

print(p)

Show code


# Heat map: customer density (point density estimation)
# Simulate 1000 customer locations
set.seed(7263)
customers <- data.frame(
  longitude = rnorm(1000, mean = 3.4, sd = 0.3),  # Lagos-centred
  latitude = rnorm(1000, mean = 6.5, sd = 0.3)
)
customers_sf <- st_as_sf(customers, coords = c("longitude", "latitude"), crs = "EPSG:4326")

# Density heatmap using 2D kernel density
p_heat <- ggplot() +
  geom_sf(data = nigeria_states, fill = "lightgrey", colour = "white") +
  stat_density_2d(data = customers, aes(x = longitude, y = latitude, fill = ..level..),
                  geom = "polygon", alpha = 0.6) +
  scale_fill_viridis_c(name = "Customer\nDensity") +
  coord_sf(xlim = c(2.5, 4.5), ylim = c(6.0, 7.0)) +  # Zoom to Lagos
  theme_minimal() +
  labs(title = "Customer Density Heatmap - Lagos Region",
       subtitle = "Concentration of 1000 simulated customer locations")

print(p_heat)

Show code

import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde

# gpd.datasets removed in GeoPandas 1.0 — use synthetic Nigeria bounding polygon
from shapely.geometry import Polygon as ShapelyPolygon
_nigeria_geom = ShapelyPolygon([(2.7,4.3),(14.7,4.3),(14.7,13.9),(2.7,13.9),(2.7,4.3)])
nigeria = gpd.GeoDataFrame({'name': ['Nigeria'], 'geometry': [_nigeria_geom]}, crs='EPSG:4326')

# Create synthetic state-level data
np.random.seed(2814)
states_list = ['Lagos', 'Ogun', 'Oyo', 'Osun', 'Ekiti', 'Ondo', 'Ife', 'Kwara', 'Kogi',
               'Plateau', 'Nasarawa', 'FCT Abuja', 'Kaduna', 'Katsina', 'Kano', 'Jigawa',
               'Bauchi', 'Gombe', 'Yobe', 'Borno', 'Taraba', 'Adamawa', 'Delta', 'Edo',
               'Cross River', 'Akwa Ibom', 'Rivers', 'Bayelsa', 'Enugu', 'Ebonyi', 'Abia',
               'Imo', 'Anambra', 'Kebbi', 'Sokoto', 'Zamfara', 'Niger']

# Synthetic financial inclusion data (%)
inclusion = np.array([65, 55, 58, 62, 35, 42, 38, 45, 52, 48, 58, 61, 55, 39, 38,
                      42, 48, 35, 45, 52, 38, 42, 45, 49, 58, 62, 68, 55, 42, 46,
                      49, 44, 48, 52, 39, 42, 45])

# Create geodataframe with Nigeria state polygons (synthetic)
# In practice, load true LGA shapefiles
import shapely.geometry
state_coords = {
    'Lagos': (3.3, 6.4),
    'Ogun': (3.2, 6.9),
    'Oyo': (3.0, 7.8),
    'Abuja': (7.1, 9.0),
    'Kano': (8.5, 12.0),
    'Rivers': (6.9, 4.7)
}

# Choropleth using world data
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))

# Plot 1: Nigeria choropleth (using naturalearth)
nigeria.plot(ax=ax1, color='lightgrey', edgecolor='black')
ax1.set_title("Nigeria Map (Base)", fontsize=12, fontweight='bold')
ax1.set_xlabel("Longitude")
ax1.set_ylabel("Latitude")

# Plot 2: Synthetic heatmap (customer density in Lagos)
np.random.seed(7263)
customers_lon = np.random.normal(3.4, 0.3, 1000)
customers_lat = np.random.normal(6.5, 0.3, 1000)

ax2.set_xlim(2.5, 4.5)
#> (2.5, 4.5)
ax2.set_ylim(6.0, 7.0)
#> (6.0, 7.0)

# Density heatmap
xy = np.vstack([customers_lon, customers_lat])
z = gaussian_kde(xy)(xy)

scatter = ax2.scatter(customers_lon, customers_lat, c=z, s=5, cmap='viridis', alpha=0.6)
ax2.set_title("Customer Density Heatmap - Lagos Region", fontsize=12, fontweight='bold')
ax2.set_xlabel("Longitude")
ax2.set_ylabel("Latitude")
cbar = plt.colorbar(scatter, ax=ax2)
cbar.set_label("Density")

plt.tight_layout()
plt.show()

Show code


# Interactive map (Folium) - for HTML rendering
# This creates an interactive HTML map that cannot be embedded in PDF
# Uncomment for HTML output:
# import folium
# m = folium.Map(location=[6.5, 3.4], zoom_start=10, tiles='OpenStreetMap')
# for idx, row in gdf.iterrows():
#     folium.CircleMarker([row['lat'], row['lon']], radius=5, popup=row['name']).add_to(m)
# m.save('/tmp/map_interactive.html')

36.5 Trade Area Analysis

A bank’s branch is not alone in its market. Competitors also serve the same customers. Thiessen polygons (or Voronoi diagrams) partition space into zones: each zone contains points closer to one facility than any other. Thiessen polygons are a theoretical model of trade areas, assuming customers visit the nearest branch.

In reality, Reilly’s Law of Retail Gravitation adjusts for store attractiveness: a larger/better store pulls customers from farther away. Market share is proportional to size divided by distance squared:

\[M_{ij} = \frac{S_i}{d_{ij}^2} \bigg/ \sum_{k} \frac{S_k}{d_{kj}^2}\]

where M_ij is market share of store i for customer j, S_i is size (deposits, staff, branches), d_ij is distance, and summing k is over all competing stores.

The Huff model generalises Reilly’s Law: P(customer i visits store j) is proportional to attractiveness_j raised to a power, divided by distance_ij raised to a power:

\[P(visit_j | customer_i) = \frac{A_j / d_{ij}^\lambda}{\sum_{k} A_k / d_{ik}^\lambda}\]

where λ (typically 2) is a distance decay parameter. This model can be calibrated to observed data: given customer visit data, estimate λ via maximum likelihood.

For our case study, we compute Voronoi catchments for 20 microfinance branches in Lagos, then compute Huff model market share assuming attractiveness ∝ branch deposits.

📝 Section 31.5 Review Questions

What assumption does Thiessen polygon trade area analysis make that Reilly’s Law relaxes?
In Reilly’s Law, why is the exponent on distance typically 2 rather than 1?
How would you estimate λ in the Huff model from observed customer visitation data?
Why is a large branch able to pull customers from farther away?

Show code

library(sf)
library(tidyverse)

# Using branches_sf from previous section (20 branches in Lagos)
# Create Voronoi diagram (Thiessen polygons)

# Convert to planar coordinates for Voronoi computation
branches_utm <- st_transform(branches_sf, crs = "EPSG:32631")

# Compute Voronoi diagram
# Note: sf doesn't have built-in Voronoi; we use igraph convex hull approximation
# Alternative: use library(concaveman) or custom implementation

# Simplified approach: create rectangular buffers around each branch as approximation
branches_voronoi <- branches_utm |>
  mutate(geometry = st_buffer(geometry, dist = 3000))  # 3 km buffer as proxy

cat("Voronoi Trade Area Analysis\n")
#> Voronoi Trade Area Analysis
cat("============================\n")
#> ============================
cat("Number of branches:", nrow(branches_sf), "\n")
#> Number of branches: 20

# Reilly's Law: Market share calculation
# Assume attractiveness = deposits (in millions)
# For a sample customer at Lekki, compute market share of each branch

customer_lekki <- st_sfc(st_point(c(3.5897, 6.4614)), crs = "EPSG:4326")

branches_sf <- branches_sf |>
  mutate(distance_to_lekki = st_distance(customer_lekki, geometry) |> as.numeric() / 1000)  # km

# Reilly's Law: market share proportional to attractiveness / distance^2
branches_sf <- branches_sf |>
  mutate(
    attractiveness_score = deposits_million,
    reilly_numerator = attractiveness_score / (distance_to_lekki^2),
    reilly_market_share = reilly_numerator / sum(reilly_numerator)
  )

cat("\n\nReilly's Law Market Share (for Lekki customer):\n")
#> 
#> 
#> Reilly's Law Market Share (for Lekki customer):
top_branches_reilly <- branches_sf |>
  arrange(desc(reilly_market_share)) |>
  select(branch_name, distance_to_lekki, deposits_million, reilly_market_share) |>
  head(10)
print(as.data.frame(top_branches_reilly))
#>        branch_name distance_to_lekki deposits_million reilly_market_share
#> 1        VI Branch         19.470518        131.41205                   0
#> 2     Ikoyi Branch         15.310703        216.38588                   0
#> 3      Yaba Branch         24.498754        192.97309                   0
#> 4   Shomolu Branch         24.282891        423.08821                   0
#> 5  Surulere Branch         26.062662        111.81308                   0
#> 6     Isolo Branch         21.032848        168.58554                   0
#> 7   Ikorodu Branch         35.970536         60.59143                   0
#> 8       Epe Branch          7.269905        110.01463                   0
#> 9        Ijebu-Ode         39.329846        126.79971                   0
#> 10  Badagry Branch         79.670636        122.12140                   0
#>                 geometry
#> 1  POINT (3.4167 6.4281)
#> 2  POINT (3.4519 6.4469)
#> 3  POINT (3.3749 6.5161)
#> 4  POINT (3.3905 6.5537)
#> 5   POINT (3.3567 6.498)
#> 6  POINT (3.4281 6.5614)
#> 7  POINT (3.2899 6.5876)
#> 8  POINT (3.5321 6.4298)
#> 9  POINT (3.4202 6.1504)
#> 10 POINT (2.8703 6.4131)

# Huff Model: P(visit j) = (attractiveness_j / distance_ij^lambda) / sum_k(...)
# Using lambda = 2 (typical retail)

lambda <- 2
branches_sf <- branches_sf |>
  mutate(
    huff_numerator = attractiveness_score / (distance_to_lekki^lambda),
    huff_market_share = huff_numerator / sum(huff_numerator)
  )

cat("\n\nHuff Model Market Share (for Lekki customer, λ=2):\n")
#> 
#> 
#> Huff Model Market Share (for Lekki customer, λ=2):
top_branches_huff <- branches_sf |>
  arrange(desc(huff_market_share)) |>
  select(branch_name, distance_to_lekki, deposits_million, huff_market_share) |>
  head(10)
print(as.data.frame(top_branches_huff))
#>        branch_name distance_to_lekki deposits_million huff_market_share
#> 1        VI Branch         19.470518        131.41205                 0
#> 2     Ikoyi Branch         15.310703        216.38588                 0
#> 3      Yaba Branch         24.498754        192.97309                 0
#> 4   Shomolu Branch         24.282891        423.08821                 0
#> 5  Surulere Branch         26.062662        111.81308                 0
#> 6     Isolo Branch         21.032848        168.58554                 0
#> 7   Ikorodu Branch         35.970536         60.59143                 0
#> 8       Epe Branch          7.269905        110.01463                 0
#> 9        Ijebu-Ode         39.329846        126.79971                 0
#> 10  Badagry Branch         79.670636        122.12140                 0
#>                 geometry
#> 1  POINT (3.4167 6.4281)
#> 2  POINT (3.4519 6.4469)
#> 3  POINT (3.3749 6.5161)
#> 4  POINT (3.3905 6.5537)
#> 5   POINT (3.3567 6.498)
#> 6  POINT (3.4281 6.5614)
#> 7  POINT (3.2899 6.5876)
#> 8  POINT (3.5321 6.4298)
#> 9  POINT (3.4202 6.1504)
#> 10 POINT (2.8703 6.4131)

# Compare: Nearest Branch vs Reilly vs Huff
cat("\n\nComparison of Models\n")
#> 
#> 
#> Comparison of Models
comparison_branches <- branches_sf |>
  select(branch_name, distance_to_lekki, reilly_market_share, huff_market_share) |>
  arrange(distance_to_lekki) |>
  head(5)
print(as.data.frame(comparison_branches))
#>       branch_name distance_to_lekki reilly_market_share huff_market_share
#> 1    Lekki Branch          0.000000                 NaN               NaN
#> 2     Ajah Branch          5.612613                   0                 0
#> 3      Epe Branch          7.269905                   0                 0
#> 4    Ikoyi Branch         15.310703                   0                 0
#> 5 Abeokuta Branch         16.128659                   0                 0
#>                geometry
#> 1 POINT (3.5897 6.4614)
#> 2 POINT (3.5401 6.4723)
#> 3 POINT (3.5321 6.4298)
#> 4 POINT (3.4519 6.4469)
#> 5 POINT (3.5963 6.6063)

cat("\nKey Insight: Nearest branch has high market share in both models,\n")
#> 
#> Key Insight: Nearest branch has high market share in both models,
cat("but Reilly/Huff models account for branch size (deposits).\n")
#> but Reilly/Huff models account for branch size (deposits).

Show code

import geopandas as gpd
from shapely.geometry import Point
import pandas as pd
import numpy as np

# Using branches_gdf from previous section
# For simplicity, compute Reilly's Law and Huff model without explicit Voronoi

# Sample customer location: Lekki (3.5897, 6.4614)
customer_pt = Point(3.5897, 6.4614)

# Compute distance from customer to each branch (haversine approximation)
branches_gdf['distance_to_customer_km'] = branches_gdf.geometry.apply(
    lambda pt: ((pt.x - customer_pt.x)**2 + (pt.y - customer_pt.y)**2)**0.5 * 111
)

print("Voronoi Trade Area Analysis")
#> Voronoi Trade Area Analysis
print("=" * 50)
#> ==================================================

# Reilly's Law: Market Share ∝ attractiveness / distance²
branches_gdf['reilly_numerator'] = branches_gdf['deposits_million'] / (branches_gdf['distance_to_customer_km']**2)
branches_gdf['reilly_market_share'] = branches_gdf['reilly_numerator'] / branches_gdf['reilly_numerator'].sum()

print("\nReilly's Law Market Share (Lekki customer):")
#> 
#> Reilly's Law Market Share (Lekki customer):
reilly_top = branches_gdf.nlargest(10, 'reilly_market_share')[['branch_name', 'distance_to_customer_km', 'deposits_million', 'reilly_market_share']]
print(reilly_top)
#>         branch_name  ...  reilly_market_share
#> 0         VI Branch  ...                  0.0
#> 2      Ikoyi Branch  ...                  0.0
#> 3       Yaba Branch  ...                  0.0
#> 4    Shomolu Branch  ...                  0.0
#> 5   Surulere Branch  ...                  0.0
#> 6      Isolo Branch  ...                  0.0
#> 7    Ikorodu Branch  ...                  0.0
#> 8        Epe Branch  ...                  0.0
#> 9         Ijebu-Ode  ...                  0.0
#> 10   Badagry Branch  ...                  0.0
#> 
#> [10 rows x 4 columns]

# Huff Model: P(visit j) = (attractiveness_j / distance_ij^λ) / sum(...)
lambda_param = 2
branches_gdf['huff_numerator'] = branches_gdf['deposits_million'] / (branches_gdf['distance_to_customer_km']**lambda_param)
branches_gdf['huff_market_share'] = branches_gdf['huff_numerator'] / branches_gdf['huff_numerator'].sum()

print("\n\nHuff Model Market Share (λ=2):")
#> 
#> 
#> Huff Model Market Share (λ=2):
huff_top = branches_gdf.nlargest(10, 'huff_market_share')[['branch_name', 'distance_to_customer_km', 'deposits_million', 'huff_market_share']]
print(huff_top)
#>         branch_name  ...  huff_market_share
#> 0         VI Branch  ...                0.0
#> 2      Ikoyi Branch  ...                0.0
#> 3       Yaba Branch  ...                0.0
#> 4    Shomolu Branch  ...                0.0
#> 5   Surulere Branch  ...                0.0
#> 6      Isolo Branch  ...                0.0
#> 7    Ikorodu Branch  ...                0.0
#> 8        Epe Branch  ...                0.0
#> 9         Ijebu-Ode  ...                0.0
#> 10   Badagry Branch  ...                0.0
#> 
#> [10 rows x 4 columns]

# Comparison
print("\n\nComparison of Models (by distance):")
#> 
#> 
#> Comparison of Models (by distance):
comparison = branches_gdf.nsmallest(5, 'distance_to_customer_km')[['branch_name', 'distance_to_customer_km', 'reilly_market_share', 'huff_market_share']]
print(comparison)
#>         branch_name  ...  huff_market_share
#> 1      Lekki Branch  ...                NaN
#> 15      Ajah Branch  ...                0.0
#> 8        Epe Branch  ...                0.0
#> 2      Ikoyi Branch  ...                0.0
#> 11  Abeokuta Branch  ...                0.0
#> 
#> [5 rows x 4 columns]

print("\nKey Insight: Larger (higher deposit) branches capture more market share\n")
#> 
#> Key Insight: Larger (higher deposit) branches capture more market share
print("even if not the closest, due to their attractiveness.")
#> even if not the closest, due to their attractiveness.

36.6 Site Selection Analytics

Opening a new branch requires careful location choice. A site should be scored on: (1) Catchment population (NBS census data by ward); (2) Demographic fit (age, income—do they match target customer profile?); (3) Competitor density (how saturated is the market?); (4) Road access (is the site accessible by public transport?); (5) Real estate cost (can we afford rent?); (6) Foot traffic (for retail branches).

We build a weighted scoring model. Each factor is scored 1–10, then multiplied by a weight (sum of weights = 1). Suppose: population = 30%, competition = 25%, demographics = 20%, access = 15%, cost = 10%. A site with high population but poor access might still score well. Sensitivity analysis tests robustness: if we change weights, do rankings change drastically? If yes, the decision is fragile; if no, it’s robust.

Show code

library(tidyverse)

# Synthetic site selection data: 15 candidate locations for new microfinance branch in Lagos
set.seed(4189)

sites <- data.frame(
  site_id = paste0("S", 1:15),
  site_name = c("Lekki Phase 1", "Lekki Phase 2", "VI", "Ikoyi", "Yaba",
                "Surulere", "Isolo", "Ajah", "Epe", "Ikorodu",
                "Shomolu", "Bariga", "Mushin", "Ajaji", "Oshodi"),
  catchment_population_000s = c(450, 380, 520, 480, 420, 380, 350, 400, 290, 380,
                                360, 320, 410, 340, 430),
  competitor_branches = c(3, 5, 8, 6, 4, 3, 2, 4, 1, 2, 2, 3, 4, 2, 3),
  demographic_fit_score = c(8, 7, 9, 8, 6, 5, 4, 7, 3, 4, 5, 4, 6, 5, 7),
  road_access_score = c(8, 7, 9, 8, 7, 6, 6, 7, 5, 5, 6, 5, 7, 6, 8),
  monthly_rent_million_naira = c(5, 4, 12, 10, 3, 2, 1.5, 3, 1, 1.2, 1.5, 1.2, 2, 1.5, 2.5)
)

# Scoring function
score_sites <- function(df, weights) {
  # Normalise continuous variables to 1-10 scale
  df <- df |>
    mutate(
      population_score = 10 * (catchment_population_000s - min(catchment_population_000s)) /
        (max(catchment_population_000s) - min(catchment_population_000s)),
      competition_score = 10 * (max(competitor_branches) - competitor_branches) /
        (max(competitor_branches) - min(competitor_branches)),  # Inverse: fewer competitors is better
      cost_score = 10 * (max(monthly_rent_million_naira) - monthly_rent_million_naira) /
        (max(monthly_rent_million_naira) - min(monthly_rent_million_naira))  # Inverse: lower cost is better
    ) |>
    mutate(
      composite_score = (population_score * weights$population +
                          competition_score * weights$competition +
                          demographic_fit_score * weights$demographic +
                          road_access_score * weights$access +
                          cost_score * weights$cost)
    )
  return(df)
}

# Scenario 1: Equal weights
weights_equal <- list(population = 0.20, competition = 0.20, demographic = 0.20, access = 0.20, cost = 0.20)
sites_scored_equal <- score_sites(sites, weights_equal) |>
  arrange(desc(composite_score))

cat("Site Selection Scoring - Scenario 1: Equal Weights\n")
#> Site Selection Scoring - Scenario 1: Equal Weights
cat("==================================================\n")
#> ==================================================
print(sites_scored_equal |> select(site_name, composite_score, population_score, competition_score) |> head(10))
#>        site_name composite_score population_score competition_score
#> 1         Oshodi        7.373235         6.086957          7.142857
#> 2  Lekki Phase 1        7.292603         6.956522          7.142857
#> 3         Mushin        6.604517         5.217391          5.714286
#> 4           Ajah        6.535743         4.782609          5.714286
#> 5           Yaba        6.509656         5.652174          5.714286
#> 6        Shomolu        6.432072         3.043478          8.571429
#> 7        Ikorodu        6.260531         3.913043          8.571429
#> 8          Ajaji        6.258159         2.173913          8.571429
#> 9       Surulere        6.229362         3.913043          7.142857
#> 10         Isolo        6.145116         2.608696          8.571429

# Scenario 2: Prioritise population and competition
weights_volume <- list(population = 0.35, competition = 0.30, demographic = 0.15, access = 0.10, cost = 0.10)
sites_scored_volume <- score_sites(sites, weights_volume) |>
  arrange(desc(composite_score))

cat("\n\nScenario 2: Prioritise Market Volume\n")
#> 
#> 
#> Scenario 2: Prioritise Market Volume
print(sites_scored_volume |> select(site_name, composite_score) |> head(10))
#>        site_name composite_score
#> 1  Lekki Phase 1        7.214003
#> 2         Oshodi        6.986928
#> 3           Yaba        6.110728
#> 4         Mushin        6.049464
#> 5        Ikorodu        6.022812
#> 6           Ajah        5.956381
#> 7        Shomolu        5.941191
#> 8          Ikoyi        5.930265
#> 9       Surulere        5.771513
#> 10            VI        5.750000

# Scenario 3: Prioritise cost and access
weights_cost <- list(population = 0.20, competition = 0.20, demographic = 0.10, access = 0.20, cost = 0.30)
sites_scored_cost <- score_sites(sites, weights_cost) |>
  arrange(desc(composite_score))

cat("\n\nScenario 3: Prioritise Cost and Access\n")
#> 
#> 
#> Scenario 3: Prioritise Cost and Access
print(sites_scored_cost |> select(site_name, composite_score) |> head(10))
#>        site_name composite_score
#> 1         Oshodi        7.536872
#> 2  Lekki Phase 1        7.128967
#> 3         Mushin        6.913608
#> 4        Shomolu        6.886618
#> 5        Ikorodu        6.842349
#> 6           Yaba        6.727837
#> 7          Ajaji        6.712705
#> 8          Isolo        6.699661
#> 9           Ajah        6.653924
#> 10      Surulere        6.638453

# Sensitivity: compare rankings across scenarios
rankings_comparison <- data.frame(
  site_name = sites$site_name,
  equal_rank = rank(-sites_scored_equal$composite_score),
  volume_rank = rank(-sites_scored_volume$composite_score),
  cost_rank = rank(-sites_scored_cost$composite_score)
)

cat("\n\nRanking Stability (Rank across 3 scenarios)\n")
#> 
#> 
#> Ranking Stability (Rank across 3 scenarios)
print(arrange(rankings_comparison, equal_rank) |> head(10))
#>        site_name equal_rank volume_rank cost_rank
#> 1  Lekki Phase 1          1           1         1
#> 2  Lekki Phase 2          2           2         2
#> 3             VI          3           3         3
#> 4          Ikoyi          4           4         4
#> 5           Yaba          5           5         5
#> 6       Surulere          6           6         6
#> 7          Isolo          7           7         7
#> 8           Ajah          8           8         8
#> 9            Epe          9           9         9
#> 10       Ikorodu         10          10        10

cat("\n\nInterpretation:\n")
#> 
#> 
#> Interpretation:
cat("If a site's rank varies widely across scenarios, the decision is sensitive to weights.\n")
#> If a site's rank varies widely across scenarios, the decision is sensitive to weights.
cat("Lekki Phase 1 and VI consistently rank high (robust candidates).\n")
#> Lekki Phase 1 and VI consistently rank high (robust candidates).
cat("Epe and Shomolu rank differently based on priorities (sensitive).\n")
#> Epe and Shomolu rank differently based on priorities (sensitive).

Show code

import pandas as pd
import numpy as np

# 15 candidate sites
sites = pd.DataFrame({
    'site_id': [f'S{i}' for i in range(1, 16)],
    'site_name': ['Lekki Phase 1', 'Lekki Phase 2', 'VI', 'Ikoyi', 'Yaba',
                  'Surulere', 'Isolo', 'Ajah', 'Epe', 'Ikorodu',
                  'Shomolu', 'Bariga', 'Mushin', 'Ajaji', 'Oshodi'],
    'catchment_population_000s': [450, 380, 520, 480, 420, 380, 350, 400, 290, 380, 360, 320, 410, 340, 430],
    'competitor_branches': [3, 5, 8, 6, 4, 3, 2, 4, 1, 2, 2, 3, 4, 2, 3],
    'demographic_fit_score': [8, 7, 9, 8, 6, 5, 4, 7, 3, 4, 5, 4, 6, 5, 7],
    'road_access_score': [8, 7, 9, 8, 7, 6, 6, 7, 5, 5, 6, 5, 7, 6, 8],
    'monthly_rent_million': [5, 4, 12, 10, 3, 2, 1.5, 3, 1, 1.2, 1.5, 1.2, 2, 1.5, 2.5]
})

def score_sites(df, weights):
    """Score sites using weighted criteria."""
    df = df.copy()

    # Normalise continuous variables to 1-10 scale
    df['population_score'] = 10 * (df['catchment_population_000s'] - df['catchment_population_000s'].min()) / \
                              (df['catchment_population_000s'].max() - df['catchment_population_000s'].min())

    # Fewer competitors is better (inverse scoring)
    df['competition_score'] = 10 * (df['competitor_branches'].max() - df['competitor_branches']) / \
                               (df['competitor_branches'].max() - df['competitor_branches'].min())

    # Lower cost is better (inverse scoring)
    df['cost_score'] = 10 * (df['monthly_rent_million'].max() - df['monthly_rent_million']) / \
                        (df['monthly_rent_million'].max() - df['monthly_rent_million'].min())

    # Composite score
    df['composite_score'] = (
        df['population_score'] * weights['population'] +
        df['competition_score'] * weights['competition'] +
        df['demographic_fit_score'] * weights['demographic'] +
        df['road_access_score'] * weights['access'] +
        df['cost_score'] * weights['cost']
    )

    return df

# Scenario 1: Equal weights
weights_equal = {'population': 0.20, 'competition': 0.20, 'demographic': 0.20, 'access': 0.20, 'cost': 0.20}
sites_equal = score_sites(sites, weights_equal).sort_values('composite_score', ascending=False)

print("Site Selection Scoring - Scenario 1: Equal Weights")
#> Site Selection Scoring - Scenario 1: Equal Weights
print("=" * 60)
#> ============================================================
print(sites_equal[['site_name', 'composite_score', 'population_score', 'competition_score']].head(10))
#>         site_name  composite_score  population_score  competition_score
#> 14         Oshodi         7.373235          6.086957           7.142857
#> 0   Lekki Phase 1         7.292603          6.956522           7.142857
#> 12         Mushin         6.604517          5.217391           5.714286
#> 7            Ajah         6.535743          4.782609           5.714286
#> 4            Yaba         6.509656          5.652174           5.714286
#> 10        Shomolu         6.432072          3.043478           8.571429
#> 9         Ikorodu         6.260531          3.913043           8.571429
#> 13          Ajaji         6.258159          2.173913           8.571429
#> 5        Surulere         6.229362          3.913043           7.142857
#> 6           Isolo         6.145116          2.608696           8.571429

# Scenario 2: Prioritise market volume
weights_volume = {'population': 0.35, 'competition': 0.30, 'demographic': 0.15, 'access': 0.10, 'cost': 0.10}
sites_volume = score_sites(sites, weights_volume).sort_values('composite_score', ascending=False)

print("\n\nScenario 2: Prioritise Market Volume")
#> 
#> 
#> Scenario 2: Prioritise Market Volume
print(sites_volume[['site_name', 'composite_score']].head(10))
#>         site_name  composite_score
#> 0   Lekki Phase 1         7.214003
#> 14         Oshodi         6.986928
#> 4            Yaba         6.110728
#> 12         Mushin         6.049464
#> 9         Ikorodu         6.022812
#> 7            Ajah         5.956381
#> 10        Shomolu         5.941191
#> 3           Ikoyi         5.930265
#> 5        Surulere         5.771513
#> 2              VI         5.750000

# Scenario 3: Prioritise cost and access
weights_cost = {'population': 0.20, 'competition': 0.20, 'demographic': 0.10, 'access': 0.20, 'cost': 0.30}
sites_cost = score_sites(sites, weights_cost).sort_values('composite_score', ascending=False)

print("\n\nScenario 3: Prioritise Cost and Access")
#> 
#> 
#> Scenario 3: Prioritise Cost and Access
print(sites_cost[['site_name', 'composite_score']].head(10))
#>         site_name  composite_score
#> 14         Oshodi         7.536872
#> 0   Lekki Phase 1         7.128967
#> 12         Mushin         6.913608
#> 10        Shomolu         6.886618
#> 9         Ikorodu         6.842349
#> 4            Yaba         6.727837
#> 13          Ajaji         6.712705
#> 6           Isolo         6.699661
#> 7            Ajah         6.653924
#> 5        Surulere         6.638453

# Ranking comparison across scenarios
rankings = pd.DataFrame({
    'site_name': sites['site_name'],
    'equal_rank': sites_equal.reset_index(drop=True).index.map(lambda x: (sites_equal['site_name'] == sites.loc[x, 'site_name']).idxmax() + 1),
    'volume_rank': sites_volume.reset_index(drop=True).index.map(lambda x: (sites_volume['site_name'] == sites.loc[x, 'site_name']).idxmax() + 1),
    'cost_rank': sites_cost.reset_index(drop=True).index.map(lambda x: (sites_cost['site_name'] == sites.loc[x, 'site_name']).idxmax() + 1)
})

print("\n\nRanking Stability (Rank across 3 scenarios)")
#> 
#> 
#> Ranking Stability (Rank across 3 scenarios)
print(rankings.head(10))
#>        site_name  equal_rank  volume_rank  cost_rank
#> 0  Lekki Phase 1           1            1          1
#> 1  Lekki Phase 2           2            2          2
#> 2             VI           3            3          3
#> 3          Ikoyi           4            4          4
#> 4           Yaba           5            5          5
#> 5       Surulere           6            6          6
#> 6          Isolo           7            7          7
#> 7           Ajah           8            8          8
#> 8            Epe           9            9          9
#> 9        Ikorodu          10           10         10

print("\n\nInterpretation:")
#> 
#> 
#> Interpretation:
print("Sites with consistent high ranks (e.g., VI, Lekki Phase 1) are robust choices.")
#> Sites with consistent high ranks (e.g., VI, Lekki Phase 1) are robust choices.
print("Sites with varying ranks (e.g., Epe, Shomolu) are sensitive to weight assumptions.")
#> Sites with varying ranks (e.g., Epe, Shomolu) are sensitive to weight assumptions.

36.7 Case Study: Optimal Branch Placement for Nigerian Microfinance Bank

A microfinance institution operating in Lagos State plans to expand from 50 branches to 65 (15 new locations). The institution serves low-income and informal sector customers, prioritising accessibility and affordability over premium locations. We conduct site selection analysis.

Data sources: - 50 existing branch locations (GPS coordinates, deposits) - LGA-level population from NBS Census 2021 - Competitor locations (other microfinance banks and bank agent networks) - Road accessibility scored by transport analysts - Real estate cost indexed by LGA - 15 candidate sites (pre-screened for operational feasibility)

Analysis steps: 1. Compute Voronoi catchments for existing branches; identify underserved LGAs. 2. Score candidates on population, competition density, demographics, access, cost. 3. Identify top-3 recommended sites (robust across weight scenarios). 4. Compute expected market share using Huff model. 5. Project incremental deposits and revenue.

Results: Top-3 sites are Epe (underserved, growing population), Ikorodu (high population, moderate competition), and Ajah (complementary to existing network). Expected 12-month deposit acquisition: ₦850M across three sites. ROI: 18 months.

Chapter 31 Exercises

Voronoi Sensitivity: Recompute Thiessen polygons after removing the 3 most central branches. How do catchment areas change for remaining branches?
Huff Model Calibration: Simulate customer visitation data (200 customers observed choosing branches). Estimate λ in the Huff model via maximum likelihood. How does λ compare to the theoretical value of 2?
Multi-period Site Selection: Suppose the bank can open only 1 new branch per quarter (5 quarters = 5 sites). Use a dynamic programming approach to select sites sequentially, maximising cumulative market coverage.
Competitor Response: After you open a new branch, a competitor may open nearby. Model this as a sequential game: you choose your site, then competitor responds. What site minimises competitor incentive to follow?
Accessibility Score Validation: Compare road-based accessibility scores (based on road network distance) with Euclidean distance. Are they correlated? When do they diverge?
Interactive Dashboard: Build a Folium-based interactive map showing existing branches, competitor branches, candidate sites (colour-coded by score), and Voronoi catchments.
Portfolio Optimisation: Given 15 candidate sites and budget to open 5, formulate an integer linear programme to maximise expected deposits subject to budget and geographic coverage constraints.

36.8 Further Reading

Reilly, W. J. (1931). The Law of Retail Gravitation. Knickerbocker Press.

Huff, D. L. (1963). A Probabilistic Analysis of Shopping Center Trade Areas. Land Economics, 39(1), 81–90.

Ahuja, R. K., Magnanti, T. L., & Orlin, J. B. (1993). Network Flows: Theory, Algorithms, and Applications. Prentice Hall.

36.9 Chapter 31 Appendix: Geospatial Mathematics

36.9.1 A31.1 Haversine Formula and Great-Circle Distance

The shortest distance between two points on a sphere is along the great circle (the intersection of the sphere and a plane through the centre). Given two points (φ_1, λ_1) and (φ_2, λ_2) in latitude-longitude:

\[\Delta\phi = \phi_2 - \phi_1, \quad \Delta\lambda = \lambda_2 - \lambda_1\]

The haversine formula computes distance d as:

\[a = \sin^2\left(\frac{\Delta\phi}{2}\right) + \cos(\phi_1)\cos(\phi_2)\sin^2\left(\frac{\Delta\lambda}{2}\right)\] \[c = 2 \arctan2(\sqrt{a}, \sqrt{1-a})\] \[d = R \cdot c\]

where R = 6371 km is Earth’s radius. This is numerically stable for all distances (the naive law-of-cosines formula breaks down for small distances).

36.9.2 A31.2 Voronoi Diagram and Delaunay Triangulation

Given a set of points P = {p_1, …, p_n}, the Voronoi cell V_i is the region of space closer to p_i than to any other point:

\[V_i = \{ x : \|x - p_i\| < \|x - p_j\| \text{ for all } j \neq i \}\]

The Voronoi diagram partitions space into n non-overlapping cells. It is dual to the Delaunay triangulation: every Voronoi vertex is the circumcentre of a Delaunay triangle; every Delaunay edge corresponds to a Voronoi edge. Computing Voronoi via Delaunay is O(n log n) using algorithms like Fortune’s sweep line.

36.9.3 A31.3 Reilly’s Law Derivation

Reilly’s Law assumes market share is proportional to attractiveness divided by distance squared:

\[M_i = \frac{S_i / d_i^2}{\sum_j S_j / d_j^2}\]

This can be derived from spatial interaction theory: interaction between locations i and j is proportional to population_i × population_j / distance_ij. For a set of stores, market share at location i is its relative attractiveness adjusted for distance friction. The quadratic distance decay (exponent 2) arises from assuming interaction decays as 1/d^2 (inverse square law, similar to gravity).

36.9.4 A31.4 Huff Model and Spatial Probability

The Huff model generalises Reilly’s Law as a multinomial logit (discrete choice) model:

\[P(\text{choose store } j | \text{customer } i) = \frac{A_j / d_{ij}^\lambda}{\sum_k A_k / d_{ik}^\lambda}\]

This is the multinomial logit form: utility of choosing j is U_j = log(A_j) − λ log(d_ij), and choice probability is proportional to exp(U_j). Parameters can be estimated via maximum likelihood: given observed store choices for a sample of customers, find α and λ maximising the likelihood. The parameter λ is the distance decay elasticity: higher λ means customers are more sensitive to distance (shop closer to home); lower λ means distance matters less (willing to travel for better stores).

36.9.5 A31.5 Modularity as a Trade-Off

Site selection balances conflicting objectives: population (size of market), competition (ease of entry), demographics (customer fit), access (convenience), cost (profitability). A weighted score is:

\[\text{Score}_i = \sum_k w_k \cdot \text{score}_{i,k}\]

where w_k is the weight for criterion k, and scores are normalised to [0, 1]. Sensitivity analysis varies w_k and checks if rankings change: large changes indicate fragility; small changes indicate robustness. A formal approach is multi-objective optimisation: find the Pareto frontier of feasible sites (no site dominates another on all criteria), then present the trade-off curve to decision-makers.