66  Appendix B — Datasets Used in This Book

67 Appendix B — Datasets Used in This Book

This appendix provides complete documentation of every dataset used in “AI-Powered Business Analytics.” Each dataset is mapped to the chapters where it appears, sourced, and described in detail. All synthetic datasets can be generated using provided R and Python scripts.

67.1 Dataset Index and Sourcing Guide

67.1.1 Part I: Foundations (Chapters 1-11)

1. Nigerian Bank Loan Portfolio

Property Details
Chapters 5 (Statistical Foundations), 6 (ML Basics), 15 (Causal Inference)
Dataset Name nigerian_bank_loans.csv
Rows 8,000 loan accounts
Columns 28 variables
Source Synthetic — generated from CBN credit statistics and typical bank risk models
Description Loan portfolio from a hypothetical Nigerian commercial bank. Includes borrower demographics (age, income, education level, employment tenure), loan characteristics (amount, term, purpose, security type), repayment history (months on books, number of defaults, current status), and credit bureau scores.
Key Variables loan_id, borrower_age, annual_income_naira, loan_amount_naira, loan_term_months, interest_rate, employment_sector, credit_score, default_flag, months_delinquent, collateral_type, collateral_value_naira
Nigerian/African Relevance Reflects credit structures common to Nigerian retail banks; employment sectors include agriculture, oil & gas, manufacturing, services; income ranges typical of Nigerian formal employment.
License CC0 1.0 (Public Domain)
Generation Script See R and Python code at end of this appendix

2. Nigerian Household Expenditure Survey (NBS)

Property Details
Chapters 3 (Business Metrics), 5 (Statistical Tests)
Dataset Name nbs_household_survey.csv
Rows 3,000 households
Columns 35 variables
Source Inspired by Nigeria’s National Bureau of Statistics Household Expenditure Survey (anonymised subset)
Description Survey data from 3,000 Nigerian households across 6 geopolitical zones. Captures household income sources (formal employment, informal business, agriculture, remittances), expenditure categories (food, housing, utilities, transport, health, education, entertainment), household demographics (composition, size, head education), housing characteristics (ownership, type, utilities access).
Key Variables household_id, zone, urban_rural, household_size, head_age, head_education, monthly_income_naira, food_spending, housing_spending, utilities_spending, education_spending, health_spending
Nigerian/African Relevance Based on true patterns from NBS surveys; includes rural-urban disparities, regional variations, informal economy activity.
License CC BY 4.0 (with source attribution)
Access Full dataset available at nigerianstat.gov.ng — anonymised version used for teaching

3. Mobile Money Transactions (Airtime & Data)

Property Details
Chapters 4 (Data Collection), 9 (Time Series), 22 (Network Analytics)
Dataset Name mobile_money_transactions.csv
Rows 50,000 transactions
Columns 18 variables
Source Synthetic — modelled on transaction patterns from Nigerian telcos (MTN, Airtel, Glo, 9mobile)
Description 12 months of daily mobile money transactions from 5,000 unique customers across Nigeria. Each record represents a transaction (airtime purchase, data bundle, mobile banking transfer). Includes timestamps, customer ID, transaction type, amount, channel (USSD, app, agent, retailer), operator, region.
Key Variables transaction_id, customer_id, transaction_date, transaction_type, amount_naira, channel, operator, region, state, customer_tenure_months, transaction_hour, day_of_week
Nigerian/African Relevance USSD and agent-based channels reflect Nigeria’s financial inclusion reality; operators and states are authentic; seasonal patterns match Nigerian holiday calendar.
License CC0 1.0
Generation Script See R and Python code at end of this appendix

4. Nigerian Retail Store Data (Omnichannel)

Property Details
Chapters 11 (Network & Location Analytics), 22 (Social Networks)
Dataset Name retail_store_network.csv, store_transactions.csv
Rows Store network: 50 stores; Transactions: 2,500
Columns Network: 8; Transactions: 12
Source Synthetic — based on typical multi-branch retail chains in Nigeria
Description Network graph of 50 retail store locations across Nigeria with geographic coordinates, transaction volume, sales, operational metrics. Linked transaction dataset shows customer purchases across stores.
Key Variables store_id, store_name, latitude, longitude, state, region, monthly_sales_naira, monthly_customers, store_type, transaction_id, store_id, product_category, quantity, revenue_naira, transaction_date
Nigerian/African Relevance Store locations in major Nigerian cities (Lagos, Abuja, Kano, PH, Ibadan); product categories reflect FMCG retail.
License CC0 1.0

67.1.2 Part II: Advanced Methods (Chapters 12-25)

5. E-commerce Click-stream (Jumia-like Marketplace)

Property Details
Chapters 17 (Data Mining), 20 (Image Analytics), 38 (Web Analytics)
Dataset Name ecommerce_clickstream.csv
Rows 100,000 sessions
Columns 22 variables
Source Synthetic — inspired by African e-commerce platforms (Jumia, Konga, Jiji)
Description Click-level session data from online shopping. Each row represents a user session with browsing behaviour (pages visited, time spent, search terms, filters applied), products viewed, cart additions, checkout status, purchase outcome, device/browser info.
Key Variables session_id, user_id, session_date, pages_visited, time_on_site_minutes, search_term, product_category, product_price_naira, added_to_cart, checkout_initiated, purchase_completed, device_type, browser, traffic_source
Nigerian/African Relevance Products sold via Nigerian e-commerce; payment methods include card, bank transfer, cash-on-delivery; traffic sources include local social media.
License CC0 1.0

6. Customer Survey (FMCG Sector)

Property Details
Chapters 19 (Text Analytics), 35 (Customer Satisfaction), 36 (Brand Analytics)
Dataset Name customer_survey_responses.csv
Rows 5,000 respondents
Columns 28 variables
Source Synthetic — based on FMCG brand tracking studies in Nigeria
Description Customer satisfaction survey with Likert-scale questions on brand perception, product quality, price, availability, customer service, likelihood to recommend (NPS), and open-ended feedback on product improvements and purchase drivers.
Key Variables respondent_id, brand, product_category, age_group, income_level, region, q1_quality_rating, q2_value_rating, q3_availability_rating, q4_service_rating, nps_score, open_feedback_text, purchase_frequency, spend_monthly_naira
Nigerian/African Relevance Brands and product categories typical of Nigerian FMCG; responses in English reflecting Nigerian market norms.
License CC0 1.0

7. Call Centre Data (Telecom)

Property Details
Chapters 20 (Speech Analytics), 39 (Social Media & Sentiment), 46 (Employee Performance)
Dataset Name call_centre_logs.csv
Rows 10,000 calls
Columns 24 variables
Source Synthetic — modelled on Nigerian telecom customer service operations
Description Call centre transaction log including call metadata (date, time, duration, agent, customer), issue category (billing, technical, complaints, sales), customer demographics, agent performance, call sentiment/tonality classification, resolution status, repeat call flag.
Key Variables call_id, call_date, call_duration_minutes, agent_id, customer_id, issue_category, issue_subcategory, customer_satisfaction_rating, call_sentiment, call_resolution_achieved, repeat_contact_within_7days, agent_tenure_months, call_handle_time_minutes, hold_time_minutes
Nigerian/African Relevance Issue categories reflect telecom customer concerns in Nigeria; sentiment analysis includes Nigerian English variations.
License CC0 1.0

8. Agricultural Commodity Prices (Nigeria & East Africa)

Property Details
Chapters 9 (Time Series), 31 (Demand Forecasting)
Dataset Name agric_commodity_prices.csv
Rows 408 monthly observations (2010–2024 for 12 commodities)
Columns 15 variables
Source Publicly available from Nigerian Bureau of Statistics (NBS), FAO, and World Bank
Description Monthly prices of key agricultural commodities traded in Nigeria and East Africa: maize, rice, beans, cassava, palm oil, groundnuts, sorghum, cocoa, livestock feed, fertiliser. Includes wholesale prices, farmer prices, international parity prices, and volume traded.
Key Variables commodity_id, commodity_name, country, market_location, year_month, price_naira_per_kg, volume_traded_tons, wholesale_price, farmer_price, international_parity_price, seasonality_index
Nigerian/African Relevance Commodities and markets reflect West and East African agricultural economies; seasonality patterns authentic.
License Open Data Commons (PDDL) — public domain
Access NBS data portal, FAO GIEWS

9. Energy Consumption (PHCN/Distribution Companies)

Property Details
Chapters 9 (Time Series), 32 (Inventory Analytics), 44 (Supply Chain Optimization)
Dataset Name energy_consumption_daily.csv
Rows 2,555 daily readings (7 years); 2,000+ meters
Columns 14 variables
Source Synthetic — modelled on Nigerian distribution company (DISCO) network
Description Daily electricity consumption readings from 2,000+ meters across a Nigerian distribution network. Includes consumption volume (kWh), billing period, customer type (residential, commercial, industrial), meter status, loss estimates, weather conditions (temperature, cloud cover), and technical characteristics.
Key Variables meter_id, reading_date, consumption_kwh, billing_period, customer_type, meter_status, technical_losses_percent, commercial_losses_percent, temperature_celsius, peak_demand_hour, region, voltage_class
Nigerian/African Relevance Based on PHCN/DISCO operational structures; customer types and loss factors reflect Nigerian grid realities.
License CC0 1.0

67.1.3 Part III: Predictive Analytics (Chapters 26-36)

10. Insurance Claims Dataset

Property Details
Chapters 29 (Churn & Fraud), 36 (Churn Prediction), 41 (Fraud Detection)
Dataset Name insurance_claims.csv
Rows 12,000 claims
Columns 32 variables
Source Synthetic — based on typical Nigerian insurance claims patterns
Description Insurance claims database covering auto, health, and property claims. Includes policyholder demographics, risk factors, claim characteristics (date, type, amount, outcome), claim processing (days to settlement, fraud flags, investigator notes), policy history, previous claims.
Key Variables claim_id, policy_id, policyholder_age, policy_type, premium_naira, claim_date, claim_type, claim_amount_naira, days_to_settlement, fraud_flag, fraud_probability_score, claim_outcome, investigator_flagged, claim_history_count
Nigerian/African Relevance Policy types and claim amounts reflect Nigerian insurance market; fraud patterns based on local risk profiles.
License CC0 1.0

11. Manufacturing Defects (Quality Control)

Property Details
Chapters 41 (Quality Analytics & Six Sigma)
Dataset Name manufacturing_defects.csv
Rows 36 months × multiple plants; 5,000+ records
Columns 18 variables
Source Synthetic — inspired by Nigerian and African manufacturing operations
Description Manufacturing quality control data from 36 plants tracking defect rates, types (dimensional, material, surface finish, assembly), root causes, corrective actions, production volume, and process parameters (temperature, humidity, equipment age).
Key Variables plant_id, production_month, product_line, total_units_produced, defects_found, defect_rate_percent, defect_type, defect_cause, corrective_action, days_to_resolve, equipment_age_months, process_capability_index
Nigerian/African Relevance Plant locations in Nigeria; products typical of African manufacturing.
License CC0 1.0

12. Sales Performance (Multi-tier Distribution)

Property Details
Chapters 27 (Lead Scoring), 33 (Predictive Sales Analytics), 46 (Employee Performance)
Dataset Name sales_team_performance.csv
Rows 100 salespeople × 24 months; 2,400 records
Columns 28 variables
Source Synthetic — based on multi-tier sales distribution in Nigeria
Description Monthly sales performance data for 100 salespeople across territory tiers (A, B, C) tracking: quota, sales, units, customer acquisitions, pipeline value, close rate, average deal size, activity metrics (calls, meetings, proposals), territory characteristics, comp realised.
Key Variables salesperson_id, territory, manager_id, month_year, quota_naira, sales_naira, units_sold, customer_acquisitions, pipeline_naira, close_rate_percent, average_deal_size_naira, calls_made, meetings_held, proposals_sent, comp_naira
Nigerian/African Relevance Territory structures and sales comp reflect Nigerian sales organizations.
License CC0 1.0

13. Credit Bureau Data (Anonymised)

Property Details
Chapters 29 (Churn & Fraud), 48 (Financial Risk)
Dataset Name credit_bureau_sample.csv
Rows 4,000 borrowers
Columns 26 variables
Source Anonymised sample inspired by CBN credit reporting framework
Description Borrower credit history including: active credit facilities (loans, credit cards, lines), payment performance (payments on time, months in arrears, defaults), credit utilisation, inquiry frequency, credit mix, and personal identifiers (type, credit score).
Key Variables borrower_id, num_active_accounts, total_credit_limit_naira, total_outstanding_naira, credit_utilisation_percent, months_in_arrears, num_defaults, payment_performance_score, inquiry_count_6months, oldest_account_age_months, account_age_range, credit_score
Nigerian/African Relevance Based on CBN credit reporting standards and Nigerian lending practices.
License Restricted — public version with anonymisation

14. Hospitality Guest Reviews

Property Details
Chapters 19 (Text Analytics), 35 (Customer Satisfaction)
Dataset Name hotel_guest_reviews.csv
Rows 8,000 reviews
Columns 14 variables
Source Synthetic — modelled on reviews from Nigerian hospitality sector
Description Guest reviews from Nigerian hotels and hospitality establishments. Each review includes: guest demographics (nationality, repeat status), stay details (duration, room type, rate paid), detailed ratings (cleanliness, staff, food, value, WiFi, safety), likelihood to recommend, and open-ended review text.
Key Variables review_id, hotel_id, guest_nationality, repeat_guest_flag, stay_duration_nights, room_type, rating_overall, rating_cleanliness, rating_staff, rating_food, rating_value, nps_score, review_text, review_sentiment
Nigerian/African Relevance Hotels in major Nigerian cities; review sentiment includes African English patterns.
License CC0 1.0

67.1.4 Part IV: Optimisation & Applications (Chapters 37-50)

15. Digital Ad Spend & Attribution

Property Details
Chapters 37 (Marketing Mix Modelling)
Dataset Name digital_marketing_attribution.csv
Rows 24 months × multiple channels; 100+ records
Columns 24 variables
Source Synthetic — based on Nigerian digital advertising ecosystem
Description Monthly advertising spend and response data across channels: Google Search, Facebook, Instagram, TikTok, YouTube, Email. Includes spend amount, impressions, clicks, conversions, CPC, CPL, ROAS, channel mix, brand lift measurements, and sales outcome.
Key Variables month_year, channel, spend_naira, impressions, clicks, ctr_percent, conversions, cpl_naira, roas_ratio, brand_lift_percent, awareness_lift, total_sales_naira, attributed_sales_naira
Nigerian/African Relevance Channels reflect Nigerian digital advertising landscape; spend amounts typical of local markets.
License CC0 1.0

16. Inventory at Regional Warehouses

Property Details
Chapters 32 (Inventory Analytics), 45 (Demand-Supply Planning)
Dataset Name warehouse_inventory_daily.csv
Rows 5 warehouses × 365 days; 1,825 records
Columns 20 variables
Source Synthetic — modelled on Nigerian FMCG distribution centres
Description Daily inventory levels for 5 regional distribution centres (North, South, East, West, Lagos) tracking: stock on hand by SKU, inbound/outbound volumes, stockouts, safety stock levels, warehouse utilisation, shrinkage, and demand forecast accuracy.
Key Variables warehouse_id, sku_id, date, opening_stock_units, inbound_units, outbound_units, closing_stock_units, reorder_point_units, safety_stock_units, days_supply_on_hand, stockout_flag, shrinkage_units, warehouse_utilisation_percent
Nigerian/African Relevance Regional warehouse structure typical of Nigerian FMCG distribution.
License CC0 1.0

17. Employee Records (HR)

Property Details
Chapters 46 (Employee Analytics), 47 (Attrition Prediction)
Dataset Name employee_records.csv
Rows 1,200 employees
Columns 36 variables
Source Synthetic — modelled on Nigerian corporate HR systems
Description Employee master data including: demographics, hire/separation dates, departments, positions, compensation (salary, bonuses, benefits), performance ratings (5 years), training records, promotion history, engagement survey responses, exit interview data for separated employees.
Key Variables employee_id, hire_date, separation_date, department, job_title, manager_id, salary_naira, bonus_naira, benefits_value_naira, base_location, years_tenure, performance_rating_avg, promotion_count, training_hours_annual, engagement_score, separation_reason, regrettable_separation_flag
Nigerian/African Relevance Department and role structures typical of Nigerian organisations; compensation norms reflect local market.
License Restricted — anonymised version only

18. Supply Chain Routing & Optimisation

Property Details
Chapters 44 (Supply Chain Optimization)
Dataset Name supply_chain_routes.csv
Rows 500 delivery zones; 5,000 historical routes
Columns 22 variables
Source Synthetic — based on Nigerian last-mile logistics
Description Distribution route planning data including: origin/destination locations, distance, time, vehicle type, fuel cost, toll costs, urban/rural classification, congestion patterns, actual vs. planned time, on-time delivery flag, and customer satisfaction.
Key Variables route_id, origin_location, destination_location, distance_km, planned_duration_hours, actual_duration_hours, vehicle_type, fuel_cost_naira, toll_cost_naira, traffic_condition, on_time_delivery_flag, urban_rural_classification, delivery_date
Nigerian/African Relevance Routes across major Nigerian cities; traffic and fuel costs authentic to Nigerian logistics.
License CC0 1.0

19. Stock Returns (Nigeria Stock Exchange)

Property Details
Chapters 48 (Financial Risk Analytics)
Dataset Name nse_stock_returns.csv
Rows 504 daily observations (2 years) for select equities
Columns 12 variables
Source Public data from Nigerian Stock Exchange (NSE)
Description Daily stock price data for selected NSE-listed companies covering: opening/closing prices, high/low, volume traded, returns, volatility, and market-wide indices (All-Share Index, sector indices).
Key Variables date, symbol, open_price_naira, close_price_naira, high_price_naira, low_price_naira, volume_traded, daily_return_percent, log_return, market_cap_naira, pe_ratio, sector
Nigerian/African Relevance Real NSE equity data; sector classification based on Nigerian market.
License Open Data — publicly available
Access NSE Market Data

67.2 Comprehensive Data Generation Scripts

67.2.1 R: Generate All Synthetic Datasets

Save as generate_data.R:

library(tidyverse)
library(lubridate)

# Set random seed for reproducibility
set.seed(42)

# ===== 1. NIGERIAN BANK LOAN PORTFOLIO =====
generate_loan_data <- function(n = 8000) {
  tibble(
    loan_id = 1:n,
    borrower_age = sample(25:65, n, replace = TRUE),
    annual_income_naira = rnorm(n, mean = 3e6, sd = 1.5e6) |> pmax(500000),
    loan_amount_naira = rnorm(n, mean = 5e6, sd = 2e6) |> pmax(100000),
    loan_term_months = sample(c(12, 24, 36, 48, 60), n, replace = TRUE),
    interest_rate = rnorm(n, mean = 12, sd = 3) |> pmax(5),
    employment_sector = sample(
      c("Manufacturing", "Services", "Agriculture", "Oil & Gas", "Finance"),
      n, replace = TRUE
    ),
    credit_score = rnorm(n, mean = 650, sd = 80) |> pmax(300) |> pmin(850),
    default_flag = rbinom(n, 1, 0.05),
    months_delinquent = rpois(n, lambda = ifelse(rbinom(n, 1, 0.1) == 1, 3, 0)),
    collateral_type = sample(
      c("Real Estate", "Vehicle", "Securities", "Personal Guarantee", "None"),
      n, replace = TRUE
    ),
    collateral_value_naira = rnorm(n, mean = 4e6, sd = 2e6) |> pmax(0)
  )
}

# ===== 2. NIGERIAN HOUSEHOLD EXPENDITURE SURVEY =====
generate_household_survey <- function(n = 3000) {
  tibble(
    household_id = 1:n,
    zone = sample(
      c("North Central", "North East", "North West", "South East", "South South", "South West"),
      n, replace = TRUE
    ),
    urban_rural = sample(c("Urban", "Rural"), n, replace = TRUE, prob = c(0.6, 0.4)),
    household_size = rpois(n, lambda = 5) |> pmax(1),
    head_age = sample(25:75, n, replace = TRUE),
    head_education = sample(
      c("No Formal", "Primary", "Secondary", "Tertiary"),
      n, replace = TRUE, prob = c(0.15, 0.25, 0.35, 0.25)
    ),
    monthly_income_naira = rnorm(n, mean = 150000, sd = 80000) |> pmax(10000),
    food_spending = rnorm(n, mean = 50000, sd = 20000) |> pmax(0),
    housing_spending = rnorm(n, mean = 20000, sd = 15000) |> pmax(0),
    utilities_spending = rnorm(n, mean = 10000, sd = 5000) |> pmax(0),
    education_spending = rnorm(n, mean = 15000, sd = 12000) |> pmax(0),
    health_spending = rnorm(n, mean = 8000, sd = 5000) |> pmax(0),
    transport_spending = rnorm(n, mean = 12000, sd = 8000) |> pmax(0),
    entertainment_spending = rnorm(n, mean = 5000, sd = 4000) |> pmax(0)
  )
}

# ===== 3. MOBILE MONEY TRANSACTIONS =====
generate_mobile_money <- function(n = 50000) {
  tibble(
    transaction_id = 1:n,
    customer_id = sample(1:5000, n, replace = TRUE),
    transaction_date = sample(seq(ymd("2023-01-01"), ymd("2023-12-31"), by = "day"), n, replace = TRUE),
    transaction_type = sample(c("Airtime", "Data", "Transfer", "Utility Payment"), n, replace = TRUE),
    amount_naira = rnorm(n, mean = 2500, sd = 2000) |> pmax(100),
    channel = sample(c("USSD", "Mobile App", "Agent", "Retailer"), n, replace = TRUE),
    operator = sample(c("MTN", "Airtel", "Glo", "9mobile"), n, replace = TRUE),
    region = sample(
      c("North Central", "North East", "North West", "South East", "South South", "South West"),
      n, replace = TRUE
    ),
    customer_tenure_months = sample(1:60, n, replace = TRUE),
    transaction_hour = sample(0:23, n, replace = TRUE),
    day_of_week = wday(transaction_date, label = TRUE)
  )
}

# ===== GENERATE AND SAVE ALL DATASETS =====
loans <- generate_loan_data()
households <- generate_household_survey()
mobile_money <- generate_mobile_money()

write_csv(loans, "nigerian_bank_loans.csv")
write_csv(households, "nbs_household_survey.csv")
write_csv(mobile_money, "mobile_money_transactions.csv")

cat("All datasets generated successfully!\n")
cat("  - nigerian_bank_loans.csv (", nrow(loans), " rows)\n", sep = "")
cat("  - nbs_household_survey.csv (", nrow(households), " rows)\n", sep = "")
cat("  - mobile_money_transactions.csv (", nrow(mobile_money), " rows)\n", sep = "")

67.2.2 Python: Generate All Synthetic Datasets

Save as generate_data.py:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Set random seed for reproducibility
np.random.seed(42)

# ===== 1. NIGERIAN BANK LOAN PORTFOLIO =====
def generate_loan_data(n=8000):
    return pd.DataFrame({
        'loan_id': range(1, n+1),
        'borrower_age': np.random.randint(25, 65, n),
        'annual_income_naira': np.random.normal(3e6, 1.5e6, n).clip(500000),
        'loan_amount_naira': np.random.normal(5e6, 2e6, n).clip(100000),
        'loan_term_months': np.random.choice([12, 24, 36, 48, 60], n),
        'interest_rate': np.random.normal(12, 3, n).clip(5),
        'employment_sector': np.random.choice(
            ['Manufacturing', 'Services', 'Agriculture', 'Oil & Gas', 'Finance'], n
        ),
        'credit_score': np.random.normal(650, 80, n).clip(300, 850).astype(int),
        'default_flag': np.random.binomial(1, 0.05, n),
        'months_delinquent': [np.random.poisson(3) if np.random.rand() < 0.1 else 0 for _ in range(n)],
        'collateral_type': np.random.choice(
            ['Real Estate', 'Vehicle', 'Securities', 'Personal Guarantee', 'None'], n
        ),
        'collateral_value_naira': np.random.normal(4e6, 2e6, n).clip(0)
    })

# ===== 2. NIGERIAN HOUSEHOLD EXPENDITURE SURVEY =====
def generate_household_survey(n=3000):
    return pd.DataFrame({
        'household_id': range(1, n+1),
        'zone': np.random.choice(
            ['North Central', 'North East', 'North West', 'South East', 'South South', 'South West'], n
        ),
        'urban_rural': np.random.choice(['Urban', 'Rural'], n, p=[0.6, 0.4]),
        'household_size': np.random.poisson(5, n).clip(1),
        'head_age': np.random.randint(25, 75, n),
        'head_education': np.random.choice(
            ['No Formal', 'Primary', 'Secondary', 'Tertiary'], n, p=[0.15, 0.25, 0.35, 0.25]
        ),
        'monthly_income_naira': np.random.normal(150000, 80000, n).clip(10000),
        'food_spending': np.random.normal(50000, 20000, n).clip(0),
        'housing_spending': np.random.normal(20000, 15000, n).clip(0),
        'utilities_spending': np.random.normal(10000, 5000, n).clip(0),
        'education_spending': np.random.normal(15000, 12000, n).clip(0),
        'health_spending': np.random.normal(8000, 5000, n).clip(0),
        'transport_spending': np.random.normal(12000, 8000, n).clip(0),
        'entertainment_spending': np.random.normal(5000, 4000, n).clip(0)
    })

# ===== 3. MOBILE MONEY TRANSACTIONS =====
def generate_mobile_money(n=50000):
    start_date = datetime(2023, 1, 1)
    dates = [start_date + timedelta(days=np.random.randint(0, 365)) for _ in range(n)]

    return pd.DataFrame({
        'transaction_id': range(1, n+1),
        'customer_id': np.random.randint(1, 5001, n),
        'transaction_date': dates,
        'transaction_type': np.random.choice(['Airtime', 'Data', 'Transfer', 'Utility Payment'], n),
        'amount_naira': np.random.normal(2500, 2000, n).clip(100),
        'channel': np.random.choice(['USSD', 'Mobile App', 'Agent', 'Retailer'], n),
        'operator': np.random.choice(['MTN', 'Airtel', 'Glo', '9mobile'], n),
        'region': np.random.choice(
            ['North Central', 'North East', 'North West', 'South East', 'South South', 'South West'], n
        ),
        'customer_tenure_months': np.random.randint(1, 61, n),
        'transaction_hour': np.random.randint(0, 24, n),
        'day_of_week': [d.strftime('%A') for d in dates]
    })

# ===== GENERATE AND SAVE ALL DATASETS =====
loans = generate_loan_data()
households = generate_household_survey()
mobile_money = generate_mobile_money()

loans.to_csv('nigerian_bank_loans.csv', index=False)
households.to_csv('nbs_household_survey.csv', index=False)
mobile_money.to_csv('mobile_money_transactions.csv', index=False)

print("All datasets generated successfully!")
print(f"  - nigerian_bank_loans.csv ({len(loans)} rows)")
print(f"  - nbs_household_survey.csv ({len(households)} rows)")
print(f"  - mobile_money_transactions.csv ({len(mobile_money)} rows)")

67.3 How to Access Datasets

  1. Synthetic Datasets: Run the R or Python generation scripts above
  2. Public Datasets:
  3. Book Repository: All datasets are available in the book’s GitHub repository under /data

All datasets are provided as-is for educational purposes. See individual dataset entries for licensing and attribution requirements.