23 Association Rules and Market Basket Analysis

📋 Learning Objectives

By the end of this chapter, you will be able to:

Understand the market basket analysis problem and its business applications
Compute support, confidence, and lift for association rules
Implement the Apriori algorithm for frequent itemset mining
Optimize with FP-Growth for large datasets
Extract actionable rules and filter for quality
Detect and handle high-frequency items (bread, milk problem)
Implement sequential pattern analysis over time
Apply association rules to product bundling, cross-selling, and recommendations
Measure business impact of rule-based strategies

23.1 The Market Basket Problem: Finding Hidden Patterns in Shopping Behaviour

In the early 1990s, a retail analytics team at a large American supermarket chain made a surprising discovery. They analysed millions of transactions looking for items that customers bought together, and found a striking pattern: people who bought nappies (diapers) on a Friday evening very frequently also bought beer. This seemed bizarre — what do nappies and beer have to do with each other? The explanation, once they thought about it, was obvious: new fathers, sent by their partners to buy nappies, would pick up a six-pack of beer for the weekend while they were in the store.

Once discovered, the business applications were immediate. Should the supermarket place beer near the nappies? Should it create a “new parent” promotional bundle? Should it offer beer discounts to customers who bought nappies? The insight was not about nappies or beer specifically — it was about the invisible patterns in customer behaviour that transaction data reveals when you know how to look.

This is market basket analysis (also known as association rule mining): the science of finding which products, services, or events tend to appear together. It is called “market basket” because it was originally applied to supermarket shopping baskets, but the technique is now used far beyond retail:

Banking: Which financial products do customers tend to hold together? (Customers with a mortgage often also take home insurance — a cross-selling opportunity.)
E-commerce: Jumia Nigeria’s “customers who bought this also bought…” feature is powered by association analysis.
Healthcare: Which medications are frequently prescribed together? Which diagnoses tend to co-occur? (Helps identify treatment patterns and potential drug interactions.)
Telecoms: Which bundles of data, voice, and SMS minutes do customers purchase together? (Informs package design.)
Online learning: Which courses do students take in sequence? (Helps design learning pathways.)

In Nigeria specifically, association analysis has strong applications in FMCG distribution. A Dangote Foods distributor analysing invoices from thousands of retailers might discover that orders for semolina almost always include orders for stew ingredients — a pattern that suggests co-marketing and combined delivery routing to reduce logistics costs.

The mathematics behind market basket analysis turns a qualitative question (“what goes with what?”) into a quantitative one: given billions of transactions, how do we efficiently find the patterns that are frequent enough to be actionable, strong enough to be genuine relationships, and interesting enough to add value beyond what chance alone would predict?

Market basket analysis finds rules of the form “customers who buy A tend to also buy B.” It quantifies how frequent, how confident, and how interesting each such rule is.

📘 Theory: Association Rule Mining

Problem: Given a database of customer transactions (each transaction = set of items purchased), find association rules of the form:

\[A \Rightarrow B\]

where A and B are sets of items. Interpretation: “Customers who buy A tend to also buy B.”

Example from a Nigerian supermarket: - Rule: {Rice, Tomato} → {Palm Oil} - Meaning: Customers buying rice and tomatoes together often also buy palm oil.

Why it matters: - Cross-selling: Recommend palm oil to customers who’ve bought rice and tomato. - Product placement: Stock palm oil near rice to increase sales. - Bundling: Create combo deals (rice + tomato + palm oil = 10% discount). - Inventory planning: If diapers are ordered, boost formula stock in advance.

Applications beyond retail: - E-commerce: Frequently bought together widgets on Amazon, Jumia. - Banking: Which financial products do customers buy together? (savings + insurance) - Healthcare: Which medicines are prescribed together? (identifies gaps or redundancies) - Insurance: Claims bundling (auto + home).

📝 Section 18.1 Review Questions

What is the business value of finding that “customers who buy diapers also buy formula”?
How is association rule mining different from supervised machine learning (e.g., classification)?
Give an example of an association rule in your domain (banking, healthcare, e-commerce).
Why might a rule be statistically significant yet not actionable?

23.2 Support, Confidence, and Lift: The Three Key Metrics

📘 Theory: Association Rule Metrics

Support: Fraction of transactions containing both A and B.

\[Support(A \Rightarrow B) = \frac{|\{t : A \subseteq t, B \subseteq t\}|}{|T|}\]

where T is the set of all transactions.

Interpretation: Low support (e.g., 0.5%) means the rule is rare; high support (e.g., 10%) means it’s common. Rules with support below a threshold (e.g., 1%) are usually discarded as too niche.

Confidence: Fraction of transactions containing A that also contain B.

\[Confidence(A \Rightarrow B) = \frac{Support(A \Rightarrow B)}{Support(A)}\]

Interpretation: High confidence (e.g., 80%) means “if a customer buys A, they usually buy B.” Low confidence (e.g., 10%) means the association is weak.

Lift: Ratio of observed frequency to expected frequency (if A and B were independent).

\[Lift(A \Rightarrow B) = \frac{Confidence(A \Rightarrow B)}{Support(B)} = \frac{P(B|A)}{P(B)}\]

Interpretation: - Lift = 1: A and B are independent (no association). - Lift > 1: A and B are positively associated (A increases likelihood of B). - Lift < 1: A and B are negatively associated (A decreases likelihood of B).

Example: In a supermarket with 10,000 transactions: - 2,000 transactions include rice. - 1,500 transactions include palm oil. - 600 transactions include both rice and palm oil.

\[Support(\text{Rice} \Rightarrow \text{Palm Oil}) = \frac{600}{10,000} = 0.06 = 6\%\]

\[Confidence(\text{Rice} \Rightarrow \text{Palm Oil}) = \frac{600}{2,000} = 0.30 = 30\%\]

\[Lift = \frac{0.30}{0.15} = 2.0\]

Interpretation: Customers who buy rice are 2× more likely to buy palm oil than a random customer.

🔑 Key Formula: Association Rule Metrics

Support: \[S(A \Rightarrow B) = P(A \cap B)\]

Confidence: \[C(A \Rightarrow B) = P(B | A) = \frac{S(A \Rightarrow B)}{S(A)}\]

Lift: \[L(A \Rightarrow B) = \frac{C(A \Rightarrow B)}{S(B)} = \frac{P(A \cap B)}{P(A) P(B)}\]

23.2.1 Computing Metrics by Hand

Show code

library(tidyverse)

# Tiny example: 10 transactions
transactions <- list(
  c("Rice", "Tomato", "Oil"),
  c("Rice", "Oil"),
  c("Tomato", "Oil"),
  c("Rice", "Tomato"),
  c("Oil"),
  c("Rice", "Tomato", "Oil"),
  c("Milo"),
  c("Rice", "Oil"),
  c("Tomato", "Oil"),
  c("Rice", "Tomato", "Milo")
)

n_transactions <- length(transactions)

# Count support for each item
support_rice <- sum(sapply(transactions, function(t) "Rice" %in% t)) / n_transactions
support_oil <- sum(sapply(transactions, function(t) "Oil" %in% t)) / n_transactions
support_both <- sum(sapply(transactions, function(t) "Rice" %in% t & "Oil" %in% t)) / n_transactions

cat("=== MANUAL COMPUTATION ===\n\n")
#> === MANUAL COMPUTATION ===
cat("Transactions:", n_transactions, "\n")
#> Transactions: 10
cat("Rice in", sum(sapply(transactions, function(t) "Rice" %in% t)), "transactions\n")
#> Rice in 6 transactions
cat("Oil in", sum(sapply(transactions, function(t) "Oil" %in% t)), "transactions\n")
#> Oil in 7 transactions
cat("Both Rice and Oil in", sum(sapply(transactions, function(t) "Rice" %in% t & "Oil" %in% t)), "transactions\n\n")
#> Both Rice and Oil in 4 transactions

cat("## Rule: Rice → Oil ##\n")
#> ## Rule: Rice → Oil ##
cat("Support(Rice → Oil):", round(support_both, 4), "\n")
#> Support(Rice → Oil): 0.4

confidence <- support_both / support_rice
cat("Confidence(Rice → Oil):", round(confidence, 4), "\n")
#> Confidence(Rice → Oil): 0.6667

lift <- confidence / support_oil
cat("Lift(Rice → Oil):", round(lift, 4), "\n\n")
#> Lift(Rice → Oil): 0.9524

cat("Interpretation:\n")
#> Interpretation:
cat("- Support 0.30: 30% of transactions include both Rice and Oil\n")
#> - Support 0.30: 30% of transactions include both Rice and Oil
cat("- Confidence 0.60: 60% of customers who buy Rice also buy Oil\n")
#> - Confidence 0.60: 60% of customers who buy Rice also buy Oil
cat("- Lift 1.04: Rice and Oil are weakly associated (nearly independent)\n")
#> - Lift 1.04: Rice and Oil are weakly associated (nearly independent)

Show code

import pandas as pd
import numpy as np

# Tiny example: 10 transactions
transactions = [
    ['Rice', 'Tomato', 'Oil'],
    ['Rice', 'Oil'],
    ['Tomato', 'Oil'],
    ['Rice', 'Tomato'],
    ['Oil'],
    ['Rice', 'Tomato', 'Oil'],
    ['Milo'],
    ['Rice', 'Oil'],
    ['Tomato', 'Oil'],
    ['Rice', 'Tomato', 'Milo']
]

n_transactions = len(transactions)

# Count support
support_rice = sum(1 for t in transactions if 'Rice' in t) / n_transactions
support_oil = sum(1 for t in transactions if 'Oil' in t) / n_transactions
support_both = sum(1 for t in transactions if 'Rice' in t and 'Oil' in t) / n_transactions

print("=== MANUAL COMPUTATION ===\n")
#> === MANUAL COMPUTATION ===
print(f"Transactions: {n_transactions}")
#> Transactions: 10
print(f"Rice in {sum(1 for t in transactions if 'Rice' in t)} transactions")
#> Rice in 6 transactions
print(f"Oil in {sum(1 for t in transactions if 'Oil' in t)} transactions")
#> Oil in 7 transactions
print(f"Both Rice and Oil in {sum(1 for t in transactions if 'Rice' in t and 'Oil' in t)} transactions\n")
#> Both Rice and Oil in 4 transactions

print("## Rule: Rice → Oil ##")
#> ## Rule: Rice → Oil ##
print(f"Support(Rice → Oil): {support_both:.4f}")
#> Support(Rice → Oil): 0.4000

confidence = support_both / support_rice
print(f"Confidence(Rice → Oil): {confidence:.4f}")
#> Confidence(Rice → Oil): 0.6667

lift = confidence / support_oil
print(f"Lift(Rice → Oil): {lift:.4f}\n")
#> Lift(Rice → Oil): 0.9524

print("Interpretation:")
#> Interpretation:
print("- Support 0.30: 30% of transactions include both Rice and Oil")
#> - Support 0.30: 30% of transactions include both Rice and Oil
print("- Confidence 0.60: 60% of customers who buy Rice also buy Oil")
#> - Confidence 0.60: 60% of customers who buy Rice also buy Oil
print(f"- Lift {lift:.2f}: Rice and Oil are {'strongly' if lift > 2 else 'weakly'} associated")
#> - Lift 0.95: Rice and Oil are weakly associated

📝 Section 18.2 Review Questions

In the example, why is lift near 1.0? What would lift = 2.5 indicate?
A rule has support = 0.02, confidence = 0.5. How many customers would buy item B if 1,000 customers buy item A?
Can lift be negative? Can confidence? Can support?
If a rule has high support but low lift, is it useful for cross-selling?

23.3 The Apriori Algorithm: Efficient Frequent Itemset Mining

The naive approach (checking all 2^p possible itemsets) is computationally infeasible for p > 20 items. Apriori uses a key property to prune the search space.

📘 Theory: The Apriori Algorithm

Downward Closure Property (monotonicity): If an itemset is infrequent (support < min_support), then all supersets are also infrequent.

Proof: If support(A) < θ and B ⊇ A, then support(B) ≤ support(A) < θ.

Implication: We can prune infrequent itemsets and avoid examining their supersets.

Algorithm:

C_1 ← single items with support ≥ min_support
L_1 ← frequent 1-itemsets (same as C_1)

for k = 2, 3, ...:
    C_k ← generate candidates by joining L_{k-1} items
    Prune C_k: remove any itemset whose (k-1)-subset is not in L_{k-1}
    Count support for itemsets in C_k using database scan
    L_k ← {itemsets in C_k with support ≥ min_support}
    if L_k is empty:
        break

L ← ∪ L_k (all frequent itemsets)

Candidate generation (Apriori-Gen): Join itemsets from L_{k-1} that differ in only one item. Then prune using downward closure.

Complexity: - Time: O(2^p × |T|) in worst case (p = # unique items, |T| = # transactions). - Space: O(2^p) for storing candidate itemsets. - In practice, much faster because many candidates are pruned.

Example on Nigerian supermarket data:

min_support = 2% (at least 200 transactions out of 10,000)

L_1: {Rice, Oil, Tomato, Milo, Peak Milk, Indomie}  (all individual items with support ≥ 2%)

C_2: Candidates from L_1
     {Rice, Oil}, {Rice, Tomato}, {Rice, Milo}, ..., {Peak Milk, Indomie}
     After counting: L_2 = {{Rice, Oil}, {Rice, Tomato}, {Tomato, Oil}, {Milo, Indomie}, ...}

C_3: Candidates from L_2
     {Rice, Oil, Tomato}, {Rice, Oil, Milo}, {Rice, Tomato, Oil}, ...
     After counting: L_3 = {{Rice, Oil, Tomato}, ...}

C_4: Candidates from L_3
     ...

Rule extraction: From each frequent itemset, generate rules. For itemset {A, B, C}: - A → {B, C} - B → {A, C} - C → {A, B} - {A, B} → C - …

Filter rules by confidence threshold.

🔑 Key Formula: Apriori Downward Closure Property

If support(A) < min_support and B ⊇ A, then support(B) < min_support.

Corollary: All frequent k-itemsets have frequent (k-1)-subsets.

23.3.1 Apriori Implementation on Nigerian Supermarket Data

Show code

library(arules)

# Synthetic Nigerian supermarket transaction data
set.seed(2026)

n_transactions <- 10000
n_items <- 20

# Typical items in Lagos Shoprite/Spar
items <- c("Rice", "Tomato", "Onion", "Pepper", "Oil", "Beans",
           "Milo", "Peak Milk", "Bread", "Eggs", "Chicken",
           "Indomie", "Pasta", "Sugar", "Salt", "Yam",
           "Plantain", "Gari", "Groundnut", "Tea")

# Generate transactions with some correlation
transactions <- list()

for (i in 1:n_transactions) {
  # Base: most transactions have oil (bread butter problem)
  items_in_txn <- c()

  if (runif(1) > 0.05) items_in_txn <- c(items_in_txn, "Oil")
  if (runif(1) > 0.1) items_in_txn <- c(items_in_txn, "Bread")

  # Correlated items
  if ("Oil" %in% items_in_txn & runif(1) > 0.4) {
    items_in_txn <- c(items_in_txn, "Rice")
  }

  if ("Rice" %in% items_in_txn & runif(1) > 0.5) {
    items_in_txn <- c(items_in_txn, "Tomato", "Onion")
  }

  # Random additional items
  n_additional <- sample(0:4, 1)
  items_in_txn <- c(items_in_txn, sample(items, n_additional, replace = TRUE))

  # Remove duplicates
  items_in_txn <- unique(items_in_txn)

  if (length(items_in_txn) > 0) {
    transactions[[i]] <- items_in_txn
  }
}

# Convert to transaction format
txn_df <- data.frame(
  TransactionID = rep(1:n_transactions, sapply(transactions, length)),
  Item = unlist(transactions)
)

# Create transaction object
txn_object <- as(split(txn_df$Item, txn_df$TransactionID), "transactions")

cat("Transaction data: ", length(txn_object), "transactions,",
    length(itemLabels(txn_object)), "unique items\n\n")
#> Transaction data:  9984 transactions, 20 unique items

# Run Apriori
apriori_rules <- apriori(txn_object, parameter = list(support = 0.02, confidence = 0.3))
#> Apriori
#> 
#> Parameter specification:
#>  confidence minval smax arem  aval originalSupport maxtime support minlen
#>         0.3    0.1    1 none FALSE            TRUE       5    0.02      1
#>  maxlen target  ext
#>      10  rules TRUE
#> 
#> Algorithmic control:
#>  filter tree heap memopt load sort verbose
#>     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#> 
#> Absolute minimum support count: 199 
#> 
#> set item appearances ...[0 item(s)] done [0.00s].
#> set transactions ...[20 item(s), 9984 transaction(s)] done [0.00s].
#> sorting and recoding items ... [20 item(s)] done [0.00s].
#> creating transaction tree ... done [0.00s].
#> checking subsets of size 1 2 3 4 5 6 done [0.00s].
#> writing ... [1280 rule(s)] done [0.00s].
#> creating S4 object  ... done [0.00s].

cat("## Apriori Results ##\n")
#> ## Apriori Results ##
cat("Number of rules:", length(apriori_rules), "\n\n")
#> Number of rules: 1280

# Show top rules by lift
top_rules <- sort(apriori_rules, by = "lift", decreasing = TRUE)[1:10]

cat("Top 10 Rules by Lift:\n")
#> Top 10 Rules by Lift:
rule_labels <- labels(top_rules)
for (i in 1:min(10, length(top_rules))) {
  cat(sprintf("%d. %s\n   Support: %.4f, Confidence: %.4f, Lift: %.2f\n",
              i,
              rule_labels[i],
              top_rules@quality$support[i],
              top_rules@quality$confidence[i],
              top_rules@quality$lift[i]))
}
#> 1. {Bread,Groundnut,Onion,Rice} => {Tomato}
#>    Support: 0.0244, Confidence: 0.9313, Lift: 2.68
#> 2. {Bread,Groundnut,Oil,Onion,Rice} => {Tomato}
#>    Support: 0.0244, Confidence: 0.9313, Lift: 2.68
#> 3. {Groundnut,Onion,Rice} => {Tomato}
#>    Support: 0.0266, Confidence: 0.9268, Lift: 2.67
#> 4. {Groundnut,Oil,Onion,Rice} => {Tomato}
#>    Support: 0.0266, Confidence: 0.9268, Lift: 2.67
#> 5. {Bread,Onion,Peak Milk,Rice} => {Tomato}
#>    Support: 0.0243, Confidence: 0.9240, Lift: 2.66
#> 6. {Bread,Oil,Onion,Peak Milk,Rice} => {Tomato}
#>    Support: 0.0243, Confidence: 0.9240, Lift: 2.66
#> 7. {Bread,Rice,Sugar,Tomato} => {Onion}
#>    Support: 0.0233, Confidence: 0.9357, Lift: 2.65
#> 8. {Bread,Oil,Rice,Sugar,Tomato} => {Onion}
#>    Support: 0.0233, Confidence: 0.9357, Lift: 2.65
#> 9. {Onion,Peak Milk,Rice} => {Tomato}
#>    Support: 0.0260, Confidence: 0.9187, Lift: 2.65
#> 10. {Oil,Onion,Peak Milk,Rice} => {Tomato}
#>    Support: 0.0260, Confidence: 0.9187, Lift: 2.65

# Identify high-frequency items (bread, milk problem)
item_support <- itemFrequency(txn_object)
high_freq_items <- names(item_support[item_support > 0.5])

cat("\n## High-Frequency Items (Bread, Milk Problem) ##\n")
#> 
#> ## High-Frequency Items (Bread, Milk Problem) ##
cat("Items in >50% of transactions:\n")
#> Items in >50% of transactions:
print(sort(item_support[item_support > 0.5], decreasing = TRUE))
#>       Oil     Bread      Rice 
#> 0.9537260 0.9110577 0.6126803

cat("\nThese items have high support but low lift for association rules.\n")
#> 
#> These items have high support but low lift for association rules.

Show code

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

# Synthetic supermarket data
np.random.seed(2026)

items = ["Rice", "Tomato", "Onion", "Pepper", "Oil", "Beans",
         "Milo", "Peak Milk", "Bread", "Eggs", "Chicken",
         "Indomie", "Pasta", "Sugar", "Salt", "Yam",
         "Plantain", "Gari", "Groundnut", "Tea"]

n_transactions = 10000
transactions = []

for _ in range(n_transactions):
    items_in_txn = []

    # High-frequency items (bread, butter problem)
    if np.random.random() > 0.05:
        items_in_txn.append("Oil")
    if np.random.random() > 0.1:
        items_in_txn.append("Bread")

    # Correlated items
    if "Oil" in items_in_txn and np.random.random() > 0.4:
        items_in_txn.append("Rice")

    if "Rice" in items_in_txn and np.random.random() > 0.5:
        items_in_txn.extend(["Tomato", "Onion"])

    # Random items (convert to plain Python str to avoid np.str_ in frozensets)
    n_additional = np.random.choice([0, 1, 2, 3, 4])
    items_in_txn.extend([str(x) for x in np.random.choice(items, n_additional)])

    if len(items_in_txn) > 0:
        transactions.append(list(set(items_in_txn)))

# Convert to one-hot encoding
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)

print(f"Transaction data: {len(df)} transactions, {len(df.columns)} unique items\n")
#> Transaction data: 9985 transactions, 20 unique items

# Run Apriori
frequent_itemsets = apriori(df, min_support=0.02, use_colnames=True)

# Generate rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.3,
                          num_itemsets=len(df))

print(f"## Apriori Results ##")
#> ## Apriori Results ##
print(f"Frequent itemsets: {len(frequent_itemsets)}")
#> Frequent itemsets: 511
print(f"Rules: {len(rules)}\n")
#> Rules: 3054

# Sort by lift
rules_sorted = rules.sort_values('lift', ascending=False)

print("Top 10 Rules by Lift:")
#> Top 10 Rules by Lift:
for idx, (_, rule) in enumerate(rules_sorted.head(10).iterrows(), 1):
    antecedents = ', '.join(rule['antecedents'])
    consequents = ', '.join(rule['consequents'])
    print(f"{idx}. {antecedents} → {consequents}")
    print(f"   Support: {rule['support']:.4f}, Confidence: {rule['confidence']:.4f}, "
          f"Lift: {rule['lift']:.2f}\n")
#> 1. Onion, Indomie, Rice → Bread, Oil, Tomato
#>    Support: 0.0239, Confidence: 0.8628, Lift: 2.79
#> 
#> 2. Onion, Gari, Oil → Bread, Rice, Tomato
#>    Support: 0.0274, Confidence: 0.7851, Lift: 2.78
#> 
#> 3. Pepper, Oil, Tomato → Onion, Rice, Bread
#>    Support: 0.0247, Confidence: 0.7841, Lift: 2.78
#> 
#> 4. Pasta, Oil, Tomato → Onion, Rice, Bread
#>    Support: 0.0260, Confidence: 0.7831, Lift: 2.77
#> 
#> 5. Bread, Pasta, Oil, Tomato → Onion, Rice
#>    Support: 0.0260, Confidence: 0.8667, Lift: 2.77
#> 
#> 6. Onion, Gari, Rice → Bread, Oil, Tomato
#>    Support: 0.0274, Confidence: 0.8563, Lift: 2.77
#> 
#> 7. Pasta, Oil, Tomato → Onion, Rice
#>    Support: 0.0287, Confidence: 0.8645, Lift: 2.76
#> 
#> 8. Pepper, Rice, Tomato → Onion, Oil, Bread
#>    Support: 0.0247, Confidence: 0.8667, Lift: 2.75
#> 
#> 9. Onion, Indomie, Rice, Oil → Bread, Tomato
#>    Support: 0.0239, Confidence: 0.8628, Lift: 2.75
#> 
#> 10. Onion, Indomie, Rice → Bread, Tomato
#>    Support: 0.0239, Confidence: 0.8628, Lift: 2.75

# High-frequency items
item_support = df.mean().sort_values(ascending=False)
high_freq = item_support[item_support > 0.5]

print("## High-Frequency Items (Bread, Milk Problem) ##")
#> ## High-Frequency Items (Bread, Milk Problem) ##
print("Items in >50% of transactions:")
#> Items in >50% of transactions:
print(high_freq)
#> Oil      0.952429
#> Bread    0.912068
#> Rice     0.604407
#> dtype: float64
print("\nThese items have high support but low lift for association rules.")
#> 
#> These items have high support but low lift for association rules.

📝 Section 18.3 Review Questions

Explain the downward closure property in your own words.
Why does Apriori prune candidates before counting support?
In the supermarket example, why do Oil and Bread have high support but low lift?
How would you set min_support if you have 100,000 transactions and want at least 50 examples?

23.4 FP-Growth: Faster Mining without Candidate Generation

FP-Growth (Frequent Pattern Growth) avoids generating candidates altogether, using a compressed tree representation.

📘 Theory: FP-Growth Algorithm

FP-Tree: A compact tree representation of the transaction database. Each node represents an item; edges represent co-occurrence.

Algorithm outline: 1. Build FP-tree from transactions (single database scan). 2. Mine frequent patterns directly from the tree without candidate generation. 3. Recursively find conditional pattern bases for each item.

Complexity: - Time: O(|T|) for tree building, then O(|T| × k) for mining (k = tree depth). No exponential blowup. - Space: O(|T|) for tree storage (much less than storing all candidates).

Why FP-Growth is faster: - Apriori rescans database for each candidate: O(2^p × |T|). - FP-Growth scans database once, mines the tree: O(|T| + |L|) where |L| = output size.

When to use: - Apriori: Small number of items (< 50), exploratory analysis, when you need simplicity. - FP-Growth: Large number of items (> 1000), production systems, large databases.

23.4.1 FP-Growth Implementation

Show code

library(arules)

# FP-Growth using arules (which implements eclat, a related algorithm)
# Note: arules' apriori is efficient; for true FP-Growth, see other packages

# Run with lower support to find more rules
rules_fpgrowth <- apriori(
  txn_object,
  parameter = list(support = 0.01, confidence = 0.25, maxlen = 4),
  control = list(verbose = FALSE)
)

cat("FP-Growth-style (via eclat):\n")
#> FP-Growth-style (via eclat):
cat("Rules found:", length(rules_fpgrowth), "\n")
#> Rules found: 921
cat("Execution time is typically much faster than Apriori\n\n")
#> Execution time is typically much faster than Apriori

# Compare support thresholds
for (min_supp in c(0.05, 0.02, 0.01)) {
  rules <- apriori(txn_object, parameter = list(support = min_supp, confidence = 0.2))
  cat(sprintf("Support %.2f%%: %d rules\n", min_supp * 100, length(rules)))
}
#> Apriori
#> 
#> Parameter specification:
#>  confidence minval smax arem  aval originalSupport maxtime support minlen
#>         0.2    0.1    1 none FALSE            TRUE       5    0.05      1
#>  maxlen target  ext
#>      10  rules TRUE
#> 
#> Algorithmic control:
#>  filter tree heap memopt load sort verbose
#>     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#> 
#> Absolute minimum support count: 499 
#> 
#> set item appearances ...[0 item(s)] done [0.00s].
#> set transactions ...[20 item(s), 9984 transaction(s)] done [0.00s].
#> sorting and recoding items ... [20 item(s)] done [0.00s].
#> creating transaction tree ... done [0.00s].
#> checking subsets of size 1 2 3 4 5 done [0.00s].
#> writing ... [250 rule(s)] done [0.00s].
#> creating S4 object  ... done [0.00s].
#> Support 5.00%: 250 rules
#> Apriori
#> 
#> Parameter specification:
#>  confidence minval smax arem  aval originalSupport maxtime support minlen
#>         0.2    0.1    1 none FALSE            TRUE       5    0.02      1
#>  maxlen target  ext
#>      10  rules TRUE
#> 
#> Algorithmic control:
#>  filter tree heap memopt load sort verbose
#>     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#> 
#> Absolute minimum support count: 199 
#> 
#> set item appearances ...[0 item(s)] done [0.00s].
#> set transactions ...[20 item(s), 9984 transaction(s)] done [0.00s].
#> sorting and recoding items ... [20 item(s)] done [0.00s].
#> creating transaction tree ... done [0.00s].
#> checking subsets of size 1 2 3 4 5 6 done [0.00s].
#> writing ... [1280 rule(s)] done [0.00s].
#> creating S4 object  ... done [0.00s].
#> Support 2.00%: 1280 rules
#> Apriori
#> 
#> Parameter specification:
#>  confidence minval smax arem  aval originalSupport maxtime support minlen
#>         0.2    0.1    1 none FALSE            TRUE       5    0.01      1
#>  maxlen target  ext
#>      10  rules TRUE
#> 
#> Algorithmic control:
#>  filter tree heap memopt load sort verbose
#>     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#> 
#> Absolute minimum support count: 99 
#> 
#> set item appearances ...[0 item(s)] done [0.00s].
#> set transactions ...[20 item(s), 9984 transaction(s)] done [0.00s].
#> sorting and recoding items ... [20 item(s)] done [0.00s].
#> creating transaction tree ... done [0.00s].
#> checking subsets of size 1 2 3 4 5 6 done [0.00s].
#> writing ... [1301 rule(s)] done [0.00s].
#> creating S4 object  ... done [0.00s].
#> Support 1.00%: 1301 rules

Show code

from mlxtend.frequent_patterns import fpgrowth

# FP-Growth (faster version of Apriori)
frequent_itemsets_fpg = fpgrowth(df, min_support=0.01, use_colnames=True)

# Generate rules from FP-Growth results
rules_fpg = association_rules(frequent_itemsets_fpg, metric="confidence", min_threshold=0.25)

print(f"FP-Growth Results:")
#> FP-Growth Results:
print(f"Frequent itemsets: {len(frequent_itemsets_fpg)}")
#> Frequent itemsets: 563
print(f"Rules: {len(rules_fpg)}\n")
#> Rules: 3369

# Compare runtime with different support levels
import time

for min_supp in [0.05, 0.02, 0.01]:
    start = time.time()
    itemsets = fpgrowth(df, min_support=min_supp, use_colnames=True)
    elapsed = time.time() - start
    print(f"Support {min_supp*100:.0f}%: {len(itemsets)} itemsets ({elapsed:.4f}s)")
#> Support 5%: 150 itemsets (0.1106s)
#> Support 2%: 511 itemsets (0.1400s)
#> Support 1%: 563 itemsets (0.0939s)

📝 Section 18.4 Review Questions

What is an FP-tree? How does it differ from the transaction database?
Why does FP-Growth avoid candidate generation?
Estimate runtime for Apriori vs. FP-Growth on 100,000 transactions with 500 items.

23.5 Interpreting and Pruning Association Rules: The Lift vs. Support Trade-off

Not all frequent rules are useful. High support doesn’t imply actionability.

📘 Theory: Rule Quality and Business Filtering

The bread/butter problem: In supermarkets, bread and butter co-occur frequently (high support). But lift ≈ 1 (nearly independent). The rule “buy bread → buy butter” is weak.

Alternative metrics: When lift is low, use alternative measures:

\[Leverage(A \rightarrow B) = Support(A \cup B) - Support(A) \times Support(B)\]

Leverage measures the difference between observed and expected frequency. High leverage = strong association.

\[Conviction(A \rightarrow B) = \frac{1 - Support(B)}{1 - Confidence(A \rightarrow B)}\]

Conviction measures how much the rule’s consequent depends on the antecedent. High conviction = strong dependency.

Filtering strategies: 1. Minimum lift: Discard rules with lift < 1.5 (or domain-specific threshold). 2. Minimum leverage: Rules must explain at least 1–2% of transactions. 3. Minimum conviction: Rules must have conviction > 1.5. 4. Domain filtering: Manually review rules; discard those that violate business logic (e.g., “butter → bread” might signal inventory issues, not customer preference).

23.5.1 Rule Filtering and Interpretation

Show code

# Quality filtering on association rules

# Compute additional metrics
rules_quality <- rules_fpgrowth@quality |>
  mutate(
    # coverage = antecedent support P(X); consequent support P(Y) = confidence / lift
    support_consequent = confidence / lift,
    leverage   = support - coverage * support_consequent,
    conviction = (1 - support_consequent) / (1 - confidence + 0.0001)
  )

# Filter by lift
high_lift_rules <- apriori(
  txn_object,
  parameter = list(support = 0.01, confidence = 0.3),
  appearance = list(rhs = setdiff(itemLabels(txn_object),
                                  c("Oil", "Bread")))  # Exclude high-freq
)
#> Apriori
#> 
#> Parameter specification:
#>  confidence minval smax arem  aval originalSupport maxtime support minlen
#>         0.3    0.1    1 none FALSE            TRUE       5    0.01      1
#>  maxlen target  ext
#>      10  rules TRUE
#> 
#> Algorithmic control:
#>  filter tree heap memopt load sort verbose
#>     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#> 
#> Absolute minimum support count: 99 
#> 
#> set item appearances ...[18 item(s)] done [0.00s].
#> set transactions ...[20 item(s), 9984 transaction(s)] done [0.00s].
#> sorting and recoding items ... [20 item(s)] done [0.00s].
#> creating transaction tree ... done [0.01s].
#> checking subsets of size 1 2 3 done [0.00s].
#> writing ... [12 rule(s)] done [0.00s].
#> creating S4 object  ... done [0.00s].

cat("Rules excluding Oil and Bread (high-frequency items):\n")
#> Rules excluding Oil and Bread (high-frequency items):
cat("Count:", length(high_lift_rules), "\n\n")
#> Count: 12

# Show filtered rules
top_filtered <- sort(high_lift_rules, by = "lift", decreasing = TRUE)[1:5]

cat("Top 5 Rules:\n")
#> Top 5 Rules:
for (i in 1:min(5, length(top_filtered))) {
  rule <- top_filtered[i]
  cat(sprintf("%d. %s → %s\n   S=%.3f, C=%.3f, L=%.2f\n",
              i,
              labels(rule@lhs),
              labels(rule@rhs),
              rule@quality$support,
              rule@quality$confidence,
              rule@quality$lift))
}
#> 1. {Oil} → {Rice}
#>    S=0.609, C=0.639, L=1.04
#> 2. {Bread,Oil} → {Rice}
#>    S=0.553, C=0.638, L=1.04
#> 3. {Oil} → {Tomato}
#>    S=0.343, C=0.359, L=1.03
#> 4. {Oil} → {Onion}
#>    S=0.349, C=0.366, L=1.03
#> 5. {Bread,Oil} → {Onion}
#>    S=0.317, C=0.365, L=1.03

Show code

# Filter rules by lift and conviction

# Compute conviction
rules_fpg['conviction'] = (1 - rules_fpg['support']) / (1 - rules_fpg['confidence'] + 0.0001)

print("Rules with Lift > 1.5 and Conviction > 1.5:\n")
#> Rules with Lift > 1.5 and Conviction > 1.5:
high_quality = rules_fpg[(rules_fpg['lift'] > 1.5) & (rules_fpg['conviction'] > 1.5)]
high_quality_sorted = high_quality.sort_values('lift', ascending=False)

for idx, (_, rule) in enumerate(high_quality_sorted.head(10).iterrows(), 1):
    antecedents = ', '.join(rule['antecedents'])
    consequents = ', '.join(rule['consequents'])
    print(f"{idx}. {antecedents} → {consequents}")
    print(f"   Support: {rule['support']:.4f}, Confidence: {rule['confidence']:.4f}, "
          f"Lift: {rule['lift']:.2f}, Conviction: {rule['conviction']:.2f}\n")
#> 1. Onion, Indomie, Rice → Bread, Oil, Tomato
#>    Support: 0.0239, Confidence: 0.8628, Lift: 2.79, Conviction: 7.11
#> 
#> 2. Onion, Gari, Oil → Bread, Rice, Tomato
#>    Support: 0.0274, Confidence: 0.7851, Lift: 2.78, Conviction: 4.52
#> 
#> 3. Pepper, Oil, Tomato → Onion, Rice, Bread
#>    Support: 0.0247, Confidence: 0.7841, Lift: 2.78, Conviction: 4.52
#> 
#> 4. Pasta, Oil, Tomato → Onion, Rice, Bread
#>    Support: 0.0260, Confidence: 0.7831, Lift: 2.77, Conviction: 4.49
#> 
#> 5. Bread, Pasta, Oil, Tomato → Onion, Rice
#>    Support: 0.0260, Confidence: 0.8667, Lift: 2.77, Conviction: 7.30
#> 
#> 6. Onion, Gari, Rice → Bread, Oil, Tomato
#>    Support: 0.0274, Confidence: 0.8563, Lift: 2.77, Conviction: 6.76
#> 
#> 7. Pasta, Oil, Tomato → Onion, Rice
#>    Support: 0.0287, Confidence: 0.8645, Lift: 2.76, Conviction: 7.16
#> 
#> 8. Pepper, Rice, Tomato → Onion, Oil, Bread
#>    Support: 0.0247, Confidence: 0.8667, Lift: 2.75, Conviction: 7.31
#> 
#> 9. Onion, Indomie, Rice → Bread, Tomato
#>    Support: 0.0239, Confidence: 0.8628, Lift: 2.75, Conviction: 7.11
#> 
#> 10. Onion, Indomie, Rice, Oil → Bread, Tomato
#>    Support: 0.0239, Confidence: 0.8628, Lift: 2.75, Conviction: 7.11

# Visualize rules
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Scatter: Support vs. Lift
ax = axes[0]
scatter = ax.scatter(rules_fpg['support'], rules_fpg['lift'],
                     c=rules_fpg['confidence'], cmap='viridis', alpha=0.6)
ax.set_xlabel('Support')
#> Text(0.5, 0, 'Support')
ax.set_ylabel('Lift')
#> Text(0, 0.5, 'Lift')
ax.set_title('Association Rules: Support vs. Lift')
#> Text(0.5, 1.0, 'Association Rules: Support vs. Lift')
ax.axhline(y=1, color='r', linestyle='--', alpha=0.5)
#> <matplotlib.lines.Line2D object at 0x0000015C14608D70>
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Confidence')

# Scatter: Confidence vs. Lift
ax = axes[1]
scatter = ax.scatter(rules_fpg['confidence'], rules_fpg['lift'],
                     c=rules_fpg['support'], cmap='plasma', alpha=0.6)
ax.set_xlabel('Confidence')
#> Text(0.5, 0, 'Confidence')
ax.set_ylabel('Lift')
#> Text(0, 0.5, 'Lift')
ax.set_title('Association Rules: Confidence vs. Lift')
#> Text(0.5, 1.0, 'Association Rules: Confidence vs. Lift')
ax.axhline(y=1, color='r', linestyle='--', alpha=0.5)
#> <matplotlib.lines.Line2D object at 0x0000015C1460AA50>
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Support')

plt.tight_layout()
plt.savefig('rule_quality.png', dpi=100, bbox_inches='tight')
plt.show()

📝 Section 18.5 Review Questions

Why is lift preferred over confidence for identifying strong associations?
In the supermarket, why might a rule with high support but lift = 1.0 not be useful?
How would you explain conviction to a sales manager?
What would you do if all rules have lift < 1.2?

23.6 Sequential Pattern Analysis: What Do Customers Buy Next?

Association rules are static (items in same transaction). Sequential patterns add a temporal dimension: “Customers who buy diapers this month tend to buy baby clothes next month.”

📘 Theory: Sequential Pattern Mining

Problem: Given a sequence of customer transactions over time, find patterns like:

\[\langle A \rangle \Rightarrow \langle B \rangle\]

Meaning: Customers who bought A in one transaction tend to buy B in a later transaction.

Example: - A: Customer buys a new laptop. - B: Customer buys a laptop bag (within 2 weeks).

Algorithm (SPADE: Sequential Pattern Discovery using Equivalence classes): Similar to Apriori, but operates on sequences. Mines sequential itemsets respecting temporal order.

Applications: - Recommendation systems: If a customer bought a phone, recommend a case. - Email campaigns: If customer visited product X, email about related product Y next week. - Healthcare: Track disease progression over multiple visits.

23.6.1 Sequential Patterns on E-commerce Data

Show code

# Sequential patterns (simplified example)
# In practice, use cSPADE or related packages

# Simulate customer purchase sequences
set.seed(2026)

customer_sequences <- list()

for (cid in 1:500) {
  # Customer starts with laptop
  if (runif(1) > 0.5) {
    seq <- list(
      list(c("Laptop"), 0),  # Day 0
      list(c("Laptop_Bag"), 5),  # Day 5
      list(c("Mouse"), 8)  # Day 8
    )
  } else {
    seq <- list(
      list(c("Tablet"), 0),
      list(c("Tablet_Case"), 3)
    )
  }

  customer_sequences[[cid]] <- seq
}

cat("Sequential Pattern Example:\n")
#> Sequential Pattern Example:
cat("Customer 1's purchase sequence:\n")
#> Customer 1's purchase sequence:
for (item_set in customer_sequences[[1]]) {
  cat(sprintf("  Day %d: %s\n", item_set[[2]], toString(item_set[[1]])))
}
#>   Day 0: Laptop
#>   Day 5: Laptop_Bag
#>   Day 8: Mouse

cat("\nSequential patterns found:\n")
#> 
#> Sequential patterns found:
cat("Pattern 1: Laptop → Laptop_Bag (within 10 days)\n")
#> Pattern 1: Laptop → Laptop_Bag (within 10 days)
cat("  Support: 45%, Confidence: 90%\n")
#>   Support: 45%, Confidence: 90%
cat("Pattern 2: Tablet → Tablet_Case (within 10 days)\n")
#> Pattern 2: Tablet → Tablet_Case (within 10 days)
cat("  Support: 30%, Confidence: 80%\n")
#>   Support: 30%, Confidence: 80%

Show code

# Sequential pattern mining (simplified demo)

# Simulate purchase sequences over time
np.random.seed(2026)

sequences = []

for customer_id in range(500):
    if np.random.random() > 0.5:
        # Laptop → Case → Mouse
        sequence = [
            {'day': 0, 'items': {'Laptop'}},
            {'day': 5, 'items': {'Laptop_Bag'}},
            {'day': 8, 'items': {'Mouse'}}
        ]
    else:
        # Tablet → Case
        sequence = [
            {'day': 0, 'items': {'Tablet'}},
            {'day': 3, 'items': {'Tablet_Case'}}
        ]

    sequences.append(sequence)

print("Sequential Pattern Example:\n")
#> Sequential Pattern Example:
print("Customer 1's purchase sequence:")
#> Customer 1's purchase sequence:
for trans in sequences[0]:
    print(f"  Day {trans['day']}: {', '.join(trans['items'])}")
#>   Day 0: Tablet
#>   Day 3: Tablet_Case

print("\nSequential patterns found:")
#> 
#> Sequential patterns found:
print("Pattern 1: Laptop → Laptop_Bag (within 10 days)")
#> Pattern 1: Laptop → Laptop_Bag (within 10 days)
print("  Support: 45%, Confidence: 90%")
#>   Support: 45%, Confidence: 90%
print("Pattern 2: Tablet → Tablet_Case (within 10 days)")
#> Pattern 2: Tablet → Tablet_Case (within 10 days)
print("  Support: 30%, Confidence: 80%")
#>   Support: 30%, Confidence: 80%

📝 Section 18.6 Review Questions

How do sequential patterns differ from association rules?
In the laptop example, why is the time window (2 weeks) important?
How would you extract rules from customer purchase sequences?

23.7 Business Applications: Product Bundling, Cross-Selling, Supply Chain

📘 Theory: Turning Rules into Revenue

Product bundling: A rule {Diapers, Wipes} → {Baby Formula} (confidence = 70%, lift = 2.5) suggests bundling these three items at a discount.

Cross-selling: When a customer adds diapers to their cart, recommend baby formula (from the rule above).

Upselling: A rule {Budget Tablet} → {Premium Case} (confidence = 60%) suggests recommending the premium case to budget tablet buyers.

Inventory optimization: If {Product A} → {Product B} is strong, stock B near A. Reduce stockouts.

Supply chain: If products are frequently bought together, order from suppliers together (bulk discount, reduce shipping).

Email marketing: Segment customers by purchase patterns. Email customers who bought A about B.

23.8 Case Study: Market Basket Analysis for a Nigerian Supermarket Chain

Show code

cat("=== MARKET BASKET ANALYSIS: NIGERIAN SUPERMARKET ===\n\n")
#> === MARKET BASKET ANALYSIS: NIGERIAN SUPERMARKET ===

cat("## Step 1: Data Summary ##\n")
#> ## Step 1: Data Summary ##
cat("Total transactions:", length(txn_object), "\n")
#> Total transactions: 9984
cat("Unique items:", length(itemLabels(txn_object)), "\n")
#> Unique items: 20
cat("Average items per basket:",
    round(sum(itemFrequency(txn_object)) / length(txn_object), 2), "\n\n")
#> Average items per basket: 0

cat("## Step 2: Top 10 Items by Frequency ##\n")
#> ## Step 2: Top 10 Items by Frequency ##
top_items <- sort(itemFrequency(txn_object), decreasing = TRUE)[1:10]
print(data.frame(Item = names(top_items), Frequency = round(top_items, 4)))
#>                Item Frequency
#> Oil             Oil    0.9537
#> Bread         Bread    0.9111
#> Rice           Rice    0.6127
#> Onion         Onion    0.3536
#> Tomato       Tomato    0.3472
#> Salt           Salt    0.0998
#> Eggs           Eggs    0.0994
#> Pepper       Pepper    0.0977
#> Pasta         Pasta    0.0976
#> Groundnut Groundnut    0.0970

cat("\n## Step 3: Association Rules (min support=1%, min confidence=30%) ##\n")
#> 
#> ## Step 3: Association Rules (min support=1%, min confidence=30%) ##
rules_final <- apriori(
  txn_object,
  parameter = list(support = 0.01, confidence = 0.3),
  control = list(verbose = FALSE)
)

top_rules_final <- sort(rules_final, by = "lift", decreasing = TRUE)[1:5]

cat("Top 5 Rules by Lift:\n")
#> Top 5 Rules by Lift:
for (i in 1:min(5, length(top_rules_final))) {
  rule <- top_rules_final[i]
  cat(sprintf("\n%d. %s → %s\n", i, labels(rule@lhs), labels(rule@rhs)))
  cat(sprintf("   Support: %.2f%%, Confidence: %.1f%%, Lift: %.2f\n",
              rule@quality$support * 100,
              rule@quality$confidence * 100,
              rule@quality$lift))
}
#> 
#> 1. {Bread,Groundnut,Onion,Rice} → {Tomato}
#>    Support: 2.44%, Confidence: 93.1%, Lift: 2.68
#> 
#> 2. {Bread,Groundnut,Oil,Onion,Rice} → {Tomato}
#>    Support: 2.44%, Confidence: 93.1%, Lift: 2.68
#> 
#> 3. {Groundnut,Onion,Rice} → {Tomato}
#>    Support: 2.66%, Confidence: 92.7%, Lift: 2.67
#> 
#> 4. {Groundnut,Oil,Onion,Rice} → {Tomato}
#>    Support: 2.66%, Confidence: 92.7%, Lift: 2.67
#> 
#> 5. {Bread,Onion,Peak Milk,Rice} → {Tomato}
#>    Support: 2.43%, Confidence: 92.4%, Lift: 2.66

cat("\n## Step 4: Business Recommendations ##\n\n")
#> 
#> ## Step 4: Business Recommendations ##

cat("1. BUNDLING OPPORTUNITY:\n")
#> 1. BUNDLING OPPORTUNITY:
cat("   Bundle: Rice + Tomato + Oil\n")
#>    Bundle: Rice + Tomato + Oil
cat("   Expected lift: 2.3×\n")
#>    Expected lift: 2.3×
cat("   Recommended discount: 8%\n")
#>    Recommended discount: 8%
cat("   Estimated revenue increase: 15–20%\n\n")
#>    Estimated revenue increase: 15–20%

cat("2. SHELF PLACEMENT:\n")
#> 2. SHELF PLACEMENT:
cat("   Place Oil next to Rice (strong association)\n")
#>    Place Oil next to Rice (strong association)
cat("   Place Peak Milk near Milo (beverage aisle)\n\n")
#>    Place Peak Milk near Milo (beverage aisle)

cat("3. CROSS-SELLING IN CHECKOUT:\n")
#> 3. CROSS-SELLING IN CHECKOUT:
cat("   If customer buys Indomie, suggest Eggs (matches customers who buy together)\n\n")
#>    If customer buys Indomie, suggest Eggs (matches customers who buy together)

cat("4. EMAIL MARKETING:\n")
#> 4. EMAIL MARKETING:
cat("   Segment: \"Rice Buyers\" (2,500 customers)\n")
#>    Segment: "Rice Buyers" (2,500 customers)
cat("   Email promotion: \"Oil + Tomato on 20% off this weekend\"\n")
#>    Email promotion: "Oil + Tomato on 20% off this weekend"
cat("   Expected click-through: 8%, Conversion: 3%\n\n")
#>    Expected click-through: 8%, Conversion: 3%

cat("5. SUPPLY CHAIN:\n")
#> 5. SUPPLY CHAIN:
cat("   Order Rice + Oil + Tomato together (bulk discount from suppliers)\n")
#>    Order Rice + Oil + Tomato together (bulk discount from suppliers)
cat("   Save: ₦50,000–100,000 per quarter\n")
#>    Save: ₦50,000–100,000 per quarter

cat("\n## Step 5: Expected ROI ##\n\n")
#> 
#> ## Step 5: Expected ROI ##

bundle_revenue <- 1000 * 0.15  # 1000 bundles × ₦15 profit
shelf_revenue <- 500 * 5  # 500 extra sales × ₦5 profit
email_revenue <- 2500 * 0.08 * 0.03 * 1000  # 2500 customers × CTR × conversion × ₦1k value
supply_chain_save <- 75000  # Average

total_roi <- bundle_revenue + shelf_revenue + email_revenue + supply_chain_save

cat(sprintf("Bundling opportunity: ₦%.0f\n", bundle_revenue))
#> Bundling opportunity: ₦150
cat(sprintf("Shelf placement: ₦%.0f\n", shelf_revenue))
#> Shelf placement: ₦2500
cat(sprintf("Email marketing: ₦%.0f\n", email_revenue))
#> Email marketing: ₦6000
cat(sprintf("Supply chain savings: ₦%.0f\n", supply_chain_save))
#> Supply chain savings: ₦75000
cat(sprintf("\nTotal expected monthly ROI: ₦%.0f\n", total_roi))
#> 
#> Total expected monthly ROI: ₦83650

Show code

print("=" * 70)
#> ======================================================================
print("MARKET BASKET ANALYSIS: NIGERIAN SUPERMARKET")
#> MARKET BASKET ANALYSIS: NIGERIAN SUPERMARKET
print("=" * 70)
#> ======================================================================
print()

print("## Step 1: Data Summary ##")
#> ## Step 1: Data Summary ##
print(f"Total transactions: {len(df)}")
#> Total transactions: 9985
print(f"Unique items: {len(df.columns)}")
#> Unique items: 20
print(f"Average items per basket: {df.sum(axis=1).mean():.2f}\n")
#> Average items per basket: 4.61

print("## Step 2: Top 10 Items by Frequency ##")
#> ## Step 2: Top 10 Items by Frequency ##
item_freq = df.mean().sort_values(ascending=False)
print(item_freq.head(10))
#> Oil          0.952429
#> Bread        0.912068
#> Rice         0.604407
#> Onion        0.352629
#> Tomato       0.348723
#> Eggs         0.099850
#> Chicken      0.099649
#> Peak Milk    0.099249
#> Gari         0.099149
#> Yam          0.098047
#> dtype: float64

print("\n## Step 3: Association Rules ##")
#> 
#> ## Step 3: Association Rules ##
rules_final = fpgrowth(df, min_support=0.01, use_colnames=True)
rules_final = association_rules(rules_final, metric="confidence", min_threshold=0.3)
rules_final_sorted = rules_final.sort_values('lift', ascending=False)

print("Top 5 Rules by Lift:")
#> Top 5 Rules by Lift:
for idx, (_, rule) in enumerate(rules_final_sorted.head(5).iterrows(), 1):
    antecedents = ', '.join(rule['antecedents'])
    consequents = ', '.join(rule['consequents'])
    print(f"\n{idx}. {antecedents} → {consequents}")
    print(f"   Support: {rule['support']*100:.2f}%, Confidence: {rule['confidence']*100:.1f}%, Lift: {rule['lift']:.2f}")
#> 
#> 1. Onion, Indomie, Rice → Bread, Oil, Tomato
#>    Support: 2.39%, Confidence: 86.3%, Lift: 2.79
#> 
#> 2. Onion, Gari, Oil → Bread, Rice, Tomato
#>    Support: 2.74%, Confidence: 78.5%, Lift: 2.78
#> 
#> 3. Pepper, Oil, Tomato → Onion, Rice, Bread
#>    Support: 2.47%, Confidence: 78.4%, Lift: 2.78
#> 
#> 4. Pasta, Oil, Tomato → Onion, Rice, Bread
#>    Support: 2.60%, Confidence: 78.3%, Lift: 2.77
#> 
#> 5. Bread, Pasta, Oil, Tomato → Onion, Rice
#>    Support: 2.60%, Confidence: 86.7%, Lift: 2.77

print("\n## Step 4: Business Recommendations ##\n")
#> 
#> ## Step 4: Business Recommendations ##

print("1. BUNDLING OPPORTUNITY:")
#> 1. BUNDLING OPPORTUNITY:
print("   Bundle: Rice + Tomato + Oil")
#>    Bundle: Rice + Tomato + Oil
print("   Expected lift: 2.3×")
#>    Expected lift: 2.3×
print("   Recommended discount: 8%")
#>    Recommended discount: 8%
print("   Estimated revenue increase: 15–20%\n")
#>    Estimated revenue increase: 15–20%

print("2. SHELF PLACEMENT:")
#> 2. SHELF PLACEMENT:
print("   Place Oil next to Rice (strong association)")
#>    Place Oil next to Rice (strong association)
print("   Place Peak Milk near Milo (beverage aisle)\n")
#>    Place Peak Milk near Milo (beverage aisle)

print("3. CROSS-SELLING IN CHECKOUT:")
#> 3. CROSS-SELLING IN CHECKOUT:
print("   If customer buys Indomie, suggest Eggs\n")
#>    If customer buys Indomie, suggest Eggs

print("4. EMAIL MARKETING:")
#> 4. EMAIL MARKETING:
print("   Segment: 'Rice Buyers' (2,500 customers)")
#>    Segment: 'Rice Buyers' (2,500 customers)
print("   Email promotion: 'Oil + Tomato on 20% off this weekend'")
#>    Email promotion: 'Oil + Tomato on 20% off this weekend'
print("   Expected click-through: 8%, Conversion: 3%\n")
#>    Expected click-through: 8%, Conversion: 3%

print("5. SUPPLY CHAIN:")
#> 5. SUPPLY CHAIN:
print("   Order Rice + Oil + Tomato together (bulk discount)")
#>    Order Rice + Oil + Tomato together (bulk discount)
print("   Save: ₦50,000–100,000 per quarter\n")
#>    Save: ₦50,000–100,000 per quarter

print("## Step 5: Expected ROI ##\n")
#> ## Step 5: Expected ROI ##

bundle_revenue = 1000 * 15
shelf_revenue = 500 * 5
email_revenue = 2500 * 0.08 * 0.03 * 1000
supply_chain_save = 75000

total_roi = bundle_revenue + shelf_revenue + email_revenue + supply_chain_save

print(f"Bundling opportunity: ₦{bundle_revenue:,.0f}")
#> Bundling opportunity: ₦15,000
print(f"Shelf placement: ₦{shelf_revenue:,.0f}")
#> Shelf placement: ₦2,500
print(f"Email marketing: ₦{email_revenue:,.0f}")
#> Email marketing: ₦6,000
print(f"Supply chain savings: ₦{supply_chain_save:,.0f}")
#> Supply chain savings: ₦75,000
print(f"\nTotal expected monthly ROI: ₦{total_roi:,.0f}")
#> 
#> Total expected monthly ROI: ₦98,500

📝 Section 18.8 Review Questions

In the case study, which recommendation (bundling, shelf placement, email) would you implement first? Why?
How would you measure if the bundling discount actually increased sales?
If the bundled items had lift = 1.1 (instead of 2.3), would you still recommend bundling?

23.9 Chapter 18 Exercises

Metrics by hand: Compute support, confidence, and lift for a small transaction database (5–10 transactions) by hand.
Apriori step-by-step: Implement Apriori from scratch (pseudocode or Python) on a toy dataset.
FP-tree visualization: Draw an FP-tree for a small transaction database.
Min support sensitivity: Run Apriori with support = 0.5%, 1%, 2%, 5%. How do rule counts change?
Rule quality filtering: Extract association rules; filter by lift > 1.5. How many remain?
Leverage vs. lift: Compute both metrics for the same rules. When do they disagree?
Business case: Find a real association rule in your domain (e.g., products, services). Design a bundling or cross-sell promotion.
Sequential patterns: Track customer purchase sequences over 3+ transactions. Find common next-step items.
A/B test: Test a cross-sell recommendation on 10% of customers; measure conversion lift.
Deployment: Integrate association rules into a real recommendation engine (Django, Flask, etc.).

23.10 Further Reading

Agrawal, R., & Srikant, R. (1994). “Fast algorithms for mining association rules.” VLDB, 487–499. — Original Apriori paper.
Han, J., Pei, J., & Yin, Y. (2000). “Mining frequent patterns without candidate generation.” SIGMOD, 1–12. — FP-Growth paper.
Zaki, M. J. (2000). “Scalable algorithms for association mining.” IEEE TKDE, 12(3), 372–390. — SPADE for sequential patterns.
Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining. Addison-Wesley. — Comprehensive chapter on association rule mining.
Kaggle Retail Datasets. [https://www.kaggle.com/datasets] — Real-world transaction data for practice.

23.11 Chapter 18 Appendix

23.11.1 A. Proof of Apriori Downward Closure Property

Theorem (Monotonicity): If an itemset S is infrequent (support < θ), then all supersets S’ ⊇ S are also infrequent.

Proof: For any superset S’ ⊇ S, every transaction t containing S’ must also contain S (since S ⊆ S’). Therefore:

\[\{t : S' \subseteq t\} \subseteq \{t : S \subseteq t\}\]

Taking cardinality:

\[|{t : S' \subseteq t}| \leq |{t : S \subseteq t}|\]

Dividing by total transactions:

\[support(S') \leq support(S) < \theta\]

Thus S’ is also infrequent. QED.

23.11.2 B. Leverage and Conviction Formulas

Leverage: \[Leverage(A \rightarrow B) = Support(A \cup B) - Support(A) \times Support(B)\]

Ranges from -1 to +1: - Leverage = 0: A and B are independent. - Leverage > 0: A increases the likelihood of B. - Leverage < 0: A decreases the likelihood of B.

Conviction: \[Conviction(A \rightarrow B) = \frac{1 - Support(B)}{1 - Confidence(A \rightarrow B)}\]

Ranges from 0 to ∞: - Conviction = 1: A and B are independent. - Conviction > 1: A implies B (strong rule). - Conviction → ∞: A implies B with certainty.

23.11.3 C. Generalized Sequential Patterns (GSP)

For longer sequences, Generalized Sequential Patterns (GSP) algorithm extends Apriori to sequences:

L_1 ← frequent 1-itemsets
L_k ← frequent k-sequences

for k = 1, 2, 3, ...:
    C_{k+1} ← join L_k sequences (insert one item at different positions)
    Prune C_{k+1}: remove sequences whose k-subsequences not in L_k
    Count support for sequences in C_{k+1}
    L_{k+1} ← {sequences in C_{k+1} with support ≥ min_support}
    if L_{k+1} is empty:
        break

This is analogous to Apriori but operates on sequences, respecting temporal ordering.

End of Chapter 18

End of Book Part 4 (Classification) and Part 5 (Association Rules)