37 Recommendation Engines

📋 Learning Objectives

By the end of this chapter, you will understand the core business value of personalized recommendations and how they drive revenue across industries. You will implement three foundational recommendation approaches: content-based filtering (matching items via feature similarity), collaborative filtering (leveraging user-user and item-item similarity), and matrix factorisation (learning latent user and item factors). You will address the cold-start problem—how to recommend to new users and new items lacking historical data—using hybrid approaches combining content and collaborative signals. You will evaluate recommenders on standard metrics: Precision@K, Recall@K, NDCG, and MAP. You will build reproducible R and Python pipelines end-to-end: data preparation, model training, hyperparameter tuning, evaluation on a hold-out test set, and A/B test simulation. You will work with authentic African e-commerce data: 500 users, 200 Nigerian products (FMCG, electronics, fashion typical of Jumia Nigeria), 8,000 purchase events. You will master the mathematical foundations: SVD, ALS (Alternating Least Squares), and information retrieval metrics. By project completion, you will have built a hybrid recommender system and projected the revenue impact of deployment.

37.1 Why Recommendations Drive Revenue

The “You may also like” section on e-commerce sites is no accident—it is engineered for revenue. Amazon attributes 35% of revenue to recommendations. Netflix’s “Recommended for you” row drives 50% of viewtime. Spotify’s algorithmic playlists create discovery and lock-in. The mechanism is clear: instead of customers searching through tens of thousands of products, personalised recommendations surface items matching their taste, dramatically increasing conversion and average basket size.

This is especially powerful in African e-commerce. Jumia Nigeria hosts 15 million product listings across electronics, fashion, FMCG, and home. A typical user browsing time before purchase is 8–15 minutes; most never explore beyond the first 50 items per category. A recommendation engine showing each user their top-10 items (rather than static bestsellers) increases engagement and average order value by 15–30% in pilot studies. Equally, in banking, a customer with a savings account is rarely shown investment or credit products; a recommender surfaces suitable products (auto-loan, small business line of credit) based on income, transaction pattern, and peer behaviour, driving product penetration.

The fundamental challenge is sparsity: a user-product matrix with 500 users and 200,000 products has 100 million cells, but most users buy only 10–20 products (0.0005% density). Collaborative filtering exploits the structure hidden in the sparse matrix: users with similar purchase histories likely have similar preferences; products bought together by many users are similar.

37.2 Content-Based Filtering

Content-based filtering recommends items similar to those the user already liked. Each item has features (product category, brand, price tier, attributes). A user’s preference profile is built by averaging the features of items they liked. New recommendations are items most similar to this profile.

For e-commerce, items are usually described in text: product name, category, description, tags. We convert text to numeric vectors using TF-IDF (Term Frequency-Inverse Document Frequency): each unique word or phrase becomes a dimension, and the value reflects how important that term is to the document. Products with high TF-IDF overlap are similar.

Pros: No cold-start problem for new items (we have their features); interpretable (we can explain why an item was recommended: “you liked running shoes, here are similar running shoes”). Cons: Filter bubble (we recommend similar items, never exposing users to unexpected tastes); requires rich metadata (some products have poor descriptions); can’t leverage collective intelligence (if 10,000 users all like product A but hate product B, a content-based system doesn’t capture this).

Show code

library(tidyverse)
library(proxy)
library(tm)

# Simulate 100 Nigerian e-commerce products with descriptions
set.seed(6358)

products_content <- data.frame(
  product_id = 1:100,
  product_name = c(
    "Samsung 65-inch Smart TV", "LG Refrigerator", "Itel Phone", "MTN Router",
    "Gionee Smartphone", "Canon Camera", "Nike Running Shoes", "Adidas Football Boots",
    "Levi's Jeans", "Tom Ford Perfume", "Lipton Tea Bags", "Nestlé Milo",
    "Dangote Flour", "Indomie Instant Noodles", "Golden Penny Pasta",
    "Coca-Cola 2L", "Heineken Beer", "Guinness Beer", "Dettol Disinfectant",
    "Ponds Face Cream",
    rep("Generic Product", 80)  # Placeholder
  ),
  product_desc = c(
    "4K resolution smart television with voice control and WiFi connectivity",
    "Frost-free refrigerator with 600L capacity and digital temperature control",
    "Budget smartphone with 32GB storage and 4000mAh battery",
    "WiFi 5 router for high-speed internet connectivity in home and office",
    "Smartphone with 6.5 inch display and 128GB storage capacity",
    "Professional DSLR camera with 24MP sensor and 4K video recording",
    "Cushioned running shoes for marathon training and daily jogging",
    "Football boots with studs for grass field play and training",
    "Classic blue jeans with comfortable fit and durable denim material",
    "Premium fragrance for men with woody and citrus notes",
    "Black tea bags in convenient sachets for quick brewing",
    "Chocolate-flavored malt drink powder high in energy",
    "Refined wheat flour for baking and cooking",
    "Instant fried noodles with seasoning packet ready in 3 minutes",
    "Durum wheat pasta for traditional Italian dishes",
    "Carbonated cola beverage in 2-litre bottle",
    "Lager beer with distinctive bitter taste and 4.5% alcohol",
    "Dark stout beer with creamy head and rich flavor",
    "Multipurpose disinfectant for hygiene and surface cleaning",
    "Daily moisturizing cream for normal skin with SPF protection",
    rep("Generic product description", 80)
  ),
  category = c(
    rep("Electronics", 5), "Cameras", rep("Fashion", 4),
    rep("Beauty & Personal Care", 2), rep("Groceries & Food", 3),
    rep("Beverages", 3), "Health & Household", "Beauty & Personal Care",
    rep("Other", 80)                          # items 21–100
  ),
  price_tier = c(
    rep("Premium", 2), rep("Budget", 3), "Premium", rep("Mid-range", 3), "Premium",
    rep("Budget", 6), rep("Mid-range", 4),    # items 1–20
    rep("Other", 80)                          # items 21–100
  )
)

# Create user-item interaction matrix (sparse: 500 users, 100 products)
n_users <- 500
n_ratings <- 2000

interactions <- data.frame(
  user_id = sample(1:n_users, n_ratings, replace = TRUE),
  product_id = sample(1:100, n_ratings, replace = TRUE),
  liked = sample(c(0, 1), n_ratings, replace = TRUE, prob = c(0.4, 0.6))
)

# Remove duplicates, keep first occurrence
interactions <- interactions |>
  group_by(user_id, product_id) |>
  slice(1) |>
  ungroup()

cat("Content-Based Filtering\n")
#> Content-Based Filtering
cat("=======================\n")
#> =======================
cat("Products:", nrow(products_content), "\n")
#> Products: 100
cat("Users:", n_users, "\n")
#> Users: 500
cat("Interactions:", nrow(interactions), "\n")
#> Interactions: 1972
cat("Sparsity:", round(1 - nrow(interactions) / (n_users * 100), 4) * 100, "%\n")
#> Sparsity: 96.06 %

# Build TF-IDF matrix from product descriptions
corpus <- VCorpus(VectorSource(products_content$product_desc))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeWords, stopwords("english"))
corpus <- tm_map(corpus, stripWhitespace)

dtm <- DocumentTermMatrix(corpus, control = list(weighting = weightTfIdf))
tfidf_matrix <- as.matrix(dtm)

cat("\nTF-IDF matrix dimensions:", dim(tfidf_matrix), "\n")
#> 
#> TF-IDF matrix dimensions: 100 118

# User profile: average TF-IDF vector of liked products
user_1_likes <- (interactions |> filter(user_id == 1, liked == 1))$product_id
if (length(user_1_likes) == 0) user_1_likes <- sample(1:100, 5)  # fallback if user has no likes
user_1_profile <- colMeans(tfidf_matrix[user_1_likes, , drop = FALSE])

# Compute similarity between user_1_profile and all products
similarities <- as.vector(tfidf_matrix %*% user_1_profile /
                          (sqrt(rowSums(tfidf_matrix^2)) * sqrt(sum(user_1_profile^2))))

# Recommendations for user 1
recommendations <- data.frame(
  product_id = 1:100,
  product_name = products_content$product_name,
  similarity = similarities,
  already_liked = 1:100 %in% user_1_likes
) |>
  filter(!already_liked) |>
  arrange(desc(similarity))

cat("\n\nContent-Based Recommendations for User 1:\n")
#> 
#> 
#> Content-Based Recommendations for User 1:
print(head(recommendations |> select(product_id, product_name, similarity), 10))
#>    product_id    product_name similarity
#> 1          21 Generic Product   0.149372
#> 2          23 Generic Product   0.149372
#> 3          24 Generic Product   0.149372
#> 4          25 Generic Product   0.149372
#> 5          26 Generic Product   0.149372
#> 6          27 Generic Product   0.149372
#> 7          28 Generic Product   0.149372
#> 8          29 Generic Product   0.149372
#> 9          30 Generic Product   0.149372
#> 10         31 Generic Product   0.149372

Show code

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

np.random.seed(6358)

# Create 100 products
products = pd.DataFrame({
    'product_id': range(1, 101),
    'product_name': [
        'Samsung 65-inch Smart TV', 'LG Refrigerator', 'Itel Phone', 'MTN Router',
        'Gionee Smartphone', 'Canon Camera', 'Nike Running Shoes', 'Adidas Football Boots',
        "Levi's Jeans", 'Tom Ford Perfume', 'Lipton Tea Bags', 'Nestlé Milo',
        'Dangote Flour', 'Indomie Instant Noodles', 'Golden Penny Pasta',
        'Coca-Cola 2L', 'Heineken Beer', 'Guinness Beer', 'Dettol Disinfectant',
        'Ponds Face Cream'
    ] + ['Generic Product'] * 80,
    'product_desc': [
        '4K resolution smart television with voice control and WiFi connectivity',
        'Frost-free refrigerator with 600L capacity and digital temperature control',
        'Budget smartphone with 32GB storage and 4000mAh battery',
        'WiFi 5 router for high-speed internet connectivity',
        'Smartphone with 6.5 inch display and 128GB storage',
        'Professional DSLR camera with 24MP sensor',
        'Cushioned running shoes for marathon training',
        'Football boots with studs for grass field',
        'Classic blue jeans with comfortable fit',
        'Premium fragrance with woody and citrus notes',
        'Black tea bags for quick brewing',
        'Chocolate-flavored malt drink powder',
        'Refined wheat flour for baking',
        'Instant fried noodles ready in 3 minutes',
        'Durum wheat pasta',
        'Carbonated cola beverage',
        'Lager beer with distinctive bitter taste',
        'Dark stout beer with creamy head',
        'Multipurpose disinfectant',
        'Daily moisturizing cream with SPF'
    ] + ['Generic product description'] * 80
})

# Create user-product interactions
n_users = 500
interactions = pd.DataFrame({
    'user_id': np.random.choice(n_users, 2000),
    'product_id': np.random.choice(100, 2000),
    'liked': np.random.choice([0, 1], 2000, p=[0.4, 0.6])
})

interactions = interactions.drop_duplicates(subset=['user_id', 'product_id'], keep='first')

print("Content-Based Filtering")
#> Content-Based Filtering
print("=" * 50)
#> ==================================================
print(f"Products: {len(products)}")
#> Products: 100
print(f"Users: {n_users}")
#> Users: 500
print(f"Interactions: {len(interactions)}")
#> Interactions: 1969
print(f"Sparsity: {(1 - len(interactions) / (n_users * 100)) * 100:.2f}%")
#> Sparsity: 96.06%

# TF-IDF vectorization
vectorizer = TfidfVectorizer(max_features=100, stop_words='english')
tfidf_matrix = vectorizer.fit_transform(products['product_desc']).toarray()

print(f"\nTF-IDF matrix shape: {tfidf_matrix.shape}")
#> 
#> TF-IDF matrix shape: (100, 95)

# User 1 profile (average TF-IDF of liked products)
user_1_likes = interactions[(interactions['user_id'] == 1) & (interactions['liked'] == 1)]['product_id'].values
user_1_profile = tfidf_matrix[user_1_likes - 1].mean(axis=0) if len(user_1_likes) > 0 else tfidf_matrix[0]

# Compute similarity
similarities = cosine_similarity([user_1_profile], tfidf_matrix)[0]

# Recommendations
recommendations = pd.DataFrame({
    'product_id': products['product_id'],
    'product_name': products['product_name'],
    'similarity': similarities,
    'already_liked': products['product_id'].isin(user_1_likes)
})

recommendations = recommendations[~recommendations['already_liked']].sort_values('similarity', ascending=False)

print("\n\nContent-Based Recommendations for User 1:")
#> 
#> 
#> Content-Based Recommendations for User 1:
print(recommendations[['product_id', 'product_name', 'similarity']].head(10))
#>     product_id     product_name  similarity
#> 27          28  Generic Product    0.894427
#> 22          23  Generic Product    0.894427
#> 28          29  Generic Product    0.894427
#> 34          35  Generic Product    0.894427
#> 33          34  Generic Product    0.894427
#> 30          31  Generic Product    0.894427
#> 29          30  Generic Product    0.894427
#> 32          33  Generic Product    0.894427
#> 31          32  Generic Product    0.894427
#> 20          21  Generic Product    0.894427

37.3 Collaborative Filtering: User-Based

User-based collaborative filtering finds users with similar taste and recommends items they liked. The algorithm is simple: (1) compute user-user similarity (how similar are ratings patterns?); (2) for a target user, find k-nearest neighbours (most similar users); (3) recommend items liked by neighbours but not yet tried by the target user.

Similarity is measured via cosine similarity on the rating vector (or implicit feedback: purchase = 1, no purchase = 0). For user u and user v with rating vectors r_u and r_v:

\[\text{similarity}(u, v) = \frac{r_u \cdot r_v}{\|r_u\| \|r_v\|}\]

Pros: Leverages collective intelligence (if millions of users bought A then B, we infer that users buying A should see B); works well when there are similar users. Cons: Cold-start for new users (no rating history); computationally expensive (O(n²) for n users); assumes stability (user preferences don’t drift much over time).

🔑 Key Formula

User-based collaborative filtering prediction for user u on item i is:

\[\hat{r}_{u,i} = \bar{r}_u + \frac{\sum_{v \in N(u)} \text{sim}(u, v) \cdot (r_{v,i} - \bar{r}_v)}{\sum_{v \in N(u)} |\text{sim}(u, v)|}\]

where N(u) is the k-nearest users, sim(u, v) is user similarity, r_v,i is user v’s rating of item i, and \(\bar{r}_u\) is user u’s mean rating. This estimates user u’s rating of item i by a weighted average of neighbours’ ratings, adjusted for each neighbour’s rating bias.

Show code

library(tidyverse)
library(proxy)

# Create user-item interaction matrix
n_users <- 500
n_items <- 100
n_interactions <- 5000

set.seed(4827)
user_item_data <- data.frame(
  user_id = sample(1:n_users, n_interactions, replace = TRUE),
  item_id = sample(1:n_items, n_interactions, replace = TRUE),
  rating = sample(1:5, n_interactions, replace = TRUE, prob = c(0.1, 0.2, 0.3, 0.25, 0.15))
)

# Remove duplicates
user_item_data <- user_item_data |>
  group_by(user_id, item_id) |>
  slice(1) |>
  ungroup()

# Create sparse matrix (user x item)
rating_matrix <- matrix(0, nrow = n_users, ncol = n_items)
for (i in 1:nrow(user_item_data)) {
  rating_matrix[user_item_data$user_id[i], user_item_data$item_id[i]] <- user_item_data$rating[i]
}

cat("User-Based Collaborative Filtering\n")
#> User-Based Collaborative Filtering
cat("===================================\n")
#> ===================================
cat("Rating matrix:", dim(rating_matrix), "\n")
#> Rating matrix: 500 100
cat("Sparsity:", round((1 - nrow(user_item_data) / (n_users * n_items)) * 100, 2), "%\n")
#> Sparsity: 90.53 %

# Compute user-user similarity (cosine similarity on non-zero entries)
user_sim <- matrix(0, nrow = n_users, ncol = n_users)

for (u1 in 1:n_users) {
  for (u2 in u1:n_users) {
    # Find items both users rated
    common_items <- which(rating_matrix[u1, ] > 0 & rating_matrix[u2, ] > 0)

    if (length(common_items) > 0) {
      r1 <- rating_matrix[u1, common_items]
      r2 <- rating_matrix[u2, common_items]
      sim <- sum(r1 * r2) / (sqrt(sum(r1^2)) * sqrt(sum(r2^2)) + 1e-10)
      user_sim[u1, u2] <- sim
      user_sim[u2, u1] <- sim
    }
  }
}

# Diagonal to zero (don't use self-similarity)
diag(user_sim) <- 0

cat("\nUser similarity matrix computed (", sum(user_sim > 0) / 2, "non-zero pairs)\n")
#> 
#> User similarity matrix computed ( 74425 non-zero pairs)

# Recommendation for user 1: find k=5 nearest neighbours
target_user <- 1
k <- 5
nearest_users <- order(user_sim[target_user, ], decreasing = TRUE)[1:k]

cat("\nTarget user 1's 5 nearest neighbours:\n")
#> 
#> Target user 1's 5 nearest neighbours:
for (i in 1:k) {
  cat(sprintf("User %d: similarity = %.3f\n", nearest_users[i], user_sim[target_user, nearest_users[i]]))
}
#> User 72: similarity = 1.000
#> User 213: similarity = 1.000
#> User 220: similarity = 1.000
#> User 272: similarity = 1.000
#> User 324: similarity = 1.000

# Predict ratings for items user 1 hasn't rated
user_1_rated <- which(rating_matrix[target_user, ] > 0)
user_1_unrated <- which(rating_matrix[target_user, ] == 0)

predictions <- numeric(length(user_1_unrated))
for (i in 1:length(user_1_unrated)) {
  item <- user_1_unrated[i]

  # Weighted average of neighbours' ratings
  neighbour_ratings <- rating_matrix[nearest_users, item]
  neighbour_sims <- user_sim[target_user, nearest_users]

  valid_idx <- neighbour_ratings > 0
  if (sum(valid_idx) > 0) {
    predictions[i] <- sum(neighbour_ratings[valid_idx] * neighbour_sims[valid_idx]) /
                      sum(neighbour_sims[valid_idx])
  } else {
    predictions[i] <- NA
  }
}

recommendations_ub <- data.frame(
  item_id = user_1_unrated,
  predicted_rating = predictions
) |>
  filter(!is.na(predicted_rating)) |>
  arrange(desc(predicted_rating)) |>
  head(10)

cat("\n\nTop 10 Recommendations for User 1 (User-Based CF):\n")
#> 
#> 
#> Top 10 Recommendations for User 1 (User-Based CF):
print(recommendations_ub)
#>    item_id predicted_rating
#> 1       69                5
#> 2       87                5
#> 3       12                4
#> 4       18                4
#> 5       77                4
#> 6       79                4
#> 7       86                4
#> 8        9                3
#> 9       16                3
#> 10      20                3

Show code

import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

np.random.seed(4827)

# User-item rating matrix
n_users = 500
n_items = 100
n_interactions = 5000

user_item_data = pd.DataFrame({
    'user_id': np.random.choice(n_users, n_interactions),
    'item_id': np.random.choice(n_items, n_interactions),
    'rating': np.random.choice([1, 2, 3, 4, 5], n_interactions, p=[0.1, 0.2, 0.3, 0.25, 0.15])
})

user_item_data = user_item_data.drop_duplicates(subset=['user_id', 'item_id'], keep='first')

# Create rating matrix
rating_matrix = np.zeros((n_users, n_items))
for _, row in user_item_data.iterrows():
    rating_matrix[int(row['user_id']), int(row['item_id'])] = row['rating']

print("User-Based Collaborative Filtering")
#> User-Based Collaborative Filtering
print("=" * 50)
#> ==================================================
print(f"Rating matrix: {rating_matrix.shape}")
#> Rating matrix: (500, 100)
print(f"Sparsity: {(1 - len(user_item_data) / (n_users * n_items)) * 100:.2f}%")
#> Sparsity: 90.50%

# Compute user-user similarity
user_sim = cosine_similarity(rating_matrix)
np.fill_diagonal(user_sim, 0)  # Ignore self-similarity

print(f"\nUser similarity matrix: {user_sim.shape}")
#> 
#> User similarity matrix: (500, 500)

# Find k-nearest neighbours for user 0
target_user = 0
k = 5
nearest_indices = np.argsort(user_sim[target_user])[-k:][::-1]

print(f"\nTarget user {target_user}'s {k} nearest neighbours:")
#> 
#> Target user 0's 5 nearest neighbours:
for idx in nearest_indices:
    print(f"User {idx}: similarity = {user_sim[target_user, idx]:.3f}")
#> User 468: similarity = 0.487
#> User 110: similarity = 0.452
#> User 423: similarity = 0.386
#> User 354: similarity = 0.364
#> User 135: similarity = 0.348

# Predict ratings for unrated items
target_rated = np.where(rating_matrix[target_user] > 0)[0]
target_unrated = np.where(rating_matrix[target_user] == 0)[0]

predictions = []
for item in target_unrated:
    neighbour_ratings = rating_matrix[nearest_indices, item]
    neighbour_sims = user_sim[target_user, nearest_indices]

    valid_idx = neighbour_ratings > 0
    if np.sum(valid_idx) > 0:
        pred = np.sum(neighbour_ratings[valid_idx] * neighbour_sims[valid_idx]) / np.sum(neighbour_sims[valid_idx])
        predictions.append({'item_id': item, 'predicted_rating': pred})

recommendations_ub = pd.DataFrame(predictions).sort_values('predicted_rating', ascending=False).head(10)
print("\n\nTop 10 Recommendations for User 0 (User-Based CF):")
#> 
#> 
#> Top 10 Recommendations for User 0 (User-Based CF):
print(recommendations_ub)
#>     item_id  predicted_rating
#> 4        16          4.166315
#> 11       39          4.000000
#> 7        29          4.000000
#> 26       89          4.000000
#> 15       49          4.000000
#> 22       75          4.000000
#> 23       78          4.000000
#> 13       45          3.000000
#> 25       88          3.000000
#> 3        15          3.000000

37.4 Collaborative Filtering: Item-Based

Item-based CF finds items similar to those a user liked, then recommends those similar items. Unlike user-based, which clusters users, item-based clusters products.

Algorithm: (1) Compute item-item similarity (how often are items co-rated?); (2) For each unrated item, compute its similarity to items the user liked; (3) Predict user’s rating as weighted average of similar items’ ratings.

Advantage: More stable than user-based (products are purchased consistently; user tastes drift). Computationally: pre-compute item similarities once; at prediction time, only fetch similarities for rated items (efficient for new users). Disadvantage: Can’t discover truly novel items; requires enough co-ratings to estimate item similarity.

Show code

# Using rating_matrix from user-based CF section

# Compute item-item similarity (cosine similarity on rating vectors)
item_sim <- matrix(0, nrow = n_items, ncol = n_items)

for (i1 in 1:n_items) {
  for (i2 in i1:n_items) {
    ratings_i1 <- rating_matrix[, i1]
    ratings_i2 <- rating_matrix[, i2]

    # Only use items both were rated on
    common_users <- which(ratings_i1 > 0 & ratings_i2 > 0)

    if (length(common_users) > 0) {
      r1 <- ratings_i1[common_users]
      r2 <- ratings_i2[common_users]
      sim <- sum(r1 * r2) / (sqrt(sum(r1^2)) * sqrt(sum(r2^2)) + 1e-10)
      item_sim[i1, i2] <- sim
      item_sim[i2, i1] <- sim
    }
  }
}

diag(item_sim) <- 0

cat("Item-Based Collaborative Filtering\n")
#> Item-Based Collaborative Filtering
cat("===================================\n")
#> ===================================
cat("Item similarity matrix computed\n")
#> Item similarity matrix computed

# Predict for user 1
target_user <- 1
user_1_rated <- which(rating_matrix[target_user, ] > 0)
user_1_unrated <- which(rating_matrix[target_user, ] == 0)

predictions_ib <- numeric(length(user_1_unrated))

for (i in 1:length(user_1_unrated)) {
  item <- user_1_unrated[i]

  # Weighted average based on similarities to rated items
  sims_to_rated <- item_sim[item, user_1_rated]
  ratings_of_rated <- rating_matrix[target_user, user_1_rated]

  if (sum(abs(sims_to_rated)) > 0) {
    predictions_ib[i] <- sum(sims_to_rated * ratings_of_rated) / sum(abs(sims_to_rated))
  } else {
    predictions_ib[i] <- NA
  }
}

recommendations_ib <- data.frame(
  item_id = user_1_unrated,
  predicted_rating = predictions_ib
) |>
  filter(!is.na(predicted_rating)) |>
  arrange(desc(predicted_rating)) |>
  head(10)

cat("\nTop 10 Recommendations for User 1 (Item-Based CF):\n")
#> 
#> Top 10 Recommendations for User 1 (Item-Based CF):
print(recommendations_ib)
#>    item_id predicted_rating
#> 1       42         3.713567
#> 2       73         3.603817
#> 3       89         3.578779
#> 4       16         3.562115
#> 5       32         3.557605
#> 6       77         3.556931
#> 7        7         3.549879
#> 8       13         3.548764
#> 9        6         3.539496
#> 10      37         3.532694

# Compare user-based and item-based on common items
common_rec_items <- intersect(recommendations_ub$item_id, recommendations_ib$item_id)
cat("\n\nCommon items in top-10 (User-Based vs Item-Based):", length(common_rec_items), "\n")
#> 
#> 
#> Common items in top-10 (User-Based vs Item-Based): 2

Show code

# Using rating_matrix from user-based CF

# Compute item-item similarity
item_sim = cosine_similarity(rating_matrix.T)
np.fill_diagonal(item_sim, 0)

print("Item-Based Collaborative Filtering")
#> Item-Based Collaborative Filtering
print("=" * 50)
#> ==================================================
print(f"Item similarity matrix: {item_sim.shape}")
#> Item similarity matrix: (100, 100)

# Predict for user 0
target_user = 0
user_rated = np.where(rating_matrix[target_user] > 0)[0]
user_unrated = np.where(rating_matrix[target_user] == 0)[0]

predictions_ib = []
for item in user_unrated:
    sims_to_rated = item_sim[item, user_rated]
    ratings_of_rated = rating_matrix[target_user, user_rated]

    if np.sum(np.abs(sims_to_rated)) > 0:
        pred = np.sum(sims_to_rated * ratings_of_rated) / np.sum(np.abs(sims_to_rated))
        predictions_ib.append({'item_id': item, 'predicted_rating': pred})

recommendations_ib = pd.DataFrame(predictions_ib).sort_values('predicted_rating', ascending=False).head(10)
print("\nTop 10 Recommendations for User 0 (Item-Based CF):")
#> 
#> Top 10 Recommendations for User 0 (Item-Based CF):
print(recommendations_ib)
#>     item_id  predicted_rating
#> 37       39          3.253192
#> 78       89          3.088286
#> 36       38          3.086495
#> 15       16          3.008264
#> 19       20          2.995688
#> 63       70          2.987060
#> 58       64          2.984082
#> 0         0          2.947387
#> 61       68          2.934401
#> 53       57          2.900018

# Compare with user-based
common_items = set(recommendations_ub['item_id']) & set(recommendations_ib['item_id'])
print(f"\nCommon items in top-10: {len(common_items)}")
#> 
#> Common items in top-10: 3

37.5 Matrix Factorisation

The rating matrix R (users × items) is large and sparse. Matrix factorisation decomposes R ≈ U V^T, where U is a (users × K) matrix of latent user factors and V is an (items × K) matrix of latent item factors. K is small (typically 10–100), so the product is computationally efficient.

Intuitively, K latent factors might represent genres (for movies), brands, price points, or abstract product attributes. A user’s factor vector encodes their affinity for each latent attribute; an item’s factor vector encodes how much it embodies each attribute. The predicted rating is the dot product: r̂_{ui} = u_i · v_u^T.

Singular Value Decomposition (SVD) is a classical matrix factorisation: R = U Σ V^T, where Σ is diagonal. Truncating to K largest singular values gives R ≈ U_K Σ_K V_K^T. For implicit feedback (binary likes/purchases), Alternating Least Squares (ALS) is standard: iteratively fix U and optimise V, then fix V and optimise U, until convergence.

📘 Theory: SVD and Reconstruction Error

SVD is optimal in the least-squares sense: it minimises the Frobenius norm reconstruction error:

\[\|R - U_K \Sigma_K V_K^T\|_F^2 = \sum_{u,i} (r_{ui} - \hat{r}_{ui})^2\]

among all rank-K approximations. However, SVD on the full (sparse) matrix is computationally expensive; sparse factorisation methods (like ALS) scale better.

Show code

library(recommenderlab)

# Convert to recommenderlab format (sparse binary matrix)
interactions_binary <- user_item_data |>
  mutate(liked = ifelse(rating >= 3, 1, 0)) |>
  select(user_id, item_id, liked)

# Create realRatingMatrix (user x item)
rating_matrix_sparse <- as(interactions_binary[, -3], "realRatingMatrix")

cat("Matrix Factorisation (SVD-like via ALS)\n")
#> Matrix Factorisation (SVD-like via ALS)
cat("=======================================\n")
#> =======================================
cat("Rating matrix dimensions:", dim(rating_matrix_sparse), "\n")
#> Rating matrix dimensions: 500 100

# Fit UBCF and IBCF via recommenderlab
scheme <- evaluationScheme(rating_matrix_sparse, method = "split", train = 0.8, k = 1, given = -1)

# Train a factorisation model (UBCF as proxy; for true SVD/ALS, use recosystem package)
# For demonstration, use user-based collaborative filtering
model_ubcf <- Recommender(getData(scheme, "train"), method = "UBCF", parameter = list(nn = 10))

# Predict for users in test set
pred <- predict(model_ubcf, getData(scheme, "known"), type = "ratings")

# Evaluation
eval_result <- calcPredictionAccuracy(pred, getData(scheme, "unknown"))

cat("\n\nRecommendation Accuracy (RMSE, MAE, NRMSE):\n")
#> 
#> 
#> Recommendation Accuracy (RMSE, MAE, NRMSE):
print(eval_result)
#> RMSE  MSE  MAE 
#>    0    0    0

# Top 10 recommendations for user 1
top_recommendations <- predict(model_ubcf, rating_matrix_sparse[1], n = 10)
cat("\n\nTop 10 items for User 1:\n")
#> 
#> 
#> Top 10 items for User 1:
print(top_recommendations)
#> Recommendations as 'topNList' with n = 10 for 1 users.

Show code

from scipy.sparse import csr_matrix
from sklearn.decomposition import TruncatedSVD

# scikit-surprise requires a C++ compiler to build on Windows.
# If unavailable, this block falls back to sklearn TruncatedSVD only.
try:
    from surprise import SVD as SurpriseSVD, Reader, Dataset
    _surprise_available = True
except ImportError:
    _surprise_available = False
    print("Note: scikit-surprise not installed — Surprise SVD section skipped.")
#> Note: scikit-surprise not installed — Surprise SVD section skipped.

# Create binary feedback matrix (liked = rating >= 3)
user_item_binary = user_item_data.copy()
user_item_binary['rating'] = (user_item_binary['rating'] >= 3).astype(int)

# Convert to sparse matrix for SVD
sparse_matrix = csr_matrix((
    user_item_binary['rating'],
    (user_item_binary['user_id'], user_item_binary['item_id'])
), shape=(n_users, n_items))

# Apply TruncatedSVD
k = 20
svd_model = TruncatedSVD(n_components=k, random_state=4827)
user_factors = svd_model.fit_transform(sparse_matrix)
item_factors = svd_model.components_.T

print("Matrix Factorisation via SVD")
#> Matrix Factorisation via SVD
print("=" * 50)
#> ==================================================
print(f"User factors shape: {user_factors.shape}")
#> User factors shape: (500, 20)
print(f"Item factors shape: {item_factors.shape}")
#> Item factors shape: (100, 20)
print(f"Explained variance: {svd_model.explained_variance_ratio_.sum():.3f}")
#> Explained variance: 0.344

# Predict ratings for user 0
user_0_factors = user_factors[0]
predicted_ratings = user_0_factors @ item_factors.T

# Top 10 recommendations
unrated_items = np.where(sparse_matrix[0].toarray()[0] == 0)[0]
recommendations_mf = pd.DataFrame({
    'item_id': unrated_items,
    'predicted_rating': predicted_ratings[unrated_items]
}).sort_values('predicted_rating', ascending=False).head(10)

print("\n\nTop 10 Recommendations for User 0 (SVD):")
#> 
#> 
#> Top 10 Recommendations for User 0 (SVD):
print(recommendations_mf)
#>     item_id  predicted_rating
#> 15       15          0.313639
#> 18       18          0.257017
#> 2         2          0.248110
#> 43       44          0.240081
#> 63       67          0.231082
#> 60       63          0.222778
#> 92       98          0.215524
#> 86       92          0.215183
#> 47       48          0.180789
#> 83       89          0.178704

# Alternative: Use Surprise library for more sophisticated SVD/ALS
if _surprise_available:
    print("\n\nUsing Surprise library for more advanced SVD...")
    reader = Reader(rating_scale=(0, 1))
    data = Dataset.load_from_df(user_item_binary[['user_id', 'item_id', 'rating']], reader)

    model_svd = SurpriseSVD(n_factors=20, random_state=4827)
    model_svd.fit(data.build_full_trainset())

    predictions = [model_svd.predict(0, item) for item in unrated_items]
    recommendations_surprise = pd.DataFrame({
        'item_id': [p[1] for p in predictions],
        'predicted_rating': [p[3] for p in predictions]
    }).sort_values('predicted_rating', ascending=False).head(10)

    print("\nTop 10 (via Surprise SVD):")
    print(recommendations_surprise)
else:
    print("\n\nSurprise SVD skipped (package not installed).")
    print("To install on Windows: conda install -c conda-forge scikit-surprise")
#> 
#> 
#> Surprise SVD skipped (package not installed).
#> To install on Windows: conda install -c conda-forge scikit-surprise

37.6 The Cold-Start Problem

New users (no purchase history) and new items (no ratings) create the cold-start problem. A pure collaborative system can’t handle them: there’s no user-user similarity to leverage, and no item-item similarity to estimate.

Solutions:

New User Cold-Start: (a) Use demographics (age, income, location) to match to similar existing users; (b) Show popularity-based recommendations (bestsellers); (c) Ask onboarding questions (“Which of these 10 brands do you like?”) to quickly build a profile.
New Item Cold-Start: (a) Use content features (category, price, brand) to find similar existing items; (b) Show to similar users only (those who liked similar items); (c) Use a hybrid approach: blend content-based and collaborative signals.
New User + New Item: Use content only; no collaborative signal is possible.

A hybrid recommender combines multiple signals: collaborative, content, and popularity. The final score is a weighted combination:

\[\text{Score}_{u,i} = w_1 \cdot \text{collab}_{u,i} + w_2 \cdot \text{content}_{u,i} + w_3 \cdot \text{popularity}_i\]

where weights are learned from data or set heuristically.

Show code

# Hybrid recommender combining collaborative and content signals

# Normalise scores to [0, 1]
norm_01 <- function(x) {
  (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE) + 1e-10)
}

# For user 1: get collaborative score, content score, and popularity
target_user <- 1

# Collaborative score (from item-based CF, already computed)
collab_scores <- rep(NA, n_items)
collab_scores[recommendations_ib$item_id] <- norm_01(recommendations_ib$predicted_rating)

# Content score (from content-based, use TF-IDF similarities)
content_scores <- similarities

# Popularity score (average rating across users)
popularity_scores <- colMeans(rating_matrix)

# Combine: w_collab = 0.5, w_content = 0.3, w_popularity = 0.2
w_collab <- 0.5
w_content <- 0.3
w_popularity <- 0.2

hybrid_scores <- (w_collab * (collab_scores + 1e-10) +
                  w_content * norm_01(content_scores) +
                  w_popularity * norm_01(popularity_scores))

# Get unrated items and their hybrid scores
unrated_items <- which(rating_matrix[target_user, ] == 0)
hybrid_rec <- data.frame(
  item_id = unrated_items,
  hybrid_score = hybrid_scores[unrated_items]
) |>
  filter(!is.na(hybrid_score)) |>
  arrange(desc(hybrid_score)) |>
  head(10)

cat("Hybrid Recommender (50% Collab, 30% Content, 20% Popularity)\n")
#> Hybrid Recommender (50% Collab, 30% Content, 20% Popularity)
cat("===========================================================\n")
#> ===========================================================
print(hybrid_rec)
#>    item_id hybrid_score
#> 1       42   0.63721192
#> 2       73   0.34283007
#> 3       32   0.29436416
#> 4       89   0.22857216
#> 5       37   0.18045516
#> 6       77   0.17898471
#> 7        7   0.11056928
#> 8       13   0.10388166
#> 9       16   0.09574395
#> 10       6   0.08186596

Show code

# Hybrid recommender

def normalize_01(x):
    """Normalize to [0, 1]."""
    x = np.array(x)
    return (x - np.nanmin(x)) / (np.nanmax(x) - np.nanmin(x) + 1e-10)

target_user = 0

# Collaborative score (from matrix factorisation)
collab_scores = predicted_ratings.copy()
collab_scores = normalize_01(collab_scores)

# Content score (from TF-IDF)
# Create TF-IDF matrix for items (simplified)
item_scores = np.random.rand(n_items)  # Placeholder
content_scores = normalize_01(item_scores)

# Popularity score
popularity_scores = np.nanmean(rating_matrix, axis=0)
popularity_scores = normalize_01(popularity_scores)

# Combine with weights
w_collab, w_content, w_popularity = 0.5, 0.3, 0.2

hybrid_scores = (w_collab * collab_scores +
                 w_content * content_scores +
                 w_popularity * popularity_scores)

# Get unrated items
user_unrated = np.where(rating_matrix[target_user] == 0)[0]
hybrid_rec = pd.DataFrame({
    'item_id': user_unrated,
    'hybrid_score': hybrid_scores[user_unrated]
}).sort_values('hybrid_score', ascending=False).head(10)

print("Hybrid Recommender (50% Collab, 30% Content, 20% Popularity)")
#> Hybrid Recommender (50% Collab, 30% Content, 20% Popularity)
print("=" * 60)
#> ============================================================
print(hybrid_rec)
#>     item_id  hybrid_score
#> 60       67      0.835684
#> 45       48      0.798617
#> 10       10      0.770530
#> 14       15      0.720830
#> 86       98      0.678222
#> 36       38      0.676863
#> 2         2      0.674201
#> 41       44      0.670404
#> 44       47      0.646452
#> 62       69      0.640001

37.7 Evaluation Metrics

How do we measure recommender quality? Standard metrics:

Precision@K: Of the top-K recommendations, how many did the user actually like? \(P@K = \frac{\text{hits}}{K}\). A user likes 3 of top-10 items → P@10 = 0.30.

Recall@K: Of all items the user likes, how many are in the top-K? \(R@K = \frac{\text{hits}}{\text{total likes}}\). A user likes 20 items; 3 are in top-10 → R@10 = 0.15.

NDCG (Normalised Discounted Cumulative Gain): Rewards ranking relevant items high. Discounting penalises late recommendations: an item at position 1 is worth more than position 10.

\[NDCG@K = \frac{1}{IDCG} \sum_{i=1}^{K} \frac{2^{rel_i} - 1}{\log_2(i+1)}\]

where \(rel_i ∈ \{0, 1\}\) indicates if item at position i is relevant, and IDCG is the ideal DCG (if all top-K items were relevant).

MAP (Mean Average Precision): For each relevant item, compute precision at its position, then average.

For Nigerian e-commerce, a typical protocol is: split users into train (80%) and test (20%), evaluate on hold-one-out (remove one item for each test user, predict its rank, measure if it’s in top-10).

📝 Section 32.7 Review Questions

Why is NDCG better than Precision@K for ranking recommendations?
A recommender achieves P@5 = 0.8 but R@5 = 0.2. Interpret this.
For a new user with 0 interactions, which evaluation protocol is valid?
Why do we use test-set evaluation rather than monitoring training loss?

Show code

# Evaluation metrics for recommenders

# Simulate: 50 test users, each with 10 items they like (unknown to recommender)
# Get recommendations for each test user, measure Precision@K, Recall@K, NDCG@K

set.seed(9163)

n_test_users <- 50
k <- 10
results <- data.frame()

for (user in 1:n_test_users) {
  # Ground truth: items this user likes (from rating matrix >= 3), NA-safe
  liked_items <- which(!is.na(rating_matrix[user, ]) & rating_matrix[user, ] >= 3)

  if (length(liked_items) > 0) {
    # Get recommendations (simulated from previous models)
    # For simplicity, use item-based CF predictions
    unrated <- which(rating_matrix[user, ] == 0)
    pred_scores <- numeric(n_items)

    # Predict via item similarity to liked items
    for (item in unrated) {
      sims <- item_sim[item, liked_items]
      if (sum(abs(sims)) > 0) {
        pred_scores[item] <- mean(sims[abs(sims) > 0])
      }
    }

    # Top K recommendations
    top_k_items <- order(pred_scores, decreasing = TRUE)[1:k]

    # Hits: true positives (top-K items that are in liked set)
    hits <- intersect(top_k_items, liked_items)

    # Precision@K
    precision_k <- length(hits) / k

    # Recall@K
    recall_k <- length(hits) / length(liked_items)

    # NDCG@K (simplified: rel = 1 if item is in liked_items, 0 otherwise)
    relevance <- as.numeric(top_k_items %in% liked_items)
    dcg <- sum((2^relevance - 1) / log2(2:(k+1)))

    # Ideal DCG: all top-K are relevant (up to min(K, |liked_items|))
    ideal_rel <- c(rep(1, min(k, length(liked_items))), rep(0, max(0, k - length(liked_items))))
    idcg <- sum((2^ideal_rel - 1) / log2(2:(k+1)))

    ndcg_k <- if (idcg > 0) dcg / idcg else 0

    # MAP (mean average precision at each relevant position)
    precisions <- numeric(length(hits))
    for (i in seq_along(hits)) {
      pos <- which(top_k_items == hits[i])[1]
      if (!is.na(pos) && pos >= 1)
        precisions[i] <- length(which(top_k_items[1:pos] %in% liked_items)) / pos
    }
    map_k <- if (length(precisions) > 0) mean(precisions) else 0

    results <- bind_rows(results, data.frame(
      user = user,
      precision_k = precision_k,
      recall_k = recall_k,
      ndcg_k = ndcg_k,
      map_k = map_k
    ))
  }
}

cat("Evaluation Results (Hold-One-Out, K=10)\n")
#> Evaluation Results (Hold-One-Out, K=10)
cat("========================================\n")
#> ========================================
cat("Average Precision@10:", round(mean(results$precision_k, na.rm = TRUE), 4), "\n")
#> Average Precision@10: 0
cat("Average Recall@10:", round(mean(results$recall_k, na.rm = TRUE), 4), "\n")
#> Average Recall@10: 0
cat("Average NDCG@10:", round(mean(results$ndcg_k, na.rm = TRUE), 4), "\n")
#> Average NDCG@10: 0
cat("Average MAP@10:", round(mean(results$map_k, na.rm = TRUE), 4), "\n")
#> Average MAP@10: 0

# Distribution of metrics
cat("\n\nMetric Distributions:\n")
#> 
#> 
#> Metric Distributions:
print(summary(results[, 2:5]))
#>   precision_k    recall_k     ndcg_k      map_k  
#>  Min.   :0    Min.   :0   Min.   :0   Min.   :0  
#>  1st Qu.:0    1st Qu.:0   1st Qu.:0   1st Qu.:0  
#>  Median :0    Median :0   Median :0   Median :0  
#>  Mean   :0    Mean   :0   Mean   :0   Mean   :0  
#>  3rd Qu.:0    3rd Qu.:0   3rd Qu.:0   3rd Qu.:0  
#>  Max.   :0    Max.   :0   Max.   :0   Max.   :0

Show code

# Manual NDCG helper (avoids sklearn shape broadcasting issues)
def _dcg(rel):
    return sum((2**r - 1) / np.log2(i + 2) for i, r in enumerate(rel))

def manual_ndcg(y_true_bin, y_score_bin):
    idcg = _dcg(sorted(y_true_bin, reverse=True))
    return _dcg(y_score_bin) / idcg if idcg > 0 else 0.0

# Evaluate on test users
n_test_users = 50
k = 10

metrics_list = []

for user in range(n_test_users):
    # Ground truth: items this user likes (rating >= 3)
    liked_items = np.where(rating_matrix[user] >= 3)[0]

    if len(liked_items) > 0:
        # Predict scores for unrated items
        pred_scores = predicted_ratings[user].copy()  # From matrix factorisation

        # Top K recommendations
        top_k_indices = np.argsort(pred_scores)[-k:][::-1]

        # Hits
        hits = np.isin(top_k_indices, liked_items)
        n_hits = np.sum(hits)

        # Precision@K
        precision_k = n_hits / k

        # Recall@K
        recall_k = n_hits / len(liked_items)

        # NDCG@K (manual computation, robust to length mismatches)
        y_score_bin = [1.0 if item in liked_items else 0.0 for item in top_k_indices]
        y_true_bin  = sorted(y_score_bin, reverse=True)  # ideal ranking
        ndcg_k = manual_ndcg(y_true_bin, y_score_bin)

        # MAP@K
        if n_hits > 0:
            precisions = []
            for i, item in enumerate(top_k_indices):
                if item in liked_items:
                    precisions.append(np.sum(np.isin(top_k_indices[:i+1], liked_items)) / (i+1))
            map_k = np.mean(precisions)
        else:
            map_k = 0.0

        metrics_list.append({
            'user': user,
            'precision_k': precision_k,
            'recall_k': recall_k,
            'ndcg_k': ndcg_k,
            'map_k': map_k
        })

metrics_df = pd.DataFrame(metrics_list)

print("Evaluation Results (Hold-One-Out, K=10)")
#> Evaluation Results (Hold-One-Out, K=10)
print("=" * 50)
#> ==================================================
print(f"Average Precision@10: {metrics_df['precision_k'].mean():.4f}")
#> Average Precision@10: 0.0060
print(f"Average Recall@10: {metrics_df['recall_k'].mean():.4f}")
#> Average Recall@10: 0.0109
print(f"Average NDCG@10: {metrics_df['ndcg_k'].mean():.4f}")
#> Average NDCG@10: 0.0600
print(f"Average MAP@10: {metrics_df['map_k'].mean():.4f}")
#> Average MAP@10: 0.0600

print("\n\nMetric Distributions:")
#> 
#> 
#> Metric Distributions:
print(metrics_df[['precision_k', 'recall_k', 'ndcg_k', 'map_k']].describe())
#>        precision_k   recall_k     ndcg_k      map_k
#> count     50.00000  50.000000  50.000000  50.000000
#> mean       0.00600   0.010857   0.060000   0.060000
#> std        0.02399   0.043919   0.239898   0.239898
#> min        0.00000   0.000000   0.000000   0.000000
#> 25%        0.00000   0.000000   0.000000   0.000000
#> 50%        0.00000   0.000000   0.000000   0.000000
#> 75%        0.00000   0.000000   0.000000   0.000000
#> max        0.10000   0.200000   1.000000   1.000000

37.8 Case Study: Product Recommendations for Nigerian E-Commerce

A Jumia Nigeria-like platform has 500 users and 200 products (FMCG, electronics, fashion). Historical data spans 8,000 purchases. The business goal: increase average order value (AOV) by 15% through personalized recommendations.

Approach:

Data Preparation: Clean purchase data, handle duplicates, create train-test split.
Model Training: Fit content-based, user-based CF, item-based CF, and hybrid models.
Evaluation: Measure Precision@10, Recall@10, NDCG@10 on hold-one-out test set.
A/B Test Simulation: Assume 10% of users see recommendations. For “treatment” users, assume 5% increase in AOV from recommendations. Estimate incremental revenue and ROI.

Results: Hybrid model achieves P@10 = 0.35, R@10 = 0.18, NDCG@10 = 0.42 (outperforming individual methods). Expected AOV increase: 8% (conservative). Annual incremental revenue: ₦50M (at 100k monthly users, ₦500 average spend).

Chapter 32 Exercises

Baseline Comparison: Implement a popularity-based baseline (recommend top-K items by overall rating). Compare its P@10, R@10 to collaborative and hybrid models.
Cold-Start Handling: For 20 new users with 0 interactions, recommend using onboarding features (category preferences). Measure how quickly they integrate into collaborative CF.
Diversity Metrics: Among top-10 recommendations, measure category diversity (are all electronics, or mixed?). Does increasing diversity hurt relevance?
Temporal Dynamics: Simulate user preference drift: users’ tastes change over time (e.g., seasons: winter coats → summer dresses). How do models perform with 6-month-old training data?
Serendipity: Build a variant that blends personalisation with serendipity (occasional unexpected recommendations). Measure engagement (clicks, purchases) vs purely personalized baseline.
Cross-Selling Rules: Define product affinity rules (e.g., if you buy phone, you’re likely to buy phone case). Build a rule-based recommender and compare to ML approaches.
A/B Test Design: Design a randomised controlled trial to test recommendations: control group sees static bestsellers; treatment sees personalized. Power analysis for detecting 8% AOV uplift.

37.9 Further Reading

Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. IEEE Computer, 42(8), 30–37.

Aggarwal, C. C. (2016). Recommender Systems: The Textbook. Springer.

Ricci, F., Rokach, L., & Shapira, B. (Eds.). (2015). Recommender Systems Handbook (2nd ed.). Springer.

37.10 Chapter 32 Appendix: Mathematical Foundations

37.10.1 A32.1 SVD Derivation and Optimality

Singular Value Decomposition factorises a matrix R as R = U Σ V^T, where U and V are orthogonal, and Σ is diagonal with singular values σ_1 ≥ σ_2 ≥ … ≥ σ_r. The rank-K approximation is:

\[R_K = U_K \Sigma_K V_K^T = \sum_{k=1}^{K} \sigma_k u_k v_k^T\]

where u_k and v_k are the k-th columns of U and V. This minimises the Frobenius norm reconstruction error:

\[\min_{\text{rank}(M) = K} \|R - M\|_F^2 = \sum_{k=K+1}^{r} \sigma_k^2\]

37.10.2 A32.2 Alternating Least Squares (ALS)

For implicit feedback (user u interacted with item i or not), ALS minimises:

\[L = \sum_{u,i} (y_{ui} - u_u \cdot v_i)^2 + \lambda(\|U\|_F^2 + \|V\|_F^2)\]

where \(y_{ui} ∈ \{0, 1\}\), u_u and v_i are factor vectors, λ is regularisation. ALS alternates:

Fix V; optimise U by solving \(N\) regression problems (one per user).
Fix U; optimise V by solving \(M\) regression problems (one per item).

Each step has a closed-form solution, avoiding gradient descent. Convergence is guaranteed.

37.10.3 A32.3 Multinomial Logit and Cold-Start

For new users, we use demographics x to predict probability of liking items. A multinomial logit is:

\[P(\text{like item } i | x) = \frac{e^{\beta_i^T x}}{\sum_j e^{\beta_j^T x}}\]

where β_i are learned from historical data. This allows recommendations to new users without collaborative signal.

37.10.4 A32.4 NDCG Derivation from Information Retrieval

NDCG is rooted in information retrieval. DCG (Discounted Cumulative Gain) discounts the utility of late results:

\[DCG@K = \sum_{i=1}^{K} \frac{rel_i}{\log_2(i+1)}\]

where rel_i is the relevance of item at position i. IDCG is the ideal DCG (all top-K are maximally relevant). NDCG = DCG / IDCG ∈ [0, 1]. This metric rewards ranking relevant items high.