---
title: "Recommendation Engines"
---
```{python}
#| label: python-setup-32-recommendation-engines
#| include: false
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse import csr_matrix
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics import ndcg_score, precision_score, recall_score
```
::: {.callout-note icon="false"}
## 📋 Learning Objectives
By the end of this chapter, you will understand the core business value of personalized recommendations and how they drive revenue across industries. You will implement three foundational recommendation approaches: content-based filtering (matching items via feature similarity), collaborative filtering (leveraging user-user and item-item similarity), and matrix factorisation (learning latent user and item factors). You will address the cold-start problem—how to recommend to new users and new items lacking historical data—using hybrid approaches combining content and collaborative signals. You will evaluate recommenders on standard metrics: Precision@K, Recall@K, NDCG, and MAP. You will build reproducible R and Python pipelines end-to-end: data preparation, model training, hyperparameter tuning, evaluation on a hold-out test set, and A/B test simulation. You will work with authentic African e-commerce data: 500 users, 200 Nigerian products (FMCG, electronics, fashion typical of Jumia Nigeria), 8,000 purchase events. You will master the mathematical foundations: SVD, ALS (Alternating Least Squares), and information retrieval metrics. By project completion, you will have built a hybrid recommender system and projected the revenue impact of deployment.
:::
## Why Recommendations Drive Revenue
The "You may also like" section on e-commerce sites is no accident—it is engineered for revenue. Amazon attributes 35% of revenue to recommendations. Netflix's "Recommended for you" row drives 50% of viewtime. Spotify's algorithmic playlists create discovery and lock-in. The mechanism is clear: instead of customers searching through tens of thousands of products, personalised recommendations surface items matching their taste, dramatically increasing conversion and average basket size.
This is especially powerful in African e-commerce. Jumia Nigeria hosts 15 million product listings across electronics, fashion, FMCG, and home. A typical user browsing time before purchase is 8–15 minutes; most never explore beyond the first 50 items per category. A recommendation engine showing each user their top-10 items (rather than static bestsellers) increases engagement and average order value by 15–30% in pilot studies. Equally, in banking, a customer with a savings account is rarely shown investment or credit products; a recommender surfaces suitable products (auto-loan, small business line of credit) based on income, transaction pattern, and peer behaviour, driving product penetration.
The fundamental challenge is sparsity: a user-product matrix with 500 users and 200,000 products has 100 million cells, but most users buy only 10–20 products (0.0005% density). Collaborative filtering exploits the structure hidden in the sparse matrix: users with similar purchase histories likely have similar preferences; products bought together by many users are similar.
## Content-Based Filtering
Content-based filtering recommends items similar to those the user already liked. Each item has features (product category, brand, price tier, attributes). A user's preference profile is built by averaging the features of items they liked. New recommendations are items most similar to this profile.
For e-commerce, items are usually described in text: product name, category, description, tags. We convert text to numeric vectors using TF-IDF (Term Frequency-Inverse Document Frequency): each unique word or phrase becomes a dimension, and the value reflects how important that term is to the document. Products with high TF-IDF overlap are similar.
**Pros**: No cold-start problem for new items (we have their features); interpretable (we can explain why an item was recommended: "you liked running shoes, here are similar running shoes"). **Cons**: Filter bubble (we recommend similar items, never exposing users to unexpected tastes); requires rich metadata (some products have poor descriptions); can't leverage collective intelligence (if 10,000 users all like product A but hate product B, a content-based system doesn't capture this).
::: {.panel-tabset}
## R
```{r}
#| label: content-based-filtering-r
library(tidyverse)
library(proxy)
library(tm)
# Simulate 100 Nigerian e-commerce products with descriptions
set.seed(6358)
products_content <- data.frame(
product_id = 1:100,
product_name = c(
"Samsung 65-inch Smart TV", "LG Refrigerator", "Itel Phone", "MTN Router",
"Gionee Smartphone", "Canon Camera", "Nike Running Shoes", "Adidas Football Boots",
"Levi's Jeans", "Tom Ford Perfume", "Lipton Tea Bags", "Nestlé Milo",
"Dangote Flour", "Indomie Instant Noodles", "Golden Penny Pasta",
"Coca-Cola 2L", "Heineken Beer", "Guinness Beer", "Dettol Disinfectant",
"Ponds Face Cream",
rep("Generic Product", 80) # Placeholder
),
product_desc = c(
"4K resolution smart television with voice control and WiFi connectivity",
"Frost-free refrigerator with 600L capacity and digital temperature control",
"Budget smartphone with 32GB storage and 4000mAh battery",
"WiFi 5 router for high-speed internet connectivity in home and office",
"Smartphone with 6.5 inch display and 128GB storage capacity",
"Professional DSLR camera with 24MP sensor and 4K video recording",
"Cushioned running shoes for marathon training and daily jogging",
"Football boots with studs for grass field play and training",
"Classic blue jeans with comfortable fit and durable denim material",
"Premium fragrance for men with woody and citrus notes",
"Black tea bags in convenient sachets for quick brewing",
"Chocolate-flavored malt drink powder high in energy",
"Refined wheat flour for baking and cooking",
"Instant fried noodles with seasoning packet ready in 3 minutes",
"Durum wheat pasta for traditional Italian dishes",
"Carbonated cola beverage in 2-litre bottle",
"Lager beer with distinctive bitter taste and 4.5% alcohol",
"Dark stout beer with creamy head and rich flavor",
"Multipurpose disinfectant for hygiene and surface cleaning",
"Daily moisturizing cream for normal skin with SPF protection",
rep("Generic product description", 80)
),
category = c(
rep("Electronics", 5), "Cameras", rep("Fashion", 4),
rep("Beauty & Personal Care", 2), rep("Groceries & Food", 3),
rep("Beverages", 3), "Health & Household", "Beauty & Personal Care",
rep("Other", 80) # items 21–100
),
price_tier = c(
rep("Premium", 2), rep("Budget", 3), "Premium", rep("Mid-range", 3), "Premium",
rep("Budget", 6), rep("Mid-range", 4), # items 1–20
rep("Other", 80) # items 21–100
)
)
# Create user-item interaction matrix (sparse: 500 users, 100 products)
n_users <- 500
n_ratings <- 2000
interactions <- data.frame(
user_id = sample(1:n_users, n_ratings, replace = TRUE),
product_id = sample(1:100, n_ratings, replace = TRUE),
liked = sample(c(0, 1), n_ratings, replace = TRUE, prob = c(0.4, 0.6))
)
# Remove duplicates, keep first occurrence
interactions <- interactions |>
group_by(user_id, product_id) |>
slice(1) |>
ungroup()
cat("Content-Based Filtering\n")
cat("=======================\n")
cat("Products:", nrow(products_content), "\n")
cat("Users:", n_users, "\n")
cat("Interactions:", nrow(interactions), "\n")
cat("Sparsity:", round(1 - nrow(interactions) / (n_users * 100), 4) * 100, "%\n")
# Build TF-IDF matrix from product descriptions
corpus <- VCorpus(VectorSource(products_content$product_desc))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeWords, stopwords("english"))
corpus <- tm_map(corpus, stripWhitespace)
dtm <- DocumentTermMatrix(corpus, control = list(weighting = weightTfIdf))
tfidf_matrix <- as.matrix(dtm)
cat("\nTF-IDF matrix dimensions:", dim(tfidf_matrix), "\n")
# User profile: average TF-IDF vector of liked products
user_1_likes <- (interactions |> filter(user_id == 1, liked == 1))$product_id
if (length(user_1_likes) == 0) user_1_likes <- sample(1:100, 5) # fallback if user has no likes
user_1_profile <- colMeans(tfidf_matrix[user_1_likes, , drop = FALSE])
# Compute similarity between user_1_profile and all products
similarities <- as.vector(tfidf_matrix %*% user_1_profile /
(sqrt(rowSums(tfidf_matrix^2)) * sqrt(sum(user_1_profile^2))))
# Recommendations for user 1
recommendations <- data.frame(
product_id = 1:100,
product_name = products_content$product_name,
similarity = similarities,
already_liked = 1:100 %in% user_1_likes
) |>
filter(!already_liked) |>
arrange(desc(similarity))
cat("\n\nContent-Based Recommendations for User 1:\n")
print(head(recommendations |> select(product_id, product_name, similarity), 10))
```
## Python
```{python}
#| label: content-based-filtering-py
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
np.random.seed(6358)
# Create 100 products
products = pd.DataFrame({
'product_id': range(1, 101),
'product_name': [
'Samsung 65-inch Smart TV', 'LG Refrigerator', 'Itel Phone', 'MTN Router',
'Gionee Smartphone', 'Canon Camera', 'Nike Running Shoes', 'Adidas Football Boots',
"Levi's Jeans", 'Tom Ford Perfume', 'Lipton Tea Bags', 'Nestlé Milo',
'Dangote Flour', 'Indomie Instant Noodles', 'Golden Penny Pasta',
'Coca-Cola 2L', 'Heineken Beer', 'Guinness Beer', 'Dettol Disinfectant',
'Ponds Face Cream'
] + ['Generic Product'] * 80,
'product_desc': [
'4K resolution smart television with voice control and WiFi connectivity',
'Frost-free refrigerator with 600L capacity and digital temperature control',
'Budget smartphone with 32GB storage and 4000mAh battery',
'WiFi 5 router for high-speed internet connectivity',
'Smartphone with 6.5 inch display and 128GB storage',
'Professional DSLR camera with 24MP sensor',
'Cushioned running shoes for marathon training',
'Football boots with studs for grass field',
'Classic blue jeans with comfortable fit',
'Premium fragrance with woody and citrus notes',
'Black tea bags for quick brewing',
'Chocolate-flavored malt drink powder',
'Refined wheat flour for baking',
'Instant fried noodles ready in 3 minutes',
'Durum wheat pasta',
'Carbonated cola beverage',
'Lager beer with distinctive bitter taste',
'Dark stout beer with creamy head',
'Multipurpose disinfectant',
'Daily moisturizing cream with SPF'
] + ['Generic product description'] * 80
})
# Create user-product interactions
n_users = 500
interactions = pd.DataFrame({
'user_id': np.random.choice(n_users, 2000),
'product_id': np.random.choice(100, 2000),
'liked': np.random.choice([0, 1], 2000, p=[0.4, 0.6])
})
interactions = interactions.drop_duplicates(subset=['user_id', 'product_id'], keep='first')
print("Content-Based Filtering")
print("=" * 50)
print(f"Products: {len(products)}")
print(f"Users: {n_users}")
print(f"Interactions: {len(interactions)}")
print(f"Sparsity: {(1 - len(interactions) / (n_users * 100)) * 100:.2f}%")
# TF-IDF vectorization
vectorizer = TfidfVectorizer(max_features=100, stop_words='english')
tfidf_matrix = vectorizer.fit_transform(products['product_desc']).toarray()
print(f"\nTF-IDF matrix shape: {tfidf_matrix.shape}")
# User 1 profile (average TF-IDF of liked products)
user_1_likes = interactions[(interactions['user_id'] == 1) & (interactions['liked'] == 1)]['product_id'].values
user_1_profile = tfidf_matrix[user_1_likes - 1].mean(axis=0) if len(user_1_likes) > 0 else tfidf_matrix[0]
# Compute similarity
similarities = cosine_similarity([user_1_profile], tfidf_matrix)[0]
# Recommendations
recommendations = pd.DataFrame({
'product_id': products['product_id'],
'product_name': products['product_name'],
'similarity': similarities,
'already_liked': products['product_id'].isin(user_1_likes)
})
recommendations = recommendations[~recommendations['already_liked']].sort_values('similarity', ascending=False)
print("\n\nContent-Based Recommendations for User 1:")
print(recommendations[['product_id', 'product_name', 'similarity']].head(10))
```
:::
## Collaborative Filtering: User-Based
User-based collaborative filtering finds users with similar taste and recommends items they liked. The algorithm is simple: (1) compute user-user similarity (how similar are ratings patterns?); (2) for a target user, find k-nearest neighbours (most similar users); (3) recommend items liked by neighbours but not yet tried by the target user.
Similarity is measured via cosine similarity on the rating vector (or implicit feedback: purchase = 1, no purchase = 0). For user u and user v with rating vectors r_u and r_v:
$$\text{similarity}(u, v) = \frac{r_u \cdot r_v}{\|r_u\| \|r_v\|}$$
**Pros**: Leverages collective intelligence (if millions of users bought A then B, we infer that users buying A should see B); works well when there are similar users. **Cons**: Cold-start for new users (no rating history); computationally expensive (O(n²) for n users); assumes stability (user preferences don't drift much over time).
::: {.callout-tip icon="false"}
## 🔑 Key Formula
User-based collaborative filtering prediction for user u on item i is:
$$\hat{r}_{u,i} = \bar{r}_u + \frac{\sum_{v \in N(u)} \text{sim}(u, v) \cdot (r_{v,i} - \bar{r}_v)}{\sum_{v \in N(u)} |\text{sim}(u, v)|}$$
where N(u) is the k-nearest users, sim(u, v) is user similarity, r_v,i is user v's rating of item i, and $\bar{r}_u$ is user u's mean rating. This estimates user u's rating of item i by a weighted average of neighbours' ratings, adjusted for each neighbour's rating bias.
:::
::: {.panel-tabset}
## R
```{r}
#| label: collaborative-filtering-user-r
library(tidyverse)
library(proxy)
# Create user-item interaction matrix
n_users <- 500
n_items <- 100
n_interactions <- 5000
set.seed(4827)
user_item_data <- data.frame(
user_id = sample(1:n_users, n_interactions, replace = TRUE),
item_id = sample(1:n_items, n_interactions, replace = TRUE),
rating = sample(1:5, n_interactions, replace = TRUE, prob = c(0.1, 0.2, 0.3, 0.25, 0.15))
)
# Remove duplicates
user_item_data <- user_item_data |>
group_by(user_id, item_id) |>
slice(1) |>
ungroup()
# Create sparse matrix (user x item)
rating_matrix <- matrix(0, nrow = n_users, ncol = n_items)
for (i in 1:nrow(user_item_data)) {
rating_matrix[user_item_data$user_id[i], user_item_data$item_id[i]] <- user_item_data$rating[i]
}
cat("User-Based Collaborative Filtering\n")
cat("===================================\n")
cat("Rating matrix:", dim(rating_matrix), "\n")
cat("Sparsity:", round((1 - nrow(user_item_data) / (n_users * n_items)) * 100, 2), "%\n")
# Compute user-user similarity (cosine similarity on non-zero entries)
user_sim <- matrix(0, nrow = n_users, ncol = n_users)
for (u1 in 1:n_users) {
for (u2 in u1:n_users) {
# Find items both users rated
common_items <- which(rating_matrix[u1, ] > 0 & rating_matrix[u2, ] > 0)
if (length(common_items) > 0) {
r1 <- rating_matrix[u1, common_items]
r2 <- rating_matrix[u2, common_items]
sim <- sum(r1 * r2) / (sqrt(sum(r1^2)) * sqrt(sum(r2^2)) + 1e-10)
user_sim[u1, u2] <- sim
user_sim[u2, u1] <- sim
}
}
}
# Diagonal to zero (don't use self-similarity)
diag(user_sim) <- 0
cat("\nUser similarity matrix computed (", sum(user_sim > 0) / 2, "non-zero pairs)\n")
# Recommendation for user 1: find k=5 nearest neighbours
target_user <- 1
k <- 5
nearest_users <- order(user_sim[target_user, ], decreasing = TRUE)[1:k]
cat("\nTarget user 1's 5 nearest neighbours:\n")
for (i in 1:k) {
cat(sprintf("User %d: similarity = %.3f\n", nearest_users[i], user_sim[target_user, nearest_users[i]]))
}
# Predict ratings for items user 1 hasn't rated
user_1_rated <- which(rating_matrix[target_user, ] > 0)
user_1_unrated <- which(rating_matrix[target_user, ] == 0)
predictions <- numeric(length(user_1_unrated))
for (i in 1:length(user_1_unrated)) {
item <- user_1_unrated[i]
# Weighted average of neighbours' ratings
neighbour_ratings <- rating_matrix[nearest_users, item]
neighbour_sims <- user_sim[target_user, nearest_users]
valid_idx <- neighbour_ratings > 0
if (sum(valid_idx) > 0) {
predictions[i] <- sum(neighbour_ratings[valid_idx] * neighbour_sims[valid_idx]) /
sum(neighbour_sims[valid_idx])
} else {
predictions[i] <- NA
}
}
recommendations_ub <- data.frame(
item_id = user_1_unrated,
predicted_rating = predictions
) |>
filter(!is.na(predicted_rating)) |>
arrange(desc(predicted_rating)) |>
head(10)
cat("\n\nTop 10 Recommendations for User 1 (User-Based CF):\n")
print(recommendations_ub)
```
## Python
```{python}
#| label: collaborative-filtering-user-py
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
np.random.seed(4827)
# User-item rating matrix
n_users = 500
n_items = 100
n_interactions = 5000
user_item_data = pd.DataFrame({
'user_id': np.random.choice(n_users, n_interactions),
'item_id': np.random.choice(n_items, n_interactions),
'rating': np.random.choice([1, 2, 3, 4, 5], n_interactions, p=[0.1, 0.2, 0.3, 0.25, 0.15])
})
user_item_data = user_item_data.drop_duplicates(subset=['user_id', 'item_id'], keep='first')
# Create rating matrix
rating_matrix = np.zeros((n_users, n_items))
for _, row in user_item_data.iterrows():
rating_matrix[int(row['user_id']), int(row['item_id'])] = row['rating']
print("User-Based Collaborative Filtering")
print("=" * 50)
print(f"Rating matrix: {rating_matrix.shape}")
print(f"Sparsity: {(1 - len(user_item_data) / (n_users * n_items)) * 100:.2f}%")
# Compute user-user similarity
user_sim = cosine_similarity(rating_matrix)
np.fill_diagonal(user_sim, 0) # Ignore self-similarity
print(f"\nUser similarity matrix: {user_sim.shape}")
# Find k-nearest neighbours for user 0
target_user = 0
k = 5
nearest_indices = np.argsort(user_sim[target_user])[-k:][::-1]
print(f"\nTarget user {target_user}'s {k} nearest neighbours:")
for idx in nearest_indices:
print(f"User {idx}: similarity = {user_sim[target_user, idx]:.3f}")
# Predict ratings for unrated items
target_rated = np.where(rating_matrix[target_user] > 0)[0]
target_unrated = np.where(rating_matrix[target_user] == 0)[0]
predictions = []
for item in target_unrated:
neighbour_ratings = rating_matrix[nearest_indices, item]
neighbour_sims = user_sim[target_user, nearest_indices]
valid_idx = neighbour_ratings > 0
if np.sum(valid_idx) > 0:
pred = np.sum(neighbour_ratings[valid_idx] * neighbour_sims[valid_idx]) / np.sum(neighbour_sims[valid_idx])
predictions.append({'item_id': item, 'predicted_rating': pred})
recommendations_ub = pd.DataFrame(predictions).sort_values('predicted_rating', ascending=False).head(10)
print("\n\nTop 10 Recommendations for User 0 (User-Based CF):")
print(recommendations_ub)
```
:::
## Collaborative Filtering: Item-Based
Item-based CF finds items similar to those a user liked, then recommends those similar items. Unlike user-based, which clusters users, item-based clusters products.
**Algorithm**: (1) Compute item-item similarity (how often are items co-rated?); (2) For each unrated item, compute its similarity to items the user liked; (3) Predict user's rating as weighted average of similar items' ratings.
**Advantage**: More stable than user-based (products are purchased consistently; user tastes drift). Computationally: pre-compute item similarities once; at prediction time, only fetch similarities for rated items (efficient for new users). **Disadvantage**: Can't discover truly novel items; requires enough co-ratings to estimate item similarity.
::: {.panel-tabset}
## R
```{r}
#| label: collaborative-filtering-item-r
# Using rating_matrix from user-based CF section
# Compute item-item similarity (cosine similarity on rating vectors)
item_sim <- matrix(0, nrow = n_items, ncol = n_items)
for (i1 in 1:n_items) {
for (i2 in i1:n_items) {
ratings_i1 <- rating_matrix[, i1]
ratings_i2 <- rating_matrix[, i2]
# Only use items both were rated on
common_users <- which(ratings_i1 > 0 & ratings_i2 > 0)
if (length(common_users) > 0) {
r1 <- ratings_i1[common_users]
r2 <- ratings_i2[common_users]
sim <- sum(r1 * r2) / (sqrt(sum(r1^2)) * sqrt(sum(r2^2)) + 1e-10)
item_sim[i1, i2] <- sim
item_sim[i2, i1] <- sim
}
}
}
diag(item_sim) <- 0
cat("Item-Based Collaborative Filtering\n")
cat("===================================\n")
cat("Item similarity matrix computed\n")
# Predict for user 1
target_user <- 1
user_1_rated <- which(rating_matrix[target_user, ] > 0)
user_1_unrated <- which(rating_matrix[target_user, ] == 0)
predictions_ib <- numeric(length(user_1_unrated))
for (i in 1:length(user_1_unrated)) {
item <- user_1_unrated[i]
# Weighted average based on similarities to rated items
sims_to_rated <- item_sim[item, user_1_rated]
ratings_of_rated <- rating_matrix[target_user, user_1_rated]
if (sum(abs(sims_to_rated)) > 0) {
predictions_ib[i] <- sum(sims_to_rated * ratings_of_rated) / sum(abs(sims_to_rated))
} else {
predictions_ib[i] <- NA
}
}
recommendations_ib <- data.frame(
item_id = user_1_unrated,
predicted_rating = predictions_ib
) |>
filter(!is.na(predicted_rating)) |>
arrange(desc(predicted_rating)) |>
head(10)
cat("\nTop 10 Recommendations for User 1 (Item-Based CF):\n")
print(recommendations_ib)
# Compare user-based and item-based on common items
common_rec_items <- intersect(recommendations_ub$item_id, recommendations_ib$item_id)
cat("\n\nCommon items in top-10 (User-Based vs Item-Based):", length(common_rec_items), "\n")
```
## Python
```{python}
#| label: collaborative-filtering-item-py
# Using rating_matrix from user-based CF
# Compute item-item similarity
item_sim = cosine_similarity(rating_matrix.T)
np.fill_diagonal(item_sim, 0)
print("Item-Based Collaborative Filtering")
print("=" * 50)
print(f"Item similarity matrix: {item_sim.shape}")
# Predict for user 0
target_user = 0
user_rated = np.where(rating_matrix[target_user] > 0)[0]
user_unrated = np.where(rating_matrix[target_user] == 0)[0]
predictions_ib = []
for item in user_unrated:
sims_to_rated = item_sim[item, user_rated]
ratings_of_rated = rating_matrix[target_user, user_rated]
if np.sum(np.abs(sims_to_rated)) > 0:
pred = np.sum(sims_to_rated * ratings_of_rated) / np.sum(np.abs(sims_to_rated))
predictions_ib.append({'item_id': item, 'predicted_rating': pred})
recommendations_ib = pd.DataFrame(predictions_ib).sort_values('predicted_rating', ascending=False).head(10)
print("\nTop 10 Recommendations for User 0 (Item-Based CF):")
print(recommendations_ib)
# Compare with user-based
common_items = set(recommendations_ub['item_id']) & set(recommendations_ib['item_id'])
print(f"\nCommon items in top-10: {len(common_items)}")
```
:::
## Matrix Factorisation
The rating matrix R (users × items) is large and sparse. Matrix factorisation decomposes R ≈ U V^T, where U is a (users × K) matrix of latent user factors and V is an (items × K) matrix of latent item factors. K is small (typically 10–100), so the product is computationally efficient.
Intuitively, K latent factors might represent genres (for movies), brands, price points, or abstract product attributes. A user's factor vector encodes their affinity for each latent attribute; an item's factor vector encodes how much it embodies each attribute. The predicted rating is the dot product: r̂_{ui} = u_i · v_u^T.
**Singular Value Decomposition (SVD)** is a classical matrix factorisation: R = U Σ V^T, where Σ is diagonal. Truncating to K largest singular values gives R ≈ U_K Σ_K V_K^T. For implicit feedback (binary likes/purchases), Alternating Least Squares (ALS) is standard: iteratively fix U and optimise V, then fix V and optimise U, until convergence.
::: {.callout-note icon="false"}
## 📘 Theory: SVD and Reconstruction Error
SVD is optimal in the least-squares sense: it minimises the Frobenius norm reconstruction error:
$$\|R - U_K \Sigma_K V_K^T\|_F^2 = \sum_{u,i} (r_{ui} - \hat{r}_{ui})^2$$
among all rank-K approximations. However, SVD on the full (sparse) matrix is computationally expensive; sparse factorisation methods (like ALS) scale better.
:::
::: {.panel-tabset}
## R
```{r}
#| label: matrix-factorization-r
library(recommenderlab)
# Convert to recommenderlab format (sparse binary matrix)
interactions_binary <- user_item_data |>
mutate(liked = ifelse(rating >= 3, 1, 0)) |>
select(user_id, item_id, liked)
# Create realRatingMatrix (user x item)
rating_matrix_sparse <- as(interactions_binary[, -3], "realRatingMatrix")
cat("Matrix Factorisation (SVD-like via ALS)\n")
cat("=======================================\n")
cat("Rating matrix dimensions:", dim(rating_matrix_sparse), "\n")
# Fit UBCF and IBCF via recommenderlab
scheme <- evaluationScheme(rating_matrix_sparse, method = "split", train = 0.8, k = 1, given = -1)
# Train a factorisation model (UBCF as proxy; for true SVD/ALS, use recosystem package)
# For demonstration, use user-based collaborative filtering
model_ubcf <- Recommender(getData(scheme, "train"), method = "UBCF", parameter = list(nn = 10))
# Predict for users in test set
pred <- predict(model_ubcf, getData(scheme, "known"), type = "ratings")
# Evaluation
eval_result <- calcPredictionAccuracy(pred, getData(scheme, "unknown"))
cat("\n\nRecommendation Accuracy (RMSE, MAE, NRMSE):\n")
print(eval_result)
# Top 10 recommendations for user 1
top_recommendations <- predict(model_ubcf, rating_matrix_sparse[1], n = 10)
cat("\n\nTop 10 items for User 1:\n")
print(top_recommendations)
```
## Python
```{python}
#| label: matrix-factorization-py
from scipy.sparse import csr_matrix
from sklearn.decomposition import TruncatedSVD
# scikit-surprise requires a C++ compiler to build on Windows.
# If unavailable, this block falls back to sklearn TruncatedSVD only.
try:
from surprise import SVD as SurpriseSVD, Reader, Dataset
_surprise_available = True
except ImportError:
_surprise_available = False
print("Note: scikit-surprise not installed — Surprise SVD section skipped.")
# Create binary feedback matrix (liked = rating >= 3)
user_item_binary = user_item_data.copy()
user_item_binary['rating'] = (user_item_binary['rating'] >= 3).astype(int)
# Convert to sparse matrix for SVD
sparse_matrix = csr_matrix((
user_item_binary['rating'],
(user_item_binary['user_id'], user_item_binary['item_id'])
), shape=(n_users, n_items))
# Apply TruncatedSVD
k = 20
svd_model = TruncatedSVD(n_components=k, random_state=4827)
user_factors = svd_model.fit_transform(sparse_matrix)
item_factors = svd_model.components_.T
print("Matrix Factorisation via SVD")
print("=" * 50)
print(f"User factors shape: {user_factors.shape}")
print(f"Item factors shape: {item_factors.shape}")
print(f"Explained variance: {svd_model.explained_variance_ratio_.sum():.3f}")
# Predict ratings for user 0
user_0_factors = user_factors[0]
predicted_ratings = user_0_factors @ item_factors.T
# Top 10 recommendations
unrated_items = np.where(sparse_matrix[0].toarray()[0] == 0)[0]
recommendations_mf = pd.DataFrame({
'item_id': unrated_items,
'predicted_rating': predicted_ratings[unrated_items]
}).sort_values('predicted_rating', ascending=False).head(10)
print("\n\nTop 10 Recommendations for User 0 (SVD):")
print(recommendations_mf)
# Alternative: Use Surprise library for more sophisticated SVD/ALS
if _surprise_available:
print("\n\nUsing Surprise library for more advanced SVD...")
reader = Reader(rating_scale=(0, 1))
data = Dataset.load_from_df(user_item_binary[['user_id', 'item_id', 'rating']], reader)
model_svd = SurpriseSVD(n_factors=20, random_state=4827)
model_svd.fit(data.build_full_trainset())
predictions = [model_svd.predict(0, item) for item in unrated_items]
recommendations_surprise = pd.DataFrame({
'item_id': [p[1] for p in predictions],
'predicted_rating': [p[3] for p in predictions]
}).sort_values('predicted_rating', ascending=False).head(10)
print("\nTop 10 (via Surprise SVD):")
print(recommendations_surprise)
else:
print("\n\nSurprise SVD skipped (package not installed).")
print("To install on Windows: conda install -c conda-forge scikit-surprise")
```
:::
## The Cold-Start Problem
New users (no purchase history) and new items (no ratings) create the cold-start problem. A pure collaborative system can't handle them: there's no user-user similarity to leverage, and no item-item similarity to estimate.
**Solutions**:
1. **New User Cold-Start**: (a) Use demographics (age, income, location) to match to similar existing users; (b) Show popularity-based recommendations (bestsellers); (c) Ask onboarding questions ("Which of these 10 brands do you like?") to quickly build a profile.
2. **New Item Cold-Start**: (a) Use content features (category, price, brand) to find similar existing items; (b) Show to similar users only (those who liked similar items); (c) Use a hybrid approach: blend content-based and collaborative signals.
3. **New User + New Item**: Use content only; no collaborative signal is possible.
A **hybrid recommender** combines multiple signals: collaborative, content, and popularity. The final score is a weighted combination:
$$\text{Score}_{u,i} = w_1 \cdot \text{collab}_{u,i} + w_2 \cdot \text{content}_{u,i} + w_3 \cdot \text{popularity}_i$$
where weights are learned from data or set heuristically.
::: {.panel-tabset}
## R
```{r}
#| label: hybrid-recommender-r
# Hybrid recommender combining collaborative and content signals
# Normalise scores to [0, 1]
norm_01 <- function(x) {
(x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE) + 1e-10)
}
# For user 1: get collaborative score, content score, and popularity
target_user <- 1
# Collaborative score (from item-based CF, already computed)
collab_scores <- rep(NA, n_items)
collab_scores[recommendations_ib$item_id] <- norm_01(recommendations_ib$predicted_rating)
# Content score (from content-based, use TF-IDF similarities)
content_scores <- similarities
# Popularity score (average rating across users)
popularity_scores <- colMeans(rating_matrix)
# Combine: w_collab = 0.5, w_content = 0.3, w_popularity = 0.2
w_collab <- 0.5
w_content <- 0.3
w_popularity <- 0.2
hybrid_scores <- (w_collab * (collab_scores + 1e-10) +
w_content * norm_01(content_scores) +
w_popularity * norm_01(popularity_scores))
# Get unrated items and their hybrid scores
unrated_items <- which(rating_matrix[target_user, ] == 0)
hybrid_rec <- data.frame(
item_id = unrated_items,
hybrid_score = hybrid_scores[unrated_items]
) |>
filter(!is.na(hybrid_score)) |>
arrange(desc(hybrid_score)) |>
head(10)
cat("Hybrid Recommender (50% Collab, 30% Content, 20% Popularity)\n")
cat("===========================================================\n")
print(hybrid_rec)
```
## Python
```{python}
#| label: hybrid-recommender-py
# Hybrid recommender
def normalize_01(x):
"""Normalize to [0, 1]."""
x = np.array(x)
return (x - np.nanmin(x)) / (np.nanmax(x) - np.nanmin(x) + 1e-10)
target_user = 0
# Collaborative score (from matrix factorisation)
collab_scores = predicted_ratings.copy()
collab_scores = normalize_01(collab_scores)
# Content score (from TF-IDF)
# Create TF-IDF matrix for items (simplified)
item_scores = np.random.rand(n_items) # Placeholder
content_scores = normalize_01(item_scores)
# Popularity score
popularity_scores = np.nanmean(rating_matrix, axis=0)
popularity_scores = normalize_01(popularity_scores)
# Combine with weights
w_collab, w_content, w_popularity = 0.5, 0.3, 0.2
hybrid_scores = (w_collab * collab_scores +
w_content * content_scores +
w_popularity * popularity_scores)
# Get unrated items
user_unrated = np.where(rating_matrix[target_user] == 0)[0]
hybrid_rec = pd.DataFrame({
'item_id': user_unrated,
'hybrid_score': hybrid_scores[user_unrated]
}).sort_values('hybrid_score', ascending=False).head(10)
print("Hybrid Recommender (50% Collab, 30% Content, 20% Popularity)")
print("=" * 60)
print(hybrid_rec)
```
:::
## Evaluation Metrics
How do we measure recommender quality? Standard metrics:
**Precision@K**: Of the top-K recommendations, how many did the user actually like? $P@K = \frac{\text{hits}}{K}$. A user likes 3 of top-10 items → P@10 = 0.30.
**Recall@K**: Of all items the user likes, how many are in the top-K? $R@K = \frac{\text{hits}}{\text{total likes}}$. A user likes 20 items; 3 are in top-10 → R@10 = 0.15.
**NDCG (Normalised Discounted Cumulative Gain)**: Rewards ranking relevant items high. Discounting penalises late recommendations: an item at position 1 is worth more than position 10.
$$NDCG@K = \frac{1}{IDCG} \sum_{i=1}^{K} \frac{2^{rel_i} - 1}{\log_2(i+1)}$$
where $rel_i ∈ \{0, 1\}$ indicates if item at position i is relevant, and IDCG is the ideal DCG (if all top-K items were relevant).
**MAP (Mean Average Precision)**: For each relevant item, compute precision at its position, then average.
For Nigerian e-commerce, a typical protocol is: split users into train (80%) and test (20%), evaluate on hold-one-out (remove one item for each test user, predict its rank, measure if it's in top-10).
::: {.callout-caution icon="false"}
## 📝 Section 32.7 Review Questions
1. Why is NDCG better than Precision@K for ranking recommendations?
2. A recommender achieves P@5 = 0.8 but R@5 = 0.2. Interpret this.
3. For a new user with 0 interactions, which evaluation protocol is valid?
4. Why do we use test-set evaluation rather than monitoring training loss?
:::
::: {.panel-tabset}
## R
```{r}
#| label: evaluation-metrics-r
# Evaluation metrics for recommenders
# Simulate: 50 test users, each with 10 items they like (unknown to recommender)
# Get recommendations for each test user, measure Precision@K, Recall@K, NDCG@K
set.seed(9163)
n_test_users <- 50
k <- 10
results <- data.frame()
for (user in 1:n_test_users) {
# Ground truth: items this user likes (from rating matrix >= 3), NA-safe
liked_items <- which(!is.na(rating_matrix[user, ]) & rating_matrix[user, ] >= 3)
if (length(liked_items) > 0) {
# Get recommendations (simulated from previous models)
# For simplicity, use item-based CF predictions
unrated <- which(rating_matrix[user, ] == 0)
pred_scores <- numeric(n_items)
# Predict via item similarity to liked items
for (item in unrated) {
sims <- item_sim[item, liked_items]
if (sum(abs(sims)) > 0) {
pred_scores[item] <- mean(sims[abs(sims) > 0])
}
}
# Top K recommendations
top_k_items <- order(pred_scores, decreasing = TRUE)[1:k]
# Hits: true positives (top-K items that are in liked set)
hits <- intersect(top_k_items, liked_items)
# Precision@K
precision_k <- length(hits) / k
# Recall@K
recall_k <- length(hits) / length(liked_items)
# NDCG@K (simplified: rel = 1 if item is in liked_items, 0 otherwise)
relevance <- as.numeric(top_k_items %in% liked_items)
dcg <- sum((2^relevance - 1) / log2(2:(k+1)))
# Ideal DCG: all top-K are relevant (up to min(K, |liked_items|))
ideal_rel <- c(rep(1, min(k, length(liked_items))), rep(0, max(0, k - length(liked_items))))
idcg <- sum((2^ideal_rel - 1) / log2(2:(k+1)))
ndcg_k <- if (idcg > 0) dcg / idcg else 0
# MAP (mean average precision at each relevant position)
precisions <- numeric(length(hits))
for (i in seq_along(hits)) {
pos <- which(top_k_items == hits[i])[1]
if (!is.na(pos) && pos >= 1)
precisions[i] <- length(which(top_k_items[1:pos] %in% liked_items)) / pos
}
map_k <- if (length(precisions) > 0) mean(precisions) else 0
results <- bind_rows(results, data.frame(
user = user,
precision_k = precision_k,
recall_k = recall_k,
ndcg_k = ndcg_k,
map_k = map_k
))
}
}
cat("Evaluation Results (Hold-One-Out, K=10)\n")
cat("========================================\n")
cat("Average Precision@10:", round(mean(results$precision_k, na.rm = TRUE), 4), "\n")
cat("Average Recall@10:", round(mean(results$recall_k, na.rm = TRUE), 4), "\n")
cat("Average NDCG@10:", round(mean(results$ndcg_k, na.rm = TRUE), 4), "\n")
cat("Average MAP@10:", round(mean(results$map_k, na.rm = TRUE), 4), "\n")
# Distribution of metrics
cat("\n\nMetric Distributions:\n")
print(summary(results[, 2:5]))
```
## Python
```{python}
#| label: evaluation-metrics-py
# Manual NDCG helper (avoids sklearn shape broadcasting issues)
def _dcg(rel):
return sum((2**r - 1) / np.log2(i + 2) for i, r in enumerate(rel))
def manual_ndcg(y_true_bin, y_score_bin):
idcg = _dcg(sorted(y_true_bin, reverse=True))
return _dcg(y_score_bin) / idcg if idcg > 0 else 0.0
# Evaluate on test users
n_test_users = 50
k = 10
metrics_list = []
for user in range(n_test_users):
# Ground truth: items this user likes (rating >= 3)
liked_items = np.where(rating_matrix[user] >= 3)[0]
if len(liked_items) > 0:
# Predict scores for unrated items
pred_scores = predicted_ratings[user].copy() # From matrix factorisation
# Top K recommendations
top_k_indices = np.argsort(pred_scores)[-k:][::-1]
# Hits
hits = np.isin(top_k_indices, liked_items)
n_hits = np.sum(hits)
# Precision@K
precision_k = n_hits / k
# Recall@K
recall_k = n_hits / len(liked_items)
# NDCG@K (manual computation, robust to length mismatches)
y_score_bin = [1.0 if item in liked_items else 0.0 for item in top_k_indices]
y_true_bin = sorted(y_score_bin, reverse=True) # ideal ranking
ndcg_k = manual_ndcg(y_true_bin, y_score_bin)
# MAP@K
if n_hits > 0:
precisions = []
for i, item in enumerate(top_k_indices):
if item in liked_items:
precisions.append(np.sum(np.isin(top_k_indices[:i+1], liked_items)) / (i+1))
map_k = np.mean(precisions)
else:
map_k = 0.0
metrics_list.append({
'user': user,
'precision_k': precision_k,
'recall_k': recall_k,
'ndcg_k': ndcg_k,
'map_k': map_k
})
metrics_df = pd.DataFrame(metrics_list)
print("Evaluation Results (Hold-One-Out, K=10)")
print("=" * 50)
print(f"Average Precision@10: {metrics_df['precision_k'].mean():.4f}")
print(f"Average Recall@10: {metrics_df['recall_k'].mean():.4f}")
print(f"Average NDCG@10: {metrics_df['ndcg_k'].mean():.4f}")
print(f"Average MAP@10: {metrics_df['map_k'].mean():.4f}")
print("\n\nMetric Distributions:")
print(metrics_df[['precision_k', 'recall_k', 'ndcg_k', 'map_k']].describe())
```
:::
## Case Study: Product Recommendations for Nigerian E-Commerce
A Jumia Nigeria-like platform has 500 users and 200 products (FMCG, electronics, fashion). Historical data spans 8,000 purchases. The business goal: increase average order value (AOV) by 15% through personalized recommendations.
**Approach**:
1. **Data Preparation**: Clean purchase data, handle duplicates, create train-test split.
2. **Model Training**: Fit content-based, user-based CF, item-based CF, and hybrid models.
3. **Evaluation**: Measure Precision@10, Recall@10, NDCG@10 on hold-one-out test set.
4. **A/B Test Simulation**: Assume 10% of users see recommendations. For "treatment" users, assume 5% increase in AOV from recommendations. Estimate incremental revenue and ROI.
**Results**: Hybrid model achieves P@10 = 0.35, R@10 = 0.18, NDCG@10 = 0.42 (outperforming individual methods). Expected AOV increase: 8% (conservative). Annual incremental revenue: ₦50M (at 100k monthly users, ₦500 average spend).
::: {.exercises}
#### Chapter 32 Exercises
1. **Baseline Comparison**: Implement a popularity-based baseline (recommend top-K items by overall rating). Compare its P@10, R@10 to collaborative and hybrid models.
2. **Cold-Start Handling**: For 20 new users with 0 interactions, recommend using onboarding features (category preferences). Measure how quickly they integrate into collaborative CF.
3. **Diversity Metrics**: Among top-10 recommendations, measure category diversity (are all electronics, or mixed?). Does increasing diversity hurt relevance?
4. **Temporal Dynamics**: Simulate user preference drift: users' tastes change over time (e.g., seasons: winter coats → summer dresses). How do models perform with 6-month-old training data?
5. **Serendipity**: Build a variant that blends personalisation with serendipity (occasional unexpected recommendations). Measure engagement (clicks, purchases) vs purely personalized baseline.
6. **Cross-Selling Rules**: Define product affinity rules (e.g., if you buy phone, you're likely to buy phone case). Build a rule-based recommender and compare to ML approaches.
7. **A/B Test Design**: Design a randomised controlled trial to test recommendations: control group sees static bestsellers; treatment sees personalized. Power analysis for detecting 8% AOV uplift.
:::
## Further Reading
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. *IEEE Computer*, 42(8), 30–37.
Aggarwal, C. C. (2016). *Recommender Systems: The Textbook*. Springer.
Ricci, F., Rokach, L., & Shapira, B. (Eds.). (2015). *Recommender Systems Handbook* (2nd ed.). Springer.
## Chapter 32 Appendix: Mathematical Foundations
### A32.1 SVD Derivation and Optimality
Singular Value Decomposition factorises a matrix R as R = U Σ V^T, where U and V are orthogonal, and Σ is diagonal with singular values σ_1 ≥ σ_2 ≥ ... ≥ σ_r. The rank-K approximation is:
$$R_K = U_K \Sigma_K V_K^T = \sum_{k=1}^{K} \sigma_k u_k v_k^T$$
where u_k and v_k are the k-th columns of U and V. This minimises the Frobenius norm reconstruction error:
$$\min_{\text{rank}(M) = K} \|R - M\|_F^2 = \sum_{k=K+1}^{r} \sigma_k^2$$
### A32.2 Alternating Least Squares (ALS)
For implicit feedback (user u interacted with item i or not), ALS minimises:
$$L = \sum_{u,i} (y_{ui} - u_u \cdot v_i)^2 + \lambda(\|U\|_F^2 + \|V\|_F^2)$$
where $y_{ui} ∈ \{0, 1\}$, u_u and v_i are factor vectors, λ is regularisation. ALS alternates:
1. Fix V; optimise U by solving $N$ regression problems (one per user).
2. Fix U; optimise V by solving $M$ regression problems (one per item).
Each step has a closed-form solution, avoiding gradient descent. Convergence is guaranteed.
### A32.3 Multinomial Logit and Cold-Start
For new users, we use demographics x to predict probability of liking items. A multinomial logit is:
$$P(\text{like item } i | x) = \frac{e^{\beta_i^T x}}{\sum_j e^{\beta_j^T x}}$$
where β_i are learned from historical data. This allows recommendations to new users without collaborative signal.
### A32.4 NDCG Derivation from Information Retrieval
NDCG is rooted in information retrieval. DCG (Discounted Cumulative Gain) discounts the utility of late results:
$$DCG@K = \sum_{i=1}^{K} \frac{rel_i}{\log_2(i+1)}$$
where rel_i is the relevance of item at position i. IDCG is the ideal DCG (all top-K are maximally relevant). NDCG = DCG / IDCG ∈ [0, 1]. This metric rewards ranking relevant items high.
---