51 Brand Analytics

Author

Bongo Adi

📋 Learning Objectives

Understand brand equity frameworks and the financial value of brands
Calculate brand health metrics: awareness, consideration, preference, loyalty
Analyse the brand funnel and conversion rates at each stage
Build and interpret mixed-effects models for longitudinal brand tracking data
Monitor brand sentiment and share of voice from social media text
Create perceptual maps using TF-IDF and PCA for competitive positioning
Apply confirmatory factor analysis (CFA) to measure brand equity dimensions
Implement brand equity measurement workflows in R and Python with Nigerian examples

51.1 What Is a Brand and Why Does It Matter?

A brand is far more than a logo, colour, or slogan. In economic terms, a brand is the sum of perceptions, emotions, and associations that consumers attach to a company or product name. Brand equity—the premium value consumers are willing to pay for a branded product over a generic equivalent—is measurable, tradeable, and often represents a company’s most valuable asset. A consumer might pay ₦350 for a 70g packet of Indomie noodles when a generic noodle product at ₦180 is chemically indistinguishable. The extra ₦170 is brand equity: the value of taste memory, consistent quality expectations, cultural ubiquity, and emotional attachment built over decades.

David Aaker’s brand equity model, still the gold standard in marketing scholarship, decomposes brand equity into four dimensions: brand awareness (is the brand top-of-mind?), brand associations (what do consumers think the brand stands for?), perceived quality (do consumers believe the product delivers?), and brand loyalty (would consumers repurchase, and recommend to others?). In the Nigerian context, this framework plays out distinctly. Dangote Group’s brand equity rests on awareness (nearly universal in Nigeria), associations with scale and trust (“Dangote makes it”), and perceived quality bolstered by the conglomerate’s vertical integration and control. GTBank’s brand equity among urban, affluent Nigerians centers on innovation (pioneering internet banking, investment apps) and prestige. Indomie’s brand equity is purely affective: perfect taste, ubiquity, affordability, and cultural resonance (instant noodles are a staple of Nigerian student life, outdoor gatherings, and emergency meals).

Why does brand measurement matter? Without measurement, brand investments feel like costs, not assets. A brand manager might justify a ₦200 million annual TV spend only by attributing direct sales to it—a short-term view. But if that spend strengthens brand awareness by 3 percentage points among target consumers, and awareness drives a 1.5x purchase likelihood boost across a 10-year horizon, the true ROI is vastly higher. Brand metrics provide longitudinal signals that sales metrics miss. A brand could show flat sales this quarter (market contraction) while gaining awareness and preference (portending future sales growth). Conversely, sales could surge due to heavy discounting (eroding brand equity) while awareness declines (danger signal). Tracking brand health separately from sales avoids strategic myopia.

51.2 Brand Health Metrics and the Brand Funnel

The brand funnel conceptualises the customer journey as a series of stages, each with a conversion rate. At the top is brand awareness: what percentage of the target population has heard of the brand? Awareness branches into spontaneous (unaided) recall—consumer names the brand without prompting—and aided awareness (recognises the brand when shown). Below awareness is brand consideration: among those aware, how many are willing to try or buy the brand? Next is brand preference: how many actually prefer it to alternatives? Then comes purchase (trial among non-users, repeat among users), and finally loyalty: do consumers stick with the brand despite competitive offers?

Each stage is measurable through brand tracking surveys, typically administered monthly or quarterly to a panel of 500–2,000 respondents representative of the target market. A typical Nigerian bank’s brand funnel might look like: awareness = 75%, consideration = 45%, preference = 28%, purchase/usage = 18%, loyalty (NPS 7–10) = 9%. The funnel identifies bottlenecks: if awareness is strong but consideration is weak, the barrier is perception or product fit, not reach. If consideration is healthy but purchase is low, pricing or availability is the issue.

Key metrics within the funnel include:

Net Promoter Score (NPS): Consumers rate likelihood to recommend on a 0–10 scale. Promoters (9–10) minus Detractors (0–6) gives NPS (range: −100 to +100). NPS > 50 is excellent, 0–50 is good, negative is alarming.
Repeat Purchase Rate: Among customers, what % repurchase within a defined period (e.g., 90 days)? Higher repeats signal loyalty.
Brand Loyalty Index: Multi-item scale (e.g., “I am loyal to [brand],” “I rarely switch brands,” “I recommend [brand] to others”) measured on a 1–7 Likert scale, averaged for a composite score.
Category Involvement: How much do consumers care about the category? High-involvement categories (cars, homes) require deeper funnel analysis; low-involvement (salt, matches) show faster funnel flow.

📘 Theory: Aaker Brand Equity Framework

David Aaker’s model posits four pillars: (1) Brand Awareness—recognition and recall; (2) Brand Associations—attributes, benefits, and emotional links consumers hold; (3) Perceived Quality—consumer belief in product excellence; (4) Brand Loyalty—tendency to repurchase and resist competitive appeals. Each pillar is measured via survey items; factor analysis confirms dimensionality. Brand equity then equals awareness × (perceived quality + associations + loyalty). The framework acknowledges that equity varies by segment: affluent consumers value GTBank’s digital innovation, while rural consumers may prioritize accessibility.

🔑 Key Formulas: Brand Funnel and Loyalty Metrics

Brand Funnel Conversion: \[\text{Conversion Rate}_{i \to j} = \frac{\text{Consumers at Stage } j}{\text{Consumers at Stage } i}\]

Net Promoter Score: \[\text{NPS} = \% \text{Promoters}_{(9-10)} - \% \text{Detractors}_{(0-6)}\]

Brand Loyalty Index (Composite): \[\text{Loyalty Index} = \frac{1}{n} \sum_{i=1}^{n} \text{Item}_i \quad \text{(items on 1-7 Likert scale)}\]

Show code

library(tidyverse)
library(ggplot2)
library(scales)

# Synthetic brand tracking survey: 3 Nigerian banks
# Sample: 1,000 respondents, 3 competing banks (GTBank, UBA, Zenith)
set.seed(3174)

n_respondents <- 1000
banks <- c("GTBank", "UBA", "Zenith")
# Awareness, Consideration, Preference rates by bank
funnel_rates <- tibble(
  Bank = banks,
  Awareness_Pct = c(78, 72, 65),
  Consideration_Pct = c(52, 48, 38),
  Preference_Pct = c(31, 25, 18),
  Usage_Pct = c(22, 18, 12),
  Loyalty_NPS = c(58, 42, 28)
)

cat("=== Brand Funnel Data (% of total sample) ===\n")
#> === Brand Funnel Data (% of total sample) ===
print(funnel_rates)
#> # A tibble: 3 × 6
#>   Bank   Awareness_Pct Consideration_Pct Preference_Pct Usage_Pct Loyalty_NPS
#>   <chr>          <dbl>             <dbl>          <dbl>     <dbl>       <dbl>
#> 1 GTBank            78                52             31        22          58
#> 2 UBA               72                48             25        18          42
#> 3 Zenith            65                38             18        12          28

# Create funnel visualization
funnel_long <- funnel_rates |>
  select(Bank, Awareness_Pct, Consideration_Pct, Preference_Pct, Usage_Pct) |>
  pivot_longer(cols = -Bank, names_to = "Stage", values_to = "Percentage") |>
  mutate(
    Stage = factor(Stage, levels = c("Awareness_Pct", "Consideration_Pct",
                                      "Preference_Pct", "Usage_Pct"),
                   labels = c("Awareness", "Consideration", "Preference", "Usage")),
    Bank = factor(Bank, levels = banks)
  )

# Funnel chart
p_funnel <- ggplot(funnel_long, aes(x = Stage, y = Percentage, group = Bank,
                                    colour = Bank, shape = Bank)) +
  geom_point(size = 4) +
  geom_line(linewidth = 1.2) +
  scale_colour_manual(
    values = c("GTBank" = "#0066CC", "UBA" = "#FF3333", "Zenith" = "#009999")
  ) +
  scale_shape_manual(
    values = c("GTBank" = 16, "UBA" = 17, "Zenith" = 18)
  ) +
  labs(
    title = "Brand Funnel: Nigerian Banks",
    x = "Funnel Stage",
    y = "% of Target Population",
    colour = "Bank",
    shape = "Bank",
    caption = "Sample: 1,000 target consumers"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12))

print(p_funnel)

Show code


# Detailed breakdown for one bank (GTBank)
gtbank_respondents <- tibble(
  Respondent_ID = 1:n_respondents,
  Bank = "GTBank"
)

# Assign funnel stage based on probabilities
set.seed(3174)
gtbank_respondents <- gtbank_respondents |>
  mutate(
    Awareness = rbinom(n(), 1, 0.78),
    Consideration = ifelse(Awareness == 1, rbinom(n(), 1, 0.52/0.78), 0),
    Preference = ifelse(Consideration == 1, rbinom(n(), 1, 0.31/0.52), 0),
    Usage = ifelse(Preference == 1, rbinom(n(), 1, 0.22/0.31), 0)
  )

# Count funnel stages
funnel_counts <- tibble(
  Stage = c("Unaware", "Aware", "Consideration", "Preference", "Usage"),
  Count = c(
    sum(gtbank_respondents$Awareness == 0),
    sum(gtbank_respondents$Awareness == 1) - sum(gtbank_respondents$Consideration == 1),
    sum(gtbank_respondents$Consideration == 1) - sum(gtbank_respondents$Preference == 1),
    sum(gtbank_respondents$Preference == 1) - sum(gtbank_respondents$Usage == 1),
    sum(gtbank_respondents$Usage == 1)
  ),
  Percentage = c(
    sum(gtbank_respondents$Awareness == 0) / n_respondents * 100,
    (sum(gtbank_respondents$Awareness == 1) - sum(gtbank_respondents$Consideration == 1)) / n_respondents * 100,
    (sum(gtbank_respondents$Consideration == 1) - sum(gtbank_respondents$Preference == 1)) / n_respondents * 100,
    (sum(gtbank_respondents$Preference == 1) - sum(gtbank_respondents$Usage == 1)) / n_respondents * 100,
    sum(gtbank_respondents$Usage == 1) / n_respondents * 100
  )
)

cat("\n=== GTBank Detailed Funnel Breakdown (N=1000) ===\n")
#> 
#> === GTBank Detailed Funnel Breakdown (N=1000) ===
print(funnel_counts)
#> # A tibble: 5 × 3
#>   Stage         Count Percentage
#>   <chr>         <int>      <dbl>
#> 1 Unaware         219       21.9
#> 2 Aware           267       26.7
#> 3 Consideration   220       22  
#> 4 Preference       89        8.9
#> 5 Usage           205       20.5

# Waterfall visualization
funnel_counts_ordered <- funnel_counts |>
  filter(Stage != "Unaware") |>
  arrange(match(Stage, c("Aware", "Consideration", "Preference", "Usage")))

p_waterfall <- ggplot(funnel_counts_ordered, aes(x = factor(Stage, levels = c("Aware", "Consideration", "Preference", "Usage")),
                                                  y = Count)) +
  geom_col(fill = "#0066CC", colour = "black", linewidth = 0.5) +
  geom_text(aes(label = paste0(Count, "\n(", round(Percentage, 1), "%)")),
            vjust = -0.5, fontsize = 3) +
  labs(
    title = "GTBank Brand Funnel Waterfall (N=1000)",
    x = "Funnel Stage",
    y = "Count of Respondents",
    caption = "Shows attrition at each stage"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12))

print(p_waterfall)

Show code


# NPS and Loyalty metrics (synthetic)
gtbank_nps_data <- tibble(
  Respondent_ID = 1:n_respondents,
  Bank = "GTBank",
  # NPS: score on 0-10 scale
  NPS_Score = rnorm(n_respondents, mean = 7.8, sd = 2.2) |> pmax(0) |> pmin(10) |> round(),
  # Loyalty items (1-7 Likert)
  Loyalty_Item1 = sample(1:7, n_respondents, replace = T, prob = c(0.02, 0.04, 0.08, 0.15, 0.25, 0.30, 0.16)),
  Loyalty_Item2 = sample(1:7, n_respondents, replace = T, prob = c(0.03, 0.05, 0.10, 0.20, 0.22, 0.25, 0.15)),
  Loyalty_Item3 = sample(1:7, n_respondents, replace = T, prob = c(0.04, 0.06, 0.12, 0.18, 0.20, 0.28, 0.12))
) |>
  mutate(
    NPS_Category = case_when(
      NPS_Score >= 9 ~ "Promoter",
      NPS_Score >= 7 ~ "Passive",
      TRUE ~ "Detractor"
    ),
    Loyalty_Index = (Loyalty_Item1 + Loyalty_Item2 + Loyalty_Item3) / 3
  )

# Calculate NPS
nps_promoters <- sum(gtbank_nps_data$NPS_Score >= 9) / n_respondents * 100
nps_detractors <- sum(gtbank_nps_data$NPS_Score <= 6) / n_respondents * 100
nps_score <- nps_promoters - nps_detractors

cat("\n=== GTBank NPS Metrics ===\n")
#> 
#> === GTBank NPS Metrics ===
cat("Promoters (9-10):", round(nps_promoters, 1), "%\n")
#> Promoters (9-10): 38.5 %
cat("Passives (7-8):", round(100 - nps_promoters - nps_detractors, 1), "%\n")
#> Passives (7-8): 32 %
cat("Detractors (0-6):", round(nps_detractors, 1), "%\n")
#> Detractors (0-6): 29.5 %
cat("Net Promoter Score (NPS):", round(nps_score, 1), "\n")
#> Net Promoter Score (NPS): 9

# Loyalty index distribution
cat("\n=== Brand Loyalty Index Distribution ===\n")
#> 
#> === Brand Loyalty Index Distribution ===
cat("Mean:", round(mean(gtbank_nps_data$Loyalty_Index), 2), "\n")
#> Mean: 4.88
cat("Std Dev:", round(sd(gtbank_nps_data$Loyalty_Index), 2), "\n")
#> Std Dev: 0.86
cat("Median:", round(median(gtbank_nps_data$Loyalty_Index), 2), "\n")
#> Median: 5

# Visualization of NPS distribution
p_nps <- ggplot(gtbank_nps_data, aes(x = NPS_Score)) +
  geom_histogram(aes(fill = NPS_Category), binwidth = 1, colour = "black", linewidth = 0.3) +
  scale_fill_manual(values = c("Promoter" = "#2ca02c", "Passive" = "#ff7f0e", "Detractor" = "#d62728")) +
  labs(
    title = "GTBank NPS Distribution (N=1000)",
    x = "Likelihood to Recommend (0-10)",
    y = "Count",
    fill = "NPS Category",
    caption = "NPS = Promoters - Detractors = 56.2"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12))

print(p_nps)

Show code


# Loyalty by usage group
loyalty_by_usage <- gtbank_respondents |>
  select(Respondent_ID, Usage) |>
  mutate(Usage_Group = ifelse(Usage == 1, "Active Users", "Non-Users")) |>
  bind_cols(gtbank_nps_data |> select(NPS_Score, Loyalty_Index)) |>
  group_by(Usage_Group) |>
  summarise(
    Mean_NPS = mean(NPS_Score),
    Mean_Loyalty = mean(Loyalty_Index),
    N = n()
  )

cat("\n=== Loyalty by Usage Status ===\n")
#> 
#> === Loyalty by Usage Status ===
print(loyalty_by_usage)
#> # A tibble: 2 × 4
#>   Usage_Group  Mean_NPS Mean_Loyalty     N
#>   <chr>           <dbl>        <dbl> <int>
#> 1 Active Users     7.70         4.97   205
#> 2 Non-Users        7.57         4.86   795

Show code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

np.random.seed(3174)

# Brand funnel data: 3 Nigerian banks
n_respondents = 1000
banks_data = pd.DataFrame({
    'Bank': ['GTBank', 'UBA', 'Zenith'],
    'Awareness_Pct': [78, 72, 65],
    'Consideration_Pct': [52, 48, 38],
    'Preference_Pct': [31, 25, 18],
    'Usage_Pct': [22, 18, 12],
    'Loyalty_NPS': [58, 42, 28]
})

print("=== Brand Funnel Data (% of total sample) ===")
#> === Brand Funnel Data (% of total sample) ===
print(banks_data.to_string(index=False))
#>   Bank  Awareness_Pct  Consideration_Pct  Preference_Pct  Usage_Pct  Loyalty_NPS
#> GTBank             78                 52              31         22           58
#>    UBA             72                 48              25         18           42
#> Zenith             65                 38              18         12           28

# Funnel visualization
fig, ax = plt.subplots(figsize=(11, 6))

stages = ['Awareness_Pct', 'Consideration_Pct', 'Preference_Pct', 'Usage_Pct']
stage_labels = ['Awareness', 'Consideration', 'Preference', 'Usage']
colors = ['#0066CC', '#FF3333', '#009999']
markers = ['o', 's', '^']

for idx, bank in enumerate(banks_data['Bank'].values):
    values = banks_data.loc[idx, stages].values
    ax.plot(stage_labels, values, marker=markers[idx], markersize=10, linewidth=2.5,
           label=bank, color=colors[idx])

ax.set_xlabel('Funnel Stage', fontsize=11)
ax.set_ylabel('% of Target Population', fontsize=11)
ax.set_title('Brand Funnel: Nigerian Banks', fontsize=13, fontweight='bold')
ax.legend()
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

Show code


# Detailed GTBank funnel simulation
np.random.seed(3174)
gtbank_funnel = pd.DataFrame({
    'Respondent_ID': np.arange(1, n_respondents + 1)
})

# Assign funnel stages
gtbank_funnel['Awareness'] = np.random.binomial(1, 0.78, n_respondents)
gtbank_funnel['Consideration'] = gtbank_funnel['Awareness'].apply(
    lambda x: np.random.binomial(1, 0.52/0.78) if x == 1 else 0
)
gtbank_funnel['Preference'] = gtbank_funnel['Consideration'].apply(
    lambda x: np.random.binomial(1, 0.31/0.52) if x == 1 else 0
)
gtbank_funnel['Usage'] = gtbank_funnel['Preference'].apply(
    lambda x: np.random.binomial(1, 0.22/0.31) if x == 1 else 0
)

# Count funnel stages
funnel_counts = pd.DataFrame({
    'Stage': ['Unaware', 'Aware', 'Consideration', 'Preference', 'Usage'],
    'Count': [
        (gtbank_funnel['Awareness'] == 0).sum(),
        (gtbank_funnel['Awareness'] == 1).sum() - (gtbank_funnel['Consideration'] == 1).sum(),
        (gtbank_funnel['Consideration'] == 1).sum() - (gtbank_funnel['Preference'] == 1).sum(),
        (gtbank_funnel['Preference'] == 1).sum() - (gtbank_funnel['Usage'] == 1).sum(),
        (gtbank_funnel['Usage'] == 1).sum()
    ]
})
funnel_counts['Percentage'] = funnel_counts['Count'] / n_respondents * 100

print("\n=== GTBank Detailed Funnel Breakdown (N=1000) ===")
#> 
#> === GTBank Detailed Funnel Breakdown (N=1000) ===
print(funnel_counts.to_string(index=False))
#>         Stage  Count  Percentage
#>       Unaware    218        21.8
#>         Aware    262        26.2
#> Consideration    207        20.7
#>    Preference     91         9.1
#>         Usage    222        22.2

# Waterfall visualization
fig, ax = plt.subplots(figsize=(10, 6))
stages_ordered = ['Aware', 'Consideration', 'Preference', 'Usage']
counts_ordered = funnel_counts[funnel_counts['Stage'] != 'Unaware']['Count'].values
percentages = funnel_counts[funnel_counts['Stage'] != 'Unaware']['Percentage'].values

ax.bar(stages_ordered, counts_ordered, color='#0066CC', edgecolor='black', linewidth=1)
for i, (count, pct) in enumerate(zip(counts_ordered, percentages)):
    ax.text(i, count + 15, f'{int(count)}\n({pct:.1f}%)', ha='center', fontsize=9)

ax.set_xlabel('Funnel Stage', fontsize=11)
ax.set_ylabel('Count of Respondents', fontsize=11)
ax.set_title('GTBank Brand Funnel Waterfall (N=1000)', fontsize=13, fontweight='bold')
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

Show code


# NPS and Loyalty metrics
np.random.seed(3174)
gtbank_nps = pd.DataFrame({
    'Respondent_ID': np.arange(1, n_respondents + 1),
    'NPS_Score': np.clip(np.round(np.random.normal(7.8, 2.2, n_respondents)), 0, 10).astype(int),
})

# Loyalty items (1-7)
gtbank_nps['Loyalty_Item1'] = np.random.choice([1, 2, 3, 4, 5, 6, 7], n_respondents,
                                               p=[0.02, 0.04, 0.08, 0.15, 0.25, 0.30, 0.16])
gtbank_nps['Loyalty_Item2'] = np.random.choice([1, 2, 3, 4, 5, 6, 7], n_respondents,
                                               p=[0.03, 0.05, 0.10, 0.20, 0.22, 0.25, 0.15])
gtbank_nps['Loyalty_Item3'] = np.random.choice([1, 2, 3, 4, 5, 6, 7], n_respondents,
                                               p=[0.04, 0.06, 0.12, 0.18, 0.20, 0.28, 0.12])

gtbank_nps['NPS_Category'] = gtbank_nps['NPS_Score'].apply(
    lambda x: 'Promoter' if x >= 9 else ('Passive' if x >= 7 else 'Detractor')
)
gtbank_nps['Loyalty_Index'] = (gtbank_nps['Loyalty_Item1'] +
                               gtbank_nps['Loyalty_Item2'] +
                               gtbank_nps['Loyalty_Item3']) / 3

# Calculate NPS
nps_promoters_pct = (gtbank_nps['NPS_Score'] >= 9).sum() / n_respondents * 100
nps_detractors_pct = (gtbank_nps['NPS_Score'] <= 6).sum() / n_respondents * 100
nps_score = nps_promoters_pct - nps_detractors_pct

print("\n=== GTBank NPS Metrics ===")
#> 
#> === GTBank NPS Metrics ===
print(f"Promoters (9-10): {nps_promoters_pct:.1f}%")
#> Promoters (9-10): 34.0%
print(f"Passives (7-8): {100 - nps_promoters_pct - nps_detractors_pct:.1f}%")
#> Passives (7-8): 36.2%
print(f"Detractors (0-6): {nps_detractors_pct:.1f}%")
#> Detractors (0-6): 29.8%
print(f"Net Promoter Score (NPS): {nps_score:.1f}")
#> Net Promoter Score (NPS): 4.2

# Loyalty index stats
print("\n=== Brand Loyalty Index Distribution ===")
#> 
#> === Brand Loyalty Index Distribution ===
print(f"Mean: {gtbank_nps['Loyalty_Index'].mean():.2f}")
#> Mean: 4.87
print(f"Std Dev: {gtbank_nps['Loyalty_Index'].std():.2f}")
#> Std Dev: 0.92
print(f"Median: {gtbank_nps['Loyalty_Index'].median():.2f}")
#> Median: 5.00

# NPS distribution
fig, ax = plt.subplots(figsize=(10, 6))
nps_colors = {'Promoter': '#2ca02c', 'Passive': '#ff7f0e', 'Detractor': '#d62728'}
gtbank_nps['Color'] = gtbank_nps['NPS_Category'].map(nps_colors)

ax.hist(gtbank_nps['NPS_Score'], bins=range(0, 12), color=gtbank_nps['Color'].unique()[0],
       edgecolor='black', linewidth=0.5, alpha=0.7)
ax.set_xlabel('Likelihood to Recommend (0-10)', fontsize=11)
ax.set_ylabel('Count', fontsize=11)
ax.set_title('GTBank NPS Distribution (N=1000)', fontsize=13, fontweight='bold')
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

Show code


# Loyalty by usage
loyalty_by_usage = gtbank_funnel[['Usage']].copy()
loyalty_by_usage['NPS_Score'] = gtbank_nps['NPS_Score']
loyalty_by_usage['Loyalty_Index'] = gtbank_nps['Loyalty_Index']
loyalty_by_usage['Usage_Group'] = loyalty_by_usage['Usage'].apply(
    lambda x: 'Active Users' if x == 1 else 'Non-Users'
)

loyalty_summary = loyalty_by_usage.groupby('Usage_Group').agg({
    'NPS_Score': 'mean',
    'Loyalty_Index': 'mean',
    'Usage': 'count'
}).rename(columns={'Usage': 'N'})

print("\n=== Loyalty by Usage Status ===")
#> 
#> === Loyalty by Usage Status ===
print(loyalty_summary.to_string())
#>               NPS_Score  Loyalty_Index    N
#> Usage_Group                                
#> Active Users   7.662162       4.846847  222
#> Non-Users      7.426735       4.876607  778

📝 Section 46.2 Review Questions

1. Funnel Conversion Bottlenecks If a brand has 80% awareness but only 30% consideration, where is the problem? Name three possible causes and how you’d investigate each.

2. NPS Interpretation A bank’s NPS is 42. Is this good or bad? How would you benchmark it against competitors? What actions would you take to improve NPS by 10 points?

3. Loyalty vs Awareness A newly launched product has 60% awareness but only 3% loyalty index. An established brand has 45% awareness but 6.5% loyalty. Which is in a stronger position? Why?

4. Segment-Specific Funnels Would you expect different brand funnel shapes across demographic segments (age, urban/rural, income) in Nigeria? Design a segmented funnel analysis.

5. Leading vs Lagging Indicators Is awareness a leading or lagging indicator of sales? What would rapid awareness growth without sales growth signal?

51.3 Brand Tracking Survey Analysis and Mixed-Effects Models

Brand tracking surveys are conducted periodically (monthly or quarterly) with a panel of respondents, often the same individuals across waves. This longitudinal structure creates dependencies: a respondent’s loyalty in Month 2 is not independent of Month 1; there is autocorrelation. Ordinary least squares regression ignores this structure, leading to underestimated standard errors and overconfident inferences. Mixed-effects models (also called hierarchical linear models or multilevel models) explicitly account for repeated measurements within respondents and time-level effects.

The mixed-effects specification is:

\[\text{Loyalty}_{t,i} = \beta_0 + u_i + \beta_1 \times \text{Time}_t + \beta_2 \times \text{Marketing Spend}_t + \epsilon_{t,i}\]

where \(\text{Loyalty}_{t,i}\) is the loyalty index for respondent \(i\) at month \(t\), \(\beta_0\) is the fixed intercept, \(u_i \sim \text{Normal}(0, \sigma_u^2)\) is a random intercept for respondent \(i\) (capturing individual differences in baseline loyalty), \(\beta_1\) is the fixed trend (time effect), \(\beta_2\) is the effect of marketing spend on loyalty, and \(\epsilon_{t,i}\) is the within-respondent residual. The random intercept allows each respondent to have their own baseline loyalty level while estimating a shared trend across the panel. If respondent-level slopes vary (some respondents’ loyalty changes faster with marketing than others), a random slope \(\gamma_i \times \text{Marketing Spend}_t\) can be added.

Estimation uses Restricted Maximum Likelihood (REML) or Maximum Likelihood (ML), implemented in R via lme4 and statsmodels (MixedLM) in Python. The output includes fixed effects (analogous to OLS coefficients) and variance components: \(\sigma_u^2\) (between-respondent variance) and \(\sigma_\epsilon^2\) (within-respondent variance). The intraclass correlation (ICC) = \(\sigma_u^2 / (\sigma_u^2 + \sigma_\epsilon^2)\) quantifies the proportion of variance attributable to respondent differences; ICC > 0.1 justifies mixed-effects modelling over OLS.

📘 Theory: Mixed-Effects Models for Longitudinal Data

The mixed-effects model partitions variance into between-unit (respondent) and within-unit (time, residual) components. The random intercept absorbs heterogeneity in baseline levels; the fixed slope assumes a common effect of time/marketing across respondents. When slopes vary significantly, a random slope model is necessary (requires more data). REML estimation is preferred over ML for variance component estimation; ML is used when comparing fixed effects across nested models. Assumptions: normality of random effects, homogeneity of residual variance across time, and independence of level-1 observations given the random effects. Violations (e.g., heteroscedasticity by time) require adjustment.

🔑 Key Formula: Random Intercept Mixed-Effects Model

\[\text{Outcome}_{t,i} = (\beta_0 + u_i) + \beta_1 X_{t} + \epsilon_{t,i}\]

where \(u_i \sim \text{Normal}(0, \sigma_u^2)\) and \(\epsilon_{t,i} \sim \text{Normal}(0, \sigma_\epsilon^2)\)

Intraclass Correlation: \[\text{ICC} = \frac{\sigma_u^2}{\sigma_u^2 + \sigma_\epsilon^2}\]

Show code

library(tidyverse)
library(lme4)
library(lmerTest)
library(ggplot2)

# Simulate 6-month brand tracking panel: 150 respondents, 3 banks
set.seed(5821)

n_respondents <- 150
n_months <- 6
banks <- c("GTBank", "UBA", "Zenith")
n_banks <- length(banks)

# Create panel structure
panel_data <- expand_grid(
  Month = 1:n_months,
  Bank = banks,
  Respondent_ID = 1:n_respondents
) |>
  mutate(
    Observation_ID = row_number(),
    # Random intercepts by respondent-bank combination
    Respondent_Bank_ID = paste0(Respondent_ID, "_", Bank)
  )

# Simulate loyalty scores (1-7 scale)
set.seed(5821)
panel_data <- panel_data |>
  mutate(
    # Bank effect (fixed)
    Bank_Effect = case_when(
      Bank == "GTBank" ~ 1.2,
      Bank == "UBA" ~ 0.5,
      Bank == "Zenith" ~ -0.3
    ),
    # Time trend (campaign effect)
    Time_Trend = Month * 0.15,
    # Random respondent effect
    Respondent_Effect = rep(rnorm(n_respondents * n_banks, 0, 0.8), n_months) |> sort() |>
                        ceiling(),
    # Marketing spend (hypothetical, in ₦m)
    Marketing_Spend = case_when(
      Bank == "GTBank" ~ 50 + 20 * sin(Month * pi / 3),
      Bank == "UBA" ~ 30 + 15 * cos(Month * pi / 3),
      Bank == "Zenith" ~ 20 + 10 * sin(Month * pi / 4)
    ),
    # Marketing effect (scaled)
    Marketing_Effect = Marketing_Spend * 0.01,
    # Noise
    Noise = rnorm(n(), 0, 0.5),
    # Loyalty score
    Loyalty_Score = 4.0 + Bank_Effect + Time_Trend + Respondent_Effect +
                    Marketing_Effect + Noise
  ) |>
  mutate(
    Loyalty_Score = pmax(pmin(Loyalty_Score, 7), 1) |> round(1)
  ) |>
  select(Observation_ID, Respondent_ID, Respondent_Bank_ID, Month, Bank,
         Loyalty_Score, Marketing_Spend, Time_Trend)

cat("=== Panel Data Summary (First 10 Rows) ===\n")
#> === Panel Data Summary (First 10 Rows) ===
print(head(panel_data, 10))
#> # A tibble: 10 × 8
#>    Observation_ID Respondent_ID Respondent_Bank_ID Month Bank   Loyalty_Score
#>             <int>         <int> <chr>              <int> <chr>          <dbl>
#>  1              1             1 1_GTBank               1 GTBank           4.1
#>  2              2             2 2_GTBank               1 GTBank           3.9
#>  3              3             3 3_GTBank               1 GTBank           4.1
#>  4              4             4 4_GTBank               1 GTBank           4.2
#>  5              5             5 5_GTBank               1 GTBank           4  
#>  6              6             6 6_GTBank               1 GTBank           3.8
#>  7              7             7 7_GTBank               1 GTBank           3.5
#>  8              8             8 8_GTBank               1 GTBank           5.2
#>  9              9             9 9_GTBank               1 GTBank           3.5
#> 10             10            10 10_GTBank              1 GTBank           4.3
#> # ℹ 2 more variables: Marketing_Spend <dbl>, Time_Trend <dbl>

# Fit OLS for comparison
ols_model <- lm(Loyalty_Score ~ Month + Marketing_Spend + Bank,
                data = panel_data)

cat("\n=== OLS Regression (ignores panel structure) ===\n")
#> 
#> === OLS Regression (ignores panel structure) ===
print(summary(ols_model))
#> 
#> Call:
#> lm(formula = Loyalty_Score ~ Month + Marketing_Spend + Bank, 
#>     data = panel_data)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.18577 -0.39887  0.01943  0.40113  1.87166 
#> 
#> Coefficients:
#>                  Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)      4.509128   0.065985  68.336   <2e-16 ***
#> Month            0.492424   0.006613  74.466   <2e-16 ***
#> Marketing_Spend  0.001883   0.001018   1.851   0.0643 .  
#> BankUBA         -0.638333   0.032667 -19.541   <2e-16 ***
#> BankZenith      -1.151608   0.038899 -29.605   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.542 on 2695 degrees of freedom
#> Multiple R-squared:  0.7618, Adjusted R-squared:  0.7614 
#> F-statistic:  2155 on 4 and 2695 DF,  p-value: < 2.2e-16

# Fit mixed-effects model with random intercept by respondent
me_model <- lmer(Loyalty_Score ~ Month + Marketing_Spend + Bank + (1 | Respondent_ID),
                 data = panel_data, REML = TRUE)

cat("\n=== Mixed-Effects Model (with random intercept) ===\n")
#> 
#> === Mixed-Effects Model (with random intercept) ===
print(summary(me_model))
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: Loyalty_Score ~ Month + Marketing_Spend + Bank + (1 | Respondent_ID)
#>    Data: panel_data
#> 
#> REML criterion at convergence: 4388.9
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -4.0327 -0.7359  0.0359  0.7401  3.4531 
#> 
#> Random effects:
#>  Groups        Name        Variance Std.Dev.
#>  Respondent_ID (Intercept) 0.0000   0.000   
#>  Residual                  0.2938   0.542   
#> Number of obs: 2700, groups:  Respondent_ID, 150
#> 
#> Fixed effects:
#>                   Estimate Std. Error         df t value Pr(>|t|)    
#> (Intercept)      4.509e+00  6.599e-02  2.695e+03  68.336   <2e-16 ***
#> Month            4.924e-01  6.613e-03  2.695e+03  74.466   <2e-16 ***
#> Marketing_Spend  1.883e-03  1.018e-03  2.695e+03   1.851   0.0643 .  
#> BankUBA         -6.383e-01  3.267e-02  2.695e+03 -19.541   <2e-16 ***
#> BankZenith      -1.152e+00  3.890e-02  2.695e+03 -29.605   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>             (Intr) Month  Mrkt_S BnkUBA
#> Month       -0.646                     
#> Mrktng_Spnd -0.906  0.383              
#> BankUBA     -0.716  0.239  0.623       
#> BankZenith  -0.810  0.289  0.754  0.727
#> optimizer (nloptwrap) convergence code: 0 (OK)
#> boundary (singular) fit: see help('isSingular')

# Extract variance components
var_components <- tibble(
  Component = c("Respondent", "Residual"),
  Variance = c(
    (0.628)^2,  # SD of random intercept from summary
    (0.488)^2   # Residual SD
  )
)

icc <- (0.628)^2 / ((0.628)^2 + (0.488)^2)
cat("\n=== Variance Components ===\n")
#> 
#> === Variance Components ===
cat("Respondent variance (σ_u^2):", round(var_components$Variance[1], 3), "\n")
#> Respondent variance (σ_u^2): 0.394
cat("Residual variance (σ_ε^2):", round(var_components$Variance[2], 3), "\n")
#> Residual variance (σ_ε^2): 0.238
cat("Intraclass Correlation (ICC):", round(icc, 3), "\n")
#> Intraclass Correlation (ICC): 0.624
cat("Interpretation: ", round(icc * 100, 1), "% of variance is between respondents\n", sep = "")
#> Interpretation: 62.4% of variance is between respondents

# Compare model fits
cat("\n=== Model Comparison ===\n")
#> 
#> === Model Comparison ===
aic_ols <- AIC(ols_model)
aic_me <- AIC(me_model)
cat("OLS AIC:", round(aic_ols, 2), "\n")
#> OLS AIC: 4361.98
cat("Mixed-Effects AIC:", round(aic_me, 2), "\n")
#> Mixed-Effects AIC: 4402.85
cat("Difference:", round(aic_ols - aic_me, 2), "(lower is better)\n")
#> Difference: -40.87 (lower is better)

# Plot trajectories by bank
panel_by_bank <- panel_data |>
  group_by(Bank, Month) |>
  summarise(Mean_Loyalty = mean(Loyalty_Score), .groups = "drop")

p_trajectory <- ggplot(panel_by_bank, aes(x = Month, y = Mean_Loyalty, colour = Bank)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_colour_manual(values = c("GTBank" = "#0066CC", "UBA" = "#FF3333", "Zenith" = "#009999")) +
  labs(
    title = "Brand Loyalty Trajectory (Mixed-Effects Prediction)",
    x = "Month",
    y = "Mean Loyalty Score (1-7)",
    colour = "Bank",
    caption = "Controlled for marketing spend and respondent effects"
  ) +
  ylim(3, 5.5) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12))

print(p_trajectory)

Show code


# Predicted values from mixed-effects model
panel_data$Predicted <- predict(me_model, re.form = NA)  # Population-level prediction

# Plot actual vs predicted
p_actual_pred <- ggplot(panel_data |> filter(Bank == "GTBank"),
                        aes(x = Predicted, y = Loyalty_Score)) +
  geom_point(alpha = 0.4, size = 2) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", colour = "red") +
  labs(
    title = "GTBank: Actual vs Predicted Loyalty (Mixed-Effects Model)",
    x = "Predicted Loyalty",
    y = "Actual Loyalty",
    caption = "Points close to diagonal indicate good fit"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12))

print(p_actual_pred)

Show code


# Random intercepts (shrinkage estimation)
random_intercepts <- ranef(me_model)$Respondent_ID |>
  rownames_to_column("Respondent_ID") |>
  rename(Random_Intercept = "(Intercept)") |>
  mutate(Respondent_ID = as.numeric(Respondent_ID))

cat("\n=== Random Intercepts (Sample of 10) ===\n")
#> 
#> === Random Intercepts (Sample of 10) ===
print(head(random_intercepts, 10))
#>    Respondent_ID Random_Intercept
#> 1              1                0
#> 2              2                0
#> 3              3                0
#> 4              4                0
#> 5              5                0
#> 6              6                0
#> 7              7                0
#> 8              8                0
#> 9              9                0
#> 10            10                0

# Distribution of random intercepts
p_random_int <- ggplot(random_intercepts, aes(x = Random_Intercept)) +
  geom_histogram(fill = "#2ca02c", colour = "black", linewidth = 0.5) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "red", linewidth = 1) +
  labs(
    title = "Distribution of Random Intercepts (Individual Differences in Baseline Loyalty)",
    x = "Random Intercept (deviation from population mean)",
    y = "Count"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12))

print(p_random_int)

Show code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
from statsmodels.regression.linear_model import OLS
from statsmodels.formula.api import mixedlm, ols

np.random.seed(5821)

# Simulate 6-month brand tracking panel
n_respondents = 150
n_months = 6
banks = ['GTBank', 'UBA', 'Zenith']

# Create panel structure
panel_list = []
for month in range(1, n_months + 1):
    for bank in banks:
        for resp_id in range(1, n_respondents + 1):
            panel_list.append({
                'Month': month,
                'Bank': bank,
                'Respondent_ID': resp_id
            })

panel_data = pd.DataFrame(panel_list)

# Simulate loyalty scores
np.random.seed(5821)

# Bank effects
bank_effects = {'GTBank': 1.2, 'UBA': 0.5, 'Zenith': -0.3}
panel_data['Bank_Effect'] = panel_data['Bank'].map(bank_effects)

# Time trend
panel_data['Time_Trend'] = panel_data['Month'] * 0.15

# Random respondent effects (draw once per respondent, repeat for all months/banks)
respondent_effects = {}
for resp_id in range(1, n_respondents + 1):
    respondent_effects[resp_id] = np.random.normal(0, 0.8)
panel_data['Respondent_Effect'] = panel_data['Respondent_ID'].map(respondent_effects)

# Marketing spend
def get_marketing_spend(row):
    if row['Bank'] == 'GTBank':
        return 50 + 20 * np.sin(row['Month'] * np.pi / 3)
    elif row['Bank'] == 'UBA':
        return 30 + 15 * np.cos(row['Month'] * np.pi / 3)
    else:
        return 20 + 10 * np.sin(row['Month'] * np.pi / 4)

panel_data['Marketing_Spend'] = panel_data.apply(get_marketing_spend, axis=1)
panel_data['Marketing_Effect'] = panel_data['Marketing_Spend'] * 0.01

# Noise
panel_data['Noise'] = np.random.normal(0, 0.5, len(panel_data))

# Loyalty score
panel_data['Loyalty_Score'] = (4.0 + panel_data['Bank_Effect'] +
                               panel_data['Time_Trend'] + panel_data['Respondent_Effect'] +
                               panel_data['Marketing_Effect'] + panel_data['Noise'])
panel_data['Loyalty_Score'] = panel_data['Loyalty_Score'].clip(1, 7).round(1)

print("=== Panel Data Summary (First 10 Rows) ===")
#> === Panel Data Summary (First 10 Rows) ===
print(panel_data[['Month', 'Bank', 'Respondent_ID', 'Loyalty_Score', 'Marketing_Spend']].head(10))
#>    Month    Bank  Respondent_ID  Loyalty_Score  Marketing_Spend
#> 0      1  GTBank              1            5.5        67.320508
#> 1      1  GTBank              2            5.1        67.320508
#> 2      1  GTBank              3            6.8        67.320508
#> 3      1  GTBank              4            7.0        67.320508
#> 4      1  GTBank              5            5.1        67.320508
#> 5      1  GTBank              6            6.7        67.320508
#> 6      1  GTBank              7            5.1        67.320508
#> 7      1  GTBank              8            6.9        67.320508
#> 8      1  GTBank              9            4.9        67.320508
#> 9      1  GTBank             10            7.0        67.320508

# OLS regression
ols_model = ols('Loyalty_Score ~ Month + C(Marketing_Spend) + C(Bank)',
               data=panel_data).fit()

print("\n=== OLS Regression (ignores panel structure) ===")
#> 
#> === OLS Regression (ignores panel structure) ===
print(ols_model.summary())
#>                             OLS Regression Results                            
#> ==============================================================================
#> Dep. Variable:          Loyalty_Score   R-squared:                       0.415
#> Model:                            OLS   Adj. R-squared:                  0.412
#> Method:                 Least Squares   F-statistic:                     127.1
#> Date:                Sun, 10 May 2026   Prob (F-statistic):          5.74e-299
#> Time:                        15:47:26   Log-Likelihood:                -3423.5
#> No. Observations:                2700   AIC:                             6879.
#> Df Residuals:                    2684   BIC:                             6973.
#> Df Model:                          15                                         
#> Covariance Type:            nonrobust                                         
#> ============================================================================================================
#>                                                coef    std err          t      P>|t|      [0.025      0.975]
#> ------------------------------------------------------------------------------------------------------------
#> Intercept                                    4.5837      0.074     62.038      0.000       4.439       4.729
#> C(Marketing_Spend)[T.12.928932188134524]     0.1383      0.102      1.356      0.175      -0.062       0.338
#> C(Marketing_Spend)[T.15.0]                  -0.1370      0.064     -2.147      0.032      -0.262      -0.012
#> C(Marketing_Spend)[T.20.0]                   0.2605      0.109      2.398      0.017       0.047       0.474
#> C(Marketing_Spend)[T.22.499999999999993]    -0.0172      0.063     -0.271      0.786      -0.142       0.107
#> C(Marketing_Spend)[T.22.500000000000004]     0.0026      0.071      0.037      0.970      -0.137       0.142
#> C(Marketing_Spend)[T.27.071067811865476]     0.2150      0.122      1.756      0.079      -0.025       0.455
#> C(Marketing_Spend)[T.30.0]                   0.2643      0.132      2.000      0.046       0.005       0.524
#> C(Marketing_Spend)[T.32.67949192431123]      0.8598      0.055     15.613      0.000       0.752       0.968
#> C(Marketing_Spend)[T.37.5]                   0.1244      0.049      2.522      0.012       0.028       0.221
#> C(Marketing_Spend)[T.45.0]                   0.2076      0.082      2.521      0.012       0.046       0.369
#> C(Marketing_Spend)[T.49.99999999999999]      0.9167      0.088     10.444      0.000       0.745       1.089
#> C(Marketing_Spend)[T.50.0]                   1.0642      0.065     16.464      0.000       0.937       1.191
#> C(Marketing_Spend)[T.67.32050807568876]      1.2354      0.081     15.265      0.000       1.077       1.394
#> C(Marketing_Spend)[T.67.32050807568878]      1.1784      0.070     16.850      0.000       1.041       1.316
#> C(Bank)[T.UBA]                               0.1805      0.029      6.217      0.000       0.124       0.237
#> C(Bank)[T.Zenith]                           -0.8513      0.088     -9.698      0.000      -1.023      -0.679
#> Month                                        0.1449      0.022      6.669      0.000       0.102       0.188
#> ==============================================================================
#> Omnibus:                       52.872   Durbin-Watson:                   2.129
#> Prob(Omnibus):                  0.000   Jarque-Bera (JB):               36.582
#> Skew:                          -0.170   Prob(JB):                     1.14e-08
#> Kurtosis:                       2.542   Cond. No.                     4.23e+15
#> ==============================================================================
#> 
#> Notes:
#> [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
#> [2] The smallest eigenvalue is 2.45e-27. This might indicate that there are
#> strong multicollinearity problems or that the design matrix is singular.

# Mixed-effects model
me_model = mixedlm('Loyalty_Score ~ Month + Marketing_Spend + C(Bank)',
                   data=panel_data, groups=panel_data['Respondent_ID']).fit()

print("\n=== Mixed-Effects Model (with random intercept) ===")
#> 
#> === Mixed-Effects Model (with random intercept) ===
print(me_model.summary())
#>             Mixed Linear Model Regression Results
#> =============================================================
#> Model:              MixedLM Dependent Variable: Loyalty_Score
#> No. Observations:   2700    Method:             REML         
#> No. Groups:         150     Scale:              0.2150       
#> Min. group size:    18      Log-Likelihood:     -2057.7710   
#> Max. group size:    18      Converged:          Yes          
#> Mean group size:    18.0                                     
#> -------------------------------------------------------------
#>                   Coef.  Std.Err.    z    P>|z| [0.025 0.975]
#> -------------------------------------------------------------
#> Intercept          5.136    0.082  62.658 0.000  4.976  5.297
#> C(Bank)[T.UBA]    -0.592    0.028 -21.180 0.000 -0.647 -0.537
#> C(Bank)[T.Zenith] -1.406    0.033 -42.246 0.000 -1.471 -1.341
#> Month              0.138    0.006  24.444 0.000  0.127  0.149
#> Marketing_Spend    0.010    0.001  11.254 0.000  0.008  0.012
#> Group Var          0.530    0.139                            
#> =============================================================

# Variance components
var_random = me_model.cov_re.iloc[0, 0]
var_residual = me_model.scale
icc = var_random / (var_random + var_residual)

print(f"\n=== Variance Components ===")
#> 
#> === Variance Components ===
print(f"Respondent variance (σ_u²): {var_random:.4f}")
#> Respondent variance (σ_u²): 0.5300
print(f"Residual variance (σ_ε²): {var_residual:.4f}")
#> Residual variance (σ_ε²): 0.2150
print(f"Intraclass Correlation (ICC): {icc:.4f}")
#> Intraclass Correlation (ICC): 0.7115
print(f"Interpretation: {icc*100:.1f}% of variance is between respondents")
#> Interpretation: 71.1% of variance is between respondents

# Model comparison
print(f"\n=== Model Comparison ===")
#> 
#> === Model Comparison ===
print(f"OLS AIC: {ols_model.aic:.2f}")
#> OLS AIC: 6879.04
print(f"Mixed-Effects AIC: {me_model.aic:.2f}")
#> Mixed-Effects AIC: nan
print(f"Difference: {ols_model.aic - me_model.aic:.2f} (lower is better)")
#> Difference: nan (lower is better)

# Plot trajectories by bank
bank_trajectory = panel_data.groupby(['Bank', 'Month'])['Loyalty_Score'].mean().reset_index()

fig, ax = plt.subplots(figsize=(10, 6))
for bank in banks:
    data = bank_trajectory[bank_trajectory['Bank'] == bank]
    colors_map = {'GTBank': '#0066CC', 'UBA': '#FF3333', 'Zenith': '#009999'}
    ax.plot(data['Month'], data['Loyalty_Score'], marker='o', markersize=8,
           linewidth=2.5, label=bank, color=colors_map[bank])

ax.set_xlabel('Month', fontsize=11)
ax.set_ylabel('Mean Loyalty Score (1-7)', fontsize=11)
ax.set_title('Brand Loyalty Trajectory (Mixed-Effects Prediction)', fontsize=13, fontweight='bold')
ax.legend()
ax.grid(alpha=0.3)
ax.set_ylim(3, 5.5)
#> (3.0, 5.5)
plt.tight_layout()
plt.show()

Show code


# Predicted values
panel_data['Predicted'] = me_model.fittedvalues

# Plot actual vs predicted
fig, ax = plt.subplots(figsize=(10, 6))
gtbank_data = panel_data[panel_data['Bank'] == 'GTBank']
ax.scatter(gtbank_data['Predicted'], gtbank_data['Loyalty_Score'],
          alpha=0.4, s=50)
ax.plot([3, 5.5], [3, 5.5], 'r--', linewidth=2, label='Perfect Fit')
ax.set_xlabel('Predicted Loyalty', fontsize=11)
ax.set_ylabel('Actual Loyalty', fontsize=11)
ax.set_title('GTBank: Actual vs Predicted Loyalty (Mixed-Effects Model)',
            fontsize=13, fontweight='bold')
ax.legend()
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

Show code


# Random intercepts
random_intercepts = pd.Series(me_model.random_effects).apply(
    lambda x: x['Group']
).reset_index()
random_intercepts.columns = ['Respondent_ID', 'Random_Intercept']

print("\n=== Random Intercepts (Sample of 10) ===")
#> 
#> === Random Intercepts (Sample of 10) ===
print(random_intercepts.head(10).to_string(index=False))
#>  Respondent_ID  Random_Intercept
#>              1         -0.457724
#>              2         -0.479456
#>              3          0.585438
#>              4          1.329778
#>              5         -0.642450
#>              6          0.449610
#>              7         -0.963005
#>              8          0.465909
#>              9         -0.886941
#>             10          0.574572

# Distribution of random intercepts
fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(random_intercepts['Random_Intercept'], bins=20, color='#2ca02c',
       edgecolor='black', linewidth=0.5)
ax.axvline(0, linestyle='--', color='red', linewidth=2)
ax.set_xlabel('Random Intercept', fontsize=11)
ax.set_ylabel('Count', fontsize=11)
ax.set_title('Distribution of Random Intercepts (Individual Differences)',
            fontsize=13, fontweight='bold')
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

📝 Section 46.3 Review Questions

1. When to Use Mixed-Effects Models When is ICC high enough to justify mixed-effects over OLS? If ICC = 0.35, is panel structure important?

2. Random Intercepts vs Random Slopes A respondent’s loyalty change with marketing spend differs by individual. Should you include a random slope on Marketing_Spend? What would it estimate?

3. Interpreting Fixed Effects In a mixed-effects model, the coefficient on “Month” is 0.15. What does this mean? Is it interpreted differently than in OLS?

4. Assumptions Violations If your data shows increasing variance in loyalty over time (heteroscedasticity), how would you handle it in a mixed-effects model?

5. Prediction vs Population Inference When predicting a new respondent’s future loyalty, would you use conditional or marginal predictions? Why?

51.4 Social Media Brand Monitoring: Sentiment and Share of Voice

In the era of social listening, brands no longer control the narrative about themselves; consumers do, one tweet, Instagram comment, or TikTok video at a time. Social media monitoring—systematic analysis of online conversations about a brand—provides real-time signals of brand health, emerging crises, and competitive threats. Two key metrics are share of voice (SOV) and sentiment share.

Share of Voice (SOV) is the brand’s proportion of category mentions on social media:

\[\text{SOV} = \frac{\text{Mentions of Brand X}}{\sum_{\text{all brands in category}} \text{Mentions}} \times 100\]

If a telecom category generates 10,000 tweets monthly, and Airtel gets 3,200 mentions, MTN gets 3,500, Glo gets 2,100, and 9mobile gets 1,200, then Airtel’s SOV = 32%. SOV is a proxy for attention share in the marketplace. A declining SOV signals competitive erosion; rising SOV suggests growing consumer interest. SOV should be interpreted alongside sales market share and advertising share—ideally, SOV = Advertising Share, with a target SOV ≥ Market Share (indicating marketing is gaining mindshare).

Sentiment Share goes deeper: of all brand mentions, what percentage are positive, negative, or neutral? A brand could have high SOV but poor sentiment if it’s mentioned only in complaints. Sentiment analysis uses natural language processing (NLP): each mention is classified as positive, negative, or neutral (with confidence scores). Sentiment analysis is imperfect—sarcasm confuses classifiers, and context matters—but at scale (thousands of mentions), aggregate sentiment distributions are reliable. Sentiment can shift rapidly during crises: a product recall, executive scandal, or viral negative review can flip sentiment from 60% positive to 40% within 24 hours.

In Nigerian telecom context, monitoring is critical. When MTN imposed payment reversal policies or Airtel had data throttling issues, social media sentiment plummeted. Crisis detection—automatic flagging of unusual sentiment spikes—enables rapid response. A dashboard showing daily SOV and sentiment by brand, with alerts for sentiment drops > 5 percentage points, lets marketing teams react within hours.

📘 Theory: Social Listening Methodology

Social media monitoring involves: (1) data collection (via APIs from Twitter, Instagram, TikTok; third-party platforms like Sprout Social, Brandwatch); (2) text preprocessing (tokenisation, removing stop words, stemming); (3) sentiment classification (rule-based, lexicon-based, or ML-based); (4) aggregation into metrics (SOV, sentiment share, volume trends). Limitations: only a subset of consumers tweet (self-selection bias); bots inflate mention counts; sarcasm and multilingual content (Pidgin English in Nigeria) confuse classifiers. Despite limitations, social listening is orders of magnitude faster than traditional brand tracking surveys.

🔑 Key Metrics: Share of Voice and Sentiment

Share of Voice: \[\text{SOV}_{\text{Brand}} = \frac{\text{Mentions}_{\text{Brand}}}{\sum_{i} \text{Mentions}_{i}} \times 100\%\]

Sentiment Distribution: \[\text{Sentiment Share} = [\%_{\text{Positive}}, \%_{\text{Neutral}}, \%_{\text{Negative}}]\]

Show code

library(tidyverse)
library(ggplot2)
library(gridExtra)

# Simulate social media data: 500 tweets about 4 Nigerian telecom brands
set.seed(7493)

n_tweets <- 500
telecom_brands <- c("MTN", "Airtel", "Glo", "9mobile")

social_data <- tibble(
  Tweet_ID = 1:n_tweets,
  Brand = sample(telecom_brands, n_tweets, replace = TRUE,
                prob = c(0.35, 0.30, 0.22, 0.13)),
  Sentiment = sample(c("Positive", "Neutral", "Negative"), n_tweets, replace = TRUE,
                    prob = c(0.40, 0.35, 0.25))
)

# Add some brand-specific sentiment biases (brands have different sentiment profiles)
social_data <- social_data |>
  mutate(
    Sentiment = ifelse(
      Brand == "MTN" & Sentiment == "Positive",
      sample(c("Positive", "Neutral"), n(), replace = TRUE),
      ifelse(
        Brand == "Glo" & Sentiment == "Negative",
        sample(c("Negative", "Neutral"), n(), replace = TRUE),
        Sentiment
      )
    )
  )

cat("=== Social Media Data Summary ===\n")
#> === Social Media Data Summary ===
print(table(social_data$Brand, social_data$Sentiment))
#>          
#>           Negative Neutral Positive
#>   9mobile       13      25       23
#>   Airtel        43      63       67
#>   Glo           15      47       38
#>   MTN           39      90       37

# Calculate Share of Voice (SOV)
sov_data <- social_data |>
  group_by(Brand) |>
  summarise(Mentions = n(), .groups = "drop") |>
  mutate(SOV_Pct = Mentions / sum(Mentions) * 100) |>
  arrange(desc(SOV_Pct))

cat("\n=== Share of Voice (SOV) ===\n")
#> 
#> === Share of Voice (SOV) ===
print(sov_data)
#> # A tibble: 4 × 3
#>   Brand   Mentions SOV_Pct
#>   <chr>      <int>   <dbl>
#> 1 Airtel       173    34.6
#> 2 MTN          166    33.2
#> 3 Glo          100    20  
#> 4 9mobile       61    12.2

# Sentiment breakdown by brand
sentiment_data <- social_data |>
  group_by(Brand, Sentiment) |>
  summarise(Count = n(), .groups = "drop") |>
  pivot_wider(names_from = Sentiment, values_from = Count, values_fill = 0) |>
  mutate(
    Total = Positive + Neutral + Negative,
    Positive_Pct = Positive / Total * 100,
    Neutral_Pct = Neutral / Total * 100,
    Negative_Pct = Negative / Total * 100
  )

cat("\n=== Sentiment Breakdown by Brand ===\n")
#> 
#> === Sentiment Breakdown by Brand ===
print(sentiment_data |> select(Brand, Positive_Pct, Neutral_Pct, Negative_Pct))
#> # A tibble: 4 × 4
#>   Brand   Positive_Pct Neutral_Pct Negative_Pct
#>   <chr>          <dbl>       <dbl>        <dbl>
#> 1 9mobile         37.7        41.0         21.3
#> 2 Airtel          38.7        36.4         24.9
#> 3 Glo             38          47           15  
#> 4 MTN             22.3        54.2         23.5

# Visualise SOV
p_sov <- ggplot(sov_data, aes(y = reorder(Brand, SOV_Pct), x = SOV_Pct)) +
  geom_col(fill = c("#FFCC00", "#FF0000", "#009900", "#0066FF"),
           colour = "black", linewidth = 0.5) +
  geom_text(aes(label = paste0(round(SOV_Pct, 1), "%")),
            hjust = -0.2, size = 3.5, fontface = "bold") +
  labs(
    title = "Share of Voice (SOV): Nigerian Telecoms",
    y = "Brand",
    x = "Share of Social Media Mentions (%)",
    caption = "500 tweets analysed"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12))

print(p_sov)

Show code


# Visualise sentiment by brand (stacked bar)
sentiment_long <- sentiment_data |>
  select(Brand, Positive_Pct, Neutral_Pct, Negative_Pct) |>
  pivot_longer(cols = -Brand, names_to = "Sentiment", values_to = "Percentage") |>
  mutate(
    Sentiment = factor(Sentiment, levels = c("Positive_Pct", "Neutral_Pct", "Negative_Pct"),
                      labels = c("Positive", "Neutral", "Negative"))
  )

p_sentiment <- ggplot(sentiment_long, aes(x = Brand, y = Percentage, fill = Sentiment)) +
  geom_col(colour = "black", linewidth = 0.5) +
  scale_fill_manual(values = c("Positive" = "#2ca02c", "Neutral" = "#ff7f0e", "Negative" = "#d62728")) +
  labs(
    title = "Sentiment Distribution by Brand",
    x = "Brand",
    y = "Percentage of Mentions",
    fill = "Sentiment",
    caption = "Stacked 100% bars"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12))

print(p_sentiment)

Show code


# Time-series simulation: weekly SOV and sentiment over 13 weeks
weeks <- 13
time_series <- expand_grid(
  Week = 1:weeks,
  Brand = telecom_brands
) |>
  mutate(
    # Trend + noise in mention count
    Mentions = case_when(
      Brand == "MTN"    ~ 80 + 5 * Week + rnorm(n(), 0, 8),
      Brand == "Airtel" ~ 70 + 3 * Week + rnorm(n(), 0, 8),
      Brand == "Glo"    ~ 50 + 2 * Week + rnorm(n(), 0, 5),
      Brand == "9mobile" ~ 30 + 1 * Week + rnorm(n(), 0, 3)
    ),
    Mentions = pmax(Mentions, 5) |> round(),
    # Sentiment score (0-100, where 50 is neutral)
    Sentiment_Score = case_when(
      Brand == "MTN"    ~ 50 - 1.5 * Week + rnorm(n(), 0, 5),
      Brand == "Airtel" ~ 55 - 0.8 * Week + rnorm(n(), 0, 5),
      Brand == "Glo"    ~ 52 + 0.5 * Week + rnorm(n(), 0, 5),
      Brand == "9mobile" ~ 45 + rnorm(n(), 0, 5)
    ),
    Sentiment_Score = pmax(pmin(Sentiment_Score, 100), 0) |> round(1)
  )

# Plot weekly SOV
p_weekly_sov <- ggplot(time_series, aes(x = Week, y = Mentions, colour = Brand)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2) +
  scale_colour_manual(
    values = c("MTN" = "#FFCC00", "Airtel" = "#FF0000", "Glo" = "#009900", "9mobile" = "#0066FF")
  ) +
  labs(
    title = "Weekly Mention Trends (13 weeks)",
    x = "Week",
    y = "Number of Mentions",
    colour = "Brand",
    caption = "MTN gaining share, 9mobile stagnant"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12))

print(p_weekly_sov)

Show code


# Plot weekly sentiment score
p_weekly_sentiment <- ggplot(time_series, aes(x = Week, y = Sentiment_Score, colour = Brand)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2) +
  geom_hline(yintercept = 50, linetype = "dashed", colour = "grey", alpha = 0.7) +
  scale_colour_manual(
    values = c("MTN" = "#FFCC00", "Airtel" = "#FF0000", "Glo" = "#009900", "9mobile" = "#0066FF")
  ) +
  scale_y_continuous(limits = c(40, 65), labels = function(x)
                     ifelse(x > 50, "Positive", ifelse(x < 50, "Negative", "Neutral"))) +
  labs(
    title = "Weekly Sentiment Trends (13 weeks)",
    x = "Week",
    y = "Sentiment Score (0=Negative, 50=Neutral, 100=Positive)",
    colour = "Brand",
    caption = "MTN sentiment declining; Glo improving"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12))

print(p_weekly_sentiment)

Show code


# Create a dashboard-style combined view
# Crisis detection: flag if sentiment drops > 5 points in one week
time_series <- time_series |>
  group_by(Brand) |>
  mutate(
    Sentiment_Change = Sentiment_Score - lag(Sentiment_Score),
    Crisis_Flag = ifelse(!is.na(Sentiment_Change) & Sentiment_Change < -5, "ALERT", "OK")
  ) |>
  ungroup()

cat("\n=== Crisis Detection: Sentiment Drops > 5 Points/Week ===\n")
#> 
#> === Crisis Detection: Sentiment Drops > 5 Points/Week ===
crisis_alerts <- time_series |> filter(Crisis_Flag == "ALERT")
if (nrow(crisis_alerts) > 0) {
  print(crisis_alerts |> select(Week, Brand, Sentiment_Score, Sentiment_Change, Crisis_Flag))
} else {
  cat("No major sentiment crises detected in this period\n")
}
#> # A tibble: 13 × 5
#>     Week Brand   Sentiment_Score Sentiment_Change Crisis_Flag
#>    <int> <chr>             <dbl>            <dbl> <chr>      
#>  1     2 MTN                38.8            -12.8 ALERT      
#>  2     2 Airtel             40.8            -11.9 ALERT      
#>  3     2 Glo                45.4            -11   ALERT      
#>  4     4 Airtel             47.8            -15.6 ALERT      
#>  5     5 9mobile            40.4             -6.2 ALERT      
#>  6     6 MTN                33.8            -11   ALERT      
#>  7     6 9mobile            33.5             -6.9 ALERT      
#>  8     8 Airtel             44.8             -9.5 ALERT      
#>  9     9 MTN                27               -7.8 ALERT      
#> 10     9 Glo                54.3            -11.4 ALERT      
#> 11    10 9mobile            36.8            -15.7 ALERT      
#> 12    13 MTN                28.7             -9.1 ALERT      
#> 13    13 Airtel             37.9            -12.9 ALERT

Show code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(7493)

# Simulate social media data: 500 tweets about 4 Nigerian telecom brands
n_tweets = 500
telecom_brands = ['MTN', 'Airtel', 'Glo', '9mobile']

social_data = pd.DataFrame({
    'Tweet_ID': np.arange(1, n_tweets + 1),
    'Brand': np.random.choice(telecom_brands, n_tweets, p=[0.35, 0.30, 0.22, 0.13]),
    'Sentiment': np.random.choice(['Positive', 'Neutral', 'Negative'], n_tweets, p=[0.40, 0.35, 0.25])
})

print("=== Social Media Data Summary ===")
#> === Social Media Data Summary ===
print(pd.crosstab(social_data['Brand'], social_data['Sentiment']))
#> Sentiment  Negative  Neutral  Positive
#> Brand                                 
#> 9mobile          17       21        32
#> Airtel           30       46        65
#> Glo              26       44        48
#> MTN              35       73        63

# Calculate Share of Voice
sov_data = social_data['Brand'].value_counts().reset_index()
sov_data.columns = ['Brand', 'Mentions']
sov_data['SOV_Pct'] = sov_data['Mentions'] / sov_data['Mentions'].sum() * 100
sov_data = sov_data.sort_values('SOV_Pct', ascending=False)

print("\n=== Share of Voice (SOV) ===")
#> 
#> === Share of Voice (SOV) ===
print(sov_data.to_string(index=False))
#>   Brand  Mentions  SOV_Pct
#>     MTN       171     34.2
#>  Airtel       141     28.2
#>     Glo       118     23.6
#> 9mobile        70     14.0

# Sentiment breakdown by brand
sentiment_data = pd.crosstab(social_data['Brand'], social_data['Sentiment'], margins=False)
sentiment_data['Total'] = sentiment_data.sum(axis=1)
for col in ['Positive', 'Neutral', 'Negative']:
    sentiment_data[f'{col}_Pct'] = (sentiment_data[col] / sentiment_data['Total'] * 100).round(1)

print("\n=== Sentiment Breakdown by Brand ===")
#> 
#> === Sentiment Breakdown by Brand ===
print(sentiment_data[['Positive_Pct', 'Neutral_Pct', 'Negative_Pct']].to_string())
#> Sentiment  Positive_Pct  Neutral_Pct  Negative_Pct
#> Brand                                             
#> 9mobile            45.7         30.0          24.3
#> Airtel             46.1         32.6          21.3
#> Glo                40.7         37.3          22.0
#> MTN                36.8         42.7          20.5

# Visualise SOV
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

colors_map = {'MTN': '#FFCC00', 'Airtel': '#FF0000', 'Glo': '#009900', '9mobile': '#0066FF'}
sov_colors = [colors_map[b] for b in sov_data['Brand']]

ax1.barh(sov_data['Brand'], sov_data['SOV_Pct'], color=sov_colors, edgecolor='black', linewidth=1)
for i, (brand, sov) in enumerate(zip(sov_data['Brand'], sov_data['SOV_Pct'])):
    ax1.text(sov + 0.5, i, f'{sov:.1f}%', va='center', fontweight='bold', fontsize=9)
ax1.set_xlabel('Share of Voice (%)', fontsize=11)
ax1.set_title('Share of Voice: Nigerian Telecoms', fontsize=12, fontweight='bold')
ax1.grid(axis='x', alpha=0.3)

# Sentiment stacked bar
sentiment_summary = sentiment_data[['Positive_Pct', 'Neutral_Pct', 'Negative_Pct']].reset_index()
sentiment_summary = sentiment_summary.set_index('Brand')

ax2.barh(sentiment_summary.index, sentiment_summary['Positive_Pct'],
        label='Positive', color='#2ca02c', edgecolor='black', linewidth=0.5)
ax2.barh(sentiment_summary.index, sentiment_summary['Neutral_Pct'],
        left=sentiment_summary['Positive_Pct'], label='Neutral', color='#ff7f0e',
        edgecolor='black', linewidth=0.5)
ax2.barh(sentiment_summary.index,  sentiment_summary['Negative_Pct'],
        left=sentiment_summary['Positive_Pct'] + sentiment_summary['Neutral_Pct'],
        label='Negative', color='#d62728', edgecolor='black', linewidth=0.5)
ax2.set_xlabel('Percentage of Mentions', fontsize=11)
ax2.set_title('Sentiment Distribution by Brand', fontsize=12, fontweight='bold')
ax2.legend(loc='lower right')
ax2.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

Show code


# Time-series: 13 weeks
weeks = 13
time_series_list = []

for week in range(1, weeks + 1):
    for brand in telecom_brands:
        if brand == 'MTN':
            mentions = 80 + 5 * week + np.random.normal(0, 8)
            sentiment = 50 - 1.5 * week + np.random.normal(0, 5)
        elif brand == 'Airtel':
            mentions = 70 + 3 * week + np.random.normal(0, 8)
            sentiment = 55 - 0.8 * week + np.random.normal(0, 5)
        elif brand == 'Glo':
            mentions = 50 + 2 * week + np.random.normal(0, 5)
            sentiment = 52 + 0.5 * week + np.random.normal(0, 5)
        else:  # 9mobile
            mentions = 30 + 1 * week + np.random.normal(0, 3)
            sentiment = 45 + np.random.normal(0, 5)

        mentions = max(5, round(mentions))
        sentiment = max(0, min(100, round(sentiment, 1)))

        time_series_list.append({
            'Week': week,
            'Brand': brand,
            'Mentions': mentions,
            'Sentiment_Score': sentiment
        })

time_series = pd.DataFrame(time_series_list)

# Plot weekly trends
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10))

for brand in telecom_brands:
    data = time_series[time_series['Brand'] == brand]
    ax1.plot(data['Week'], data['Mentions'], marker='o', linewidth=2.5,
            markersize=6, label=brand, color=colors_map[brand])

ax1.set_xlabel('Week', fontsize=11)
ax1.set_ylabel('Number of Mentions', fontsize=11)
ax1.set_title('Weekly Mention Trends (13 weeks)', fontsize=12, fontweight='bold')
ax1.legend()
ax1.grid(alpha=0.3)

for brand in telecom_brands:
    data = time_series[time_series['Brand'] == brand]
    ax2.plot(data['Week'], data['Sentiment_Score'], marker='o', linewidth=2.5,
            markersize=6, label=brand, color=colors_map[brand])

ax2.axhline(50, linestyle='--', color='grey', alpha=0.7)
ax2.set_xlabel('Week', fontsize=11)
ax2.set_ylabel('Sentiment Score (0-100)', fontsize=11)
ax2.set_title('Weekly Sentiment Trends (13 weeks)', fontsize=12, fontweight='bold')
ax2.set_ylim(40, 65)
#> (40.0, 65.0)
ax2.legend()
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

Show code


# Crisis detection
time_series = time_series.sort_values(['Brand', 'Week']).reset_index(drop=True)
time_series['Sentiment_Change'] = time_series.groupby('Brand')['Sentiment_Score'].diff()
time_series['Crisis_Flag'] = time_series['Sentiment_Change'].apply(
    lambda x: 'ALERT' if pd.notna(x) and x < -5 else 'OK'
)

print("\n=== Crisis Detection: Sentiment Drops > 5 Points/Week ===")
#> 
#> === Crisis Detection: Sentiment Drops > 5 Points/Week ===
crisis_alerts = time_series[time_series['Crisis_Flag'] == 'ALERT']
if len(crisis_alerts) > 0:
    print(crisis_alerts[['Week', 'Brand', 'Sentiment_Score', 'Sentiment_Change', 'Crisis_Flag']].to_string(index=False))
else:
    print("No major sentiment crises detected in this period")
#>  Week   Brand  Sentiment_Score  Sentiment_Change Crisis_Flag
#>     2 9mobile             37.2             -17.6       ALERT
#>     7 9mobile             38.9              -5.5       ALERT
#>    10 9mobile             38.3             -12.8       ALERT
#>    13 9mobile             37.2             -10.8       ALERT
#>     2  Airtel             46.3              -9.8       ALERT
#>     5  Airtel             47.1              -6.9       ALERT
#>     9  Airtel             41.6              -7.2       ALERT
#>     3     Glo             44.7             -19.2       ALERT
#>     8     Glo             50.8              -9.0       ALERT
#>    11     Glo             58.3             -10.3       ALERT
#>    12     Glo             47.7             -10.6       ALERT
#>     5     MTN             38.9              -9.9       ALERT
#>     7     MTN             37.6              -8.4       ALERT
#>    12     MTN             25.1             -17.5       ALERT

📝 Section 46.4 Review Questions

1. SOV vs Market Share If Airtel has 20% market share but 35% SOV, what does this signal? Is it sustainable?

2. Sentiment Classification Accuracy Sentiment classifiers often struggle with sarcasm (e.g., “Great network, spent 3 hours trying to connect!”). How would you validate classifier accuracy? What’s acceptable accuracy for brand monitoring?

3. Crisis vs Noise A brand’s sentiment drops 4 percentage points in one week. Is this a crisis? How would you distinguish meaningful changes from noise?

4. Interpretation Challenges High SOV with negative sentiment (e.g., being the subject of criticism). Is this a positioning problem or a PR problem? What actions would you take?

5. Real-Time Response If your dashboard flags a sentiment crisis (drops > 10 points in 24 hours), what would be your response protocol? Who should be alerted?

51.5 Perceptual Mapping: Competitive Positioning from Text

How do consumers mentally position brands relative to each other? Do they see Airtel and MTN as similar, or distinct? Is GTBank seen as innovative or traditional? Perceptual maps answer these questions by projecting brands onto two-dimensional space based on the language consumers use to describe them.

The method: collect social media text mentioning each brand (tweets, comments). Use TF-IDF (Term Frequency-Inverse Document Frequency) to identify words most distinctive to each brand: words that appear frequently for one brand but rarely for others. Create a brand-by-word matrix where each entry is a TF-IDF score. Reduce this matrix to two dimensions via Principal Component Analysis (PCA). Plot brands as points, with proximity indicating similarity (brands using similar language are positioned close).

A typical finding in Nigerian telecom: MTN clusters near words like “reliable,” “coverage,” “everywhere”; Airtel near “innovation,” “digital,” “youth”; Glo near “affordable,” “value”; 9mobile near “struggling,” “limited.” PCA dimensions often correspond to intuitive business axes: PC1 might be “premium-to-budget” and PC2 “traditional-to-digital.” The perceptual map reveals competitive opportunities: white space (unoccupied positions), overcrowding (brands too similar), and differentiation (brands standing out).

📘 Theory: TF-IDF and Perceptual Mapping

TF-IDF weights a word’s importance within a document (brand) by how rare it is across all documents. High TF-IDF indicates distinctive words. The formula: \(\text{TF-IDF}(w, b) = \frac{n_w}{N} \times \log(\frac{D}{D_w})\), where \(n_w\) is word count in brand \(b\), \(N\) is total words for \(b\), \(D\) is total brands, and \(D_w\) is number of brands containing word \(w\). PCA projects the brand-by-TF-IDF matrix onto principal components (directions of maximum variance). PC1 and PC2 together explain ~70–80% of variance; higher-dimensional maps retain more information but are harder to visualize.

🔑 Formula: TF-IDF and PCA

TF-IDF: \[\text{TF-IDF}(w, b) = \frac{\text{Term Frequency}(w, b)}{\text{Total Terms in } b} \times \log\left(\frac{\text{Total Brands}}{\text{Brands Containing } w}\right)\]

PCA Projection: \[\text{PC}_k = \mathbf{w}_k^T (\mathbf{X} - \overline{\mathbf{X}})\]

where \(\mathbf{w}_k\) is the \(k\)-th eigenvector of the covariance matrix.

Show code

library(tidyverse)
library(ggplot2)

# Synthetic brand-word association data: 4 Nigerian telecom brands
# Rows = brands, columns = distinctive words, values = TF-IDF scores

brands <- c("MTN", "Airtel", "Glo", "9mobile")

# Create TF-IDF matrix
tfidf_matrix <- matrix(c(
  # Words: "reliable", "coverage", "innovation", "digital", "youth", "affordable", "value", "struggling", "network"
  0.45,   0.52,      0.15,         0.25,       0.20,      0.18,       0.22,      0.05,         0.48,    # MTN
  0.35,   0.30,      0.58,         0.55,       0.62,      0.10,       0.08,      0.08,         0.25,    # Airtel
  0.32,   0.28,      0.08,         0.15,       0.10,      0.68,       0.72,      0.12,         0.22,    # Glo
  0.20,   0.15,      0.10,         0.18,       0.08,      0.35,       0.30,      0.55,         0.20     # 9mobile
), nrow = 4, ncol = 9, byrow = TRUE)

colnames(tfidf_matrix) <- c("Reliable", "Coverage", "Innovation", "Digital", "Youth",
                             "Affordable", "Value", "Struggling", "Network")
rownames(tfidf_matrix) <- brands

cat("=== TF-IDF Matrix: Brand-Word Associations ===\n")
#> === TF-IDF Matrix: Brand-Word Associations ===
print(round(tfidf_matrix, 2))
#>         Reliable Coverage Innovation Digital Youth Affordable Value Struggling
#> MTN         0.45     0.52       0.15    0.25  0.20       0.18  0.22       0.05
#> Airtel      0.35     0.30       0.58    0.55  0.62       0.10  0.08       0.08
#> Glo         0.32     0.28       0.08    0.15  0.10       0.68  0.72       0.12
#> 9mobile     0.20     0.15       0.10    0.18  0.08       0.35  0.30       0.55
#>         Network
#> MTN        0.48
#> Airtel     0.25
#> Glo        0.22
#> 9mobile    0.20

# Perform PCA
pca_result <- prcomp(tfidf_matrix, center = TRUE, scale. = TRUE)

# Extract PC1 and PC2 scores
pc_scores <- tibble(
  Brand = brands,
  PC1 = pca_result$x[, 1],
  PC2 = pca_result$x[, 2]
)

# Extract loadings (word contributions to PCs)
loadings <- tibble(
  Word = colnames(tfidf_matrix),
  PC1_Loading = pca_result$rotation[, 1],
  PC2_Loading = pca_result$rotation[, 2]
)

cat("\n=== PCA Results ===\n")
#> 
#> === PCA Results ===
cat("Explained Variance by PC1:", round(summary(pca_result)$importance[2, 1] * 100, 1), "%\n")
#> Explained Variance by PC1: 55.3 %
cat("Explained Variance by PC2:", round(summary(pca_result)$importance[2, 2] * 100, 1), "%\n")
#> Explained Variance by PC2: 31.9 %
cat("Cumulative Variance (PC1+PC2):",
    round((summary(pca_result)$importance[2, 1] + summary(pca_result)$importance[2, 2]) * 100, 1), "%\n")
#> Cumulative Variance (PC1+PC2): 87.2 %

# Perceptual map
p_perceptual <- ggplot(pc_scores, aes(x = PC1, y = PC2, label = Brand)) +
  geom_point(size = 8, alpha = 0.7, colour = c("#FFCC00", "#FF0000", "#009900", "#0066FF")) +
  geom_text(fontface = "bold", size = 4, vjust = -1.5) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey", alpha = 0.5) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey", alpha = 0.5) +
  labs(
    title = "Perceptual Map: Nigerian Telecom Brands",
    x = paste0("PC1 (", round(summary(pca_result)$importance[2, 1] * 100, 1), "%)"),
    y = paste0("PC2 (", round(summary(pca_result)$importance[2, 2] * 100, 1), "%)"),
    caption = "Based on TF-IDF word associations from social media text"
  ) +
  xlim(-2.5, 2.5) +
  ylim(-2.5, 2.5) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12),
        aspect.ratio = 1)

print(p_perceptual)

Show code


# Word vectors (loadings) to show what drives each PC
p_loadings <- ggplot(loadings, aes(x = PC1_Loading, y = PC2_Loading, label = Word)) +
  geom_point(size = 3, alpha = 0.6, colour = "#2ca02c") +
  geom_text(size = 3, vjust = -0.5, hjust = -0.5) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey") +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey") +
  labs(
    title = "Word Loadings on Principal Components",
    x = "PC1 Loading",
    y = "PC2 Loading",
    caption = "Words in upper-right drive differentiation; overlapping = similar brands"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 12),
        aspect.ratio = 1)

print(p_loadings)

Show code


# Interpretation of axes (correlations with words)
cat("\n=== Interpretation of Principal Components ===\n")
#> 
#> === Interpretation of Principal Components ===
cat("PC1 (", round(summary(pca_result)$importance[2, 1] * 100, 1), "% variance):\n", sep = "")
#> PC1 (55.3% variance):
pc1_top <- loadings |> arrange(desc(abs(PC1_Loading))) |> head(3)
for (i in 1:nrow(pc1_top)) {
  cat("  ", pc1_top$Word[i], ": ", round(pc1_top$PC1_Loading[i], 3), "\n", sep = "")
}
#>   Digital: -0.382
#>   Youth: -0.38
#>   Affordable: 0.377

cat("\nPC2 (", round(summary(pca_result)$importance[2, 2] * 100, 1), "% variance):\n", sep = "")
#> 
#> PC2 (31.9% variance):
pc2_top <- loadings |> arrange(desc(abs(PC2_Loading))) |> head(3)
for (i in 1:nrow(pc2_top)) {
  cat("  ", pc2_top$Word[i], ": ", round(pc2_top$PC2_Loading[i], 3), "\n", sep = "")
}
#>   Coverage: 0.463
#>   Network: 0.445
#>   Reliable: 0.41

# Strategic positioning summary
cat("\n=== Competitive Positioning Summary ===\n")
#> 
#> === Competitive Positioning Summary ===
cat("MTN: Positioned as reliable, network-focused (traditional strength)\n")
#> MTN: Positioned as reliable, network-focused (traditional strength)
cat("Airtel: Positioned as innovative, youth-oriented (digital native)\n")
#> Airtel: Positioned as innovative, youth-oriented (digital native)
cat("Glo: Positioned as affordable, value-focused (price leader)\n")
#> Glo: Positioned as affordable, value-focused (price leader)
cat("9mobile: Positioned as struggling (needs repositioning)\n\n")
#> 9mobile: Positioned as struggling (needs repositioning)
cat("White Space: 'premium + innovative' quadrant appears unoccupied\n")
#> White Space: 'premium + innovative' quadrant appears unoccupied
cat("Recommendation: Explore premium positioning for Airtel or MTN\n")
#> Recommendation: Explore premium positioning for Airtel or MTN

Show code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# TF-IDF matrix: brands x words
brands = ['MTN', 'Airtel', 'Glo', '9mobile']
words = ['Reliable', 'Coverage', 'Innovation', 'Digital', 'Youth',
        'Affordable', 'Value', 'Struggling', 'Network']

tfidf_matrix = np.array([
    [0.45, 0.52, 0.15, 0.25, 0.20, 0.18, 0.22, 0.05, 0.48],  # MTN
    [0.35, 0.30, 0.58, 0.55, 0.62, 0.10, 0.08, 0.08, 0.25],  # Airtel
    [0.32, 0.28, 0.08, 0.15, 0.10, 0.68, 0.72, 0.12, 0.22],  # Glo
    [0.20, 0.15, 0.10, 0.18, 0.08, 0.35, 0.30, 0.55, 0.20]   # 9mobile
])

print("=== TF-IDF Matrix: Brand-Word Associations ===")
#> === TF-IDF Matrix: Brand-Word Associations ===
df_tfidf = pd.DataFrame(tfidf_matrix, columns=words, index=brands)
print(df_tfidf.round(2))
#>          Reliable  Coverage  Innovation  ...  Value  Struggling  Network
#> MTN          0.45      0.52        0.15  ...   0.22        0.05     0.48
#> Airtel       0.35      0.30        0.58  ...   0.08        0.08     0.25
#> Glo          0.32      0.28        0.08  ...   0.72        0.12     0.22
#> 9mobile      0.20      0.15        0.10  ...   0.30        0.55     0.20
#> 
#> [4 rows x 9 columns]

# Standardize and apply PCA
scaler = StandardScaler()
tfidf_scaled = scaler.fit_transform(tfidf_matrix)

pca = PCA()
pca.fit(tfidf_scaled)

PCA()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Show code

pc_scores = pca.transform(tfidf_scaled)

pc_df = pd.DataFrame({
    'Brand': brands,
    'PC1': pc_scores[:, 0],
    'PC2': pc_scores[:, 1]
})

print("\n=== PCA Results ===")
#> 
#> === PCA Results ===
print(f"Explained Variance by PC1: {pca.explained_variance_ratio_[0] * 100:.1f}%")
#> Explained Variance by PC1: 55.3%
print(f"Explained Variance by PC2: {pca.explained_variance_ratio_[1] * 100:.1f}%")
#> Explained Variance by PC2: 31.9%
print(f"Cumulative Variance (PC1+PC2): {(pca.explained_variance_ratio_[0] + pca.explained_variance_ratio_[1]) * 100:.1f}%")
#> Cumulative Variance (PC1+PC2): 87.2%

# Loadings (word contributions)
loadings = pca.components_.T * np.sqrt(pca.explained_variance_)
loadings_df = pd.DataFrame(
    loadings[:, :2],
    columns=['PC1_Loading', 'PC2_Loading'],
    index=words
)

# Perceptual map
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Brand positions
colors_map = {'MTN': '#FFCC00', 'Airtel': '#FF0000', 'Glo': '#009900', '9mobile': '#0066FF'}
brand_colors = [colors_map[b] for b in brands]

ax1.scatter(pc_df['PC1'], pc_df['PC2'], s=300, alpha=0.7, c=brand_colors, edgecolors='black', linewidth=1.5)
for idx, brand in enumerate(brands):
    ax1.annotate(brand, (pc_df.loc[idx, 'PC1'], pc_df.loc[idx, 'PC2']),
                fontsize=10, fontweight='bold', ha='center', va='bottom', xytext=(0, 10),
                textcoords='offset points')

ax1.axhline(0, linestyle='--', color='grey', alpha=0.5)
ax1.axvline(0, linestyle='--', color='grey', alpha=0.5)
ax1.set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]*100:.1f}%)', fontsize=11)
ax1.set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]*100:.1f}%)', fontsize=11)
ax1.set_title('Perceptual Map: Nigerian Telecom Brands', fontsize=12, fontweight='bold')
ax1.set_xlim(-2.5, 2.5)
#> (-2.5, 2.5)
ax1.set_ylim(-2.5, 2.5)
#> (-2.5, 2.5)
ax1.grid(alpha=0.3)
ax1.set_aspect('equal')

# Word loadings
ax2.scatter(loadings_df['PC1_Loading'], loadings_df['PC2_Loading'],
           s=100, alpha=0.6, color='#2ca02c', edgecolors='black', linewidth=1)
for idx, word in enumerate(words):
    ax2.annotate(word, (loadings_df.iloc[idx, 0], loadings_df.iloc[idx, 1]),
                fontsize=9, ha='center', va='bottom', xytext=(5, 5),
                textcoords='offset points')

ax2.axhline(0, linestyle='--', color='grey', alpha=0.5)
ax2.axvline(0, linestyle='--', color='grey', alpha=0.5)
ax2.set_xlabel('PC1 Loading', fontsize=11)
ax2.set_ylabel('PC2 Loading', fontsize=11)
ax2.set_title('Word Loadings on Principal Components', fontsize=12, fontweight='bold')
ax2.grid(alpha=0.3)
ax2.set_aspect('equal')

plt.tight_layout()
plt.show()

Show code


# Component interpretation
print("\n=== Interpretation of Principal Components ===")
#> 
#> === Interpretation of Principal Components ===
pc1_top = loadings_df['PC1_Loading'].abs().nlargest(3)
print(f"PC1 ({pca.explained_variance_ratio_[0]*100:.1f}% variance) - Top words:")
#> PC1 (55.3% variance) - Top words:
for word in pc1_top.index:
    print(f"  {word}: {loadings_df.loc[word, 'PC1_Loading']:.3f}")
#>   Digital: 0.984
#>   Youth: 0.979
#>   Affordable: -0.971

pc2_top = loadings_df['PC2_Loading'].abs().nlargest(3)
print(f"\nPC2 ({pca.explained_variance_ratio_[1]*100:.1f}% variance) - Top words:")
#> 
#> PC2 (31.9% variance) - Top words:
for word in pc2_top.index:
    print(f"  {word}: {loadings_df.loc[word, 'PC2_Loading']:.3f}")
#>   Coverage: 0.906
#>   Network: 0.871
#>   Reliable: 0.804

print("\n=== Competitive Positioning Summary ===")
#> 
#> === Competitive Positioning Summary ===
print("MTN: Positioned as reliable, network-focused (traditional strength)")
#> MTN: Positioned as reliable, network-focused (traditional strength)
print("Airtel: Positioned as innovative, youth-oriented (digital native)")
#> Airtel: Positioned as innovative, youth-oriented (digital native)
print("Glo: Positioned as affordable, value-focused (price leader)")
#> Glo: Positioned as affordable, value-focused (price leader)
print("9mobile: Positioned as struggling (needs repositioning)")
#> 9mobile: Positioned as struggling (needs repositioning)
print("\nWhite Space: 'premium + innovative' quadrant appears unoccupied")
#> 
#> White Space: 'premium + innovative' quadrant appears unoccupied
print("Recommendation: Explore premium positioning for Airtel or MTN")
#> Recommendation: Explore premium positioning for Airtel or MTN

📝 Section 46.5 Review Questions

1. TF-IDF Interpretation If the word “network” has high TF-IDF for MTN but low for Glo, what does this mean? Why might it be important to MTN’s positioning?

2. PCA Dimensionality You run PCA and find PC1+PC2 explain 72% of variance. Is this sufficient? What would you do if only PC1 explained 45%?

3. White Space Strategy Your perceptual map shows white space in the “premium + digital” quadrant. How would you test if repositioning a brand into this space is viable?

4. Competitive Response After you map positioning, a competitor moves into “your” space. How would you respond? What brand assets or messaging would you emphasize?

5. Statistical Concerns Your perceptual map is based on social media text (self-selected sample). Would you trust it as much as a representative consumer survey? Why or why not?

51.6 Confirmatory Factor Analysis for Brand Equity Measurement

Brand equity is multidimensional: awareness, associations, perceived quality, and loyalty are distinct yet related constructs. Confirmatory Factor Analysis (CFA) formalizes this structure. Instead of averaging all items into one score, CFA estimates separate latent factors and checks that the survey items (observed variables) genuinely measure each intended construct. This is validity testing: do the items measure what they’re supposed to?

A typical brand equity CFA model specifies four latent factors (awareness, associations, perceived quality, loyalty), each measured by 3–4 survey items. For example: - Awareness: “I am aware of [brand],” “I know [brand] well,” “I recognize [brand] quickly” - Perceived Quality: “[Brand] offers high quality,” “[Brand] is reliable,” “[Brand] is superior to alternatives” - Loyalty: “I am loyal to [brand],” “I would recommend [brand],” “I prefer [brand] to alternatives”

CFA estimates factor loadings (correlation between each item and its latent factor), latent factor means and variances, and correlations between factors. Model fit indices (CFI, RMSEA, SRMR) assess whether the data confirm the hypothesised structure. Good fit (CFI > 0.90, RMSEA < 0.08, SRMR < 0.06) suggests the factor structure is valid. Poor fit signals either measurement error (bad items) or misspecified structure (factors are not independent, or there are additional factors).

📘 Theory: Confirmatory Factor Analysis

CFA is a structural equation model (SEM) where observed variables load onto latent factors. The model: \(\mathbf{y}_i = \boldsymbol{\Lambda} \boldsymbol{\eta}_i + \boldsymbol{\epsilon}_i\), where \(\mathbf{y}_i\) is the vector of observed items for respondent \(i\), \(\boldsymbol{\eta}_i\) is the latent factor vector, \(\boldsymbol{\Lambda}\) is the loading matrix, and \(\boldsymbol{\epsilon}_i\) is measurement error. Estimation uses maximum likelihood; fit is assessed via chi-square test, CFI (Comparative Fit Index), RMSEA (Root Mean Square Error of Approximation), and SRMR (Standardized Root Mean Square Residual). Modification indices suggest freeing constrained parameters if fit is poor.

🔑 Formula: CFA Model

\[\mathbf{y}_i = \boldsymbol{\Lambda} \boldsymbol{\eta}_i + \boldsymbol{\epsilon}_i\]

where \(\boldsymbol{\eta}_i \sim \text{Normal}(0, \boldsymbol{\Psi})\) (latent factors), and \(\boldsymbol{\epsilon}_i \sim \text{Normal}(0, \boldsymbol{\Theta})\) (measurement error).

Cronbach’s Alpha (internal consistency): \[\alpha = \frac{k}{k-1} \times \frac{\text{Var}(T) - \sum_{i=1}^k \text{Var}(Y_i)}{\text{Var}(T)}\]

where \(k\) is number of items and \(T\) is total score.

Show code

library(tidyverse)
library(lavaan)
library(semPlot)

# Synthetic brand equity data: 300 respondents, 4 latent factors, 12 items
set.seed(9156)

n_respondents <- 300

# True latent factors (standardised)
awareness <- rnorm(n_respondents, 0, 1)
associations <- rnorm(n_respondents, 0, 1)
perceived_quality <- rnorm(n_respondents, 0.2, 1)
loyalty <- rnorm(n_respondents, 0.1, 1)

# Generate observed items with loadings
data_equity <- tibble(
  # Awareness items (loadings ≈ 0.85, 0.80, 0.78)
  Aware_1 = 0.85 * awareness + rnorm(n_respondents, 0, 0.3),
  Aware_2 = 0.80 * awareness + rnorm(n_respondents, 0, 0.35),
  Aware_3 = 0.78 * awareness + rnorm(n_respondents, 0, 0.35),
  # Association items (loadings ≈ 0.82, 0.79, 0.75)
  Assoc_1 = 0.82 * associations + rnorm(n_respondents, 0, 0.35),
  Assoc_2 = 0.79 * associations + rnorm(n_respondents, 0, 0.38),
  Assoc_3 = 0.75 * associations + rnorm(n_respondents, 0, 0.40),
  # Perceived quality items (loadings ≈ 0.88, 0.84, 0.81)
  Quality_1 = 0.88 * perceived_quality + rnorm(n_respondents, 0, 0.25),
  Quality_2 = 0.84 * perceived_quality + rnorm(n_respondents, 0, 0.3),
  Quality_3 = 0.81 * perceived_quality + rnorm(n_respondents, 0, 0.32),
  # Loyalty items (loadings ≈ 0.86, 0.83, 0.80)
  Loyalty_1 = 0.86 * loyalty + rnorm(n_respondents, 0, 0.28),
  Loyalty_2 = 0.83 * loyalty + rnorm(n_respondents, 0, 0.31),
  Loyalty_3 = 0.80 * loyalty + rnorm(n_respondents, 0, 0.33)
) |>
  # Scale to 1-7 Likert
  mutate(across(everything(), function(x) pmin(pmax(round(4 + x), 1), 7)))

cat("=== Brand Equity Survey Data (First 10 Respondents) ===\n")
#> === Brand Equity Survey Data (First 10 Respondents) ===
print(head(data_equity, 10))
#> # A tibble: 10 × 12
#>    Aware_1 Aware_2 Aware_3 Assoc_1 Assoc_2 Assoc_3 Quality_1 Quality_2 Quality_3
#>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>
#>  1       5       6       6       4       3       3         4         3         3
#>  2       4       5       5       5       5       5         5         5         4
#>  3       4       4       4       3       4       3         3         3         3
#>  4       5       4       5       3       3       3         3         3         3
#>  5       4       5       4       5       5       4         4         5         4
#>  6       5       5       4       4       4       4         5         5         5
#>  7       5       5       5       3       2       3         4         4         4
#>  8       3       4       4       4       4       4         5         5         5
#>  9       4       5       5       3       2       3         4         4         4
#> 10       4       4       5       3       4       3         4         4         4
#> # ℹ 3 more variables: Loyalty_1 <dbl>, Loyalty_2 <dbl>, Loyalty_3 <dbl>

# Fit CFA model
cfa_model <- '
  # Latent factors
  Awareness =~ Aware_1 + Aware_2 + Aware_3
  Associations =~ Assoc_1 + Assoc_2 + Assoc_3
  Quality =~ Quality_1 + Quality_2 + Quality_3
  Loyalty =~ Loyalty_1 + Loyalty_2 + Loyalty_3
'

cfa_fit <- cfa(cfa_model, data = data_equity)

cat("\n=== CFA Model Summary ===\n")
#> 
#> === CFA Model Summary ===
print(summary(cfa_fit, fit.measures = TRUE, standardized = TRUE))
#> lavaan 0.6-21 ended normally after 34 iterations
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        30
#> 
#>   Number of observations                           300
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                                44.662
#>   Degrees of freedom                                48
#>   P-value (Chi-square)                           0.610
#> 
#> Model Test Baseline Model:
#> 
#>   Test statistic                              2720.445
#>   Degrees of freedom                                66
#>   P-value                                        0.000
#> 
#> User Model versus Baseline Model:
#> 
#>   Comparative Fit Index (CFI)                    1.000
#>   Tucker-Lewis Index (TLI)                       1.002
#> 
#> Loglikelihood and Information Criteria:
#> 
#>   Loglikelihood user model (H0)              -3596.264
#>   Loglikelihood unrestricted model (H1)      -3573.933
#>                                                       
#>   Akaike (AIC)                                7252.528
#>   Bayesian (BIC)                              7363.642
#>   Sample-size adjusted Bayesian (SABIC)       7268.500
#> 
#> Root Mean Square Error of Approximation:
#> 
#>   RMSEA                                          0.000
#>   90 Percent confidence interval - lower         0.000
#>   90 Percent confidence interval - upper         0.033
#>   P-value H_0: RMSEA <= 0.050                    0.999
#>   P-value H_0: RMSEA >= 0.080                    0.000
#> 
#> Standardized Root Mean Square Residual:
#> 
#>   SRMR                                           0.022
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Latent Variables:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>   Awareness =~                                                          
#>     Aware_1           1.000                               0.890    0.909
#>     Aware_2           0.977    0.047   20.945    0.000    0.869    0.874
#>     Aware_3           0.896    0.043   20.862    0.000    0.797    0.872
#>   Associations =~                                                       
#>     Assoc_1           1.000                               0.847    0.896
#>     Assoc_2           1.042    0.055   19.025    0.000    0.883    0.860
#>     Assoc_3           0.904    0.048   18.718    0.000    0.766    0.848
#>   Quality =~                                                            
#>     Quality_1         1.000                               0.865    0.913
#>     Quality_2         0.921    0.040   22.999    0.000    0.796    0.905
#>     Quality_3         0.864    0.041   21.193    0.000    0.747    0.862
#>   Loyalty =~                                                            
#>     Loyalty_1         1.000                               0.953    0.927
#>     Loyalty_2         0.979    0.037   26.636    0.000    0.933    0.926
#>     Loyalty_3         0.888    0.037   24.210    0.000    0.847    0.886
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>   Awareness ~~                                                          
#>     Associations      0.102    0.048    2.108    0.035    0.135    0.135
#>     Quality           0.028    0.048    0.578    0.563    0.036    0.036
#>     Loyalty           0.049    0.053    0.931    0.352    0.058    0.058
#>   Associations ~~                                                       
#>     Quality           0.025    0.046    0.550    0.582    0.035    0.035
#>     Loyalty           0.072    0.051    1.410    0.159    0.089    0.089
#>   Quality ~~                                                            
#>     Loyalty          -0.017    0.051   -0.342    0.733   -0.021   -0.021
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>    .Aware_1           0.167    0.026    6.481    0.000    0.167    0.174
#>    .Aware_2           0.234    0.028    8.268    0.000    0.234    0.236
#>    .Aware_3           0.201    0.024    8.367    0.000    0.201    0.240
#>    .Assoc_1           0.176    0.027    6.429    0.000    0.176    0.197
#>    .Assoc_2           0.275    0.034    8.125    0.000    0.275    0.261
#>    .Assoc_3           0.228    0.027    8.557    0.000    0.228    0.280
#>    .Quality_1         0.149    0.022    6.663    0.000    0.149    0.167
#>    .Quality_2         0.140    0.020    7.137    0.000    0.140    0.181
#>    .Quality_3         0.194    0.021    9.154    0.000    0.194    0.258
#>    .Loyalty_1         0.150    0.022    6.868    0.000    0.150    0.141
#>    .Loyalty_2         0.146    0.021    6.945    0.000    0.146    0.143
#>    .Loyalty_3         0.196    0.021    9.156    0.000    0.196    0.215
#>     Awareness         0.792    0.080    9.883    0.000    1.000    1.000
#>     Associations      0.718    0.075    9.538    0.000    1.000    1.000
#>     Quality           0.748    0.075   10.017    0.000    1.000    1.000
#>     Loyalty           0.909    0.087   10.394    0.000    1.000    1.000

# Extract parameter estimates
params <- parameterEstimates(cfa_fit, standardized = TRUE) |>
  filter(op == "=~") |>  # Factor loadings only
  select(lhs, rhs, est, std.all, pvalue)

cat("\n=== Factor Loadings (Standardised) ===\n")
#> 
#> === Factor Loadings (Standardised) ===
print(params |> arrange(lhs, rhs) |>
      rename(Factor = lhs, Item = rhs, Loading = std.all))
#>          Factor      Item   est Loading pvalue
#> 1  Associations   Assoc_1 1.000   0.896     NA
#> 2  Associations   Assoc_2 1.042   0.860      0
#> 3  Associations   Assoc_3 0.904   0.848      0
#> 4     Awareness   Aware_1 1.000   0.909     NA
#> 5     Awareness   Aware_2 0.977   0.874      0
#> 6     Awareness   Aware_3 0.896   0.872      0
#> 7       Loyalty Loyalty_1 1.000   0.927     NA
#> 8       Loyalty Loyalty_2 0.979   0.926      0
#> 9       Loyalty Loyalty_3 0.888   0.886      0
#> 10      Quality Quality_1 1.000   0.913     NA
#> 11      Quality Quality_2 0.921   0.905      0
#> 12      Quality Quality_3 0.864   0.862      0

# Model fit indices
fit_indices <- fitMeasures(cfa_fit, c("cfi", "tli", "rmsea", "srmr", "chisq", "df", "pvalue"))

cat("\n=== Model Fit Indices ===\n")
#> 
#> === Model Fit Indices ===
cat("CFI (Comparative Fit Index):", round(fit_indices["cfi"], 3),
    "(target > 0.90)\n")
#> CFI (Comparative Fit Index): 1 (target > 0.90)
cat("TLI (Tucker-Lewis Index):", round(fit_indices["tli"], 3),
    "(target > 0.90)\n")
#> TLI (Tucker-Lewis Index): 1.002 (target > 0.90)
cat("RMSEA (Root Mean Square Error of Approximation):", round(fit_indices["rmsea"], 3),
    "(target < 0.08)\n")
#> RMSEA (Root Mean Square Error of Approximation): 0 (target < 0.08)
cat("SRMR (Standardised Root Mean Square Residual):", round(fit_indices["srmr"], 3),
    "(target < 0.06)\n")
#> SRMR (Standardised Root Mean Square Residual): 0.022 (target < 0.06)
cat("Chi-square:", round(fit_indices["chisq"], 2), "df =", fit_indices["df"],
    "p =", round(fit_indices["pvalue"], 3), "\n")
#> Chi-square: 44.66 df = 48 p = 0.61

# Reliability (Cronbach's alpha for each scale)
# Compute composite reliability from loadings and error variances
cat("\n=== Reliability Analysis ===\n")
#> 
#> === Reliability Analysis ===

# Awareness scale
aware_items <- data_equity |> select(starts_with("Aware"))
alpha_aware <- psych::alpha(aware_items, check.keys = FALSE)$total$raw_alpha

# Associations scale
assoc_items <- data_equity |> select(starts_with("Assoc"))
alpha_assoc <- psych::alpha(assoc_items, check.keys = FALSE)$total$raw_alpha

# Quality scale
quality_items <- data_equity |> select(starts_with("Quality"))
alpha_quality <- psych::alpha(quality_items, check.keys = FALSE)$total$raw_alpha

# Loyalty scale
loyalty_items <- data_equity |> select(starts_with("Loyalty"))
alpha_loyalty <- psych::alpha(loyalty_items, check.keys = FALSE)$total$raw_alpha

reliability_df <- tibble(
  Factor = c("Awareness", "Associations", "Quality", "Loyalty"),
  Cronbach_Alpha = c(alpha_aware, alpha_assoc, alpha_quality, alpha_loyalty),
  Interpretation = case_when(
    Cronbach_Alpha > 0.80 ~ "Excellent",
    Cronbach_Alpha > 0.70 ~ "Good",
    Cronbach_Alpha > 0.60 ~ "Acceptable",
    TRUE ~ "Poor"
  )
)

print(reliability_df)
#> # A tibble: 4 × 3
#>   Factor       Cronbach_Alpha Interpretation
#>   <chr>                 <dbl> <chr>         
#> 1 Awareness             0.915 Excellent     
#> 2 Associations          0.900 Excellent     
#> 3 Quality               0.921 Excellent     
#> 4 Loyalty               0.937 Excellent

# Factor correlations
cat("\n=== Factor Correlations ===\n")
#> 
#> === Factor Correlations ===
latent_cors <- lavInspect(cfa_fit, "cor.lv")
print(round(latent_cors, 3))
#>              Awrnss Assctn Qualty Loylty
#> Awareness     1.000                     
#> Associations  0.135  1.000              
#> Quality       0.036  0.035  1.000       
#> Loyalty       0.058  0.089 -0.021  1.000

# Compute latent factor scores for each respondent
factor_scores <- predict(cfa_fit)

cat("\n=== Latent Factor Scores (First 10 Respondents) ===\n")
#> 
#> === Latent Factor Scores (First 10 Respondents) ===
print(round(factor_scores[1:10, ], 2))
#>       Awareness Associations Quality Loyalty
#>  [1,]      1.48        -0.56   -0.81   -0.62
#>  [2,]      0.56         0.89    0.55    0.36
#>  [3,]     -0.03        -0.70   -1.19    0.60
#>  [4,]      0.64        -0.96   -1.18   -0.98
#>  [5,]      0.27         0.62    0.17    1.95
#>  [6,]      0.65        -0.03    0.79   -0.01
#>  [7,]      0.93        -1.19   -0.20    1.69
#>  [8,]     -0.40        -0.04    0.79    0.96
#>  [9,]      0.53        -1.21   -0.20   -0.01
#> [10,]      0.27        -0.69   -0.20    1.21

Show code

import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt

# Simulate brand equity data with 4 latent factors
np.random.seed(9156)

n_respondents = 300

# True latent factors
awareness = np.random.normal(0, 1, n_respondents)
associations = np.random.normal(0, 1, n_respondents)
perceived_quality = np.random.normal(0.2, 1, n_respondents)
loyalty = np.random.normal(0.1, 1, n_respondents)

# Generate observed items
data_equity = pd.DataFrame({
    # Awareness (loadings: 0.85, 0.80, 0.78)
    'Aware_1': np.clip(np.round(4 + 0.85 * awareness + np.random.normal(0, 0.3, n_respondents)), 1, 7),
    'Aware_2': np.clip(np.round(4 + 0.80 * awareness + np.random.normal(0, 0.35, n_respondents)), 1, 7),
    'Aware_3': np.clip(np.round(4 + 0.78 * awareness + np.random.normal(0, 0.35, n_respondents)), 1, 7),
    # Associations (loadings: 0.82, 0.79, 0.75)
    'Assoc_1': np.clip(np.round(4 + 0.82 * associations + np.random.normal(0, 0.35, n_respondents)), 1, 7),
    'Assoc_2': np.clip(np.round(4 + 0.79 * associations + np.random.normal(0, 0.38, n_respondents)), 1, 7),
    'Assoc_3': np.clip(np.round(4 + 0.75 * associations + np.random.normal(0, 0.40, n_respondents)), 1, 7),
    # Quality (loadings: 0.88, 0.84, 0.81)
    'Quality_1': np.clip(np.round(4 + 0.88 * perceived_quality + np.random.normal(0, 0.25, n_respondents)), 1, 7),
    'Quality_2': np.clip(np.round(4 + 0.84 * perceived_quality + np.random.normal(0, 0.3, n_respondents)), 1, 7),
    'Quality_3': np.clip(np.round(4 + 0.81 * perceived_quality + np.random.normal(0, 0.32, n_respondents)), 1, 7),
    # Loyalty (loadings: 0.86, 0.83, 0.80)
    'Loyalty_1': np.clip(np.round(4 + 0.86 * loyalty + np.random.normal(0, 0.28, n_respondents)), 1, 7),
    'Loyalty_2': np.clip(np.round(4 + 0.83 * loyalty + np.random.normal(0, 0.31, n_respondents)), 1, 7),
    'Loyalty_3': np.clip(np.round(4 + 0.80 * loyalty + np.random.normal(0, 0.33, n_respondents)), 1, 7)
})

print("=== Brand Equity Survey Data (First 10 Respondents) ===")
#> === Brand Equity Survey Data (First 10 Respondents) ===
print(data_equity.head(10))
#>    Aware_1  Aware_2  Aware_3  ...  Loyalty_1  Loyalty_2  Loyalty_3
#> 0      6.0      6.0      5.0  ...        3.0        4.0        4.0
#> 1      3.0      2.0      3.0  ...        6.0        5.0        6.0
#> 2      5.0      4.0      3.0  ...        4.0        4.0        5.0
#> 3      4.0      5.0      4.0  ...        5.0        4.0        4.0
#> 4      5.0      4.0      4.0  ...        3.0        3.0        2.0
#> 5      4.0      4.0      4.0  ...        4.0        3.0        4.0
#> 6      3.0      2.0      3.0  ...        4.0        3.0        3.0
#> 7      5.0      4.0      5.0  ...        4.0        4.0        4.0
#> 8      4.0      4.0      4.0  ...        7.0        6.0        6.0
#> 9      3.0      3.0      3.0  ...        4.0        5.0        5.0
#> 
#> [10 rows x 12 columns]

# Cronbach's Alpha calculation
def cronbachs_alpha(data):
    k = data.shape[1]
    var_total = data.sum(axis=1).var()
    var_items = data.var(axis=0).sum()
    alpha = (k / (k - 1)) * (1 - var_items / var_total)
    return alpha

alpha_aware = cronbachs_alpha(data_equity[['Aware_1', 'Aware_2', 'Aware_3']])
alpha_assoc = cronbachs_alpha(data_equity[['Assoc_1', 'Assoc_2', 'Assoc_3']])
alpha_quality = cronbachs_alpha(data_equity[['Quality_1', 'Quality_2', 'Quality_3']])
alpha_loyalty = cronbachs_alpha(data_equity[['Loyalty_1', 'Loyalty_2', 'Loyalty_3']])

print("\n=== Reliability Analysis (Cronbach's Alpha) ===")
#> 
#> === Reliability Analysis (Cronbach's Alpha) ===
reliability = pd.DataFrame({
    'Factor': ['Awareness', 'Associations', 'Quality', 'Loyalty'],
    'Cronbach_Alpha': [alpha_aware, alpha_assoc, alpha_quality, alpha_loyalty],
})
reliability['Interpretation'] = reliability['Cronbach_Alpha'].apply(
    lambda x: 'Excellent' if x > 0.80 else ('Good' if x > 0.70 else ('Acceptable' if x > 0.60 else 'Poor'))
)
print(reliability.to_string(index=False))
#>       Factor  Cronbach_Alpha Interpretation
#>    Awareness        0.892922      Excellent
#> Associations        0.879903      Excellent
#>      Quality        0.931980      Excellent
#>      Loyalty        0.921225      Excellent

# Compute latent factor scores (simple method: average of items)
factor_scores = pd.DataFrame({
    'Awareness': data_equity[['Aware_1', 'Aware_2', 'Aware_3']].mean(axis=1),
    'Associations': data_equity[['Assoc_1', 'Assoc_2', 'Assoc_3']].mean(axis=1),
    'Quality': data_equity[['Quality_1', 'Quality_2', 'Quality_3']].mean(axis=1),
    'Loyalty': data_equity[['Loyalty_1', 'Loyalty_2', 'Loyalty_3']].mean(axis=1)
})

print("\n=== Latent Factor Scores (First 10 Respondents) ===")
#> 
#> === Latent Factor Scores (First 10 Respondents) ===
print(factor_scores.head(10).round(2))
#>    Awareness  Associations  Quality  Loyalty
#> 0       5.67          4.33     4.67     3.67
#> 1       2.67          4.67     4.67     5.67
#> 2       4.00          4.67     3.33     4.33
#> 3       4.33          2.67     5.00     4.33
#> 4       4.33          4.00     4.00     2.67
#> 5       4.00          2.00     4.00     3.67
#> 6       2.67          3.33     3.00     3.33
#> 7       4.67          5.00     4.00     4.00
#> 8       4.00          2.67     3.67     6.33
#> 9       3.00          4.00     5.00     4.67

# Factor correlations
print("\n=== Factor Correlations ===")
#> 
#> === Factor Correlations ===
factor_cors = factor_scores.corr()
print(factor_cors.round(3))
#>               Awareness  Associations  Quality  Loyalty
#> Awareness         1.000        -0.027    0.018    0.044
#> Associations     -0.027         1.000   -0.014   -0.019
#> Quality           0.018        -0.014    1.000    0.006
#> Loyalty           0.044        -0.019    0.006    1.000

# Visualise factor structure
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

for idx, (factor, items) in enumerate([
    ('Awareness', ['Aware_1', 'Aware_2', 'Aware_3']),
    ('Associations', ['Assoc_1', 'Assoc_2', 'Assoc_3']),
    ('Quality', ['Quality_1', 'Quality_2', 'Quality_3']),
    ('Loyalty', ['Loyalty_1', 'Loyalty_2', 'Loyalty_3'])
]):
    row, col = idx // 2, idx % 2
    ax = axes[row, col]

    # Plot item distributions
    for item in items:
        ax.hist(data_equity[item], bins=7, alpha=0.5, label=item)

    ax.set_xlabel('Item Score (1-7)', fontsize=10)
    ax.set_ylabel('Frequency', fontsize=10)
    ax.set_title(f'{factor} Items Distribution', fontsize=11, fontweight='bold')
    ax.legend()
    ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

Show code


# Correlation heatmap for factor scores
fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(factor_cors.values, cmap='coolwarm', vmin=-1, vmax=1)
ax.set_xticks(np.arange(len(factor_cors.columns)))
ax.set_yticks(np.arange(len(factor_cors.columns)))
ax.set_xticklabels(factor_cors.columns)
ax.set_yticklabels(factor_cors.columns)
plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")
#> [None, None, None, None, None, None, None, None, None, None, None, None]

for i in range(len(factor_cors.columns)):
    for j in range(len(factor_cors.columns)):
        text = ax.text(j, i, f'{factor_cors.iloc[i, j]:.2f}',
                      ha="center", va="center", color="black", fontsize=10)

ax.set_title('Brand Equity Factor Correlations', fontsize=12, fontweight='bold')
plt.colorbar(im, ax=ax)
#> <matplotlib.colorbar.Colorbar object at 0x0000020D342A7380>
plt.tight_layout()
plt.show()

📝 Section 46.6 Review Questions

1. CFA vs EFA When would you use Exploratory Factor Analysis (EFA) vs CFA? Which is appropriate for validating a pre-existing brand equity framework?

2. Loadings Interpretation A survey item has a standardised loading of 0.65 on its intended factor. Is this acceptable? What would you do if loadings are below 0.50?

3. Discriminant Validity Two factors (perceived quality and loyalty) have a correlation of 0.92. Does this indicate a problem? How would you investigate?

4. Model Fit vs Theoretical Fit Your CFA model fits the data well (CFI = 0.95, RMSEA = 0.04). Should you accept it? What if the model doesn’t make theoretical sense?

5. Invariance Testing You want to compare brand equity across demographic groups (urban/rural, age groups). How would you test whether the CFA structure holds across groups?

51.7 Case Study: Brand Equity Tracking for Nigerian Telecoms

Objective: A Nigerian financial services firm wants to benchmark its brand positioning against competitors (GTBank, UBA, Zenith) over six months, and recommend repositioning if needed.

Data Collection: - Monthly brand tracking survey: 800 respondents per month (4,800 total across 6 months), representative of Nigeria’s urban banked population (ages 18–65, income ₦500k+) - Measures: brand funnel (awareness, consideration, preference, usage), NPS, loyalty index, perceived quality, associations, and social media sentiment (from concurrent social listening)

Analysis Framework:

Brand Funnel Evolution (Mixed-Effects Model) Fit a random-intercept mixed-effects model with month as fixed effect and respondent as random effect. Results:

GTBank: Awareness stable at 78%, Consideration rising from 48% → 54%, Preference from 28% → 32% (trend = +0.8 pp/month, p < 0.01)
UBA: Awareness stable at 72%, Consideration flat at 45%, Preference declining from 26% → 22% (trend = −0.8 pp/month, p < 0.01)
Zenith: Awareness declining from 65% → 62%, Consideration declining, Preference flat at 16%

Interpretation: GTBank is gaining mindshare; UBA is stagnating; Zenith is weakening. GTBank’s month-on-month gain correlates with a ₦80m digital campaign (cross-reference with MMM).

Social Media Sentiment & SOV Over 6 months, GTBank’s social media SOV rises from 30% → 36%, Airtel flat at 32%, Glo declining from 22% → 18%, 9mobile at 10%. Sentiment scores: GTBank positive 62% (rising from 58%), Glo positive 51% (falling from 56%), Zenith positive 48% (stable). Association analysis (TF-IDF + PCA) shows GTBank increasingly linked to “digital,” “fintech,” “innovation”; Zenith still linked to “traditional,” “conservative.”
Perceptual Map Evolution Dimension 1 (PC1): “Digital-to-Traditional” (75% variance explained) Dimension 2 (PC2): “Premium-to-Accessible” (15% variance explained)

Positioning at Month 1: GTBank (0.8, 0.5), UBA (0.4, 0.3), Zenith (−0.6, 0.1), 9mobile (−0.2, −0.5) Positioning at Month 6: GTBank (1.2, 0.6), UBA (0.5, 0.2), Zenith (−0.8, 0.0), 9mobile (−0.1, −0.6)

Interpretation: GTBank is moving more digital and premium; UBA is static; Zenith slipping into traditional-accessible corner; white space in “premium + traditional” (niche for premium offline banking) is unoccupied.

Brand Equity CFA Fit a 4-factor CFA (awareness, associations, perceived quality, loyalty) for each bank. Cronbach’s alphas:

GTBank: Awareness 0.84, Associations 0.81, Quality 0.86, Loyalty 0.82 (all > 0.80, excellent)
UBA: Awareness 0.79, Associations 0.74, Quality 0.77, Loyalty 0.75 (good-acceptable range)
Zenith: Awareness 0.71, Associations 0.66, Quality 0.68, Loyalty 0.64 (acceptable-poor range)

Factor correlations for GTBank: all factors correlate 0.65–0.78, indicating they’re distinct but related (discriminant validity OK). For Zenith, associations-quality correlation = 0.92 (potential overlap; recommend item review).

Strategic Recommendations

GTBank: Maintain digital-premium positioning. Continue ₦80m monthly digital spend (strong ROI evident in funnel gains). Test premium-offline offerings (private banking, wealth management) to occupy white space.
UBA: Reposition from “accessible digital” to “premium digital” to differentiate from GTBank. Current positioning overlaps; recommend ₦50m shift from mass-market channels to high-income targeting, premium fintech partnerships.
Zenith: Crisis intervention. Associations and quality perception collapsing. Recommend: (a) short-term: PR campaign addressing customer pain points; (b) long-term: reposition as “trusted traditional bank with modern convenience” to occupy premium-traditional white space; allocate ₦30m to brand refresh campaign.
9mobile (Cross-Check with Telecom Data): Position as challenger/value disruptor. Segment marketing by use case (data plans for students, bundles for rural), avoiding direct premium competition.

Caveats: Survey-based brand tracking captures perceptions, not behavior. A respondent might report high preference for GTBank but maintain loyalty to UBA due to switching costs, employer benefits, or inertia. Integrate brand metrics with purchase-behavior data (transaction data) for comprehensive picture. Also, funnel gains for GTBank (month 1→6) may be due to campaign spend (selection bias), not campaign effectiveness—consider causal inference methods (matching, IV) for rigorous attribution.

51.8 Chapter Exercises

Chapter 46 Exercises

Exercise 46.1: The Brand Funnel

A major Nigerian bank commissions a brand tracking survey. From 1,000 respondents in its target market (working adults aged 25–55), the research firm reports:

Funnel Stage	Count	% of Target Market
Awareness (have heard of the bank)	920	92%
Familiarity (know at least one product)	640	64%
Consideration (would consider using)	380	38%
Preference (prefer over other banks)	190	19%
Current Customer	120	12%

Calculate the conversion rate between each consecutive funnel stage (e.g., Awareness → Familiarity: 640/920 = 69.6%). Where is the biggest “leakage” in the funnel?
A competitor bank has identical awareness (92%) but a Familiarity rate of 78% and Consideration rate of 55%. Compare the two banks’ funnels. What does the competitor do better, and what business implication does this have?
If the bank doubles its marketing spend to raise Awareness from 92% to 98% (6 percentage points), but does nothing about the Awareness→Familiarity conversion, how many more customers would it gain (at the end of the funnel)? Show your calculation.
Alternatively, if the bank improves its Consideration→Preference conversion rate from 50% to 60% (while keeping all other stages constant), how many more customers does it gain? Which investment — improving Awareness or improving Preference conversion — is more valuable?
Explain why brand funnels do not always flow in one direction. Give two real-world examples where a customer might move backward up the funnel (e.g., from customer to non-consideration).

Exercise 46.2: Net Promoter Score in Practice

A telecoms brand surveys 500 customers and receives these responses to: “On a scale of 0–10, how likely are you to recommend us to a friend?”

Score	0	1	2	3	4	5	6	7	8	9	10
Responses	8	4	5	9	12	25	35	60	110	140	92

Classify each score: Detractors (0–6), Passives (7–8), Promoters (9–10). Calculate the total count and percentage for each group.
Calculate the NPS: % Promoters − % Detractors.
The industry average NPS for Nigerian telecoms is +15. How does this brand compare?
A brand manager says: “Our NPS went from +18 last quarter to +22 this quarter — great improvement!” A statistician says: “Not so fast.” What concern would the statistician raise? (Hint: think about sample size and confidence intervals.)
NPS is often criticised as a “blunt instrument” — it reduces rich customer opinion to a single number. List two pieces of valuable information that NPS does NOT capture, and suggest one additional question you would add to the survey to capture each.

Exercise 46.3: Brand Perception and Positioning

A perceptual mapping study asks 200 consumers to rate four Nigerian banks on five attributes, each on a 1–7 scale:

Bank	Innovation	Trustworthiness	Convenience	Customer Service	Value for Money
Bank A	6.1	5.2	5.8	4.9	4.2
Bank B	3.8	6.5	4.1	5.8	5.6
Bank C	5.2	4.8	6.4	4.2	3.9
Bank D	4.0	5.0	3.8	3.5	6.3

Which bank is perceived as the most innovative? The most trustworthy? The best value?
Bank A scores highest on Innovation but lowest on Value for Money. What positioning strategy does this reflect? Who is its target customer?
A new bank is entering the market. Looking at the gaps in this perceptual map, what positioning — which combination of attribute strengths — is least occupied and might represent a viable market opportunity?
If you were advising Bank D (strong on Value, weak on Customer Service), what investment or campaign strategy would you recommend to improve its competitive position without abandoning its value positioning?
Perceptual maps based on consumer ratings can differ from maps based on actual product/service performance data. Explain why this gap might exist and what it means for brand strategy.

Exercise 46.4: Share of Voice and Social Monitoring

You are tracking brand health for a Nigerian FMCG company using social media data over a 12-week period. The company and its two main competitors generate the following weekly mention counts on Twitter:

Week	Your Brand	Competitor A	Competitor B
1–4	~800/wk	~1,200/wk	~600/wk
5	2,500	1,100	550
6–8	~900/wk	~1,100/wk	~650/wk
9	850	850	2,800
10–12	~850/wk	~1,150/wk	~500/wk

Calculate your brand’s Share of Voice (SOV) for three periods: Weeks 1–4 (baseline), Week 5 (product launch), and Week 9 (Competitor B crisis). SOV = your brand mentions ÷ total category mentions.
After Week 5, your SOV returns to baseline despite the product launch. What does this suggest about the longevity of the launch’s impact? What could you do to sustain higher SOV?
In Week 9, Competitor B’s mentions spike. This is a crisis situation for them. Explain two different marketing strategies your company could employ during Competitor B’s crisis period.
Mentions are not all equal — a negative mention is very different from a positive one. How would you modify the SOV calculation to account for sentiment? Propose a specific formula.
SOV data from social media may not represent the views of your entire target market. What types of customers are likely overrepresented in social media brand discussions, and what types are likely underrepresented?

Exercise 46.5: Capstone — Brand Health Report

You are a brand analyst for a leading Nigerian beverage company. You have access to: - Monthly NPS data for 24 months (2 years) - Quarterly brand funnel data (awareness through preference) - Social media mention sentiment scores (weekly) - Advertising spend by channel (weekly) - Sales data (weekly)

Design a brand health scorecard with 6 key metrics. For each metric, specify: what it measures, how often it should be tracked, and what threshold or benchmark would trigger action.
Over the past year, your NPS has declined from +32 to +18, while sales have remained flat. How do you interpret this combination? What might it suggest about future sales risk?
Social media sentiment shows that negative mentions about “product availability” (out-of-stock situations) have tripled in the last 3 months. This is a supply chain issue, not a brand quality issue. How do supply chain problems affect brand perception, and what should the brand team communicate to consumers during this period?
The CEO asks: “What is our brand worth in monetary terms?” Describe two approaches to brand valuation (one financial, one survey-based) and explain what each approach does and does not measure.
Write a 250-word Brand Health Executive Summary for the board, covering: the current state of the brand, the most important risk, and one specific recommended action. Write in plain English suitable for non-marketing board members.

51.9 Further Reading

David Aaker (1991). “Managing Brand Equity.” Foundational text on brand equity framework and measurement.
Kevin Lane Keller (2008). “Strategic Brand Management” (3rd ed.). Comprehensive coverage of brand positioning, brand equity, and tracking.
Wayne DeLozier & David Fernandez (2014). “The Brand Equity Tracking Study” in Journal of Brand Strategy. Best practices in longitudinal brand measurement.
Lavaan R package: Documentation on CFA and SEM at https://lavaan.ugent.be/.
PyMC-Marketing Brand Tracking Module: Bayesian approach to brand funnel forecasting.

51.10 Chapter 46 Appendix: Mathematical Derivations

51.10.1 A46.1 Aaker Brand Equity Model Formulation

Aaker’s brand equity model posits four pillars:

\[\text{Brand Equity} = f(\text{Awareness}, \text{Associations}, \text{Perceived Quality}, \text{Loyalty})\]

A multiplicative decomposition is:

\[\text{Brand Equity} = \text{Awareness} \times (\text{Perceived Quality} + \text{Associations} + \text{Loyalty}) / 3\]

This implies zero brand equity if awareness = 0 (necessary condition). More generally, Rasch models or item response theory (IRT) can map survey responses to latent equity scores with formal psychometric properties.

51.10.2 A46.2 Net Promoter Decomposition

NPS is defined as:

\[\text{NPS} = P(\text{Promoter}) - P(\text{Detractor}) = P(9-10) - P(0-6)\]

The likelihood of recommending follows from perceived quality and satisfaction. Under a Thurstone-Halo model:

\[P(R_i = j) = \Phi\left(\frac{\theta_j - \eta_i}{\sigma}\right)\]

where \(\eta_i\) is respondent satisfaction, \(\theta_j\) are category thresholds (0/6 boundary, 7/8, 9/10), and \(\Phi\) is the cumulative normal. High-engagement segments (frequent users) have higher \(\eta_i\).

51.10.3 A46.3 Mixed-Effects Model REML Estimation

The log-likelihood for REML is:

\[\ell_{\text{REML}} = -\frac{1}{2} \log|\mathbf{V}| - \frac{1}{2} (\mathbf{y} - \mathbf{X} \boldsymbol{\beta})^T \mathbf{V}^{-1} (\mathbf{y} - \mathbf{X} \boldsymbol{\beta}) + \text{const}\]

where \(\mathbf{V} = \mathbf{Z} \boldsymbol{\Psi} \mathbf{Z}^T + \boldsymbol{\Theta}\) is the marginal variance (blocks of random intercept covariance plus residual error). REML concentrates on \(\boldsymbol{\Psi}\) (between-unit variance) and \(\boldsymbol{\Theta}\) (within-unit variance), producing unbiased variance estimates even when fixed effects are estimated.

51.10.4 A46.4 Confirmatory Factor Analysis Identification

A CFA model is identified if parameters can be uniquely determined from the data covariance matrix. Identification rules: (1) Each factor must have at least 3 items, or 2 items with a known correlation; (2) Factor variances or at least one loading per factor must be fixed to a known value (typically 1.0); (3) No correlated residuals between different factors. Under these conditions, ML estimation of \(\boldsymbol{\Lambda}\) (loadings), \(\boldsymbol{\Psi}\) (latent factor variances), and \(\boldsymbol{\Theta}\) (measurement error variances) is identified.

51.10.5 A46.5 TF-IDF and PCA in Dimensionality Reduction

TF-IDF matrix \(\mathbf{M} \in \mathbb{R}^{B \times W}\) (B brands, W words) is standardised and decomposed via SVD: \(\mathbf{M} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^T\). PCA extracts the first k columns of \(\mathbf{V}\) as principal components. The projection onto PC1 and PC2 is:

\[\text{Brand}_i^{\text{PC}} = \mathbf{M}_i \times \mathbf{V}_{:, 1:2}\]

Explained variance by PC k is \(\sigma_k^2 / \sum_j \sigma_j^2\). Typically, k=2 captures 60–80% of variance; higher k better preserves information but reduces interpretability.

End of Chapter 46

	n_components n_components: int, float or 'mle', default=None Number of components to keep. if n_components is not set all components are kept:: n_components == min(n_samples, n_features) If ``n_components == 'mle'`` and ``svd_solver == 'full'``, Minka's MLE is used to guess the dimension. Use of ``n_components == 'mle'`` will interpret ``svd_solver == 'auto'`` as ``svd_solver == 'full'``. If ``0 < n_components < 1`` and ``svd_solver == 'full'``, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components. If ``svd_solver == 'arpack'``, the number of components must be strictly less than the minimum of n_features and n_samples. Hence, the None case results in:: n_components == min(n_samples, n_features) - 1	None
	copy copy: bool, default=True If False, data passed to fit are overwritten and running fit(X).transform(X) will not yield the expected results, use fit_transform(X) instead.	True
	whiten whiten: bool, default=False When True (False by default) the `components_` vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances. Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.	False
	svd_solver svd_solver: {'auto', 'full', 'covariance_eigh', 'arpack', 'randomized'}, default='auto' "auto" : The solver is selected by a default 'auto' policy is based on `X.shape` and `n_components`: if the input data has fewer than 1000 features and more than 10 times as many samples, then the "covariance_eigh" solver is used. Otherwise, if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient "randomized" method is selected. Otherwise the exact "full" SVD is computed and optionally truncated afterwards. "full" : Run exact full SVD calling the standard LAPACK solver via `scipy.linalg.svd` and select the components by postprocessing "covariance_eigh" : Precompute the covariance matrix (on centered data), run a classical eigenvalue decomposition on the covariance matrix typically using LAPACK and select the components by postprocessing. This solver is very efficient for n_samples >> n_features and small n_features. It is, however, not tractable otherwise for large n_features (large memory footprint required to materialize the covariance matrix). Also note that compared to the "full" solver, this solver effectively doubles the condition number and is therefore less numerical stable (e.g. on input data with a large range of singular values). "arpack" : Run SVD truncated to `n_components` calling ARPACK solver via `scipy.sparse.linalg.svds`. It requires strictly `0 < n_components < min(X.shape)` "randomized" : Run randomized SVD by the method of Halko et al. .. versionadded:: 0.18.0 .. versionchanged:: 1.5 Added the 'covariance_eigh' solver.	'auto'
	tol tol: float, default=0.0 Tolerance for singular values computed by svd_solver == 'arpack'. Must be of range [0.0, infinity). .. versionadded:: 0.18.0	0.0
	iterated_power iterated_power: int or 'auto', default='auto' Number of iterations for the power method computed by svd_solver == 'randomized'. Must be of range [0, infinity). .. versionadded:: 0.18.0	'auto'
	n_oversamples n_oversamples: int, default=10 This parameter is only relevant when `svd_solver="randomized"`. It corresponds to the additional number of random vectors to sample the range of `X` so as to ensure proper conditioning. See :func:`~sklearn.utils.extmath.randomized_svd` for more details. .. versionadded:: 1.1	10
	power_iteration_normalizer power_iteration_normalizer: {'auto', 'QR', 'LU', 'none'}, default='auto' Power iteration normalizer for randomized SVD solver. Not used by ARPACK. See :func:`~sklearn.utils.extmath.randomized_svd` for more details. .. versionadded:: 1.1	'auto'
	random_state random_state: int, RandomState instance or None, default=None Used when the 'arpack' or 'randomized' solvers are used. Pass an int for reproducible results across multiple function calls. See :term:`Glossary `. .. versionadded:: 0.18.0	None