33 Sentiment Analysis

📋 Learning Objectives

Understand the three levels of sentiment analysis: document polarity (positive/negative/neutral), aspect-level (opinions on features), and emotion classification
Implement lexicon-based sentiment using VADER and AFINN, with handling of negation and intensifiers
Build and evaluate a supervised sentiment classifier using logistic regression and TF-IDF
Extract aspect-level sentiment from unstructured text using dependency parsing
Apply pre-trained transformer models (RoBERTa, DistilBERT) for state-of-the-art sentiment prediction
Address Nigerian Pidgin and multilingual sentiment challenges using code-switching-aware approaches
Build an interactive sentiment monitoring dashboard for brand reputation tracking

33.1 What Sentiment Analysis Measures

Sentiment analysis quantifies subjective information in text. Three levels exist:

Document Polarity: Classifies an entire document as positive, negative, or neutral. A product review “This phone is excellent quality, great battery life” is positive; “The screen broke after one week” is negative. Useful for brand monitoring, customer satisfaction scoring, and feedback triage.

Aspect-Level Sentiment: Identifies opinions about specific features or aspects. “The design is sleek but the battery drains quickly” has mixed sentiment: positive on design, negative on battery. A restaurant review: “Food was delicious (aspect: quality, sentiment: positive), but service was slow (aspect: service, sentiment: negative)”. Actionable for product development.

Emotion Classification: Beyond polarity, classifies discrete emotions. Ekman’s six basic emotions: joy, sadness, anger, disgust, surprise, fear. A tweet “This is the best news ever! 🎉” expresses joy. “I can’t believe they’re shutting down the factory” expresses surprise and sadness. Useful for understanding emotional drivers of customer behaviour.

Business Applications: - Customer Service: Triage incoming messages by severity (angry complaints vs. neutral inquiries). - Product Development: Identify feature-level complaints for prioritisation. - Brand Monitoring: Track sentiment trends across social media. - Investor Relations: Gauge sentiment from earnings call transcripts and shareholder communications. - HR Analytics: Monitor employee sentiment for engagement and retention risks.

33.2 Lexicon-Based Sentiment: VADER and AFINN

A lexicon is a dictionary of words annotated with sentiment polarities. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a rule-based system built for social media; AFINN assigns numeric scores (−5 to +5).

VADER Strengths: Handles negation (“not good” → negative), intensifiers (“very good” → stronger positive), and punctuation (“Good!!!!” → stronger). Returns a compound score (−1 to +1, normalised).

Formula: \[\text{compound} = \frac{\sum \text{sentiment scores}}{\sqrt{\sum \text{sentiment scores}^2 + \alpha}}\]

The denominator normalises by the sum of squared raw scores, producing a bounded output. The constant \(\alpha\) prevents division by zero.

AFINN: Simpler, assigns each word a score. “good” = +2, “bad” = −3. Document sentiment is the sum of word scores.

Limitations: - Lexicons are static and miss new words or slang. “E don spoil” (Nigerian Pidgin: “it broke”) is not in English lexicons. - Context-dependent words (“great” in “that’s great, I’m devastated” is sarcasm) confuse lexicon methods. - Domain-specific language (financial “rally” is positive, but in other contexts may be neutral) isn’t captured.

📘 Theory: Lexicon-Based Sentiment Scoring

Let \(L = \{w_1: s_1, w_2: s_2, \ldots, w_n: s_n\}\) be a lexicon where each word \(w_i\) has a sentiment score \(s_i\). For a document with words \(\{w_1^{\text{doc}}, \ldots, w_m^{\text{doc}}\}\):

\[\text{sentiment} = \sum_{i=1}^{m} s_i \times g(w_i^{\text{doc}})\]

where \(g(w_i^{\text{doc}})\) is a modifier function accounting for negation and intensifiers.

🔑 Key Formula

VADER Compound Score: \[\text{compound} = \frac{\text{normalized\_sum}}{\sqrt{\text{normalized\_sum}^2 + \alpha}}\] where \(\text{normalized\_sum}\) is the sum of adjusted sentiment scores and \(\alpha\) is a normalisation constant (typically 15).

Show code

library(tidyverse)
library(syuzhet)  # Includes NRC lexicon
library(tidytext)

# Synthetic Nigerian social media posts about telecom companies
tweets <- c(
  "MTN's data bundle is so cheap and fast! Best network ever 🙌",
  "e no work at all!!! waste of money, airtel customer service is terrible",
  "Good coverage in Lagos but terrible in rural areas, not worth the cost",
  "Love the new Airtel plan, very affordable and reliable connection 💚",
  "MTN network keeps dropping, not very good experience this month",
  "Amazing speed! Airtel 5G is the future 🚀",
  "The prices are high but network quality is excellent, would recommend",
  "Worst customer service ever, waited 3 hours on hold. Very disappointed!",
  "MTN gives me consistent service, very satisfied with my monthly plan",
  "e sweet well-well! Airtel customer service resolved my issue quickly"
)

tweets_df <- data.frame(
  tweet_id = 1:length(tweets),
  text = tweets,
  company = c("MTN", "Airtel", "Airtel", "Airtel", "MTN",
              "Airtel", "MTN", "Airtel", "MTN", "Airtel")
)

cat("Sentiment Analysis: Nigerian Telecom Social Media\n")
#> Sentiment Analysis: Nigerian Telecom Social Media
cat("Sample tweets:\n\n")
#> Sample tweets:
print(head(tweets_df, 10))
#>    tweet_id
#> 1         1
#> 2         2
#> 3         3
#> 4         4
#> 5         5
#> 6         6
#> 7         7
#> 8         8
#> 9         9
#> 10       10
#>                                                                       text
#> 1              MTN's data bundle is so cheap and fast! Best network ever 🙌
#> 2  e no work at all!!! waste of money, airtel customer service is terrible
#> 3   Good coverage in Lagos but terrible in rural areas, not worth the cost
#> 4      Love the new Airtel plan, very affordable and reliable connection 💚
#> 5          MTN network keeps dropping, not very good experience this month
#> 6                                 Amazing speed! Airtel 5G is the future 🚀
#> 7    The prices are high but network quality is excellent, would recommend
#> 8  Worst customer service ever, waited 3 hours on hold. Very disappointed!
#> 9     MTN gives me consistent service, very satisfied with my monthly plan
#> 10    e sweet well-well! Airtel customer service resolved my issue quickly
#>    company
#> 1      MTN
#> 2   Airtel
#> 3   Airtel
#> 4   Airtel
#> 5      MTN
#> 6   Airtel
#> 7      MTN
#> 8   Airtel
#> 9      MTN
#> 10  Airtel

# VADER sentiment using syuzhet
vader_sentiment <- tweets_df |>
  mutate(
    # Syuzhet provides NRC emotion lexicon; for VADER-like scores, we use sentiment
    sentiment_score = sapply(text, function(t) {
      # Simple sentiment: count positive/negative words
      positive_words <- c("best", "good", "great", "amazing", "love", "excellent",
                          "fast", "cheap", "affordable", "reliable", "sweet")
      negative_words <- c("worst", "terrible", "bad", "waste", "disappointed",
                          "dropping", "high", "slow", "waste")

      pos_count <- sum(tolower(strsplit(t, "\\s+")[[1]]) %in% positive_words)
      neg_count <- sum(tolower(strsplit(t, "\\s+")[[1]]) %in% negative_words)

      (pos_count - neg_count) / (pos_count + neg_count + 1)  # Normalise
    })
  ) |>
  mutate(
    sentiment_label = case_when(
      sentiment_score > 0.2 ~ "Positive",
      sentiment_score < -0.2 ~ "Negative",
      TRUE ~ "Neutral"
    )
  )

cat("\n\nSentiment Scores and Labels (VADER-like):\n")
#> 
#> 
#> Sentiment Scores and Labels (VADER-like):
print(vader_sentiment |>
        select(tweet_id, company, text, sentiment_score, sentiment_label) |>
        mutate(sentiment_score = round(sentiment_score, 2)))
#>    tweet_id company
#> 1         1     MTN
#> 2         2  Airtel
#> 3         3  Airtel
#> 4         4  Airtel
#> 5         5     MTN
#> 6         6  Airtel
#> 7         7     MTN
#> 8         8  Airtel
#> 9         9     MTN
#> 10       10  Airtel
#>                                                                       text
#> 1              MTN's data bundle is so cheap and fast! Best network ever 🙌
#> 2  e no work at all!!! waste of money, airtel customer service is terrible
#> 3   Good coverage in Lagos but terrible in rural areas, not worth the cost
#> 4      Love the new Airtel plan, very affordable and reliable connection 💚
#> 5          MTN network keeps dropping, not very good experience this month
#> 6                                 Amazing speed! Airtel 5G is the future 🚀
#> 7    The prices are high but network quality is excellent, would recommend
#> 8  Worst customer service ever, waited 3 hours on hold. Very disappointed!
#> 9     MTN gives me consistent service, very satisfied with my monthly plan
#> 10    e sweet well-well! Airtel customer service resolved my issue quickly
#>    sentiment_score sentiment_label
#> 1             0.67        Positive
#> 2            -0.67        Negative
#> 3             0.00         Neutral
#> 4             0.75        Positive
#> 5             0.50        Positive
#> 6             0.50        Positive
#> 7            -0.50        Negative
#> 8            -0.50        Negative
#> 9             0.00         Neutral
#> 10            0.50        Positive

# Summary by company
company_sentiment <- vader_sentiment |>
  group_by(company) |>
  summarise(
    mean_sentiment = mean(sentiment_score),
    median_sentiment = median(sentiment_score),
    positive_count = sum(sentiment_label == "Positive"),
    negative_count = sum(sentiment_label == "Negative"),
    neutral_count = sum(sentiment_label == "Neutral"),
    total = n(),
    .groups = "drop"
  ) |>
  mutate(
    positive_pct = positive_count / total * 100,
    negative_pct = negative_count / total * 100,
    neutral_pct = neutral_count / total * 100
  )

cat("\n\nSentiment Summary by Company:\n\n")
#> 
#> 
#> Sentiment Summary by Company:
print(company_sentiment |>
        select(company, mean_sentiment, positive_pct, negative_pct, neutral_pct) |>
        mutate(across(where(is.numeric), round, 2)))
#> # A tibble: 2 × 5
#>   company mean_sentiment positive_pct negative_pct neutral_pct
#>   <chr>            <dbl>        <dbl>        <dbl>       <dbl>
#> 1 Airtel            0.1            50         33.3        16.7
#> 2 MTN               0.17           50         25          25

# Visualisation: sentiment distribution
sentiment_plot <- vader_sentiment |>
  count(company, sentiment_label) |>
  ggplot(aes(x = company, y = n, fill = sentiment_label)) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_fill_manual(values = c("Positive" = "green", "Negative" = "red", "Neutral" = "gray")) +
  labs(title = "Sentiment Distribution: Nigerian Telecom Tweets",
       x = "Company", y = "Tweet Count", fill = "Sentiment") +
  theme_minimal() +
  theme(legend.position = "top")

print(sentiment_plot)

Show code


# Word-sentiment association
vader_sentiment |>
  unnest_tokens(word, text) |>
  inner_join(
    data.frame(
      word = c("best", "good", "great", "amazing", "love", "excellent",
               "fast", "cheap", "affordable", "reliable", "sweet",
               "worst", "terrible", "bad", "waste", "disappointed",
               "dropping", "high", "slow"),
      sentiment = c(rep("Positive", 11), rep("Negative", 8))
    ),
    by = "word"
  ) |>
  count(word, sentiment, company) |>
  group_by(company, sentiment) |>
  top_n(5, n) |>
  ungroup() |>
  ggplot(aes(x = reorder_within(word, n, sentiment), y = n, fill = sentiment)) +
  geom_col(alpha = 0.8) +
  facet_wrap(~sentiment, scales = "free_x") +
  coord_flip() +
  scale_x_reordered() +
  scale_fill_manual(values = c("Positive" = "green", "Negative" = "red")) +
  labs(title = "Most Common Positive and Negative Words",
       x = NULL, y = "Frequency", fill = "Sentiment") +
  theme_minimal() +
  theme(legend.position = "none")

Show code

from nltk.sentiment import SentimentIntensityAnalyzer
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import nltk

try:
    nltk.data.find('sentiment/vader_lexicon')
except LookupError:
    nltk.download('vader_lexicon')
#> True

# Synthetic tweets
tweets = [
    "MTN's data bundle is so cheap and fast! Best network ever 🙌",
    "e no work at all!!! waste of money, airtel customer service is terrible",
    "Good coverage in Lagos but terrible in rural areas, not worth the cost",
    "Love the new Airtel plan, very affordable and reliable connection 💚",
    "MTN network keeps dropping, not very good experience this month",
    "Amazing speed! Airtel 5G is the future 🚀",
    "The prices are high but network quality is excellent, would recommend",
    "Worst customer service ever, waited 3 hours on hold. Very disappointed!",
    "MTN gives me consistent service, very satisfied with my monthly plan",
    "e sweet well-well! Airtel customer service resolved my issue quickly"
]

df_tweets = pd.DataFrame({
    'tweet_id': range(1, len(tweets) + 1),
    'text': tweets,
    'company': ['MTN', 'Airtel', 'Airtel', 'Airtel', 'MTN',
                'Airtel', 'MTN', 'Airtel', 'MTN', 'Airtel']
})

print("Sentiment Analysis: Nigerian Telecom Social Media\n")
#> Sentiment Analysis: Nigerian Telecom Social Media
print(df_tweets.head())
#>    tweet_id                                               text company
#> 0         1  MTN's data bundle is so cheap and fast! Best n...     MTN
#> 1         2  e no work at all!!! waste of money, airtel cus...  Airtel
#> 2         3  Good coverage in Lagos but terrible in rural a...  Airtel
#> 3         4  Love the new Airtel plan, very affordable and ...  Airtel
#> 4         5  MTN network keeps dropping, not very good expe...     MTN

# VADER sentiment analysis
sia = SentimentIntensityAnalyzer()

sentiment_results = []
for idx, row in df_tweets.iterrows():
    scores = sia.polarity_scores(row['text'])
    sentiment_label = 'Positive' if scores['compound'] > 0.05 else \
                     ('Negative' if scores['compound'] < -0.05 else 'Neutral')
    sentiment_results.append({
        'tweet_id': row['tweet_id'],
        'company': row['company'],
        'text': row['text'],
        'compound': round(scores['compound'], 3),
        'negative': round(scores['neg'], 3),
        'neutral': round(scores['neu'], 3),
        'positive': round(scores['pos'], 3),
        'sentiment_label': sentiment_label
    })

df_sentiment = pd.DataFrame(sentiment_results)

print("\n\nSentiment Scores (VADER):")
#> 
#> 
#> Sentiment Scores (VADER):
print(df_sentiment[['tweet_id', 'company', 'compound', 'sentiment_label']].to_string(index=False))
#>  tweet_id company  compound sentiment_label
#>         1     MTN     0.670        Positive
#>         2  Airtel    -0.839        Negative
#>         3  Airtel    -0.637        Negative
#>         4  Airtel     0.637        Positive
#>         5     MTN    -0.387        Negative
#>         6  Airtel     0.624        Positive
#>         7     MTN     0.852        Positive
#>         8  Airtel    -0.831        Negative
#>         9     MTN     0.475        Positive
#>        10  Airtel     0.611        Positive

# Summary by company
company_sentiment = df_sentiment.groupby('company').agg({
    'compound': ['mean', 'median'],
    'sentiment_label': lambda x: (x == 'Positive').sum(),
    'company': 'count'
}).round(3)
company_sentiment.columns = ['mean_sentiment', 'median_sentiment', 'positive_count', 'total']
company_sentiment['positive_pct'] = (company_sentiment['positive_count'] / company_sentiment['total'] * 100).round(1)

print("\n\nSentiment Summary by Company:")
#> 
#> 
#> Sentiment Summary by Company:
print(company_sentiment)
#>          mean_sentiment  median_sentiment  positive_count  total  positive_pct
#> company                                                                       
#> Airtel           -0.072            -0.013               3      6          50.0
#> MTN               0.402             0.572               3      4          75.0

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Sentiment distribution by company
sentiment_counts = df_sentiment.groupby(['company', 'sentiment_label']).size().unstack(fill_value=0)
sentiment_counts.plot(kind='bar', ax=ax1, color=['red', 'gray', 'green'], alpha=0.8)
ax1.set_title('Sentiment Distribution: Nigerian Telecom Tweets')
ax1.set_ylabel('Tweet Count')
ax1.set_xlabel('Company')
ax1.legend(title='Sentiment')
ax1.grid(True, alpha=0.3)

# Compound sentiment score by company
df_sentiment.boxplot(column='compound', by='company', ax=ax2)
ax2.set_title('Compound Sentiment Score Distribution')
ax2.set_ylabel('Compound Score')
ax2.set_xlabel('Company')
ax2.grid(True, alpha=0.3)
plt.suptitle('')

plt.tight_layout()
plt.show()

📝 Section 28.2 Review Questions

Explain how VADER handles negation (e.g., “not good”). How does the score differ from “good”?
What is the difference between VADER and AFINN? When would you use each?
Why does punctuation matter in VADER (“Good!” vs. “Good!!!!”)? How does it influence the score?
If a tweet contains “The food is good but the service is bad”, what overall sentiment should be assigned?
How would you adapt VADER or AFINN for Nigerian Pidgin where many words are not in English lexicons?

33.3 Supervised Sentiment Classification: Building a Classifier

Lexicon methods are fast and interpretable but rigid. A supervised classifier learns from labelled examples (e.g., product reviews with star ratings as labels) and can adapt to domain-specific language.

Approach: 1. Use star ratings (1–5) as proxy labels: 1–2 stars = Negative, 3 stars = Neutral, 4–5 stars = Positive. 2. Preprocess reviews (tokenise, remove stop words, etc.). 3. Compute TF-IDF features. 4. Train logistic regression to predict sentiment from TF-IDF vectors. 5. Evaluate: precision, recall, F1 per class.

Advantages: Adapts to domain (product-specific language), learns word weights. Disadvantage: Requires labelled training data.

📘 Theory: Logistic Regression for Sentiment Classification

Given features \(\mathbf{x}\) (TF-IDF vector), predict sentiment class \(y \in \{\text{Negative}, \text{Neutral}, \text{Positive}\}\) using: \[P(y = k | \mathbf{x}) = \frac{e^{\mathbf{w}_k^T \mathbf{x} + b_k}}{\sum_{j=1}^{3} e^{\mathbf{w}_j^T \mathbf{x} + b_j}}\] (Softmax for multiclass). Weights \(\mathbf{w}_k\) and biases \(b_k\) are learned via maximum likelihood (cross-entropy loss).

🔑 Key Metric: Precision, Recall, F1

For each class: \[\text{Precision} = \frac{TP}{TP + FP}, \quad \text{Recall} = \frac{TP}{TP + FN}, \quad \text{F1} = 2 \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\]

Show code

library(tidyverse)
library(tidytext)
library(caret)
library(tm)
library(SnowballC)

# Synthetic Nigerian e-commerce product reviews (800 train, 200 test)
set.seed(888)

review_templates <- list(
  poor_1 = c(
    "Product arrived damaged, waste of money",
    "Very poor quality, broke after one use",
    "Useless product, not as described",
    "Terrible quality, never buying again",
    "Awful experience, customer service ignored me"
  ),
  mediocre_2 = c(
    "Average product, nothing special",
    "Okay quality but expensive",
    "Works but not as good as expected",
    "Decent but could be better",
    "Fair quality for the price"
  ),
  neutral_3 = c(
    "Product is acceptable",
    "Does what it says but nothing more",
    "Average, neither good nor bad",
    "Okay quality, reasonable price",
    "Satisfactory but no surprises"
  ),
  good_4 = c(
    "Very good product, highly recommended",
    "Excellent quality, great value",
    "Love this product, will buy again",
    "High quality, fast delivery",
    "Really satisfied with my purchase"
  ),
  excellent_5 = c(
    "Outstanding product, 5 stars!",
    "Best purchase ever, absolutely fantastic",
    "Excellent quality and service",
    "Amazing product, exceeded expectations",
    "Perfect! Worth every naira"
  )
)

# Generate 1000 reviews
generate_reviews <- function(n = 1000) {
  star_distribution <- rep(1:5, times = c(100, 150, 200, 250, 300))
  reviews <- character(n)
  for (i in 1:n) {
    stars <- sample(star_distribution, 1)
    templates <- review_templates[[paste0(c("poor_1", "mediocre_2", "neutral_3", "good_4", "excellent_5")[stars])]]
    reviews[i] <- sample(templates, 1)
  }
  return(reviews)
}

reviews <- generate_reviews(1000)
df_reviews <- data.frame(
  review_id = 1:1000,
  text = reviews,
  stars = rep(1:5, times = c(100, 150, 200, 250, 300))
) |>
  mutate(sentiment = case_when(
    stars %in% c(1, 2) ~ "Negative",
    stars == 3 ~ "Neutral",
    stars %in% c(4, 5) ~ "Positive"
  ))

# Split train-test
train_idx <- 1:800
test_idx <- 801:1000

df_train <- df_reviews[train_idx, ]
df_test <- df_reviews[test_idx, ]

cat("Supervised Sentiment Classification\n")
#> Supervised Sentiment Classification
cat("Training set: ", nrow(df_train), " reviews\n")
#> Training set:  800  reviews
cat("Test set: ", nrow(df_test), " reviews\n")
#> Test set:  200  reviews
cat("Class distribution (train):\n")
#> Class distribution (train):
print(table(df_train$sentiment))
#> 
#> Negative  Neutral Positive 
#>      250      200      350

# Preprocessing
preprocess_review <- function(text) {
  text <- tolower(text)
  text <- gsub("[^a-z0-9\\s]", "", text)
  tokens <- unlist(strsplit(text, "\\s+"))
  tokens <- tokens[tokens != ""]

  stop_words <- c("the", "a", "an", "is", "are", "be", "have", "has",
                  "and", "or", "but", "in", "on", "at", "to", "for", "of")
  tokens <- tokens[!(tokens %in% stop_words)]
  tokens <- wordStem(tokens, language = "english")
  return(paste(tokens, collapse = " "))
}

df_train <- df_train |> mutate(text_clean = sapply(text, preprocess_review))
df_test <- df_test |> mutate(text_clean = sapply(text, preprocess_review))

# TF-IDF vectorisation (using training vocab)
library(tm)
train_corpus <- Corpus(VectorSource(df_train$text_clean))
test_corpus <- Corpus(VectorSource(df_test$text_clean))

control_list <- list(tokenize = function(x) unlist(strsplit(x, "\\s+")),
                     removePunctuation = FALSE,
                     stopwords = FALSE,
                     stemming = FALSE)

dtm_train <- DocumentTermMatrix(train_corpus,
                                 control = list(
                                   weighting = function(x) weightTfIdf(x),
                                   tokenize = function(x) unlist(strsplit(x, "\\s+"))
                                 ))
dtm_test <- DocumentTermMatrix(test_corpus,
                                control = list(
                                  weighting = function(x) weightTfIdf(x),
                                  tokenize = function(x) unlist(strsplit(x, "\\s+"))
                                ))

# Convert to data frames
df_tfidf_train <- as.data.frame(as.matrix(dtm_train))
df_tfidf_train$sentiment <- df_train$sentiment

# Align test DTM to train vocabulary (dense approach avoids slam subscript limits)
train_terms <- colnames(dtm_train)
test_mat <- as.matrix(dtm_test)
aligned_test <- matrix(0, nrow = nrow(test_mat), ncol = length(train_terms),
                       dimnames = list(rownames(test_mat), train_terms))
common_terms <- intersect(train_terms, colnames(test_mat))
if (length(common_terms) > 0) aligned_test[, common_terms] <- test_mat[, common_terms]
df_tfidf_test <- as.data.frame(aligned_test)
df_tfidf_test$sentiment <- df_test$sentiment

# Train multinomial logistic regression
library(nnet)
fit_multinom <- multinom(sentiment ~ ., data = df_tfidf_train, trace = FALSE)

# Predict on test set
pred_test <- predict(fit_multinom, newdata = df_tfidf_test, type = "class")

# Evaluate — ensure both factors share the same levels
sentiment_levels <- c("Negative", "Neutral", "Positive")
confusion <- confusionMatrix(
  factor(pred_test,                  levels = sentiment_levels),
  factor(df_tfidf_test$sentiment,    levels = sentiment_levels)
)

cat("\n\nModel Performance:\n")
#> 
#> 
#> Model Performance:
print(confusion)
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction Negative Neutral Positive
#>   Negative        0       0       33
#>   Neutral         0       0       18
#>   Positive        0       0      149
#> 
#> Overall Statistics
#>                                           
#>                Accuracy : 0.745           
#>                  95% CI : (0.6787, 0.8039)
#>     No Information Rate : 1               
#>     P-Value [Acc > NIR] : 1               
#>                                           
#>                   Kappa : 0               
#>                                           
#>  Mcnemar's Test P-Value : NA              
#> 
#> Statistics by Class:
#> 
#>                      Class: Negative Class: Neutral Class: Positive
#> Sensitivity                       NA             NA           0.745
#> Specificity                    0.835           0.91              NA
#> Pos Pred Value                    NA             NA              NA
#> Neg Pred Value                    NA             NA              NA
#> Prevalence                     0.000           0.00           1.000
#> Detection Rate                 0.000           0.00           0.745
#> Detection Prevalence           0.165           0.09           0.745
#> Balanced Accuracy                 NA             NA              NA

# Extract precision, recall, F1
metrics <- confusion$byClass
cat("\n\nPer-Class Metrics:\n")
#> 
#> 
#> Per-Class Metrics:
print(round(metrics, 3))
#>                 Sensitivity Specificity Pos Pred Value Neg Pred Value Precision
#> Class: Negative          NA       0.835             NA             NA         0
#> Class: Neutral           NA       0.910             NA             NA         0
#> Class: Positive       0.745          NA             NA             NA         1
#>                 Recall    F1 Prevalence Detection Rate Detection Prevalence
#> Class: Negative     NA    NA          0          0.000                0.165
#> Class: Neutral      NA    NA          0          0.000                0.090
#> Class: Positive  0.745 0.854          1          0.745                0.745
#>                 Balanced Accuracy
#> Class: Negative                NA
#> Class: Neutral                 NA
#> Class: Positive                NA

# Feature importance via coefficients
# nnet::multinom stores coefficients in a matrix: rows = classes (K-1), cols = features
cat("\n\nTop 10 Terms Most Associated with Each Sentiment:\n\n")
#> 
#> 
#> Top 10 Terms Most Associated with Each Sentiment:
coef_mat <- coef(fit_multinom)  # rows = classes, cols = features
for (cls in rownames(coef_mat)) {
  row_coefs <- coef_mat[cls, ]
  top_idx   <- order(abs(row_coefs), decreasing = TRUE)[1:min(10, length(row_coefs))]
  top_terms <- names(row_coefs)[top_idx]
  cat(sprintf("%s: %s\n", cls, paste(top_terms, collapse = ", ")))
}
#> Neutral: awfulexperiencecustomerserviceignoredm, (Intercept), worksbutnotasgoodasexpect, reallysatisfiedwithmypurchas, satisfactorybutnosurpris, outstandingproduct5star, verygoodproducthighlyrecommend, productarriveddamagedwasteofmoney, uselessproductnotasdescrib, lovethisproductwillbuyagain
#> Positive: doeswhatitsaysbutnothingmor, satisfactorybutnosurpris, (Intercept), awfulexperiencecustomerserviceignoredm, fairqualityforthepric, productisaccept, excellentqualityandservic, bestpurchaseeverabsolutelyfantast, averageneithergoodnorbad, terriblequalityneverbuyingagain

Show code

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix, precision_recall_fscore_support, classification_report
import matplotlib.pyplot as plt

np.random.seed(888)

# Generate synthetic reviews
review_templates = {
    1: ["Product arrived damaged, waste of money",
        "Very poor quality, broke after one use",
        "Useless product, not as described"],
    2: ["Average product, nothing special",
        "Okay quality but expensive",
        "Works but not as good as expected"],
    3: ["Product is acceptable",
        "Does what it says but nothing more",
        "Okay quality, reasonable price"],
    4: ["Very good product, highly recommended",
        "Excellent quality, great value",
        "Love this product, will buy again"],
    5: ["Outstanding product, 5 stars!",
        "Best purchase ever, absolutely fantastic",
        "Excellent quality and service"]
}

reviews = []
stars = []
# Generate equal-ish class sizes so train AND test have all 3 sentiments
star_counts = {1: 100, 2: 150, 3: 200, 4: 250, 5: 300}
for s in [1, 2, 3, 4, 5]:
    for _ in range(star_counts[s]):
        reviews.append(np.random.choice(review_templates[s]))
        stars.append(s)

df_reviews = pd.DataFrame({
    'text': reviews,
    'stars': stars,
    'sentiment': ['Negative' if s <= 2 else ('Neutral' if s == 3 else 'Positive') for s in stars]
}).sample(frac=1, random_state=888).reset_index(drop=True)   # shuffle

# Train-test split (stratified by shuffling)
train_size = 800
df_train = df_reviews.iloc[:train_size]
df_test  = df_reviews.iloc[train_size:]

print(f"Supervised Sentiment Classification")
#> Supervised Sentiment Classification
print(f"Training set: {len(df_train)} reviews")
#> Training set: 800 reviews
print(f"Test set: {len(df_test)} reviews")
#> Test set: 200 reviews
print(f"\nClass distribution (train):")
#> 
#> Class distribution (train):
print(df_train['sentiment'].value_counts())
#> sentiment
#> Positive    439
#> Negative    206
#> Neutral     155
#> Name: count, dtype: int64

# TF-IDF vectorisation
vectorizer = TfidfVectorizer(max_features=500, min_df=1, max_df=0.95, ngram_range=(1, 2))
X_train = vectorizer.fit_transform(df_train['text'])
X_test = vectorizer.transform(df_test['text'])

# Label encoding
le = LabelEncoder()
y_train = le.fit_transform(df_train['sentiment'])
y_test = le.transform(df_test['sentiment'])

# Train logistic regression
clf = LogisticRegression(max_iter=1000, random_state=888)
clf.fit(X_train, y_train)
#> LogisticRegression(max_iter=1000, random_state=888)

# Predict
y_pred = clf.predict(X_test)

# Evaluate
print(f"\n\nModel Performance:")
#> 
#> 
#> Model Performance:
print(f"Accuracy: {(y_pred == y_test).mean():.3f}")
#> Accuracy: 1.000

precision, recall, f1, _ = precision_recall_fscore_support(y_test, y_pred, labels=np.unique(y_train))

print(f"\n\nPer-Class Metrics:")
#> 
#> 
#> Per-Class Metrics:
for i, label in enumerate(le.classes_):
    print(f"{label}: Precision={precision[i]:.3f}, Recall={recall[i]:.3f}, F1={f1[i]:.3f}")
#> Negative: Precision=1.000, Recall=1.000, F1=1.000
#> Neutral: Precision=1.000, Recall=1.000, F1=1.000
#> Positive: Precision=1.000, Recall=1.000, F1=1.000

# Feature importance
feature_names = vectorizer.get_feature_names_out()

print(f"\n\nTop Terms Most Associated with Each Sentiment:")
#> 
#> 
#> Top Terms Most Associated with Each Sentiment:
n_coef_rows = clf.coef_.shape[0]
for i, label in enumerate(le.classes_):
    coef_row   = clf.coef_[min(i, n_coef_rows - 1)]
    k          = min(10, len(feature_names))
    top_idx    = np.argsort(coef_row)[-k:][::-1]
    top_terms  = feature_names[top_idx]
    print(f"{label}: {', '.join(top_terms)}")
#> Negative: as, quality but, expensive, but expensive, product nothing, special, nothing special, average, average product, not
#> Neutral: product is, is, is acceptable, acceptable, quality reasonable, reasonable, price, reasonable price, says but, it says
#> Positive: excellent quality, excellent, product stars, outstanding product, outstanding, stars, and service, and, service, quality and

# Confusion matrix visualization
cm = confusion_matrix(y_test, y_pred)
fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(cm, cmap='Blues')
ax.set_xticks(range(len(le.classes_)))
#> [<matplotlib.axis.XTick object at 0x000001D0DBD77610>, <matplotlib.axis.XTick object at 0x000001D0DBE98E10>, <matplotlib.axis.XTick object at 0x000001D0DBE991D0>]
ax.set_yticks(range(len(le.classes_)))
#> [<matplotlib.axis.YTick object at 0x000001D0DBD779D0>, <matplotlib.axis.YTick object at 0x000001D0DBE99590>, <matplotlib.axis.YTick object at 0x000001D0DBE99950>]
ax.set_xticklabels(le.classes_)
#> [Text(0, 0, 'Negative'), Text(1, 0, 'Neutral'), Text(2, 0, 'Positive')]
ax.set_yticklabels(le.classes_)
#> [Text(0, 0, 'Negative'), Text(0, 1, 'Neutral'), Text(0, 2, 'Positive')]
ax.set_xlabel('Predicted')
#> Text(0.5, 0, 'Predicted')
ax.set_ylabel('Actual')
#> Text(0, 0.5, 'Actual')
ax.set_title('Confusion Matrix: Sentiment Classification')
#> Text(0.5, 1.0, 'Confusion Matrix: Sentiment Classification')

for i in range(len(le.classes_)):
    for j in range(len(le.classes_)):
        text = ax.text(j, i, cm[i, j], ha="center", va="center", color="white")

plt.colorbar(im, ax=ax)
#> <matplotlib.colorbar.Colorbar object at 0x000001D0DBE8EE40>
plt.tight_layout()
plt.show()

📝 Section 28.3 Review Questions

Why use star ratings as proxy labels instead of manual annotation? What are the risks?
In a multiclass sentiment problem (Negative, Neutral, Positive), how does logistic regression assign probabilities?
What does high precision but low recall for the “Negative” class mean? How would you address it?
Why is TF-IDF appropriate for feeding into a logistic regression classifier?
How would you handle class imbalance if 80% of reviews are positive, 15% neutral, 5% negative?

33.4 Aspect-Based Sentiment Analysis: Feature-Level Opinions

“The phone has excellent battery life but the camera is poor” expresses mixed sentiment across aspects. Aspect-Based Sentiment Analysis (ABSA) extracts pairs (aspect, sentiment), enabling granular feedback for product development.

Pipeline: 1. Aspect Extraction: Identify noun phrases referring to product features. Use dependency parsing or NER. 2. Opinion Word Extraction: Find adjectives and verbs expressing opinions. 3. Link Aspects to Opinions: Use dependency relations (e.g., “camera” with nmod of “poor”).

Example: Parse the sentence “The screen is bright but the battery drains quickly.” - Aspect 1: “screen”, Opinion: “bright” → Positive. - Aspect 2: “battery”, Opinion: “drains quickly” → Negative.

📘 Theory: Dependency Parsing for ABSA

Dependency parsing identifies grammatical relationships between words. A dependency graph shows subject-verb, adjective-noun, etc. relations. For “The battery drains quickly”: - “battery” is the nsubj (nominal subject) of “drains”. - “quickly” is an advmod (adverbial modifier) of “drains”. - Opinion “drains quickly” is negatively associated with aspect “battery”.

Show code

library(tidyverse)
# spacyr requires Python+spaCy; this chunk uses simulated data instead

# Note: spacyr requires Python and spacy to be installed.
# For demo purposes, we manually simulate aspect extraction.

# Synthetic Nigerian telecom reviews with aspects
reviews_aspects <- c(
  "The network coverage is excellent in Lagos but terrible in villages",
  "MTN customer service is responsive, but data prices are too high",
  "Airtel's call quality is clear, though the connection drops frequently",
  "The registration process is simple, but the app keeps crashing",
  "Data bundles are cheap and reliable, best value in the market",
  "Network speed is fast, but billing is confusing",
  "E-credit loading is quick, but interface is not user-friendly"
)

# Manual aspect extraction (in production, use spaCy with dependency parsing)
aspect_opinion_pairs <- list(
  c(aspect = "network coverage", opinion = "excellent", sentiment = "positive"),
  c(aspect = "network coverage", opinion = "terrible", sentiment = "negative"),
  c(aspect = "customer service", opinion = "responsive", sentiment = "positive"),
  c(aspect = "data prices", opinion = "high", sentiment = "negative"),
  c(aspect = "call quality", opinion = "clear", sentiment = "positive"),
  c(aspect = "connection", opinion = "drops frequently", sentiment = "negative"),
  c(aspect = "registration process", opinion = "simple", sentiment = "positive"),
  c(aspect = "app", opinion = "crashes", sentiment = "negative"),
  c(aspect = "data bundles", opinion = "cheap", sentiment = "positive"),
  c(aspect = "data bundles", opinion = "reliable", sentiment = "positive"),
  c(aspect = "network speed", opinion = "fast", sentiment = "positive"),
  c(aspect = "billing", opinion = "confusing", sentiment = "negative"),
  c(aspect = "e-credit loading", opinion = "quick", sentiment = "positive"),
  c(aspect = "interface", opinion = "not user-friendly", sentiment = "negative")
)

# Convert to dataframe
df_aspects <- bind_rows(lapply(aspect_opinion_pairs, function(x) {
  tibble(aspect = x[["aspect"]], opinion = x[["opinion"]], sentiment = x[["sentiment"]])
}))

cat("Aspect-Based Sentiment Analysis: Nigerian Telecom Reviews\n\n")
#> Aspect-Based Sentiment Analysis: Nigerian Telecom Reviews
cat("Extracted Aspect-Opinion-Sentiment Triples:\n")
#> Extracted Aspect-Opinion-Sentiment Triples:
print(df_aspects)
#> # A tibble: 14 × 3
#>    aspect               opinion           sentiment
#>    <chr>                <chr>             <chr>    
#>  1 network coverage     excellent         positive 
#>  2 network coverage     terrible          negative 
#>  3 customer service     responsive        positive 
#>  4 data prices          high              negative 
#>  5 call quality         clear             positive 
#>  6 connection           drops frequently  negative 
#>  7 registration process simple            positive 
#>  8 app                  crashes           negative 
#>  9 data bundles         cheap             positive 
#> 10 data bundles         reliable          positive 
#> 11 network speed        fast              positive 
#> 12 billing              confusing         negative 
#> 13 e-credit loading     quick             positive 
#> 14 interface            not user-friendly negative

# Summarise by aspect
aspect_summary <- df_aspects |>
  group_by(aspect) |>
  summarise(
    positive_opinions = sum(sentiment == "positive"),
    negative_opinions = sum(sentiment == "negative"),
    net_sentiment = positive_opinions - negative_opinions,
    .groups = "drop"
  ) |>
  arrange(desc(net_sentiment))

cat("\n\nAspect Sentiment Summary:\n")
#> 
#> 
#> Aspect Sentiment Summary:
print(aspect_summary)
#> # A tibble: 12 × 4
#>    aspect               positive_opinions negative_opinions net_sentiment
#>    <chr>                            <int>             <int>         <int>
#>  1 data bundles                         2                 0             2
#>  2 call quality                         1                 0             1
#>  3 customer service                     1                 0             1
#>  4 e-credit loading                     1                 0             1
#>  5 network speed                        1                 0             1
#>  6 registration process                 1                 0             1
#>  7 network coverage                     1                 1             0
#>  8 app                                  0                 1            -1
#>  9 billing                              0                 1            -1
#> 10 connection                           0                 1            -1
#> 11 data prices                          0                 1            -1
#> 12 interface                            0                 1            -1

# Visualisation
aspect_summary |>
  pivot_longer(cols = c(positive_opinions, negative_opinions),
               names_to = "sentiment_type", values_to = "count") |>
  ggplot(aes(x = reorder(aspect, net_sentiment), y = count, fill = sentiment_type)) +
  geom_col(alpha = 0.8) +
  coord_flip() +
  scale_fill_manual(values = c("positive_opinions" = "green", "negative_opinions" = "red")) +
  labs(title = "Aspect-Level Sentiment: Nigerian Telecom",
       x = "Aspect", y = "Opinion Count", fill = "Sentiment") +
  theme_minimal() +
  theme(legend.position = "top")

Show code


# Sentiment distribution per aspect
cat("\n\nOpinions for Selected Aspects:\n\n")
#> 
#> 
#> Opinions for Selected Aspects:
for (asp in c("network coverage", "customer service", "data prices")) {
  opinions <- df_aspects |>
    filter(aspect == asp) |>
    pull(opinion)

  cat(sprintf("%s: %s\n", asp, paste(opinions, collapse = ", ")))
}
#> network coverage: excellent, terrible
#> customer service: responsive
#> data prices: high

# Recommendations for product team
cat("\n\nProduct Development Recommendations:\n")
#> 
#> 
#> Product Development Recommendations:
cat("1. URGENT: Fix app crashes and billing clarity (strong negative sentiment).\n")
#> 1. URGENT: Fix app crashes and billing clarity (strong negative sentiment).
cat("2. IMPROVE: Expand network coverage to rural areas (weak point).\n")
#> 2. IMPROVE: Expand network coverage to rural areas (weak point).
cat("3. MAINTAIN: Data reliability and bundle pricing (strong positive).\n")
#> 3. MAINTAIN: Data reliability and bundle pricing (strong positive).
cat("4. ENHANCE: Interface usability (specific complaint about user-friendliness).\n")
#> 4. ENHANCE: Interface usability (specific complaint about user-friendliness).

Show code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Synthetic Nigerian telecom reviews (manually annotated aspects)
reviews_aspects = [
    ("The network coverage is excellent in Lagos but terrible in villages",
     [("network coverage", "excellent", "positive"),
      ("network coverage", "terrible", "negative")]),
    ("MTN customer service is responsive, but data prices are too high",
     [("customer service", "responsive", "positive"),
      ("data prices", "high", "negative")]),
    ("Airtel's call quality is clear, though the connection drops frequently",
     [("call quality", "clear", "positive"),
      ("connection", "drops frequently", "negative")]),
    ("The registration process is simple, but the app keeps crashing",
     [("registration process", "simple", "positive"),
      ("app", "crashes", "negative")]),
    ("Data bundles are cheap and reliable, best value in the market",
     [("data bundles", "cheap", "positive"),
      ("data bundles", "reliable", "positive")]),
    ("Network speed is fast, but billing is confusing",
     [("network speed", "fast", "positive"),
      ("billing", "confusing", "negative")]),
    ("E-credit loading is quick, but interface is not user-friendly",
     [("e-credit loading", "quick", "positive"),
      ("interface", "not user-friendly", "negative")])
]

# Extract aspect-opinion pairs
aspect_pairs = []
for review, pairs in reviews_aspects:
    for aspect, opinion, sentiment in pairs:
        aspect_pairs.append({
            'aspect': aspect,
            'opinion': opinion,
            'sentiment': sentiment
        })

df_aspects = pd.DataFrame(aspect_pairs)

print("Aspect-Based Sentiment Analysis: Nigerian Telecom Reviews\n")
#> Aspect-Based Sentiment Analysis: Nigerian Telecom Reviews
print("Extracted Aspect-Opinion-Sentiment Triples:")
#> Extracted Aspect-Opinion-Sentiment Triples:
print(df_aspects)
#>                   aspect            opinion sentiment
#> 0       network coverage          excellent  positive
#> 1       network coverage           terrible  negative
#> 2       customer service         responsive  positive
#> 3            data prices               high  negative
#> 4           call quality              clear  positive
#> 5             connection   drops frequently  negative
#> 6   registration process             simple  positive
#> 7                    app            crashes  negative
#> 8           data bundles              cheap  positive
#> 9           data bundles           reliable  positive
#> 10         network speed               fast  positive
#> 11               billing          confusing  negative
#> 12      e-credit loading              quick  positive
#> 13             interface  not user-friendly  negative

# Summarise by aspect
aspect_summary = df_aspects.groupby('aspect').agg({
    'sentiment': lambda x: (x == 'positive').sum() - (x == 'negative').sum()
}).rename(columns={'sentiment': 'net_sentiment'})
aspect_summary['positive_count'] = df_aspects.groupby('aspect')['sentiment'].apply(lambda x: (x == 'positive').sum())
aspect_summary['negative_count'] = df_aspects.groupby('aspect')['sentiment'].apply(lambda x: (x == 'negative').sum())
aspect_summary = aspect_summary.sort_values('net_sentiment', ascending=False)

print("\n\nAspect Sentiment Summary:")
#> 
#> 
#> Aspect Sentiment Summary:
print(aspect_summary)
#>                       net_sentiment  positive_count  negative_count
#> aspect                                                             
#> data bundles                      2               2               0
#> call quality                      1               1               0
#> customer service                  1               1               0
#> e-credit loading                  1               1               0
#> network speed                     1               1               0
#> registration process              1               1               0
#> network coverage                  0               1               1
#> app                              -1               0               1
#> connection                       -1               0               1
#> billing                          -1               0               1
#> data prices                      -1               0               1
#> interface                        -1               0               1

# Visualization
fig, ax = plt.subplots(figsize=(10, 6))
aspects = aspect_summary.index
x = np.arange(len(aspects))
width = 0.35

ax.barh(x - width/2, aspect_summary['positive_count'], width,
        label='Positive', color='green', alpha=0.8)
#> <BarContainer object of 12 artists>
ax.barh(x + width/2, -aspect_summary['negative_count'], width,
        label='Negative', color='red', alpha=0.8)
#> <BarContainer object of 12 artists>
ax.set_yticks(x)
#> [<matplotlib.axis.YTick object at 0x000001D0E0D84190>, <matplotlib.axis.YTick object at 0x000001D0E0D87110>, <matplotlib.axis.YTick object at 0x000001D0E0D874D0>, <matplotlib.axis.YTick object at 0x000001D0E0D87890>, <matplotlib.axis.YTick object at 0x000001D0E0D87C50>, <matplotlib.axis.YTick object at 0x000001D0E0DF0050>, <matplotlib.axis.YTick object at 0x000001D0E0DF0410>, <matplotlib.axis.YTick object at 0x000001D0E0DF07D0>, <matplotlib.axis.YTick object at 0x000001D0E0DF0B90>, <matplotlib.axis.YTick object at 0x000001D0E0DF0F50>, <matplotlib.axis.YTick object at 0x000001D0E0DF1310>, <matplotlib.axis.YTick object at 0x000001D0E0DF16D0>]
ax.set_yticklabels(aspects)
#> [Text(0, 0, 'data bundles'), Text(0, 1, 'call quality'), Text(0, 2, 'customer service'), Text(0, 3, 'e-credit loading'), Text(0, 4, 'network speed'), Text(0, 5, 'registration process'), Text(0, 6, 'network coverage'), Text(0, 7, 'app'), Text(0, 8, 'connection'), Text(0, 9, 'billing'), Text(0, 10, 'data prices'), Text(0, 11, 'interface')]
ax.set_xlabel('Opinion Count')
#> Text(0.5, 0, 'Opinion Count')
ax.set_title('Aspect-Level Sentiment: Nigerian Telecom')
#> Text(0.5, 1.0, 'Aspect-Level Sentiment: Nigerian Telecom')
ax.legend()
#> <matplotlib.legend.Legend object at 0x000001D0E0E08980>
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

Show code


# Opinions per aspect
print("\n\nOpinions for Selected Aspects:")
#> 
#> 
#> Opinions for Selected Aspects:
for asp in ['network coverage', 'customer service', 'data prices']:
    opinions = df_aspects[df_aspects['aspect'] == asp]['opinion'].tolist()
    print(f"{asp}: {', '.join(opinions)}")
#> network coverage: excellent, terrible
#> customer service: responsive
#> data prices: high

📝 Section 28.4 Review Questions

Why is aspect-based sentiment more actionable than document-level sentiment?
Name three challenges in extracting aspects from unstructured text.
In “The phone’s design is sleek, but it’s heavy”, identify the aspects and sentiments.
How would you handle implicit aspects? (e.g., “This store is amazing!” implies positive sentiment on multiple aspects: service, cleanliness, etc.)
How would dependency parsing help identify aspect-opinion relations?

33.5 Pre-Trained Transformer Models: Production-Grade Sentiment

State-of-the-art performance comes from fine-tuned transformer models. HuggingFace provides pre-trained models (DistilBERT, RoBERTa, ALBERT) for sentiment classification. They outperform lexicon and traditional supervised methods but require no additional training for many tasks.

Benefits: Single line of code, strong generalisation, handles complex linguistic phenomena (sarcasm, negation, code-switching).

Trade-off: Black-box predictions; less interpretable than lexicon or linear models.

📘 Theory: Fine-Tuned Transformers for Sentiment

A pre-trained BERT model has learned contextual representations from 3.3 billion words of English text. To adapt it for sentiment classification, we add a classification head (linear layer + softmax) and fine-tune on labelled sentiment data. Transfer learning reduces data requirements: good results with 500–1,000 labelled examples.

Show code

# Note: Running transformers in R via reticulate or external Python calls.
# Here we simulate results from HuggingFace transformers.

library(tidyverse)

# Simulated transformer predictions (in production, use transformers library in Python)
tweets_test <- c(
  "This product is absolutely amazing, best purchase ever!",
  "Terrible quality, completely disappointed with my purchase",
  "The product is okay, nothing special but does the job",
  "I love this company's customer service, very helpful and quick",
  "Worst experience ever, would give negative stars if possible",
  "Good product for the price, would recommend to friends",
  "The item arrived damaged, very frustrating",
  "Excellent quality, exceeded all my expectations"
)

# Simulated HuggingFace transformer outputs
transformer_results <- data.frame(
  text = tweets_test,
  sentiment = c("POSITIVE", "NEGATIVE", "NEUTRAL", "POSITIVE", "NEGATIVE",
                "POSITIVE", "NEGATIVE", "POSITIVE"),
  confidence = c(0.98, 0.95, 0.72, 0.96, 0.97, 0.88, 0.91, 0.99)
)

cat("Pre-Trained Transformer Sentiment Analysis\n")
#> Pre-Trained Transformer Sentiment Analysis
cat("Model: distilbert-base-uncased-finetuned-sst-2-english\n\n")
#> Model: distilbert-base-uncased-finetuned-sst-2-english

print(transformer_results)
#>                                                             text sentiment
#> 1        This product is absolutely amazing, best purchase ever!  POSITIVE
#> 2     Terrible quality, completely disappointed with my purchase  NEGATIVE
#> 3          The product is okay, nothing special but does the job   NEUTRAL
#> 4 I love this company's customer service, very helpful and quick  POSITIVE
#> 5   Worst experience ever, would give negative stars if possible  NEGATIVE
#> 6         Good product for the price, would recommend to friends  POSITIVE
#> 7                     The item arrived damaged, very frustrating  NEGATIVE
#> 8                Excellent quality, exceeded all my expectations  POSITIVE
#>   confidence
#> 1       0.98
#> 2       0.95
#> 3       0.72
#> 4       0.96
#> 5       0.97
#> 6       0.88
#> 7       0.91
#> 8       0.99

# Sentiment distribution
sentiment_dist <- transformer_results |>
  group_by(sentiment) |>
  summarise(count = n(), avg_confidence = mean(confidence), .groups = "drop")

cat("\n\nSentiment Distribution:\n")
#> 
#> 
#> Sentiment Distribution:
print(sentiment_dist)
#> # A tibble: 3 × 3
#>   sentiment count avg_confidence
#>   <chr>     <int>          <dbl>
#> 1 NEGATIVE      3          0.943
#> 2 NEUTRAL       1          0.72 
#> 3 POSITIVE      4          0.952

# Confidence analysis
cat("\n\nConfidence Analysis:\n")
#> 
#> 
#> Confidence Analysis:
cat(sprintf("Mean confidence: %.3f\n", mean(transformer_results$confidence)))
#> Mean confidence: 0.920
cat(sprintf("Min confidence: %.3f\n", min(transformer_results$confidence)))
#> Min confidence: 0.720
cat(sprintf("Max confidence: %.3f\n", max(transformer_results$confidence)))
#> Max confidence: 0.990
cat(sprintf("Predictions with <80%% confidence: %d\n",
            sum(transformer_results$confidence < 0.8)))
#> Predictions with <80% confidence: 1

# Visualisation
ggplot(transformer_results, aes(x = reorder(text, confidence), y = confidence,
                               fill = sentiment)) +
  geom_col(alpha = 0.8) +
  geom_hline(yintercept = 0.8, linetype = "dashed", color = "red",
             linewidth = 0.5, label = "Threshold") +
  coord_flip() +
  scale_fill_manual(values = c("POSITIVE" = "green", "NEGATIVE" = "red",
                               "NEUTRAL" = "gray")) +
  labs(title = "Pre-Trained Transformer Sentiment with Confidence Scores",
       x = NULL, y = "Confidence Score", fill = "Sentiment") +
  theme_minimal() +
  theme(axis.text.x = element_text(size = 9),
        legend.position = "top")

Show code

# The 'transformers' package requires a large download and GPU for best performance.
# This cell demonstrates the API pattern; install with:
#   pip install transformers[torch]   (requires pytorch)
# then uncomment the code below.
#
# from transformers import pipeline
# sentiment_pipeline = pipeline(
#     "sentiment-analysis",
#     model="distilbert-base-uncased-finetuned-sst-2-english"
# )
# results = sentiment_pipeline(tweets_test)
#
# We simulate realistic transformer outputs below to illustrate the output format.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Simulated transformer output (as would be returned by HuggingFace pipeline)
tweets_test = [
    "This product is absolutely amazing, best purchase ever!",
    "Terrible quality, completely disappointed with my purchase",
    "The product is okay, nothing special but does the job",
    "I love this company's customer service, very helpful and quick",
    "Worst experience ever, would give negative stars if possible",
    "Good product for the price, would recommend to friends",
    "The item arrived damaged, very frustrating",
    "Excellent quality, exceeded all my expectations"
]

simulated_results = [
    {'label': 'POSITIVE', 'score': 0.9997},
    {'label': 'NEGATIVE', 'score': 0.9991},
    {'label': 'POSITIVE', 'score': 0.7843},   # borderline
    {'label': 'POSITIVE', 'score': 0.9984},
    {'label': 'NEGATIVE', 'score': 0.9996},
    {'label': 'POSITIVE', 'score': 0.9876},
    {'label': 'NEGATIVE', 'score': 0.9962},
    {'label': 'POSITIVE', 'score': 0.9993},
]

print("Pre-Trained Transformer Sentiment Analysis")
#> Pre-Trained Transformer Sentiment Analysis
print("Model: distilbert-base-uncased-finetuned-sst-2-english")
#> Model: distilbert-base-uncased-finetuned-sst-2-english
print("(Output simulated — install transformers package to run live)\n")
#> (Output simulated — install transformers package to run live)

df_results = pd.DataFrame({
    'text':       tweets_test,
    'sentiment':  [r['label']  for r in simulated_results],
    'confidence': [r['score']  for r in simulated_results]
})

print(df_results.to_string(index=False))
#>                                                           text sentiment  confidence
#>        This product is absolutely amazing, best purchase ever!  POSITIVE      0.9997
#>     Terrible quality, completely disappointed with my purchase  NEGATIVE      0.9991
#>          The product is okay, nothing special but does the job  POSITIVE      0.7843
#> I love this company's customer service, very helpful and quick  POSITIVE      0.9984
#>   Worst experience ever, would give negative stars if possible  NEGATIVE      0.9996
#>         Good product for the price, would recommend to friends  POSITIVE      0.9876
#>                     The item arrived damaged, very frustrating  NEGATIVE      0.9962
#>                Excellent quality, exceeded all my expectations  POSITIVE      0.9993
print(f"\nMean confidence: {df_results['confidence'].mean():.3f}")
#> 
#> Mean confidence: 0.971
print(f"Low-confidence predictions (<0.80): {(df_results['confidence'] < 0.80).sum()}")
#> Low-confidence predictions (<0.80): 1

# Visualisation
fig, ax = plt.subplots(figsize=(10, 5))
colors = ['#2ca02c' if s == 'POSITIVE' else '#d62728' for s in df_results['sentiment']]
short_text = [t[:55] + '…' if len(t) > 55 else t for t in df_results['text']]
ax.barh(short_text, df_results['confidence'], color=colors, alpha=0.8)
#> <BarContainer object of 8 artists>
ax.axvline(0.80, color='black', linestyle='--', linewidth=1.2, label='80% confidence threshold')
#> <matplotlib.lines.Line2D object at 0x000001D0E0E0BE00>
ax.set_xlabel('Confidence Score', fontsize=11)
#> Text(0.5, 0, 'Confidence Score')
ax.set_title('Transformer Sentiment: Score Distribution', fontweight='bold')
#> Text(0.5, 1.0, 'Transformer Sentiment: Score Distribution')
ax.set_xlim(0, 1)
#> (0.0, 1.0)
ax.legend()
#> <matplotlib.legend.Legend object at 0x000001D0E0E0BCB0>
ax.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

📝 Section 28.5 Review Questions

Why do transformers outperform traditional supervised methods on sentiment classification?
What is the difference between a base model and a fine-tuned model? Which should you use for sentiment?
If a transformer assigns 0.65 confidence to a prediction, should you trust it? How would you use low-confidence predictions?
How would you adapt a transformer trained on English to work on Nigerian Pidgin?
What is the trade-off between using a large model (e.g., RoBERTa-large) vs. a small model (e.g., DistilBERT)?

33.6 Nigerian Pidgin and Multilingual Challenges

Standard NLP tools fail on Nigerian Pidgin: “e don spoil” (it broke), “no be small thing” (very significant), “e sweet well-well” (it’s very good). BERT and RoBERTa models trained on English don’t recognise Pidgin vocabulary.

Solutions: 1. XLM-RoBERTa: A multilingual transformer trained on 100+ languages. Handles code-switching (mixing English and Pidgin in one sentence). 2. Pidgin-Specific Models: Limited availability but growing. Nigerian researchers have released Pidgin NLP resources. 3. Custom Preprocessing: Expand Pidgin slang before feeding to English models (“e” → “it/he”, “don” → “have”).

Resources: - Nigerian-Pidgin-English Dictionary (community-curated). - Works by researchers at universities in Nigeria and diaspora. - Cautionary note: Automated translation risks losing nuance; manual validation recommended for business-critical tasks.

📘 Theory: Code-Switching Detection

Code-switching (mixing languages) is detected via: 1. Lexicon matching: Flag words not in English dictionary. 2. Language model probabilities: P(word | English) vs. P(word | Pidgin). 3. Contextual markers: Pidgin has distinct structure (SVO word order, tone marking, copula omission).

XLM-RoBERTa’s multilingual embeddings capture these patterns implicitly.

Show code

library(tidyverse)

# Nigerian Pidgin examples with English translations
pidgin_examples <- data.frame(
  text = c(
    "e don spoil joor",
    "no be small thing",
    "e sweet well-well",
    "abeg make e work",
    "the product fine fine, no vex",
    "e no dey work like before",
    "supply chain don scatter scatter",
    "money done finish for this month"
  ),
  english = c(
    "it broke badly",
    "that is significant/serious",
    "it's very good",
    "please make it work",
    "the product is nice, don't be upset",
    "it's not working as it used to",
    "the supply chain is in disarray",
    "money has run out this month"
  ),
  sentiment = c(
    "negative", "negative", "positive", "neutral",
    "positive", "negative", "negative", "negative"
  )
)

cat("Nigerian Pidgin Sentiment Challenges\n\n")
#> Nigerian Pidgin Sentiment Challenges
print(pidgin_examples)
#>                               text                             english
#> 1                 e don spoil joor                      it broke badly
#> 2                no be small thing         that is significant/serious
#> 3                e sweet well-well                      it's very good
#> 4                 abeg make e work                 please make it work
#> 5    the product fine fine, no vex the product is nice, don't be upset
#> 6        e no dey work like before      it's not working as it used to
#> 7 supply chain don scatter scatter     the supply chain is in disarray
#> 8 money done finish for this month        money has run out this month
#>   sentiment
#> 1  negative
#> 2  negative
#> 3  positive
#> 4   neutral
#> 5  positive
#> 6  negative
#> 7  negative
#> 8  negative

# Demonstrate code-switching
code_switched <- c(
  "The network is fast but e no dey last long",
  "I love this product, e don change my life",
  "Customer service wey no fine at all",
  "The design is sleek sha, but battery don die"
)

cat("\n\nCode-Switching Examples (English + Pidgin):\n\n")
#> 
#> 
#> Code-Switching Examples (English + Pidgin):
for (i in seq_along(code_switched)) {
  cat(sprintf("%d. %s\n", i, code_switched[i]))
}
#> 1. The network is fast but e no dey last long
#> 2. I love this product, e don change my life
#> 3. Customer service wey no fine at all
#> 4. The design is sleek sha, but battery don die

# Challenges table
challenges <- data.frame(
  challenge = c(
    "Pidgin vocabulary",
    "Grammar variations",
    "Tone marking",
    "Regional dialects",
    "Contraction patterns",
    "Sentiment reversal"
  ),
  example = c(
    "'e' means 'it/he/she', not in English dicts",
    "Pidgin lacks verb conjugation (tense via auxiliaries)",
    "Tone changes meaning; hard to represent in text",
    "Lagos Pidgin differs from Calabar Pidgin",
    "'don' = 'have' (past), 'go' = 'will' (future)",
    "'sweet' can mean 'good' but context matters"
  ),
  impact = c(
    "Words flagged as misspellings",
    "Grammar models fail",
    "Meaning lost in translation",
    "Regional models needed",
    "Tokenisation errors",
    "Sentiment reversed if literal"
  )
)

cat("\n\nKey Challenges for NLP in Nigerian Pidgin:\n\n")
#> 
#> 
#> Key Challenges for NLP in Nigerian Pidgin:
print(challenges)
#>              challenge                                               example
#> 1    Pidgin vocabulary           'e' means 'it/he/she', not in English dicts
#> 2   Grammar variations Pidgin lacks verb conjugation (tense via auxiliaries)
#> 3         Tone marking       Tone changes meaning; hard to represent in text
#> 4    Regional dialects              Lagos Pidgin differs from Calabar Pidgin
#> 5 Contraction patterns         'don' = 'have' (past), 'go' = 'will' (future)
#> 6   Sentiment reversal           'sweet' can mean 'good' but context matters
#>                          impact
#> 1 Words flagged as misspellings
#> 2           Grammar models fail
#> 3   Meaning lost in translation
#> 4        Regional models needed
#> 5           Tokenisation errors
#> 6 Sentiment reversed if literal

# Recommendation matrix
recommendations <- data.frame(
  scenario = c(
    "Pure English text",
    "Pidgin-dominant text",
    "Code-switched text",
    "Regional Pidgin dialect"
  ),
  recommended_model = c(
    "English BERT/RoBERTa",
    "XLM-RoBERTa or custom Pidgin model",
    "XLM-RoBERTa (handles mixed languages)",
    "Fine-tuned regional model or ensemble"
  ),
  additional_steps = c(
    "Standard preprocessing",
    "Pidgin lexicon expansion + XLM-RoBERTa",
    "Language ID per token, then model selection",
    "Collect regional examples for fine-tuning"
  )
)

cat("\n\nModel Recommendation Matrix:\n\n")
#> 
#> 
#> Model Recommendation Matrix:
print(recommendations)
#>                  scenario                     recommended_model
#> 1       Pure English text                  English BERT/RoBERTa
#> 2    Pidgin-dominant text    XLM-RoBERTa or custom Pidgin model
#> 3      Code-switched text XLM-RoBERTa (handles mixed languages)
#> 4 Regional Pidgin dialect Fine-tuned regional model or ensemble
#>                              additional_steps
#> 1                      Standard preprocessing
#> 2      Pidgin lexicon expansion + XLM-RoBERTa
#> 3 Language ID per token, then model selection
#> 4   Collect regional examples for fine-tuning

# Practical example: sentiment on Pidgin
cat("\n\nPractical Example: Sentiment on Pidgin Text\n\n")
#> 
#> 
#> Practical Example: Sentiment on Pidgin Text

pidgin_tweet <- "MTN network e sweet well-well joor! Worth the money no vex"
cat(sprintf("Pidgin tweet: '%s'\n\n", pidgin_tweet))
#> Pidgin tweet: 'MTN network e sweet well-well joor! Worth the money no vex'

cat("Challenge with English model (e.g., VADER):\n")
#> Challenge with English model (e.g., VADER):
cat("- 'sweet' interpreted as literal (food taste) rather than slang (good)\n")
#> - 'sweet' interpreted as literal (food taste) rather than slang (good)
cat("- 'no vex' not in English dictionaries\n")
#> - 'no vex' not in English dictionaries
cat("- Likely to assign neutral or negative sentiment incorrectly\n\n")
#> - Likely to assign neutral or negative sentiment incorrectly

cat("Solution with XLM-RoBERTa:\n")
#> Solution with XLM-RoBERTa:
cat("- Trained on multilingual data including Pidgin-adjacent patterns\n")
#> - Trained on multilingual data including Pidgin-adjacent patterns
cat("- Understands 'sweet' in context of praise\n")
#> - Understands 'sweet' in context of praise
cat("- Correctly identifies positive sentiment\n")
#> - Correctly identifies positive sentiment
cat("- Confidence may be lower due to Pidgin being low-resource\n")
#> - Confidence may be lower due to Pidgin being low-resource

Show code

# Transformer pipeline requires the 'transformers' package (pip install transformers[torch]).
# This cell demonstrates the concepts without live inference.

import pandas as pd

# Nigerian Pidgin examples with human-assigned sentiment
pidgin_examples = pd.DataFrame({
    'text': [
        "e don spoil joor",
        "no be small thing",
        "e sweet well-well",
        "abeg make e work",
        "the product fine fine, no vex",
        "e no dey work like before",
        "supply chain don scatter scatter",
        "money done finish for this month"
    ],
    'english_gloss': [
        "it broke badly",
        "that is significant/serious",
        "it's very good",
        "please make it work",
        "the product is nice, don't be upset",
        "it's not working as it used to",
        "the supply chain is in disarray",
        "money has run out this month"
    ],
    'sentiment': [
        "negative", "negative", "positive", "neutral",
        "positive", "negative", "negative", "negative"
    ],
    # Simulated DistilBERT (English-only) scores — miscalibrated for Pidgin
    'distilbert_label':     ["NEGATIVE","POSITIVE","NEGATIVE","POSITIVE",
                             "POSITIVE","NEGATIVE","NEGATIVE","NEGATIVE"],
    'distilbert_score':     [0.94, 0.61, 0.53, 0.72, 0.81, 0.89, 0.76, 0.88],
    # Simulated XLM-RoBERTa (multilingual) — better on code-switched text
    'xlmr_label':           ["NEGATIVE","NEGATIVE","POSITIVE","NEUTRAL",
                             "POSITIVE","NEGATIVE","NEGATIVE","NEGATIVE"],
    'xlmr_score':           [0.91, 0.78, 0.87, 0.55, 0.83, 0.92, 0.85, 0.90],
})

print("Nigerian Pidgin Sentiment: Model Comparison\n")
#> Nigerian Pidgin Sentiment: Model Comparison
print(pidgin_examples[['text','sentiment','distilbert_label','xlmr_label']].to_string(index=False))
#>                             text sentiment distilbert_label xlmr_label
#>                 e don spoil joor  negative         NEGATIVE   NEGATIVE
#>                no be small thing  negative         POSITIVE   NEGATIVE
#>                e sweet well-well  positive         NEGATIVE   POSITIVE
#>                 abeg make e work   neutral         POSITIVE    NEUTRAL
#>    the product fine fine, no vex  positive         POSITIVE   POSITIVE
#>        e no dey work like before  negative         NEGATIVE   NEGATIVE
#> supply chain don scatter scatter  negative         NEGATIVE   NEGATIVE
#> money done finish for this month  negative         NEGATIVE   NEGATIVE

# Accuracy comparison
correct_distil = (pidgin_examples['distilbert_label'].str.lower() ==
                  pidgin_examples['sentiment'].str.upper().str.replace('NEUTRAL','POSITIVE')).mean()
correct_xlmr   = (pidgin_examples['xlmr_label'].str.lower() ==
                  pidgin_examples['sentiment']).mean()

print(f"\nDistilBERT (English-only) accuracy on Pidgin: {correct_distil:.0%}")
#> 
#> DistilBERT (English-only) accuracy on Pidgin: 0%
print(f"XLM-RoBERTa (multilingual) accuracy on Pidgin: {correct_xlmr:.0%}")
#> XLM-RoBERTa (multilingual) accuracy on Pidgin: 100%
print("\nRecommendation: XLM-RoBERTa for Pidgin/code-switched text.")
#> 
#> Recommendation: XLM-RoBERTa for Pidgin/code-switched text.
print("For production: fine-tune on labelled Nigerian Pidgin data (NaijaSenti corpus).")
#> For production: fine-tune on labelled Nigerian Pidgin data (NaijaSenti corpus).

# Key challenges table
challenges = pd.DataFrame({
    'Challenge': ["Vocabulary", "Grammar", "Code-switching", "Dialects", "Sentiment reversal"],
    'Example':   ["'e' = it/he/she; 'joor' = emphasis particle",
                  "No verb conjugation; tense via 'don'/'go'",
                  "Mix of English + Pidgin in one utterance",
                  "Lagos Pidgin ≠ Calabar Pidgin",
                  "'sweet' = good; 'e no fine' = it's bad"]
})
print("\n\nKey NLP Challenges for Nigerian Pidgin:")
#> 
#> 
#> Key NLP Challenges for Nigerian Pidgin:
print(challenges.to_string(index=False))
#>          Challenge                                     Example
#>         Vocabulary 'e' = it/he/she; 'joor' = emphasis particle
#>            Grammar   No verb conjugation; tense via 'don'/'go'
#>     Code-switching    Mix of English + Pidgin in one utterance
#>           Dialects               Lagos Pidgin ≠ Calabar Pidgin
#> Sentiment reversal      'sweet' = good; 'e no fine' = it's bad

📝 Section 28.6 Review Questions

Why do standard English NLP models fail on Nigerian Pidgin?
What is code-switching, and why is it important to handle it?
How would XLM-RoBERTa perform compared to English BERT on Pidgin text?
If you have 1,000 Pidgin reviews but no labelled training data, what approach would you use?
Name three domain-specific Pidgin words in your industry and suggest how to handle them in sentiment analysis.

33.7 Case Study: Brand Sentiment Dashboard for Nigerian Telecoms

A telecommunications company monitors weekly sentiment across Twitter/X for two competitors (MTN and Airtel) using 500 synthetic tweets per company. We build a dashboard showing: - Weekly compound sentiment trend (VADER). - Aspect-level breakdown (network, customer service, pricing). - Side-by-side competitor comparison. - Interactive charts (HTML/static PDF).

Show code

library(tidyverse)
library(lubridate)

# Synthetic 500 tweets (250 per company) over 12 weeks
set.seed(999)

generate_telecom_tweets <- function(n = 500, company = "MTN") {
  sentiments_positive <- c(
    "love this network, very fast",
    "excellent customer service",
    "great value for money",
    "network is stable, very reliable",
    "best data plan in the market",
    "e sweet well-well",
    "service na better quality",
    "would recommend to everyone"
  )

  sentiments_negative <- c(
    "network keeps dropping",
    "terrible customer service",
    "prices too high",
    "connection is unstable",
    "waste of money",
    "e don spoil completely",
    "service no good",
    "very disappointed with quality"
  )

  sentiments_neutral <- c(
    "using their service",
    "works as expected",
    "okay for the price",
    "average quality",
    "network is acceptable"
  )

  tweets <- c(
    rep(sample(sentiments_positive, 200, replace = TRUE), 1),
    rep(sample(sentiments_negative, 150, replace = TRUE), 1),
    rep(sample(sentiments_neutral, 150, replace = TRUE), 1)
  )

  return(sample(tweets, n))
}

tweets_mtn <- generate_telecom_tweets(250, "MTN")
tweets_airtel <- generate_telecom_tweets(250, "Airtel")

# Create dataframe with dates — match exactly 500 tweets
n_tweets <- length(c(tweets_mtn, tweets_airtel))  # 500
weeks_seq <- seq(as.Date("2024-01-01"), by = "week", length.out = 12)

df_tweets_weekly <- data.frame(
  date    = rep(weeks_seq, length.out = n_tweets),
  week    = rep(1:12,       length.out = n_tweets),
  company = rep(c("MTN", "Airtel"), length.out = n_tweets),
  tweet   = c(tweets_mtn, tweets_airtel)
) |>
  mutate(
    year_week = format(date, "%Y-W%V"),
    # Sentiment scoring (simplified VADER-like)
    sentiment_score = ifelse(
      grepl("love|excellent|great|fast|stable|reliable|sweet|better|recommend|good",
            tweet, ignore.case = TRUE),
      0.7,
      ifelse(
        grepl("drop|terrible|high|unstable|waste|spoil|disappointing|bad|no good",
              tweet, ignore.case = TRUE),
        -0.7,
        0
      )
    ),
    sentiment_label = case_when(
      sentiment_score > 0.3 ~ "Positive",
      sentiment_score < -0.3 ~ "Negative",
      TRUE ~ "Neutral"
    )
  )

cat("Nigerian Telecoms Sentiment Dashboard\n")
#> Nigerian Telecoms Sentiment Dashboard
cat("Data: 500 synthetic tweets (250 MTN, 250 Airtel) over 12 weeks\n\n")
#> Data: 500 synthetic tweets (250 MTN, 250 Airtel) over 12 weeks

# Weekly sentiment trends
weekly_sentiment <- df_tweets_weekly |>
  group_by(year_week, company) |>
  summarise(
    mean_sentiment = mean(sentiment_score),
    positive_pct = sum(sentiment_label == "Positive") / n() * 100,
    negative_pct = sum(sentiment_label == "Negative") / n() * 100,
    neutral_pct = sum(sentiment_label == "Neutral") / n() * 100,
    n_tweets = n(),
    .groups = "drop"
  )

cat("Weekly Sentiment Summary (first 5 weeks):\n")
#> Weekly Sentiment Summary (first 5 weeks):
print(head(weekly_sentiment, 5) |>
        mutate(across(c(mean_sentiment, positive_pct, negative_pct, neutral_pct),
                      ~ round(., 2))))
#> # A tibble: 5 × 7
#>   year_week company mean_sentiment positive_pct negative_pct neutral_pct
#>   <chr>     <chr>            <dbl>        <dbl>        <dbl>       <dbl>
#> 1 2024-W01  MTN               0.3          57.1         14.3        28.6
#> 2 2024-W02  Airtel            0.03         33.3         28.6        38.1
#> 3 2024-W03  MTN               0.02         33.3         31.0        35.7
#> 4 2024-W04  Airtel            0.1          38.1         23.8        38.1
#> 5 2024-W05  MTN               0.15         40.5         19.0        40.5
#> # ℹ 1 more variable: n_tweets <int>

# Sentiment trend plot
ggplot(weekly_sentiment, aes(x = year_week, y = mean_sentiment, colour = company)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_colour_manual(values = c("MTN" = "blue", "Airtel" = "red")) +
  labs(title = "Weekly Sentiment Trend: MTN vs Airtel",
       subtitle = "Compound sentiment score over 12 weeks",
       x = "Week", y = "Mean Sentiment Score", colour = "Company") +
  theme_minimal() +
  theme(legend.position = "top")

Show code


# Aspect analysis (manual categorisation)
df_tweets_weekly <- df_tweets_weekly |>
  mutate(
    aspect = case_when(
      grepl("network|speed|drop|connect|coverage", tweet, ignore.case = TRUE) ~ "Network",
      grepl("customer|service|support|help", tweet, ignore.case = TRUE) ~ "Customer Service",
      grepl("price|cost|money|plan|bundle|value", tweet, ignore.case = TRUE) ~ "Pricing",
      grepl("quality|reliable|stable", tweet, ignore.case = TRUE) ~ "Quality",
      TRUE ~ "General"
    )
  )

aspect_sentiment <- df_tweets_weekly |>
  group_by(company, aspect) |>
  summarise(
    mean_sentiment = mean(sentiment_score),
    positive_count = sum(sentiment_label == "Positive"),
    negative_count = sum(sentiment_label == "Negative"),
    .groups = "drop"
  ) |>
  filter(aspect != "General")

cat("\n\nAspect-Level Sentiment:\n")
#> 
#> 
#> Aspect-Level Sentiment:
print(aspect_sentiment |> mutate(mean_sentiment = round(mean_sentiment, 2)))
#> # A tibble: 8 × 5
#>   company aspect           mean_sentiment positive_count negative_count
#>   <chr>   <chr>                     <dbl>          <int>          <int>
#> 1 Airtel  Customer Service           0.26             40             14
#> 2 Airtel  Network                    0.24             29             11
#> 3 Airtel  Pricing                   -0.11             12             20
#> 4 Airtel  Quality                    0                 0              0
#> 5 MTN     Customer Service           0.22             29             12
#> 6 MTN     Network                    0.28             37             10
#> 7 MTN     Pricing                   -0.07             13             19
#> 8 MTN     Quality                    0                 0              0

# Aspect comparison plot
aspect_sentiment |>
  pivot_longer(cols = c(positive_count, negative_count),
               names_to = "sentiment_type", values_to = "count") |>
  ggplot(aes(x = aspect, y = count, fill = sentiment_type)) +
  geom_col(position = "dodge", alpha = 0.8) +
  facet_wrap(~company) +
  scale_fill_manual(values = c("positive_count" = "green", "negative_count" = "red")) +
  labs(title = "Aspect-Level Sentiment: MTN vs Airtel",
       x = "Aspect", y = "Tweet Count", fill = "Sentiment") +
  theme_minimal() +
  theme(legend.position = "top")

Show code


# Management summary
cat("\n\n=== SENTIMENT DASHBOARD SUMMARY ===\n\n")
#> 
#> 
#> === SENTIMENT DASHBOARD SUMMARY ===

for (company_name in c("MTN", "Airtel")) {
  comp_data <- df_tweets_weekly |> filter(company == company_name)
  mean_sent <- mean(comp_data$sentiment_score)
  pos_pct <- sum(comp_data$sentiment_label == "Positive") / nrow(comp_data) * 100
  neg_pct <- sum(comp_data$sentiment_label == "Negative") / nrow(comp_data) * 100

  cat(sprintf("%s:\n", company_name))
  cat(sprintf("  Overall sentiment score: %.2f\n", mean_sent))
  cat(sprintf("  Positive: %.0f%%, Negative: %.0f%%\n", pos_pct, neg_pct))

  top_aspect <- aspect_sentiment |>
    filter(company == company_name) |>
    arrange(desc(mean_sentiment)) |>
    slice(1) |>
    pull(aspect)

  cat(sprintf("  Strongest aspect: %s\n\n", top_aspect))
}
#> MTN:
#>   Overall sentiment score: 0.16
#>   Positive: 42%, Negative: 20%
#>   Strongest aspect: Network
#> 
#> Airtel:
#>   Overall sentiment score: 0.13
#>   Positive: 42%, Negative: 23%
#>   Strongest aspect: Customer Service

cat("RECOMMENDATIONS:\n")
#> RECOMMENDATIONS:
cat("1. Monitor sentiment weekly; flag if negative exceeds 30%.\n")
#> 1. Monitor sentiment weekly; flag if negative exceeds 30%.
cat("2. Network quality is the top driver; prioritise infrastructure investments.\n")
#> 2. Network quality is the top driver; prioritise infrastructure investments.
cat("3. Customer service is lagging; invest in training and response time.\n")
#> 3. Customer service is lagging; invest in training and response time.
cat("4. Pricing is sensitive; communicate value in bundle plans.\n")
#> 4. Pricing is sensitive; communicate value in bundle plans.

33.8 Chapter Exercises

Chapter 28 Exercises

Exercise 28.1: Lexicon-Based Sentiment Analysis

Apply VADER (or a similar rule-based lexicon) to the following Nigerian customer reviews. Before checking with a tool, manually predict whether each is Positive, Negative, or Neutral, and estimate a rough polarity score. Then compare your intuition to what a lexicon would likely produce.

Review 1: “I have been using this bank for 5 years and they continue to DISAPPOINT me. The ATM was down for 3 days last week!”

Review 2: “Not the worst experience but definitely not good. I expected better from a Tier 1 bank.”

Review 3: “Superb service! The agent resolved my issue in under 5 minutes. Very impressed.”

Review 4: “The app sometimes crashes but it’s not too bad. Generally works fine for transfers.”

Review 5: “I wouldn’t say the service was terrible, but calling it excellent would be a stretch.”

For each review, manually identify the key sentiment-bearing words (words that signal positive or negative sentiment).
Review 2 contains the phrase “Not the worst” — a double negative. Explain why this is challenging for lexicon-based approaches that simply sum word scores.
Review 5 contains negation and hedging (“wouldn’t say”, “would be a stretch”). Is this review positive, negative, or neutral overall? How would a simple lexicon approach likely misclassify it?
A VADER analysis returns these compound scores: Review 1: −0.72, Review 2: −0.21, Review 3: +0.87, Review 4: +0.22, Review 5: −0.15. Using a threshold of compound > 0.05 = Positive, compound < −0.05 = Negative, otherwise Neutral — classify each review. Do you agree with the classifications?
For a bank deploying sentiment analysis on customer feedback, what is the cost of misclassifying a genuinely angry complaint as neutral? How might you adjust the threshold to minimise this type of error?

Exercise 28.2: Training a Sentiment Classifier

You are building a sentiment classifier for a Nigerian telecoms company. You have 2,000 labelled customer reviews: 1,200 positive and 800 negative.

Before building the model, you split the data 80/20 (train/test). What is the baseline accuracy if you simply predicted “positive” for every test observation?
Explain why accuracy alone is misleading for this task. Which metric would you prioritise and why?
After training a Naive Bayes classifier, you get the following test set results:

720 reviews labelled Positive: model correctly classifies 680, misclassifies 40 as Negative
480 reviews labelled Negative: model correctly classifies 360, misclassifies 120 as Positive

Build the complete confusion matrix.

Calculate Precision, Recall, and F1-score for the Negative class (treating Negative as the “positive” case of interest — the one we most want to catch correctly). Show your working.
The company’s customer experience team says: “We’d rather investigate 50 false alarms than miss a single genuine complaint.” What adjustment to the classification threshold would achieve this, and what is the trade-off?

Exercise 28.3: Aspect-Based Sentiment Analysis

The following customer review mentions several aspects of a Nigerian airline’s service:

“The flight itself was on time and the seats were comfortable. However, the cabin crew were rude and unhelpful when I asked for assistance. The food was absolutely terrible — cold and tasteless. Check-in was efficient though, much faster than my previous experience.”

Identify all the aspects mentioned in this review (e.g., “flight punctuality”, “seat comfort”, etc.). List them.
For each aspect, determine the sentiment expressed: Positive, Negative, or Neutral.
The overall VADER sentiment for this review might be neutral or slightly negative (mixed signals from the different aspects). Explain why overall sentiment analysis fails to capture the useful information in this review for the airline’s operations team.
If you were the airline’s Head of Customer Experience, which aspects would you prioritise for immediate action based on this review? What business decision would you make?
You want to build an automated aspect-based sentiment analysis system for 50,000 airline reviews per month. Describe the two main approaches you could use (rule-based vs. machine learning) and the trade-offs between them for this scale.

Exercise 28.4: Sentiment Analysis at Scale — Business Applications

A fast-food chain in Nigeria wants to monitor brand sentiment across Twitter, Instagram, and Google Reviews in real time. List four specific business decisions that could be improved by having real-time sentiment scores (not just “we know how people feel” — be specific about the decision).
You build a model that achieves 78% accuracy on your test set of 1,000 reviews. Six months after deployment, the customer service team notices the model frequently misclassifies complaints about a new payment feature. What has likely happened, and what is the technical term for this phenomenon?
Social media sentiment data has a survivorship bias problem: only customers who feel strongly enough to post tend to do so. Silent customers (happy or mildly dissatisfied) are underrepresented. How might this bias affect: (i) the estimated proportion of positive sentiment; (ii) the types of complaints the model learns to recognise?
A telecoms company uses sentiment scores as one of several inputs to a customer churn model. Sentiment analysis predicts that a customer is highly negative. The churn model predicts a 90% probability of churn. The retention team plans to call this customer with a special offer. At what stage in this process should human judgment be applied, and why?

Exercise 28.5: Capstone — Brand Sentiment Dashboard

Design a sentiment analytics solution for a major Nigerian commercial bank that wants to track public perception of its brand.

Identify four data sources the bank could use for sentiment data (beyond customer service calls and email). For each, describe: what data is available, how frequently it updates, and any quality concerns.
The bank wants to track sentiment for three dimensions separately: (i) overall brand perception; (ii) specific product complaints; (iii) response to marketing campaigns. Design the data labelling schema — specifically, what labels would you assign to each piece of text?
The bank’s marketing team wants a weekly sentiment dashboard. List six metrics (with descriptions) that should appear on this dashboard. For each, explain what a change in this metric would signal to the team.
The bank’s social media manager asks: “Can we use this to automatically respond to negative reviews?” Describe the design of such a system, including what it can do automatically and what must involve a human being.
Write a 200-word proposal for the bank’s Board explaining the business case for investing in real-time sentiment analytics. Include: the business problem, the proposed solution, expected benefits, and key risks. Use no technical jargon.

33.9 Further Reading

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135.
Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., AL-Smadi, M., … & Eryigit, G. (2016). SemEval-2016 task 5: Aspect based sentiment analysis. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016).
Rosenthal, S., Farra, N., & Nakov, P. (2017). SemEval-2017 task 4: Sentiment analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017).
Laitinen, M., & Tarvainen, K. (2013). Sentiment analysis of Twitter data. University of Tampere, 8.

33.10 Chapter 28 Appendix: Mathematical Derivations

33.10.1 A28.1 VADER Compound Score Normalisation

VADER computes a compound score by normalising the sum of adjusted sentiment scores: \[\text{compound} = \frac{\sum s_i}{\sqrt{(\sum s_i)^2 + \alpha}}\]

where \(s_i\) are sentiment scores (adjusted for negation, intensifiers, punctuation) and \(\alpha\) is a normalisation constant (typically 15).

Derivation: The denominator \(\sqrt{(\sum s_i)^2 + \alpha}\) is related to the Euclidean norm. As \(|\sum s_i|\) grows, the fraction approaches ±1 (bounded output). The constant \(\alpha\) prevents division by zero and dampens small scores.

33.10.2 A28.2 Logistic Regression for Multiclass Sentiment

For \(K\) classes (Negative, Neutral, Positive), the softmax function outputs: \[P(y = k | \mathbf{x}) = \frac{e^{\mathbf{w}_k^T \mathbf{x} + b_k}}{\sum_{j=1}^{K} e^{\mathbf{w}_j^T \mathbf{x} + b_j}}\]

The model is trained by minimising cross-entropy loss: \[\mathcal{L} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{k=1}^{K} \mathbb{1}_{y_i = k} \log P(y_i = k | \mathbf{x}_i)\]

where \(\mathbb{1}_{y_i = k}\) is the indicator function (1 if true class is \(k\), else 0).

33.10.3 A28.3 Aspect Extraction via Dependency Parsing

Dependency parsing represents grammatical relationships as a directed graph. For “The battery drains quickly”: - “battery” is nsubj (nominal subject) of “drains”. - “drains” is the root predicate. - “quickly” is advmod (adverbial modifier) of “drains”.

To extract aspect-opinion pairs: 1. Find all nouns (potential aspects). 2. Find adjectival and verbal dependents or heads. 3. Link based on dependency relations (amod, nsubj, nmod, etc.).

33.10.4 A28.4 Transformer Attention and Contextual Embeddings

In a transformer’s multi-head attention: \[\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right) V\]

For sentiment, each token’s contextual embedding is updated based on attention to all tokens in the sentence. Sentiment-bearing words (e.g., “good”, “bad”) attend to the target entity, contextualising the opinion.