69 Appendix E — Glossary of Key Terms

70 Appendix E — Glossary of Key Terms

This glossary defines 150+ essential concepts from “AI-Powered Business Analytics” in plain, accessible language suitable for MBA students and professionals new to data science. Terms are arranged alphabetically with cross-references to chapters where they are introduced or applied.

70.1 A

A/B Testing A controlled experiment comparing two variants (A and B) of a product, feature, or marketing message to determine which performs better. Users are randomly assigned to each variant, and a metric (conversion rate, engagement, revenue) is compared. See Ch. 14, 15.

Accuracy The proportion of predictions that are correct in a classification model: (TP + TN) / (TP + TN + FP + FN). A basic metric but can be misleading on imbalanced datasets. See Ch. 6, 23.

Adstock The lagged effect of advertising spend on sales. A customer exposed to an ad today may purchase next week. Adstock models capture this delayed response, typically using geometric decay. See Ch. 37.

AIC (Akaike Information Criterion) A measure of model fit that balances goodness-of-fit with model complexity. Lower AIC indicates a better model. Used for model selection when comparing non-nested models. See Ch. 6, 24.

ARIMA (AutoRegressive Integrated Moving Average) A statistical model for time series forecasting. Parameters (p, d, q) control the autoregressive, differencing, and moving average components respectively. See Ch. 9, 31.

Association Rules In market basket analysis, rules like “if customer buys item A, they likely buy item B” with metrics support and confidence. Found using algorithms like Apriori. See Ch. 17.

AUC (Area Under Curve) The area under the ROC (Receiver Operating Characteristic) curve. Ranges from 0.5 (random) to 1.0 (perfect). Measures classification model performance across all probability thresholds. See Ch. 6, 23.

70.2 B

Backpropagation The algorithm used to train neural networks by computing gradients of the loss function with respect to weights, then updating weights to minimise loss. See Ch. 7.

Bagging (Bootstrap Aggregating) An ensemble method that trains multiple models on bootstrap samples of data, then averages predictions. Reduces variance and overfitting. See Ch. 7.

Bayes’ Theorem A foundational probability formula: P(A|B) = P(B|A) × P(A) / P(B). The basis for Bayesian inference and updating beliefs with new evidence. See Ch. 8, 52.

Bayesian Inference A statistical approach to updating prior beliefs (prior distribution) with observed data (likelihood) to obtain posterior beliefs. Used in many modern analytics applications. See Ch. 8, 29.

Betweenness Centrality A network centrality measure: the extent to which a node lies on the shortest paths between other nodes. High betweenness = important broker/intermediary. See Ch. 11, 22.

Bias-Variance Trade-off The tension in model building: high bias models are simple but underfit; high variance models are complex but overfit. The goal is balance. See Ch. 6.

Bootstrap A resampling method: draw repeated samples (with replacement) from data to estimate the distribution of a statistic. Useful for confidence intervals and hypothesis tests. See Ch. 5.

Brand Equity The added value a brand name gives to a product beyond functional attributes. Measured via awareness, perception, loyalty, and price premium. See Ch. 35.

70.3 C

Cash Flow The actual movement of money in and out of a business. Includes operating, investing, and financing activities. Critical for liquidity and solvency. See Ch. 3, 49.

Centrality (Network) Measures of how “central” a node is in a network: degree (connections), closeness (distance to others), betweenness (intermediary role), eigenvector (importance of neighbours). See Ch. 11.

Chi-Squared Test A statistical test for independence between categorical variables. Tests whether observed frequencies differ significantly from expected under independence. See Ch. 5.

Classification A supervised learning task: predict a categorical target variable (e.g., churn: yes/no). Includes logistic regression, decision trees, neural networks. See Ch. 6, 23.

Cluster Analysis Unsupervised learning to group similar observations. Methods include K-means, hierarchical clustering, DBSCAN. No predetermined target. See Ch. 28.

Coefficient of Variation Standard deviation divided by mean, expressed as percentage. A unit-free measure of relative variability; useful for comparing spread across variables with different scales. See Ch. 5.

Collaborative Filtering A recommendation technique: users with similar past behaviour receive similar recommendations. Includes user-based and item-based variants. See Ch. 21.

Confusion Matrix A table showing predicted vs. actual class labels for a classifier. Enables calculation of accuracy, precision, recall, F1 score. See Ch. 6, 23.

Convolutional Neural Network (CNN) A deep learning architecture for image data. Uses convolutional layers to extract spatial features, pooling to reduce dimensions. See Ch. 7, 20.

Correlation A measure of linear association between two variables, ranging from -1 to +1. Pearson (continuous), Spearman (rank-based), Kendall (rank-based, robust). See Ch. 5, 16.

Covariate An observed variable that may affect the outcome but is not the primary focus. Often controlled for in regression models. See Ch. 5, 24.

Cross-Validation A model evaluation technique: divide data into k folds, train on k-1 folds, test on the remaining fold, repeat. Provides robust estimate of model performance. See Ch. 6.

70.4 D

Data Mining The discovery of patterns and knowledge in large datasets. Includes clustering, association rules, anomaly detection. See Ch. 17.

Decision Tree A tree-shaped model for classification/regression. Splits data recursively on features to minimise impurity. Interpretable but prone to overfitting. See Ch. 7, 27.

Demand Forecasting Predicting future customer demand for products. Methods include time series, causal regression, judgment-based approaches. See Ch. 31, 45.

Dendrogram A tree diagram showing hierarchical clustering results. Height of branches indicates dissimilarity; cutting at different heights yields different cluster counts. See Ch. 28.

DBSCAN (Density-Based Spatial Clustering) A clustering algorithm grouping points by density. Does not require specifying k in advance; can find arbitrary-shaped clusters. See Ch. 28.

Dimensionality Reduction Techniques to reduce the number of features while retaining information: PCA, feature selection, feature engineering. See Ch. 18.

70.5 E

Elastic Net A regularised regression technique combining L1 (LASSO) and L2 (Ridge) penalties. Balances feature selection with coefficient shrinkage. See Ch. 18.

Ensemble Method Combining multiple models to improve prediction. Includes bagging, boosting, stacking. Often outperforms individual models. See Ch. 7.

Explainability / Interpretability The ability to understand why a model makes a specific prediction. SHAP, LIME, feature importance provide post-hoc explanations. See Ch. 29, 51.

70.6 F

Factor Analysis Unsupervised technique to discover latent factors explaining correlations among observed variables. Similar to PCA but with different assumptions. See Ch. 18.

Feature Engineering Creating new features from raw data to improve model performance. Includes transformations, interactions, domain knowledge. See Ch. 6, 18.

Feature Importance A measure of how much each feature contributes to model predictions. Computed via tree-based methods, permutation, SHAP values. See Ch. 6, 29.

F1 Score Harmonic mean of precision and recall: 2 × (Precision × Recall) / (Precision + Recall). Balances precision and recall; useful for imbalanced classes. See Ch. 6, 23.

Forecasting Predicting future values based on historical data. Includes time series (ARIMA, Prophet), causal regression, judgmental methods. See Ch. 9, 31.

Fraud Detection Identifying fraudulent transactions or behaviour. Methods include anomaly detection, isolation forests, supervised classification. See Ch. 29, 41.

70.7 G

Gradient Boosting An ensemble technique iteratively training models to correct residuals of previous models. Includes XGBoost, LightGBM, CatBoost. Often achieves top performance. See Ch. 7.

70.8 H

Hazard Ratio In survival analysis, the ratio of hazard rates for two groups. HR > 1 indicates higher risk in first group; HR < 1 indicates lower risk. See Ch. 30.

Hierarchical Clustering Clustering technique building a dendrogram of nested clusters. Agglomerative (bottom-up) starts with individuals; divisive (top-down) starts with all data. See Ch. 28.

Hypothesis Testing Statistical procedure to test a claim about a population. Includes null hypothesis, test statistic, p-value, significance level, power. See Ch. 5.

70.9 I

Information Value (IV) A measure of predictive power for a variable, particularly in credit scoring. Higher IV = stronger predictor. See Ch. 27.

Isolation Forest An anomaly detection algorithm exploiting the idea that anomalies are “isolated” in feature space. Efficient for high-dimensional data. See Ch. 29.

70.10 K

K-Means Clustering An unsupervised algorithm partitioning data into k clusters by minimising within-cluster variance. Requires specifying k; sensitive to initialisation. See Ch. 28.

Kaplan-Meier Estimator A non-parametric method estimating survival probability at each time point, accounting for censoring. Basis for survival curves. See Ch. 30.

Kruskal-Wallis Test A non-parametric test comparing central tendencies of 3+ groups. Generalisation of Mann-Whitney U test; does not assume normality. See Ch. 5.

70.11 L

LASSO (Least Absolute Shrinkage and Selection Operator) Regularised regression using L1 penalty. Shrinks some coefficients to exactly zero, performing feature selection. See Ch. 18.

Lead Scoring Ranking sales prospects by likelihood to convert. Models combine explicit signals (firmographics) and implicit signals (engagement behaviour). See Ch. 27.

Lift In market basket analysis or targeting, the ratio of response rate in a group to overall response rate. Lift > 1 indicates group responds better than average. See Ch. 17, 37.

Linear Programming Optimisation technique for linear objective functions subject to linear constraints. Includes graphical method, simplex method. See Ch. 25.

Logistic Regression Classification model for binary outcomes. Outputs probabilities via logistic function; easily interpreted coefficients represent log-odds. See Ch. 6, 23.

LDA (Latent Dirichlet Allocation) A topic model discovering latent topics in a document collection. Each document is a mixture of topics; each topic is a distribution over words. See Ch. 10, 19.

70.12 M

Market Basket Analysis Discovering associations between products purchased together. Uses support, confidence, lift metrics and Apriori/Eclat algorithms. See Ch. 17.

Mean Absolute Error (MAE) Average of absolute differences between predicted and actual values. Robust to outliers; same units as target variable. See Ch. 6, 31.

Minimum Detectable Effect (MDE) The smallest effect size an A/B test is designed to reliably detect given sample size and power. Determined before running experiment. See Ch. 14, 15.

Multicollinearity When predictor variables are highly correlated, causing unstable regression coefficients. Detected via VIF; addressed via regularisation or feature selection. See Ch. 6, 18.

70.13 N

Neural Network A machine learning model inspired by biological neurons. Layers of interconnected nodes with weights; trained via backpropagation. See Ch. 7.

NDCG (Normalized Discounted Cumulative Gain) A ranking metric for recommendation systems. Accounts for position (top items matter more) and relevance. Ranges 0-1; higher is better. See Ch. 21.

NPS (Net Promoter Score) Customer loyalty metric: percentage of promoters (likelihood to recommend 9-10) minus detractors (0-6). Simple, widely used in practice. See Ch. 35.

70.14 O

Odds Ratio In logistic regression and 2x2 tables, the ratio of odds of outcome under two conditions. OR > 1 indicates increased odds; OR < 1 indicates decreased odds. See Ch. 23.

One-Hot Encoding Converting categorical variable with k categories into k binary variables (1 if category applies, 0 otherwise). Used before fitting many ML models. See Ch. 6.

Overfitting When a model fits training data too closely, capturing noise rather than signal. Results in poor generalisation to new data. See Ch. 6.

70.15 P

PageRank An algorithm ranking nodes in a network by importance, based on link structure. Assumes important pages link to and are linked from other important pages. See Ch. 11.

PCA (Principal Component Analysis) Dimensionality reduction: linear transformation creating uncorrelated principal components in order of variance explained. See Ch. 18.

Pearson Correlation Correlation coefficient for continuous variables measuring linear association. Ranges -1 to +1; assumes normality for hypothesis tests. See Ch. 5.

Precision In classification, proportion of positive predictions that are correct: TP / (TP + FP). Important when false positives are costly. See Ch. 6, 23.

Predictive Maintenance Using models to predict equipment failures before they occur, enabling proactive maintenance. Reduces downtime and costs. See Ch. 9.

Principal Component In PCA, a linear combination of original variables. First component explains most variance; subsequent components explain residual variance. See Ch. 18.

Prior (Bayesian) Initial probability distribution of a parameter before observing data. Updated with data to obtain posterior. Reflects prior belief or domain knowledge. See Ch. 8.

Prophet A time series forecasting library (Facebook) handling seasonality, trend, holidays. Robust to missing data and outliers. See Ch. 9, 31.

p-Chart Statistical process control chart for defect proportions. Plots sample proportion over time; includes control limits. See Ch. 41.

70.16 Q

Quantile Regression Regression method estimating conditional quantiles (e.g., median, 25th percentile) rather than mean. Robust to outliers. See Ch. 24.

70.17 R

Random Forest Ensemble of decision trees, each trained on bootstrap sample and random feature subset. Combines predictions via averaging (regression) or voting (classification). See Ch. 7, 29.

Recall In classification, proportion of actual positives correctly identified: TP / (TP + FN). Important when false negatives are costly. See Ch. 6, 23.

Regularisation Technique to prevent overfitting by penalising model complexity. L1 (LASSO), L2 (Ridge), dropout (neural networks). See Ch. 6, 7.

Regression Supervised learning predicting a continuous target. Includes linear, multiple, polynomial, logistic (when target is categorical). See Ch. 24.

RFM Analysis Customer segmentation based on Recency (last purchase), Frequency (purchase count), Monetary (total spent). Simple, interpretable, actionable. See Ch. 28, 34.

RMSE (Root Mean Squared Error) Square root of average squared prediction errors. Penalises large errors more than MAE. See Ch. 6, 31.

ROC Curve Plot of True Positive Rate vs. False Positive Rate across all classification thresholds. AUC summarises model performance. See Ch. 6, 23.

70.18 S

SARIMA Seasonal ARIMA extending ARIMA to handle seasonality. Includes seasonal autoregressive and moving average components. See Ch. 9.

Segmentation Dividing customers or products into distinct groups. Includes demographic, behavioural, value-based segmentation. See Ch. 28, 34.

Sensitivity Same as Recall (True Positive Rate). In classification, the ability to correctly identify positive cases. See Ch. 6.

SHAP (SHapley Additive exPlanations) Post-hoc explainability method computing each feature’s contribution to predictions. Provides local and global interpretability. See Ch. 29, 51.

Sigmoid Function Mathematically, σ(x) = 1 / (1 + e^-x). Maps any input to (0, 1); used in logistic regression and neural networks to output probabilities. See Ch. 6, 7.

Silhouette Score Measure of cluster cohesion and separation in clustering. Ranges -1 to +1; higher indicates better-defined clusters. Used to select optimal k. See Ch. 28.

SMOTE (Synthetic Minority Over-sampling) Technique for handling imbalanced classification by synthesising new minority class samples. Improves model ability to learn rare class. See Ch. 29.

Specificity In classification, proportion of actual negatives correctly identified: TN / (TN + FP). Opposite of false positive rate. See Ch. 6.

Spearman Correlation Rank-based correlation coefficient, non-parametric. Measures monotonic (not necessarily linear) association. Robust to outliers. See Ch. 5.

Statistical Power Probability of correctly rejecting false null hypothesis (detecting true effect). 1 - Type II error. Goal typically 80-90%. See Ch. 5, 14.

Stationarity In time series, statistical properties (mean, variance) constant over time. Required for ARIMA models. Tested via ADF test. See Ch. 9.

Support (Association Rules) In market basket analysis, proportion of transactions containing both items: P(A ∩ B). Minimum support threshold filters rare rules. See Ch. 17.

Survival Analysis Statistical methods for time-to-event data (e.g., customer lifetime, time to failure). Handles censoring (incomplete observations). See Ch. 30.

Support Vector Machine (SVM) Classification algorithm finding hyperplane maximising margin between classes. Handles non-linear boundaries via kernel trick. See Ch. 6.

70.19 T

t-Test Hypothesis test comparing means of one or two groups. One-sample (vs. value), two-sample (independent groups), paired. See Ch. 5.

TF-IDF (Term Frequency-Inverse Document Frequency) Weighting scheme for text analysis. TF measures word frequency; IDF downweights common words. See Ch. 10, 19.

Time Series Sequence of observations ordered by time. Properties include trend, seasonality, cyclical patterns, noise. See Ch. 9.

Transfer Learning Using a model trained on one task to solve another task. Common in deep learning (e.g., pre-trained image models). See Ch. 7, 20.

Type I Error False positive: rejecting true null hypothesis. Probability controlled by significance level α. See Ch. 5.

Type II Error False negative: failing to reject false null hypothesis. Probability denoted β; (1 - β) is power. See Ch. 5.

70.20 V

VIF (Variance Inflation Factor) Quantifies multicollinearity. VIF = 1 (no collinearity); VIF > 5-10 indicates problematic collinearity. Computed per predictor. See Ch. 18.

Voronoi Diagram Partitioning space based on distance to seed points. Used in location analytics to define regions. See Ch. 11, 40.

70.21 W

Weight of Evidence (WOE) Log ratio of goods to bads in a category vs. overall. Used in information value calculations and logistic regression. See Ch. 27.

70.22 X

XGBoost Extreme Gradient Boosting: optimised gradient boosting library. Widely used; competitive on Kaggle and in industry. See Ch. 7, 29.

70.23 Z

Z-Score Standardisation: (x - mean) / standard deviation. Converts to standard normal scale; facilitates comparison across variables. See Ch. 5.

70.24 Frequently Confused Terms

Precision vs. Recall: Precision answers “of predicted positives, how many are correct?” Recall answers “of actual positives, how many did we find?” Both matter in imbalanced data.

Correlation vs. Causation: Correlation measures association; causation requires causal mechanism, temporal ordering, confounding control. See Ch. 14, 52.

Bias vs. Variance: Bias is error from wrong assumptions (underfitting); variance is error from sensitivity to training data (overfitting). See Ch. 6.

Sensitivity vs. Specificity: Sensitivity (recall) = ability to find positives; specificity = ability to reject negatives. Trade-off controlled by probability threshold. See Ch. 6.

Supervised vs. Unsupervised: Supervised learning has labelled target (classification/regression); unsupervised has no target (clustering, dimensionality reduction). See Ch. 6, 28.

70.25 References to Chapters

For each term, the primary chapter(s) where it is introduced or most heavily used are noted. Example: “See Ch. 6, 23” means term is introduced in Ch. 6 and applied again in Ch. 23.

To find terms related to a specific topic, consult the Method Coverage Map in Appendix C or the comprehensive chapter index.

This glossary covers concepts across all 56 chapters. For technical proofs and deeper mathematical foundations, refer to the main text and references in Appendix G.

--- title: "Appendix E — Glossary of Key Terms" --- # Appendix E — Glossary of Key Terms This glossary defines 150+ essential concepts from "AI-Powered Business Analytics" in plain, accessible language suitable for MBA students and professionals new to data science. Terms are arranged alphabetically with cross-references to chapters where they are introduced or applied. --- ## A **A/B Testing** A controlled experiment comparing two variants (A and B) of a product, feature, or marketing message to determine which performs better. Users are randomly assigned to each variant, and a metric (conversion rate, engagement, revenue) is compared. See Ch. 14, 15. **Accuracy** The proportion of predictions that are correct in a classification model: (TP + TN) / (TP + TN + FP + FN). A basic metric but can be misleading on imbalanced datasets. See Ch. 6, 23. **Adstock** The lagged effect of advertising spend on sales. A customer exposed to an ad today may purchase next week. Adstock models capture this delayed response, typically using geometric decay. See Ch. 37. **AIC (Akaike Information Criterion)** A measure of model fit that balances goodness-of-fit with model complexity. Lower AIC indicates a better model. Used for model selection when comparing non-nested models. See Ch. 6, 24. **ARIMA (AutoRegressive Integrated Moving Average)** A statistical model for time series forecasting. Parameters (p, d, q) control the autoregressive, differencing, and moving average components respectively. See Ch. 9, 31. **Association Rules** In market basket analysis, rules like "if customer buys item A, they likely buy item B" with metrics support and confidence. Found using algorithms like Apriori. See Ch. 17. **AUC (Area Under Curve)** The area under the ROC (Receiver Operating Characteristic) curve. Ranges from 0.5 (random) to 1.0 (perfect). Measures classification model performance across all probability thresholds. See Ch. 6, 23. --- ## B **Backpropagation** The algorithm used to train neural networks by computing gradients of the loss function with respect to weights, then updating weights to minimise loss. See Ch. 7. **Bagging (Bootstrap Aggregating)** An ensemble method that trains multiple models on bootstrap samples of data, then averages predictions. Reduces variance and overfitting. See Ch. 7. **Bayes' Theorem** A foundational probability formula: P(A|B) = P(B|A) × P(A) / P(B). The basis for Bayesian inference and updating beliefs with new evidence. See Ch. 8, 52. **Bayesian Inference** A statistical approach to updating prior beliefs (prior distribution) with observed data (likelihood) to obtain posterior beliefs. Used in many modern analytics applications. See Ch. 8, 29. **Betweenness Centrality** A network centrality measure: the extent to which a node lies on the shortest paths between other nodes. High betweenness = important broker/intermediary. See Ch. 11, 22. **Bias-Variance Trade-off** The tension in model building: high bias models are simple but underfit; high variance models are complex but overfit. The goal is balance. See Ch. 6. **Bootstrap** A resampling method: draw repeated samples (with replacement) from data to estimate the distribution of a statistic. Useful for confidence intervals and hypothesis tests. See Ch. 5. **Brand Equity** The added value a brand name gives to a product beyond functional attributes. Measured via awareness, perception, loyalty, and price premium. See Ch. 35. --- ## C **Cash Flow** The actual movement of money in and out of a business. Includes operating, investing, and financing activities. Critical for liquidity and solvency. See Ch. 3, 49. **Centrality (Network)** Measures of how "central" a node is in a network: degree (connections), closeness (distance to others), betweenness (intermediary role), eigenvector (importance of neighbours). See Ch. 11. **Chi-Squared Test** A statistical test for independence between categorical variables. Tests whether observed frequencies differ significantly from expected under independence. See Ch. 5. **Classification** A supervised learning task: predict a categorical target variable (e.g., churn: yes/no). Includes logistic regression, decision trees, neural networks. See Ch. 6, 23. **Cluster Analysis** Unsupervised learning to group similar observations. Methods include K-means, hierarchical clustering, DBSCAN. No predetermined target. See Ch. 28. **Coefficient of Variation** Standard deviation divided by mean, expressed as percentage. A unit-free measure of relative variability; useful for comparing spread across variables with different scales. See Ch. 5. **Collaborative Filtering** A recommendation technique: users with similar past behaviour receive similar recommendations. Includes user-based and item-based variants. See Ch. 21. **Confusion Matrix** A table showing predicted vs. actual class labels for a classifier. Enables calculation of accuracy, precision, recall, F1 score. See Ch. 6, 23. **Convolutional Neural Network (CNN)** A deep learning architecture for image data. Uses convolutional layers to extract spatial features, pooling to reduce dimensions. See Ch. 7, 20. **Correlation** A measure of linear association between two variables, ranging from -1 to +1. Pearson (continuous), Spearman (rank-based), Kendall (rank-based, robust). See Ch. 5, 16. **Covariate** An observed variable that may affect the outcome but is not the primary focus. Often controlled for in regression models. See Ch. 5, 24. **Cross-Validation** A model evaluation technique: divide data into k folds, train on k-1 folds, test on the remaining fold, repeat. Provides robust estimate of model performance. See Ch. 6. --- ## D **Data Mining** The discovery of patterns and knowledge in large datasets. Includes clustering, association rules, anomaly detection. See Ch. 17. **Decision Tree** A tree-shaped model for classification/regression. Splits data recursively on features to minimise impurity. Interpretable but prone to overfitting. See Ch. 7, 27. **Demand Forecasting** Predicting future customer demand for products. Methods include time series, causal regression, judgment-based approaches. See Ch. 31, 45. **Dendrogram** A tree diagram showing hierarchical clustering results. Height of branches indicates dissimilarity; cutting at different heights yields different cluster counts. See Ch. 28. **DBSCAN (Density-Based Spatial Clustering)** A clustering algorithm grouping points by density. Does not require specifying k in advance; can find arbitrary-shaped clusters. See Ch. 28. **Dimensionality Reduction** Techniques to reduce the number of features while retaining information: PCA, feature selection, feature engineering. See Ch. 18. --- ## E **Elastic Net** A regularised regression technique combining L1 (LASSO) and L2 (Ridge) penalties. Balances feature selection with coefficient shrinkage. See Ch. 18. **Ensemble Method** Combining multiple models to improve prediction. Includes bagging, boosting, stacking. Often outperforms individual models. See Ch. 7. **Explainability / Interpretability** The ability to understand why a model makes a specific prediction. SHAP, LIME, feature importance provide post-hoc explanations. See Ch. 29, 51. --- ## F **Factor Analysis** Unsupervised technique to discover latent factors explaining correlations among observed variables. Similar to PCA but with different assumptions. See Ch. 18. **Feature Engineering** Creating new features from raw data to improve model performance. Includes transformations, interactions, domain knowledge. See Ch. 6, 18. **Feature Importance** A measure of how much each feature contributes to model predictions. Computed via tree-based methods, permutation, SHAP values. See Ch. 6, 29. **F1 Score** Harmonic mean of precision and recall: 2 × (Precision × Recall) / (Precision + Recall). Balances precision and recall; useful for imbalanced classes. See Ch. 6, 23. **Forecasting** Predicting future values based on historical data. Includes time series (ARIMA, Prophet), causal regression, judgmental methods. See Ch. 9, 31. **Fraud Detection** Identifying fraudulent transactions or behaviour. Methods include anomaly detection, isolation forests, supervised classification. See Ch. 29, 41. --- ## G **Gradient Boosting** An ensemble technique iteratively training models to correct residuals of previous models. Includes XGBoost, LightGBM, CatBoost. Often achieves top performance. See Ch. 7. --- ## H **Hazard Ratio** In survival analysis, the ratio of hazard rates for two groups. HR > 1 indicates higher risk in first group; HR < 1 indicates lower risk. See Ch. 30. **Hierarchical Clustering** Clustering technique building a dendrogram of nested clusters. Agglomerative (bottom-up) starts with individuals; divisive (top-down) starts with all data. See Ch. 28. **Hypothesis Testing** Statistical procedure to test a claim about a population. Includes null hypothesis, test statistic, p-value, significance level, power. See Ch. 5. --- ## I **Information Value (IV)** A measure of predictive power for a variable, particularly in credit scoring. Higher IV = stronger predictor. See Ch. 27. **Isolation Forest** An anomaly detection algorithm exploiting the idea that anomalies are "isolated" in feature space. Efficient for high-dimensional data. See Ch. 29. --- ## K **K-Means Clustering** An unsupervised algorithm partitioning data into k clusters by minimising within-cluster variance. Requires specifying k; sensitive to initialisation. See Ch. 28. **Kaplan-Meier Estimator** A non-parametric method estimating survival probability at each time point, accounting for censoring. Basis for survival curves. See Ch. 30. **Kruskal-Wallis Test** A non-parametric test comparing central tendencies of 3+ groups. Generalisation of Mann-Whitney U test; does not assume normality. See Ch. 5. --- ## L **LASSO (Least Absolute Shrinkage and Selection Operator)** Regularised regression using L1 penalty. Shrinks some coefficients to exactly zero, performing feature selection. See Ch. 18. **Lead Scoring** Ranking sales prospects by likelihood to convert. Models combine explicit signals (firmographics) and implicit signals (engagement behaviour). See Ch. 27. **Lift** In market basket analysis or targeting, the ratio of response rate in a group to overall response rate. Lift > 1 indicates group responds better than average. See Ch. 17, 37. **Linear Programming** Optimisation technique for linear objective functions subject to linear constraints. Includes graphical method, simplex method. See Ch. 25. **Logistic Regression** Classification model for binary outcomes. Outputs probabilities via logistic function; easily interpreted coefficients represent log-odds. See Ch. 6, 23. **LDA (Latent Dirichlet Allocation)** A topic model discovering latent topics in a document collection. Each document is a mixture of topics; each topic is a distribution over words. See Ch. 10, 19. --- ## M **Market Basket Analysis** Discovering associations between products purchased together. Uses support, confidence, lift metrics and Apriori/Eclat algorithms. See Ch. 17. **Mean Absolute Error (MAE)** Average of absolute differences between predicted and actual values. Robust to outliers; same units as target variable. See Ch. 6, 31. **Minimum Detectable Effect (MDE)** The smallest effect size an A/B test is designed to reliably detect given sample size and power. Determined before running experiment. See Ch. 14, 15. **Multicollinearity** When predictor variables are highly correlated, causing unstable regression coefficients. Detected via VIF; addressed via regularisation or feature selection. See Ch. 6, 18. --- ## N **Neural Network** A machine learning model inspired by biological neurons. Layers of interconnected nodes with weights; trained via backpropagation. See Ch. 7. **NDCG (Normalized Discounted Cumulative Gain)** A ranking metric for recommendation systems. Accounts for position (top items matter more) and relevance. Ranges 0-1; higher is better. See Ch. 21. **NPS (Net Promoter Score)** Customer loyalty metric: percentage of promoters (likelihood to recommend 9-10) minus detractors (0-6). Simple, widely used in practice. See Ch. 35. --- ## O **Odds Ratio** In logistic regression and 2x2 tables, the ratio of odds of outcome under two conditions. OR > 1 indicates increased odds; OR < 1 indicates decreased odds. See Ch. 23. **One-Hot Encoding** Converting categorical variable with k categories into k binary variables (1 if category applies, 0 otherwise). Used before fitting many ML models. See Ch. 6. **Overfitting** When a model fits training data too closely, capturing noise rather than signal. Results in poor generalisation to new data. See Ch. 6. --- ## P **PageRank** An algorithm ranking nodes in a network by importance, based on link structure. Assumes important pages link to and are linked from other important pages. See Ch. 11. **PCA (Principal Component Analysis)** Dimensionality reduction: linear transformation creating uncorrelated principal components in order of variance explained. See Ch. 18. **Pearson Correlation** Correlation coefficient for continuous variables measuring linear association. Ranges -1 to +1; assumes normality for hypothesis tests. See Ch. 5. **Precision** In classification, proportion of positive predictions that are correct: TP / (TP + FP). Important when false positives are costly. See Ch. 6, 23. **Predictive Maintenance** Using models to predict equipment failures before they occur, enabling proactive maintenance. Reduces downtime and costs. See Ch. 9. **Principal Component** In PCA, a linear combination of original variables. First component explains most variance; subsequent components explain residual variance. See Ch. 18. **Prior (Bayesian)** Initial probability distribution of a parameter before observing data. Updated with data to obtain posterior. Reflects prior belief or domain knowledge. See Ch. 8. **Prophet** A time series forecasting library (Facebook) handling seasonality, trend, holidays. Robust to missing data and outliers. See Ch. 9, 31. **p-Chart** Statistical process control chart for defect proportions. Plots sample proportion over time; includes control limits. See Ch. 41. --- ## Q **Quantile Regression** Regression method estimating conditional quantiles (e.g., median, 25th percentile) rather than mean. Robust to outliers. See Ch. 24. --- ## R **Random Forest** Ensemble of decision trees, each trained on bootstrap sample and random feature subset. Combines predictions via averaging (regression) or voting (classification). See Ch. 7, 29. **Recall** In classification, proportion of actual positives correctly identified: TP / (TP + FN). Important when false negatives are costly. See Ch. 6, 23. **Regularisation** Technique to prevent overfitting by penalising model complexity. L1 (LASSO), L2 (Ridge), dropout (neural networks). See Ch. 6, 7. **Regression** Supervised learning predicting a continuous target. Includes linear, multiple, polynomial, logistic (when target is categorical). See Ch. 24. **RFM Analysis** Customer segmentation based on Recency (last purchase), Frequency (purchase count), Monetary (total spent). Simple, interpretable, actionable. See Ch. 28, 34. **RMSE (Root Mean Squared Error)** Square root of average squared prediction errors. Penalises large errors more than MAE. See Ch. 6, 31. **ROC Curve** Plot of True Positive Rate vs. False Positive Rate across all classification thresholds. AUC summarises model performance. See Ch. 6, 23. --- ## S **SARIMA** Seasonal ARIMA extending ARIMA to handle seasonality. Includes seasonal autoregressive and moving average components. See Ch. 9. **Segmentation** Dividing customers or products into distinct groups. Includes demographic, behavioural, value-based segmentation. See Ch. 28, 34. **Sensitivity** Same as Recall (True Positive Rate). In classification, the ability to correctly identify positive cases. See Ch. 6. **SHAP (SHapley Additive exPlanations)** Post-hoc explainability method computing each feature's contribution to predictions. Provides local and global interpretability. See Ch. 29, 51. **Sigmoid Function** Mathematically, σ(x) = 1 / (1 + e^-x). Maps any input to (0, 1); used in logistic regression and neural networks to output probabilities. See Ch. 6, 7. **Silhouette Score** Measure of cluster cohesion and separation in clustering. Ranges -1 to +1; higher indicates better-defined clusters. Used to select optimal k. See Ch. 28. **SMOTE (Synthetic Minority Over-sampling)** Technique for handling imbalanced classification by synthesising new minority class samples. Improves model ability to learn rare class. See Ch. 29. **Specificity** In classification, proportion of actual negatives correctly identified: TN / (TN + FP). Opposite of false positive rate. See Ch. 6. **Spearman Correlation** Rank-based correlation coefficient, non-parametric. Measures monotonic (not necessarily linear) association. Robust to outliers. See Ch. 5. **Statistical Power** Probability of correctly rejecting false null hypothesis (detecting true effect). 1 - Type II error. Goal typically 80-90%. See Ch. 5, 14. **Stationarity** In time series, statistical properties (mean, variance) constant over time. Required for ARIMA models. Tested via ADF test. See Ch. 9. **Support (Association Rules)** In market basket analysis, proportion of transactions containing both items: P(A ∩ B). Minimum support threshold filters rare rules. See Ch. 17. **Survival Analysis** Statistical methods for time-to-event data (e.g., customer lifetime, time to failure). Handles censoring (incomplete observations). See Ch. 30. **Support Vector Machine (SVM)** Classification algorithm finding hyperplane maximising margin between classes. Handles non-linear boundaries via kernel trick. See Ch. 6. --- ## T **t-Test** Hypothesis test comparing means of one or two groups. One-sample (vs. value), two-sample (independent groups), paired. See Ch. 5. **TF-IDF (Term Frequency-Inverse Document Frequency)** Weighting scheme for text analysis. TF measures word frequency; IDF downweights common words. See Ch. 10, 19. **Time Series** Sequence of observations ordered by time. Properties include trend, seasonality, cyclical patterns, noise. See Ch. 9. **Transfer Learning** Using a model trained on one task to solve another task. Common in deep learning (e.g., pre-trained image models). See Ch. 7, 20. **Type I Error** False positive: rejecting true null hypothesis. Probability controlled by significance level α. See Ch. 5. **Type II Error** False negative: failing to reject false null hypothesis. Probability denoted β; (1 - β) is power. See Ch. 5. --- ## V **VIF (Variance Inflation Factor)** Quantifies multicollinearity. VIF = 1 (no collinearity); VIF > 5-10 indicates problematic collinearity. Computed per predictor. See Ch. 18. **Voronoi Diagram** Partitioning space based on distance to seed points. Used in location analytics to define regions. See Ch. 11, 40. --- ## W **Weight of Evidence (WOE)** Log ratio of goods to bads in a category vs. overall. Used in information value calculations and logistic regression. See Ch. 27. --- ## X **XGBoost** Extreme Gradient Boosting: optimised gradient boosting library. Widely used; competitive on Kaggle and in industry. See Ch. 7, 29. --- ## Z **Z-Score** Standardisation: (x - mean) / standard deviation. Converts to standard normal scale; facilitates comparison across variables. See Ch. 5. --- ## Frequently Confused Terms **Precision vs. Recall**: Precision answers "of predicted positives, how many are correct?" Recall answers "of actual positives, how many did we find?" Both matter in imbalanced data. **Correlation vs. Causation**: Correlation measures association; causation requires causal mechanism, temporal ordering, confounding control. See Ch. 14, 52. **Bias vs. Variance**: Bias is error from wrong assumptions (underfitting); variance is error from sensitivity to training data (overfitting). See Ch. 6. **Sensitivity vs. Specificity**: Sensitivity (recall) = ability to find positives; specificity = ability to reject negatives. Trade-off controlled by probability threshold. See Ch. 6. **Supervised vs. Unsupervised**: Supervised learning has labelled target (classification/regression); unsupervised has no target (clustering, dimensionality reduction). See Ch. 6, 28. --- ## References to Chapters For each term, the primary chapter(s) where it is introduced or most heavily used are noted. Example: "See Ch. 6, 23" means term is introduced in Ch. 6 and applied again in Ch. 23. To find terms related to a specific topic, consult the Method Coverage Map in Appendix C or the comprehensive chapter index. --- *This glossary covers concepts across all 56 chapters. For technical proofs and deeper mathematical foundations, refer to the main text and references in Appendix G.*