70  Appendix F — Mathematical Notation Reference

71 Appendix F — Mathematical Notation Reference

This appendix provides a comprehensive reference of all mathematical symbols and notation used throughout “AI-Powered Business Analytics.” Symbols are organised by category for easy lookup.


71.1 General Mathematical Notation

Symbol Name Definition First Introduced
Summation Sum of values: ∑_{i=1}^{n} x_i = x_1 + x_2 + … + x_n Ch. 5
Product Product of values: ∏_{i=1}^{n} x_i = x_1 × x_2 × … × x_n Ch. 5
Partial derivative Rate of change of function with respect to one variable, holding others constant Ch. 7
Integration Continuous summation; inverse of differentiation Ch. 8
Element of x ∈ A means x is a member of set A Ch. 5
Subset of A ⊆ B means all elements of A are in B Ch. 5
Union A ∪ B contains all elements in A or B Ch. 8
Intersection A ∩ B contains elements in both A and B Ch. 8
Approximately equal x ≈ y means x and y are very close but not exactly equal Ch. 5
≤, ≥ Less/Greater than or equal x ≤ y, x ≥ y Ch. 5
Proportional to y ∝ x means y = kx for some constant k Ch. 24
Infinity Without bound; very large limit Ch. 5
! Factorial n! = n × (n-1) × … × 1 Ch. 5
| x | Absolute value Distance from 0; -5
Identically equal True for all values in domain Ch. 5

71.2 Probability Notation

Symbol Name Definition First Introduced
P(A) Probability of A Likelihood of event A occurring; ranges [0, 1] Ch. 5
P(A|B) Conditional probability Probability of A given B has occurred: P(A|B) = P(A∩B) / P(B) Ch. 8
P(A^c) Complement Probability A does NOT occur: P(A^c) = 1 - P(A) Ch. 5
E[X] Expected value Mean or average value of random variable X Ch. 5
Var[X] or σ² Variance Expected squared deviation from mean: E[(X - E[X])²] Ch. 5
SD[X] or σ Standard deviation Square root of variance: SD = √Var Ch. 5
Cov[X, Y] Covariance Expected product of deviations: E[(X - E[X])(Y - E[Y])] Ch. 5
Cor[X, Y] or ρ Correlation Standardised covariance: ρ = Cov[X,Y] / (SD[X] × SD[Y]) Ch. 5
μ Population mean True average of entire population Ch. 5
σ² Population variance True variance of entire population Ch. 5
μ̂ or x̄ Sample mean Average of sample: ∑x_i / n Ch. 5
Sample variance Variance of sample: ∑(x_i - x̄)² / (n - 1) Ch. 5
Distributed as X ∼ N(μ, σ²) means X follows normal distribution Ch. 5
Independent X ⊥ Y means X and Y are statistically independent Ch. 8
iid Independent, identically distributed Observations are independent and from same distribution Ch. 5

71.3 Statistical Distributions

Distribution Notation Parameters Used In
Normal X ∼ N(μ, σ²) μ (mean), σ² (variance) Ch. 5, throughout
Standard Normal Z ∼ N(0, 1) Mean 0, variance 1 Ch. 5
t-distribution t_{df} df (degrees of freedom) Ch. 5
Chi-Squared χ²_{df} df (degrees of freedom) Ch. 5
F-distribution F_{df1,df2} df1, df2 (numerator and denominator df) Ch. 5
Binomial X ∼ Binomial(n, p) n (trials), p (success probability) Ch. 5, 14
Poisson X ∼ Poisson(λ) λ (rate parameter) Ch. 5
Exponential X ∼ Exp(λ) λ (rate parameter) Ch. 9, 30
Uniform X ∼ Uniform(a, b) a (min), b (max) Ch. 8, 26
Beta X ∼ Beta(α, β) α, β (shape parameters) Ch. 8
Gamma X ∼ Gamma(α, β) α (shape), β (rate) Ch. 8
Dirichlet X ∼ Dirichlet(α) α (concentration parameters) Ch. 10

71.4 Linear Algebra Notation

Symbol Name Definition First Introduced
x or x Vector Column vector of n elements Ch. 6
X or X Matrix Rectangular array of m × n elements Ch. 6
X^T Transpose Flips rows and columns: (X^T){ij} = X{ji} Ch. 6
X^{-1} Inverse Matrix such that X × X^{-1} = I Ch. 6
|X| or det(X) Determinant Scalar property of square matrix; non-zero if invertible Ch. 6
tr(X) Trace Sum of diagonal elements: ∑{i} X{ii} Ch. 6
||x|| or ||x||_2 Euclidean norm Length of vector: √(∑ x_i²) Ch. 18
||x||_1 L1 norm Sum of absolute values: ∑ |x_i| Ch. 18
x · y or ⟨x, y⟩ Dot product ∑ x_i × y_i; measure of vector alignment Ch. 6
λ Eigenvalue Scalar for which Xv = λv has solution Ch. 18
v Eigenvector Vector for which Xv = λv Ch. 18
rank(X) Rank Dimension of column/row space; max # linearly independent columns Ch. 6

71.5 Regression Notation

Symbol Name Definition First Introduced
y Response / Dependent variable Target variable being predicted Ch. 6
X Design matrix Matrix of predictor variables; rows = observations, columns = features Ch. 6
β Coefficient vector Parameters to estimate: β = (β₀, β₁, …, β_p) Ch. 6
β̂ Estimated coefficient Estimate of β from data: β̂ = (X^T X)^{-1} X^T y Ch. 6
ε or e Residual / Error Difference between observed and predicted: e_i = y_i - ŷ_i Ch. 5
ŷ Fitted / Predicted value Model prediction: ŷ = Xβ̂ Ch. 6
σ² or σ_{ε}² Error variance Variance of residuals Ch. 5
Coefficient of determination Proportion of variance explained: 1 - (SS_{res} / SS_{tot}) Ch. 6
R²_{adj} Adjusted R² R² penalised for number of predictors Ch. 6
SE(β̂) Standard error Standard deviation of coefficient estimate Ch. 5
t-stat t-statistic β̂ / SE(β̂); test whether β significantly differs from 0 Ch. 5
p-value p-value Probability of observing test statistic under null hypothesis Ch. 5
AIC Akaike Information Criterion -2 ln(L) + 2k; L = likelihood, k = # parameters Ch. 6
BIC Bayesian Information Criterion -2 ln(L) + k ln(n); penalises complexity more than AIC Ch. 6
VIF Variance Inflation Factor 1 / (1 - R²_j); measures multicollinearity for predictor j Ch. 18

71.6 Logistic Regression Notation

Symbol Name Definition First Introduced
p or π Probability P(Y = 1); probability of success Ch. 6
logit(p) Logit function ln(p / (1 - p)); log-odds Ch. 23
odds Odds p / (1 - p); ratio of success to failure Ch. 23
OR Odds ratio exp(β); multiplicative change in odds for 1-unit increase Ch. 23

71.7 Machine Learning Notation

Symbol Name Definition First Introduced
D Dataset Collection of observations and labels Ch. 6
D_{train} Training set Data used to fit model Ch. 6
D_{test} Test set Data used to evaluate model Ch. 6
n Sample size Number of observations Ch. 5
p Number of features Dimensionality Ch. 6
L(y, ŷ) Loss function Penalty for incorrect prediction Ch. 6
ℓ(θ) Likelihood Joint probability of data given parameters θ Ch. 8
Gradient Vector of partial derivatives Ch. 7
∇L Gradient of loss Direction of steepest increase in loss Ch. 7
θ Parameters Model parameters to estimate Ch. 6
w or w_i Weight Parameter in neural network Ch. 7
b or b_i Bias Intercept term in neural network Ch. 7
η Learning rate Step size in gradient descent Ch. 7
TP True positives Correctly predicted positive cases Ch. 6
TN True negatives Correctly predicted negative cases Ch. 6
FP False positives Incorrectly predicted positive cases Ch. 6
FN False negatives Incorrectly predicted negative cases Ch. 6
TPR True positive rate TP / (TP + FN); Sensitivity, Recall Ch. 6
FPR False positive rate FP / (FP + TN); 1 - Specificity Ch. 6
AUC Area under curve Area under ROC curve; [0.5, 1.0] Ch. 6

71.8 Time Series Notation

Symbol Name Definition First Introduced
y_t Time series observation Value at time t Ch. 9
y_{t+h} Forecast Predicted value h steps ahead Ch. 9
ŷ_{t+h|t} Conditional forecast Forecast of y_{t+h} given information up to time t Ch. 31
φ or φ_j AR coefficient Autoregressive coefficient in ARIMA(p, d, q) Ch. 9
θ or θ_j MA coefficient Moving average coefficient in ARIMA(p, d, q) Ch. 9
ε_t or a_t Shock / Innovation Random error term at time t Ch. 9
B or L Backshift operator Bx_t = x_{t-1}; facilitates ARIMA notation Ch. 9
Δ Differencing operator ∆y_t = y_t - y_{t-1}; removes trend Ch. 9
S or m Seasonality period Number of periods per seasonal cycle (e.g., 12 for monthly data) Ch. 9
ACF Autocorrelation function Correlation between y_t and y_{t-k} Ch. 9
PACF Partial autocorrelation function Correlation between y_t and y_{t-k} controlling for lags between Ch. 9
MSE Mean squared error (1/n) ∑ (y_t - ŷ_t)² Ch. 6, 31
MAE Mean absolute error (1/n) ∑ |y_t - ŷ_t| Ch. 6, 31
RMSE Root mean squared error √MSE Ch. 6, 31
MAPE Mean absolute percentage error (1/n) ∑ |y_t - ŷ_t| / y_t Ch. 31

71.9 Survival Analysis Notation

Symbol Name Definition First Introduced
T Time to event Duration until event of interest (e.g., churn) Ch. 30
C Censoring time Time until observation ends (if event not observed) Ch. 30
S(t) Survival function P(T > t); probability of surviving past time t Ch. 30
h(t) Hazard function Instantaneous risk of event at time t Ch. 30
H(t) Cumulative hazard ∫_0^t h(u) du; cumulative risk Ch. 30
δ_i Event indicator 1 if event occurred, 0 if censored Ch. 30

71.10 Bayesian Notation

Symbol Name Definition First Introduced
p(θ) Prior Probability distribution of parameter θ before observing data Ch. 8
L(D|θ) Likelihood Probability of data D given parameter θ Ch. 8
p(θ|D) Posterior Probability of θ given data D: ∝ L(D|θ) × p(θ) Ch. 8
p(D) Marginal likelihood / Evidence ∫ L(D|θ) p(θ) dθ; normalising constant Ch. 8

71.11 Specific Model Notation

Model Notation Definition First Introduced
ARIMA ARIMA(p, d, q) Autoregressive (p), Integrated (d), Moving Average (q) Ch. 9
SARIMA SARIMA(p,d,q)(P,D,Q)_m Seasonal ARIMA with seasonal parameters P, D, Q and period m Ch. 9
SHAP φ_i Shapley value for feature i; contribution to prediction Ch. 29
XGBoost f_t(x) Prediction of tree t; ensemble prediction = ∑ f_t(x) Ch. 7
SVM w · φ(x) + b Support vector machine decision function Ch. 6
PCA PC_k k-th principal component Ch. 18
k-means μ_k Centroid of cluster k Ch. 28
Clustering d(x_i, x_j) Distance between observations i and j Ch. 28

71.12 Common Abbreviations in Formulas

Abbreviation Meaning
s.t. Subject to
i.e. That is
w.r.t. With respect to
o.w. Otherwise
lhs / rhs Left-hand side / Right-hand side
LHS / RHS Left-hand side / Right-hand side (capitalized)
a.s. Almost surely
w.p. 1 With probability 1
There exists
For all
QED Proof complete

71.13 Subscript and Superscript Conventions

Symbol Meaning
x_i i-th element of vector or observation index
x_{ij} (i,j)-th element of matrix
x^{(k)} k-th sample or iteration
x^* Optimal value
Estimate of x
Mean of x
Transformed or adjusted x
x’ Derivative of x with respect to time
Derivative of x with respect to time (Newton notation)

71.14 Examples of Common Formula Notations

71.14.1 Linear Regression

y_i = β_0 + β_1 x_i + ε_i where i = 1, …, n; ε_i ∼ N(0, σ²)

71.14.2 Logistic Regression

logit(p_i) = β_0 + β_1 x_i ⇒ p_i = 1 / (1 + exp(-(β_0 + β_1 x_i)))

71.14.3 ARIMA(p, d, q)

Δ^d y_t = φ_1 Δ^d y_{t-1} + … + φ_p Δ^d y_{t-p} + ε_t + θ_1 ε_{t-1} + … + θ_q ε_{t-q}

71.14.4 Survival Analysis (Cox Model)

h(t|X) = h_0(t) exp(β_1 X_1 + … + β_p X_p)

71.14.5 K-Means Objective

min ∑{k=1}^K ∑{i:C(i)=k} ||x_i - μ_k||²


71.15 Symbol Quick Reference by Unicode

Unicode LaTeX Symbol Name
U+03A3 Σ Summation
U+03A0 Π Product
U+03BB λ Lambda (eigenvalue, rate)
U+03BC μ Mu (mean)
U+03C3 σ Sigma (std. dev)
U+03B8 θ Theta (parameter)
U+03B2 β Beta (coefficient)
U+03C1 ρ Rho (correlation)
U+03B1 α Alpha (significance level)
U+03B4 δ Delta (change, indicator)
U+2202 Partial derivative
U+222B Integral
U+2208 Element of
U+2217 * * Convolution, multiplication
U+2260 Not equal

All notation in this appendix is used consistently throughout the book. When in doubt about a symbol, return to this reference or check the specific chapter where the concept is introduced.