---
title: "Appendix F — Mathematical Notation Reference"
---
# Appendix F — Mathematical Notation Reference
This appendix provides a comprehensive reference of all mathematical symbols and notation used throughout "AI-Powered Business Analytics." Symbols are organised by category for easy lookup.
---
## General Mathematical Notation
| Symbol | Name | Definition | First Introduced |
|--------|------|-----------|------------------|
| ∑ | Summation | Sum of values: ∑_{i=1}^{n} x_i = x_1 + x_2 + ... + x_n | Ch. 5 |
| ∏ | Product | Product of values: ∏_{i=1}^{n} x_i = x_1 × x_2 × ... × x_n | Ch. 5 |
| ∂ | Partial derivative | Rate of change of function with respect to one variable, holding others constant | Ch. 7 |
| ∫ | Integration | Continuous summation; inverse of differentiation | Ch. 8 |
| ∈ | Element of | x ∈ A means x is a member of set A | Ch. 5 |
| ⊆ | Subset of | A ⊆ B means all elements of A are in B | Ch. 5 |
| ∪ | Union | A ∪ B contains all elements in A or B | Ch. 8 |
| ∩ | Intersection | A ∩ B contains elements in both A and B | Ch. 8 |
| ≈ | Approximately equal | x ≈ y means x and y are very close but not exactly equal | Ch. 5 |
| ≤, ≥ | Less/Greater than or equal | x ≤ y, x ≥ y | Ch. 5 |
| ∝ | Proportional to | y ∝ x means y = kx for some constant k | Ch. 24 |
| ∞ | Infinity | Without bound; very large limit | Ch. 5 |
| ! | Factorial | n! = n × (n-1) × ... × 1 | Ch. 5 |
| \| x \| | Absolute value | Distance from 0; |-5| = 5 | Ch. 5 |
| ≡ | Identically equal | True for all values in domain | Ch. 5 |
---
## Probability Notation
| Symbol | Name | Definition | First Introduced |
|--------|------|-----------|------------------|
| P(A) | Probability of A | Likelihood of event A occurring; ranges [0, 1] | Ch. 5 |
| P(A\|B) | Conditional probability | Probability of A given B has occurred: P(A\|B) = P(A∩B) / P(B) | Ch. 8 |
| P(A^c) | Complement | Probability A does NOT occur: P(A^c) = 1 - P(A) | Ch. 5 |
| E[X] | Expected value | Mean or average value of random variable X | Ch. 5 |
| Var[X] or σ² | Variance | Expected squared deviation from mean: E[(X - E[X])²] | Ch. 5 |
| SD[X] or σ | Standard deviation | Square root of variance: SD = √Var | Ch. 5 |
| Cov[X, Y] | Covariance | Expected product of deviations: E[(X - E[X])(Y - E[Y])] | Ch. 5 |
| Cor[X, Y] or ρ | Correlation | Standardised covariance: ρ = Cov[X,Y] / (SD[X] × SD[Y]) | Ch. 5 |
| μ | Population mean | True average of entire population | Ch. 5 |
| σ² | Population variance | True variance of entire population | Ch. 5 |
| μ̂ or x̄ | Sample mean | Average of sample: ∑x_i / n | Ch. 5 |
| s² | Sample variance | Variance of sample: ∑(x_i - x̄)² / (n - 1) | Ch. 5 |
| ∼ | Distributed as | X ∼ N(μ, σ²) means X follows normal distribution | Ch. 5 |
| ⊥ | Independent | X ⊥ Y means X and Y are statistically independent | Ch. 8 |
| iid | Independent, identically distributed | Observations are independent and from same distribution | Ch. 5 |
---
## Statistical Distributions
| Distribution | Notation | Parameters | Used In |
|--------------|----------|-----------|---------|
| Normal | X ∼ N(μ, σ²) | μ (mean), σ² (variance) | Ch. 5, throughout |
| Standard Normal | Z ∼ N(0, 1) | Mean 0, variance 1 | Ch. 5 |
| t-distribution | t_{df} | df (degrees of freedom) | Ch. 5 |
| Chi-Squared | χ²_{df} | df (degrees of freedom) | Ch. 5 |
| F-distribution | F_{df1,df2} | df1, df2 (numerator and denominator df) | Ch. 5 |
| Binomial | X ∼ Binomial(n, p) | n (trials), p (success probability) | Ch. 5, 14 |
| Poisson | X ∼ Poisson(λ) | λ (rate parameter) | Ch. 5 |
| Exponential | X ∼ Exp(λ) | λ (rate parameter) | Ch. 9, 30 |
| Uniform | X ∼ Uniform(a, b) | a (min), b (max) | Ch. 8, 26 |
| Beta | X ∼ Beta(α, β) | α, β (shape parameters) | Ch. 8 |
| Gamma | X ∼ Gamma(α, β) | α (shape), β (rate) | Ch. 8 |
| Dirichlet | X ∼ Dirichlet(α) | α (concentration parameters) | Ch. 10 |
---
## Linear Algebra Notation
| Symbol | Name | Definition | First Introduced |
|--------|------|-----------|------------------|
| **x** or x | Vector | Column vector of n elements | Ch. 6 |
| **X** or X | Matrix | Rectangular array of m × n elements | Ch. 6 |
| **X**^T | Transpose | Flips rows and columns: (X^T)_{ij} = X_{ji} | Ch. 6 |
| **X**^{-1} | Inverse | Matrix such that X × X^{-1} = I | Ch. 6 |
| \|X\| or det(X) | Determinant | Scalar property of square matrix; non-zero if invertible | Ch. 6 |
| tr(X) | Trace | Sum of diagonal elements: ∑_{i} X_{ii} | Ch. 6 |
| \|\|x\|\| or \|\|x\|\|_2 | Euclidean norm | Length of vector: √(∑ x_i²) | Ch. 18 |
| \|\|x\|\|_1 | L1 norm | Sum of absolute values: ∑ \|x_i\| | Ch. 18 |
| **x** · **y** or ⟨x, y⟩ | Dot product | ∑ x_i × y_i; measure of vector alignment | Ch. 6 |
| λ | Eigenvalue | Scalar for which X**v** = λ**v** has solution | Ch. 18 |
| **v** | Eigenvector | Vector for which X**v** = λ**v** | Ch. 18 |
| rank(X) | Rank | Dimension of column/row space; max # linearly independent columns | Ch. 6 |
---
## Regression Notation
| Symbol | Name | Definition | First Introduced |
|--------|------|-----------|------------------|
| **y** | Response / Dependent variable | Target variable being predicted | Ch. 6 |
| **X** | Design matrix | Matrix of predictor variables; rows = observations, columns = features | Ch. 6 |
| **β** | Coefficient vector | Parameters to estimate: β = (β₀, β₁, ..., β_p) | Ch. 6 |
| β̂ | Estimated coefficient | Estimate of β from data: β̂ = (X^T X)^{-1} X^T **y** | Ch. 6 |
| ε or e | Residual / Error | Difference between observed and predicted: e_i = y_i - ŷ_i | Ch. 5 |
| ŷ | Fitted / Predicted value | Model prediction: ŷ = X**β̂** | Ch. 6 |
| σ² or σ_{ε}² | Error variance | Variance of residuals | Ch. 5 |
| R² | Coefficient of determination | Proportion of variance explained: 1 - (SS_{res} / SS_{tot}) | Ch. 6 |
| R²_{adj} | Adjusted R² | R² penalised for number of predictors | Ch. 6 |
| SE(β̂) | Standard error | Standard deviation of coefficient estimate | Ch. 5 |
| t-stat | t-statistic | β̂ / SE(β̂); test whether β significantly differs from 0 | Ch. 5 |
| p-value | p-value | Probability of observing test statistic under null hypothesis | Ch. 5 |
| AIC | Akaike Information Criterion | -2 ln(L) + 2k; L = likelihood, k = # parameters | Ch. 6 |
| BIC | Bayesian Information Criterion | -2 ln(L) + k ln(n); penalises complexity more than AIC | Ch. 6 |
| VIF | Variance Inflation Factor | 1 / (1 - R²_j); measures multicollinearity for predictor j | Ch. 18 |
---
## Logistic Regression Notation
| Symbol | Name | Definition | First Introduced |
|--------|------|-----------|------------------|
| p or π | Probability | P(Y = 1); probability of success | Ch. 6 |
| logit(p) | Logit function | ln(p / (1 - p)); log-odds | Ch. 23 |
| odds | Odds | p / (1 - p); ratio of success to failure | Ch. 23 |
| OR | Odds ratio | exp(β); multiplicative change in odds for 1-unit increase | Ch. 23 |
---
## Machine Learning Notation
| Symbol | Name | Definition | First Introduced |
|--------|------|-----------|------------------|
| D | Dataset | Collection of observations and labels | Ch. 6 |
| D_{train} | Training set | Data used to fit model | Ch. 6 |
| D_{test} | Test set | Data used to evaluate model | Ch. 6 |
| n | Sample size | Number of observations | Ch. 5 |
| p | Number of features | Dimensionality | Ch. 6 |
| L(y, ŷ) | Loss function | Penalty for incorrect prediction | Ch. 6 |
| ℓ(θ) | Likelihood | Joint probability of data given parameters θ | Ch. 8 |
| ∇ | Gradient | Vector of partial derivatives | Ch. 7 |
| ∇L | Gradient of loss | Direction of steepest increase in loss | Ch. 7 |
| θ | Parameters | Model parameters to estimate | Ch. 6 |
| w or w_i | Weight | Parameter in neural network | Ch. 7 |
| b or b_i | Bias | Intercept term in neural network | Ch. 7 |
| η | Learning rate | Step size in gradient descent | Ch. 7 |
| TP | True positives | Correctly predicted positive cases | Ch. 6 |
| TN | True negatives | Correctly predicted negative cases | Ch. 6 |
| FP | False positives | Incorrectly predicted positive cases | Ch. 6 |
| FN | False negatives | Incorrectly predicted negative cases | Ch. 6 |
| TPR | True positive rate | TP / (TP + FN); Sensitivity, Recall | Ch. 6 |
| FPR | False positive rate | FP / (FP + TN); 1 - Specificity | Ch. 6 |
| AUC | Area under curve | Area under ROC curve; [0.5, 1.0] | Ch. 6 |
---
## Time Series Notation
| Symbol | Name | Definition | First Introduced |
|--------|------|-----------|------------------|
| y_t | Time series observation | Value at time t | Ch. 9 |
| y_{t+h} | Forecast | Predicted value h steps ahead | Ch. 9 |
| ŷ_{t+h\|t} | Conditional forecast | Forecast of y_{t+h} given information up to time t | Ch. 31 |
| φ or φ_j | AR coefficient | Autoregressive coefficient in ARIMA(p, d, q) | Ch. 9 |
| θ or θ_j | MA coefficient | Moving average coefficient in ARIMA(p, d, q) | Ch. 9 |
| ε_t or a_t | Shock / Innovation | Random error term at time t | Ch. 9 |
| B or L | Backshift operator | Bx_t = x_{t-1}; facilitates ARIMA notation | Ch. 9 |
| Δ | Differencing operator | ∆y_t = y_t - y_{t-1}; removes trend | Ch. 9 |
| S or m | Seasonality period | Number of periods per seasonal cycle (e.g., 12 for monthly data) | Ch. 9 |
| ACF | Autocorrelation function | Correlation between y_t and y_{t-k} | Ch. 9 |
| PACF | Partial autocorrelation function | Correlation between y_t and y_{t-k} controlling for lags between | Ch. 9 |
| MSE | Mean squared error | (1/n) ∑ (y_t - ŷ_t)² | Ch. 6, 31 |
| MAE | Mean absolute error | (1/n) ∑ \|y_t - ŷ_t\| | Ch. 6, 31 |
| RMSE | Root mean squared error | √MSE | Ch. 6, 31 |
| MAPE | Mean absolute percentage error | (1/n) ∑ \|y_t - ŷ_t\| / y_t | Ch. 31 |
---
## Survival Analysis Notation
| Symbol | Name | Definition | First Introduced |
|--------|------|-----------|------------------|
| T | Time to event | Duration until event of interest (e.g., churn) | Ch. 30 |
| C | Censoring time | Time until observation ends (if event not observed) | Ch. 30 |
| S(t) | Survival function | P(T > t); probability of surviving past time t | Ch. 30 |
| h(t) | Hazard function | Instantaneous risk of event at time t | Ch. 30 |
| H(t) | Cumulative hazard | ∫_0^t h(u) du; cumulative risk | Ch. 30 |
| δ_i | Event indicator | 1 if event occurred, 0 if censored | Ch. 30 |
---
## Bayesian Notation
| Symbol | Name | Definition | First Introduced |
|--------|------|-----------|------------------|
| p(θ) | Prior | Probability distribution of parameter θ before observing data | Ch. 8 |
| L(D\|θ) | Likelihood | Probability of data D given parameter θ | Ch. 8 |
| p(θ\|D) | Posterior | Probability of θ given data D: ∝ L(D\|θ) × p(θ) | Ch. 8 |
| p(D) | Marginal likelihood / Evidence | ∫ L(D\|θ) p(θ) dθ; normalising constant | Ch. 8 |
---
## Specific Model Notation
| Model | Notation | Definition | First Introduced |
|-------|----------|-----------|------------------|
| ARIMA | ARIMA(p, d, q) | Autoregressive (p), Integrated (d), Moving Average (q) | Ch. 9 |
| SARIMA | SARIMA(p,d,q)(P,D,Q)_m | Seasonal ARIMA with seasonal parameters P, D, Q and period m | Ch. 9 |
| SHAP | φ_i | Shapley value for feature i; contribution to prediction | Ch. 29 |
| XGBoost | f_t(x) | Prediction of tree t; ensemble prediction = ∑ f_t(x) | Ch. 7 |
| SVM | w · φ(x) + b | Support vector machine decision function | Ch. 6 |
| PCA | PC_k | k-th principal component | Ch. 18 |
| k-means | μ_k | Centroid of cluster k | Ch. 28 |
| Clustering | d(x_i, x_j) | Distance between observations i and j | Ch. 28 |
---
## Common Abbreviations in Formulas
| Abbreviation | Meaning |
|--------------|---------|
| s.t. | Subject to |
| i.e. | That is |
| w.r.t. | With respect to |
| o.w. | Otherwise |
| lhs / rhs | Left-hand side / Right-hand side |
| LHS / RHS | Left-hand side / Right-hand side (capitalized) |
| a.s. | Almost surely |
| w.p. 1 | With probability 1 |
| ∃ | There exists |
| ∀ | For all |
| QED | Proof complete |
---
## Subscript and Superscript Conventions
| Symbol | Meaning |
|--------|---------|
| x_i | i-th element of vector or observation index |
| x_{ij} | (i,j)-th element of matrix |
| x^{(k)} | k-th sample or iteration |
| x^* | Optimal value |
| x̂ | Estimate of x |
| x̄ | Mean of x |
| x̃ | Transformed or adjusted x |
| x' | Derivative of x with respect to time |
| ẋ | Derivative of x with respect to time (Newton notation) |
---
## Examples of Common Formula Notations
### Linear Regression
**y_i = β_0 + β_1 x_i + ε_i**
where i = 1, ..., n; ε_i ∼ N(0, σ²)
### Logistic Regression
**logit(p_i) = β_0 + β_1 x_i ⇒ p_i = 1 / (1 + exp(-(β_0 + β_1 x_i)))**
### ARIMA(p, d, q)
**Δ^d y_t = φ_1 Δ^d y_{t-1} + ... + φ_p Δ^d y_{t-p} + ε_t + θ_1 ε_{t-1} + ... + θ_q ε_{t-q}**
### Survival Analysis (Cox Model)
**h(t|X) = h_0(t) exp(β_1 X_1 + ... + β_p X_p)**
### K-Means Objective
**min ∑_{k=1}^K ∑_{i:C(i)=k} ||x_i - μ_k||²**
---
## Symbol Quick Reference by Unicode
| Unicode | LaTeX | Symbol | Name |
|---------|-------|--------|------|
| U+03A3 | \Sigma | Σ | Summation |
| U+03A0 | \Pi | Π | Product |
| U+03BB | \lambda | λ | Lambda (eigenvalue, rate) |
| U+03BC | \mu | μ | Mu (mean) |
| U+03C3 | \sigma | σ | Sigma (std. dev) |
| U+03B8 | \theta | θ | Theta (parameter) |
| U+03B2 | \beta | β | Beta (coefficient) |
| U+03C1 | \rho | ρ | Rho (correlation) |
| U+03B1 | \alpha | α | Alpha (significance level) |
| U+03B4 | \delta | δ | Delta (change, indicator) |
| U+2202 | \partial | ∂ | Partial derivative |
| U+222B | \int | ∫ | Integral |
| U+2208 | \in | ∈ | Element of |
| U+2217 | * | * | Convolution, multiplication |
| U+2260 | \neq | ≠ | Not equal |
---
*All notation in this appendix is used consistently throughout the book. When in doubt about a symbol, return to this reference or check the specific chapter where the concept is introduced.*