70 Appendix F — Mathematical Notation Reference

71 Appendix F — Mathematical Notation Reference

This appendix provides a comprehensive reference of all mathematical symbols and notation used throughout “AI-Powered Business Analytics.” Symbols are organised by category for easy lookup.

71.1 General Mathematical Notation

Symbol	Name	Definition	First Introduced
∑	Summation	Sum of values: ∑_{i=1}^{n} x_i = x_1 + x_2 + … + x_n	Ch. 5
∏	Product	Product of values: ∏_{i=1}^{n} x_i = x_1 × x_2 × … × x_n	Ch. 5
∂	Partial derivative	Rate of change of function with respect to one variable, holding others constant	Ch. 7
∫	Integration	Continuous summation; inverse of differentiation	Ch. 8
∈	Element of	x ∈ A means x is a member of set A	Ch. 5
⊆	Subset of	A ⊆ B means all elements of A are in B	Ch. 5
∪	Union	A ∪ B contains all elements in A or B	Ch. 8
∩	Intersection	A ∩ B contains elements in both A and B	Ch. 8
≈	Approximately equal	x ≈ y means x and y are very close but not exactly equal	Ch. 5
≤, ≥	Less/Greater than or equal	x ≤ y, x ≥ y	Ch. 5
∝	Proportional to	y ∝ x means y = kx for some constant k	Ch. 24
∞	Infinity	Without bound; very large limit	Ch. 5
!	Factorial	n! = n × (n-1) × … × 1	Ch. 5
\| x \|	Absolute value	Distance from 0;	-5
≡	Identically equal	True for all values in domain	Ch. 5

71.2 Probability Notation

Symbol	Name	Definition	First Introduced
P(A)	Probability of A	Likelihood of event A occurring; ranges [0, 1]	Ch. 5
P(A\|B)	Conditional probability	Probability of A given B has occurred: P(A\|B) = P(A∩B) / P(B)	Ch. 8
P(A^c)	Complement	Probability A does NOT occur: P(A^c) = 1 - P(A)	Ch. 5
E[X]	Expected value	Mean or average value of random variable X	Ch. 5
Var[X] or σ²	Variance	Expected squared deviation from mean: E[(X - E[X])²]	Ch. 5
SD[X] or σ	Standard deviation	Square root of variance: SD = √Var	Ch. 5
Cov[X, Y]	Covariance	Expected product of deviations: E[(X - E[X])(Y - E[Y])]	Ch. 5
Cor[X, Y] or ρ	Correlation	Standardised covariance: ρ = Cov[X,Y] / (SD[X] × SD[Y])	Ch. 5
μ	Population mean	True average of entire population	Ch. 5
σ²	Population variance	True variance of entire population	Ch. 5
μ̂ or x̄	Sample mean	Average of sample: ∑x_i / n	Ch. 5
s²	Sample variance	Variance of sample: ∑(x_i - x̄)² / (n - 1)	Ch. 5
∼	Distributed as	X ∼ N(μ, σ²) means X follows normal distribution	Ch. 5
⊥	Independent	X ⊥ Y means X and Y are statistically independent	Ch. 8
iid	Independent, identically distributed	Observations are independent and from same distribution	Ch. 5

71.3 Statistical Distributions

Distribution	Notation	Parameters	Used In
Normal	X ∼ N(μ, σ²)	μ (mean), σ² (variance)	Ch. 5, throughout
Standard Normal	Z ∼ N(0, 1)	Mean 0, variance 1	Ch. 5
t-distribution	t_{df}	df (degrees of freedom)	Ch. 5
Chi-Squared	χ²_{df}	df (degrees of freedom)	Ch. 5
F-distribution	F_{df1,df2}	df1, df2 (numerator and denominator df)	Ch. 5
Binomial	X ∼ Binomial(n, p)	n (trials), p (success probability)	Ch. 5, 14
Poisson	X ∼ Poisson(λ)	λ (rate parameter)	Ch. 5
Exponential	X ∼ Exp(λ)	λ (rate parameter)	Ch. 9, 30
Uniform	X ∼ Uniform(a, b)	a (min), b (max)	Ch. 8, 26
Beta	X ∼ Beta(α, β)	α, β (shape parameters)	Ch. 8
Gamma	X ∼ Gamma(α, β)	α (shape), β (rate)	Ch. 8
Dirichlet	X ∼ Dirichlet(α)	α (concentration parameters)	Ch. 10

71.4 Linear Algebra Notation

Symbol	Name	Definition	First Introduced
x or x	Vector	Column vector of n elements	Ch. 6
X or X	Matrix	Rectangular array of m × n elements	Ch. 6
X^T	Transpose	Flips rows and columns: (X^T){ij} = X{ji}	Ch. 6
X^{-1}	Inverse	Matrix such that X × X^{-1} = I	Ch. 6
\|X\| or det(X)	Determinant	Scalar property of square matrix; non-zero if invertible	Ch. 6
tr(X)	Trace	Sum of diagonal elements: ∑{i} X{ii}	Ch. 6
\|\|x\|\| or \|\|x\|\|_2	Euclidean norm	Length of vector: √(∑ x_i²)	Ch. 18
\|\|x\|\|_1	L1 norm	Sum of absolute values: ∑ \|x_i\|	Ch. 18
x · y or ⟨x, y⟩	Dot product	∑ x_i × y_i; measure of vector alignment	Ch. 6
λ	Eigenvalue	Scalar for which Xv = λv has solution	Ch. 18
v	Eigenvector	Vector for which Xv = λv	Ch. 18
rank(X)	Rank	Dimension of column/row space; max # linearly independent columns	Ch. 6

71.5 Regression Notation

Symbol	Name	Definition	First Introduced
y	Response / Dependent variable	Target variable being predicted	Ch. 6
X	Design matrix	Matrix of predictor variables; rows = observations, columns = features	Ch. 6
β	Coefficient vector	Parameters to estimate: β = (β₀, β₁, …, β_p)	Ch. 6
β̂	Estimated coefficient	Estimate of β from data: β̂ = (X^T X)^{-1} X^T y	Ch. 6
ε or e	Residual / Error	Difference between observed and predicted: e_i = y_i - ŷ_i	Ch. 5
ŷ	Fitted / Predicted value	Model prediction: ŷ = Xβ̂	Ch. 6
σ² or σ_{ε}²	Error variance	Variance of residuals	Ch. 5
R²	Coefficient of determination	Proportion of variance explained: 1 - (SS_{res} / SS_{tot})	Ch. 6
R²_{adj}	Adjusted R²	R² penalised for number of predictors	Ch. 6
SE(β̂)	Standard error	Standard deviation of coefficient estimate	Ch. 5
t-stat	t-statistic	β̂ / SE(β̂); test whether β significantly differs from 0	Ch. 5
p-value	p-value	Probability of observing test statistic under null hypothesis	Ch. 5
AIC	Akaike Information Criterion	-2 ln(L) + 2k; L = likelihood, k = # parameters	Ch. 6
BIC	Bayesian Information Criterion	-2 ln(L) + k ln(n); penalises complexity more than AIC	Ch. 6
VIF	Variance Inflation Factor	1 / (1 - R²_j); measures multicollinearity for predictor j	Ch. 18

71.6 Logistic Regression Notation

Symbol	Name	Definition	First Introduced
p or π	Probability	P(Y = 1); probability of success	Ch. 6
logit(p)	Logit function	ln(p / (1 - p)); log-odds	Ch. 23
odds	Odds	p / (1 - p); ratio of success to failure	Ch. 23
OR	Odds ratio	exp(β); multiplicative change in odds for 1-unit increase	Ch. 23

71.7 Machine Learning Notation

Symbol	Name	Definition	First Introduced
D	Dataset	Collection of observations and labels	Ch. 6
D_{train}	Training set	Data used to fit model	Ch. 6
D_{test}	Test set	Data used to evaluate model	Ch. 6
n	Sample size	Number of observations	Ch. 5
p	Number of features	Dimensionality	Ch. 6
L(y, ŷ)	Loss function	Penalty for incorrect prediction	Ch. 6
ℓ(θ)	Likelihood	Joint probability of data given parameters θ	Ch. 8
∇	Gradient	Vector of partial derivatives	Ch. 7
∇L	Gradient of loss	Direction of steepest increase in loss	Ch. 7
θ	Parameters	Model parameters to estimate	Ch. 6
w or w_i	Weight	Parameter in neural network	Ch. 7
b or b_i	Bias	Intercept term in neural network	Ch. 7
η	Learning rate	Step size in gradient descent	Ch. 7
TP	True positives	Correctly predicted positive cases	Ch. 6
TN	True negatives	Correctly predicted negative cases	Ch. 6
FP	False positives	Incorrectly predicted positive cases	Ch. 6
FN	False negatives	Incorrectly predicted negative cases	Ch. 6
TPR	True positive rate	TP / (TP + FN); Sensitivity, Recall	Ch. 6
FPR	False positive rate	FP / (FP + TN); 1 - Specificity	Ch. 6
AUC	Area under curve	Area under ROC curve; [0.5, 1.0]	Ch. 6

71.8 Time Series Notation

Symbol	Name	Definition	First Introduced
y_t	Time series observation	Value at time t	Ch. 9
y_{t+h}	Forecast	Predicted value h steps ahead	Ch. 9
ŷ_{t+h\|t}	Conditional forecast	Forecast of y_{t+h} given information up to time t	Ch. 31
φ or φ_j	AR coefficient	Autoregressive coefficient in ARIMA(p, d, q)	Ch. 9
θ or θ_j	MA coefficient	Moving average coefficient in ARIMA(p, d, q)	Ch. 9
ε_t or a_t	Shock / Innovation	Random error term at time t	Ch. 9
B or L	Backshift operator	Bx_t = x_{t-1}; facilitates ARIMA notation	Ch. 9
Δ	Differencing operator	∆y_t = y_t - y_{t-1}; removes trend	Ch. 9
S or m	Seasonality period	Number of periods per seasonal cycle (e.g., 12 for monthly data)	Ch. 9
ACF	Autocorrelation function	Correlation between y_t and y_{t-k}	Ch. 9
PACF	Partial autocorrelation function	Correlation between y_t and y_{t-k} controlling for lags between	Ch. 9
MSE	Mean squared error	(1/n) ∑ (y_t - ŷ_t)²	Ch. 6, 31
MAE	Mean absolute error	(1/n) ∑ \|y_t - ŷ_t\|	Ch. 6, 31
RMSE	Root mean squared error	√MSE	Ch. 6, 31
MAPE	Mean absolute percentage error	(1/n) ∑ \|y_t - ŷ_t\| / y_t	Ch. 31

71.9 Survival Analysis Notation

Symbol	Name	Definition	First Introduced
T	Time to event	Duration until event of interest (e.g., churn)	Ch. 30
C	Censoring time	Time until observation ends (if event not observed)	Ch. 30
S(t)	Survival function	P(T > t); probability of surviving past time t	Ch. 30
h(t)	Hazard function	Instantaneous risk of event at time t	Ch. 30
H(t)	Cumulative hazard	∫_0^t h(u) du; cumulative risk	Ch. 30
δ_i	Event indicator	1 if event occurred, 0 if censored	Ch. 30

71.10 Bayesian Notation

Symbol	Name	Definition	First Introduced
p(θ)	Prior	Probability distribution of parameter θ before observing data	Ch. 8
L(D\|θ)	Likelihood	Probability of data D given parameter θ	Ch. 8
p(θ\|D)	Posterior	Probability of θ given data D: ∝ L(D\|θ) × p(θ)	Ch. 8
p(D)	Marginal likelihood / Evidence	∫ L(D\|θ) p(θ) dθ; normalising constant	Ch. 8

71.11 Specific Model Notation

Model	Notation	Definition	First Introduced
ARIMA	ARIMA(p, d, q)	Autoregressive (p), Integrated (d), Moving Average (q)	Ch. 9
SARIMA	SARIMA(p,d,q)(P,D,Q)_m	Seasonal ARIMA with seasonal parameters P, D, Q and period m	Ch. 9
SHAP	φ_i	Shapley value for feature i; contribution to prediction	Ch. 29
XGBoost	f_t(x)	Prediction of tree t; ensemble prediction = ∑ f_t(x)	Ch. 7
SVM	w · φ(x) + b	Support vector machine decision function	Ch. 6
PCA	PC_k	k-th principal component	Ch. 18
k-means	μ_k	Centroid of cluster k	Ch. 28
Clustering	d(x_i, x_j)	Distance between observations i and j	Ch. 28

71.12 Common Abbreviations in Formulas

Abbreviation	Meaning
s.t.	Subject to
i.e.	That is
w.r.t.	With respect to
o.w.	Otherwise
lhs / rhs	Left-hand side / Right-hand side
LHS / RHS	Left-hand side / Right-hand side (capitalized)
a.s.	Almost surely
w.p. 1	With probability 1
∃	There exists
∀	For all
QED	Proof complete

71.13 Subscript and Superscript Conventions

Symbol	Meaning
x_i	i-th element of vector or observation index
x_{ij}	(i,j)-th element of matrix
x^{(k)}	k-th sample or iteration
x^*	Optimal value
x̂	Estimate of x
x̄	Mean of x
x̃	Transformed or adjusted x
x’	Derivative of x with respect to time
ẋ	Derivative of x with respect to time (Newton notation)

71.14 Examples of Common Formula Notations

71.14.1 Linear Regression

y_i = β_0 + β_1 x_i + ε_i where i = 1, …, n; ε_i ∼ N(0, σ²)

71.14.2 Logistic Regression

logit(p_i) = β_0 + β_1 x_i ⇒ p_i = 1 / (1 + exp(-(β_0 + β_1 x_i)))

71.14.3 ARIMA(p, d, q)

Δ^d y_t = φ_1 Δ^d y_{t-1} + … + φ_p Δ^d y_{t-p} + ε_t + θ_1 ε_{t-1} + … + θ_q ε_{t-q}

71.14.4 Survival Analysis (Cox Model)

h(t|X) = h_0(t) exp(β_1 X_1 + … + β_p X_p)

71.14.5 K-Means Objective

min ∑{k=1}^K ∑{i:C(i)=k} ||x_i - μ_k||²

71.15 Symbol Quick Reference by Unicode

Unicode	LaTeX	Symbol	Name
U+03A3		Σ	Summation
U+03A0		Π	Product
U+03BB		λ	Lambda (eigenvalue, rate)
U+03BC		μ	Mu (mean)
U+03C3		σ	Sigma (std. dev)
U+03B8		θ	Theta (parameter)
U+03B2		β	Beta (coefficient)
U+03C1		ρ	Rho (correlation)
U+03B1		α	Alpha (significance level)
U+03B4		δ	Delta (change, indicator)
U+2202		∂	Partial derivative
U+222B		∫	Integral
U+2208		∈	Element of
U+2217	*	*	Convolution, multiplication
U+2260		≠	Not equal

All notation in this appendix is used consistently throughout the book. When in doubt about a symbol, return to this reference or check the specific chapter where the concept is introduced.

--- title: "Appendix F — Mathematical Notation Reference" --- # Appendix F — Mathematical Notation Reference This appendix provides a comprehensive reference of all mathematical symbols and notation used throughout "AI-Powered Business Analytics." Symbols are organised by category for easy lookup. --- ## General Mathematical Notation | Symbol | Name | Definition | First Introduced | |--------|------|-----------|------------------| | ∑ | Summation | Sum of values: ∑_{i=1}^{n} x_i = x_1 + x_2 + ... + x_n | Ch. 5 | | ∏ | Product | Product of values: ∏_{i=1}^{n} x_i = x_1 × x_2 × ... × x_n | Ch. 5 | | ∂ | Partial derivative | Rate of change of function with respect to one variable, holding others constant | Ch. 7 | | ∫ | Integration | Continuous summation; inverse of differentiation | Ch. 8 | | ∈ | Element of | x ∈ A means x is a member of set A | Ch. 5 | | ⊆ | Subset of | A ⊆ B means all elements of A are in B | Ch. 5 | | ∪ | Union | A ∪ B contains all elements in A or B | Ch. 8 | | ∩ | Intersection | A ∩ B contains elements in both A and B | Ch. 8 | | ≈ | Approximately equal | x ≈ y means x and y are very close but not exactly equal | Ch. 5 | | ≤, ≥ | Less/Greater than or equal | x ≤ y, x ≥ y | Ch. 5 | | ∝ | Proportional to | y ∝ x means y = kx for some constant k | Ch. 24 | | ∞ | Infinity | Without bound; very large limit | Ch. 5 | | ! | Factorial | n! = n × (n-1) × ... × 1 | Ch. 5 | | \| x \| | Absolute value | Distance from 0; |-5| = 5 | Ch. 5 | | ≡ | Identically equal | True for all values in domain | Ch. 5 | --- ## Probability Notation | Symbol | Name | Definition | First Introduced | |--------|------|-----------|------------------| | P(A) | Probability of A | Likelihood of event A occurring; ranges [0, 1] | Ch. 5 | | P(A\|B) | Conditional probability | Probability of A given B has occurred: P(A\|B) = P(A∩B) / P(B) | Ch. 8 | | P(A^c) | Complement | Probability A does NOT occur: P(A^c) = 1 - P(A) | Ch. 5 | | E[X] | Expected value | Mean or average value of random variable X | Ch. 5 | | Var[X] or σ² | Variance | Expected squared deviation from mean: E[(X - E[X])²] | Ch. 5 | | SD[X] or σ | Standard deviation | Square root of variance: SD = √Var | Ch. 5 | | Cov[X, Y] | Covariance | Expected product of deviations: E[(X - E[X])(Y - E[Y])] | Ch. 5 | | Cor[X, Y] or ρ | Correlation | Standardised covariance: ρ = Cov[X,Y] / (SD[X] × SD[Y]) | Ch. 5 | | μ | Population mean | True average of entire population | Ch. 5 | | σ² | Population variance | True variance of entire population | Ch. 5 | | μ̂ or x̄ | Sample mean | Average of sample: ∑x_i / n | Ch. 5 | | s² | Sample variance | Variance of sample: ∑(x_i - x̄)² / (n - 1) | Ch. 5 | | ∼ | Distributed as | X ∼ N(μ, σ²) means X follows normal distribution | Ch. 5 | | ⊥ | Independent | X ⊥ Y means X and Y are statistically independent | Ch. 8 | | iid | Independent, identically distributed | Observations are independent and from same distribution | Ch. 5 | --- ## Statistical Distributions | Distribution | Notation | Parameters | Used In | |--------------|----------|-----------|---------| | Normal | X ∼ N(μ, σ²) | μ (mean), σ² (variance) | Ch. 5, throughout | | Standard Normal | Z ∼ N(0, 1) | Mean 0, variance 1 | Ch. 5 | | t-distribution | t_{df} | df (degrees of freedom) | Ch. 5 | | Chi-Squared | χ²_{df} | df (degrees of freedom) | Ch. 5 | | F-distribution | F_{df1,df2} | df1, df2 (numerator and denominator df) | Ch. 5 | | Binomial | X ∼ Binomial(n, p) | n (trials), p (success probability) | Ch. 5, 14 | | Poisson | X ∼ Poisson(λ) | λ (rate parameter) | Ch. 5 | | Exponential | X ∼ Exp(λ) | λ (rate parameter) | Ch. 9, 30 | | Uniform | X ∼ Uniform(a, b) | a (min), b (max) | Ch. 8, 26 | | Beta | X ∼ Beta(α, β) | α, β (shape parameters) | Ch. 8 | | Gamma | X ∼ Gamma(α, β) | α (shape), β (rate) | Ch. 8 | | Dirichlet | X ∼ Dirichlet(α) | α (concentration parameters) | Ch. 10 | --- ## Linear Algebra Notation | Symbol | Name | Definition | First Introduced | |--------|------|-----------|------------------| | **x** or x | Vector | Column vector of n elements | Ch. 6 | | **X** or X | Matrix | Rectangular array of m × n elements | Ch. 6 | | **X**^T | Transpose | Flips rows and columns: (X^T)_{ij} = X_{ji} | Ch. 6 | | **X**^{-1} | Inverse | Matrix such that X × X^{-1} = I | Ch. 6 | | \|X\| or det(X) | Determinant | Scalar property of square matrix; non-zero if invertible | Ch. 6 | | tr(X) | Trace | Sum of diagonal elements: ∑_{i} X_{ii} | Ch. 6 | | \|\|x\|\| or \|\|x\|\|_2 | Euclidean norm | Length of vector: √(∑ x_i²) | Ch. 18 | | \|\|x\|\|_1 | L1 norm | Sum of absolute values: ∑ \|x_i\| | Ch. 18 | | **x** · **y** or ⟨x, y⟩ | Dot product | ∑ x_i × y_i; measure of vector alignment | Ch. 6 | | λ | Eigenvalue | Scalar for which X**v** = λ**v** has solution | Ch. 18 | | **v** | Eigenvector | Vector for which X**v** = λ**v** | Ch. 18 | | rank(X) | Rank | Dimension of column/row space; max # linearly independent columns | Ch. 6 | --- ## Regression Notation | Symbol | Name | Definition | First Introduced | |--------|------|-----------|------------------| | **y** | Response / Dependent variable | Target variable being predicted | Ch. 6 | | **X** | Design matrix | Matrix of predictor variables; rows = observations, columns = features | Ch. 6 | | **β** | Coefficient vector | Parameters to estimate: β = (β₀, β₁, ..., β_p) | Ch. 6 | | β̂ | Estimated coefficient | Estimate of β from data: β̂ = (X^T X)^{-1} X^T **y** | Ch. 6 | | ε or e | Residual / Error | Difference between observed and predicted: e_i = y_i - ŷ_i | Ch. 5 | | ŷ | Fitted / Predicted value | Model prediction: ŷ = X**β̂** | Ch. 6 | | σ² or σ_{ε}² | Error variance | Variance of residuals | Ch. 5 | | R² | Coefficient of determination | Proportion of variance explained: 1 - (SS_{res} / SS_{tot}) | Ch. 6 | | R²_{adj} | Adjusted R² | R² penalised for number of predictors | Ch. 6 | | SE(β̂) | Standard error | Standard deviation of coefficient estimate | Ch. 5 | | t-stat | t-statistic | β̂ / SE(β̂); test whether β significantly differs from 0 | Ch. 5 | | p-value | p-value | Probability of observing test statistic under null hypothesis | Ch. 5 | | AIC | Akaike Information Criterion | -2 ln(L) + 2k; L = likelihood, k = # parameters | Ch. 6 | | BIC | Bayesian Information Criterion | -2 ln(L) + k ln(n); penalises complexity more than AIC | Ch. 6 | | VIF | Variance Inflation Factor | 1 / (1 - R²_j); measures multicollinearity for predictor j | Ch. 18 | --- ## Logistic Regression Notation | Symbol | Name | Definition | First Introduced | |--------|------|-----------|------------------| | p or π | Probability | P(Y = 1); probability of success | Ch. 6 | | logit(p) | Logit function | ln(p / (1 - p)); log-odds | Ch. 23 | | odds | Odds | p / (1 - p); ratio of success to failure | Ch. 23 | | OR | Odds ratio | exp(β); multiplicative change in odds for 1-unit increase | Ch. 23 | --- ## Machine Learning Notation | Symbol | Name | Definition | First Introduced | |--------|------|-----------|------------------| | D | Dataset | Collection of observations and labels | Ch. 6 | | D_{train} | Training set | Data used to fit model | Ch. 6 | | D_{test} | Test set | Data used to evaluate model | Ch. 6 | | n | Sample size | Number of observations | Ch. 5 | | p | Number of features | Dimensionality | Ch. 6 | | L(y, ŷ) | Loss function | Penalty for incorrect prediction | Ch. 6 | | ℓ(θ) | Likelihood | Joint probability of data given parameters θ | Ch. 8 | | ∇ | Gradient | Vector of partial derivatives | Ch. 7 | | ∇L | Gradient of loss | Direction of steepest increase in loss | Ch. 7 | | θ | Parameters | Model parameters to estimate | Ch. 6 | | w or w_i | Weight | Parameter in neural network | Ch. 7 | | b or b_i | Bias | Intercept term in neural network | Ch. 7 | | η | Learning rate | Step size in gradient descent | Ch. 7 | | TP | True positives | Correctly predicted positive cases | Ch. 6 | | TN | True negatives | Correctly predicted negative cases | Ch. 6 | | FP | False positives | Incorrectly predicted positive cases | Ch. 6 | | FN | False negatives | Incorrectly predicted negative cases | Ch. 6 | | TPR | True positive rate | TP / (TP + FN); Sensitivity, Recall | Ch. 6 | | FPR | False positive rate | FP / (FP + TN); 1 - Specificity | Ch. 6 | | AUC | Area under curve | Area under ROC curve; [0.5, 1.0] | Ch. 6 | --- ## Time Series Notation | Symbol | Name | Definition | First Introduced | |--------|------|-----------|------------------| | y_t | Time series observation | Value at time t | Ch. 9 | | y_{t+h} | Forecast | Predicted value h steps ahead | Ch. 9 | | ŷ_{t+h\|t} | Conditional forecast | Forecast of y_{t+h} given information up to time t | Ch. 31 | | φ or φ_j | AR coefficient | Autoregressive coefficient in ARIMA(p, d, q) | Ch. 9 | | θ or θ_j | MA coefficient | Moving average coefficient in ARIMA(p, d, q) | Ch. 9 | | ε_t or a_t | Shock / Innovation | Random error term at time t | Ch. 9 | | B or L | Backshift operator | Bx_t = x_{t-1}; facilitates ARIMA notation | Ch. 9 | | Δ | Differencing operator | ∆y_t = y_t - y_{t-1}; removes trend | Ch. 9 | | S or m | Seasonality period | Number of periods per seasonal cycle (e.g., 12 for monthly data) | Ch. 9 | | ACF | Autocorrelation function | Correlation between y_t and y_{t-k} | Ch. 9 | | PACF | Partial autocorrelation function | Correlation between y_t and y_{t-k} controlling for lags between | Ch. 9 | | MSE | Mean squared error | (1/n) ∑ (y_t - ŷ_t)² | Ch. 6, 31 | | MAE | Mean absolute error | (1/n) ∑ \|y_t - ŷ_t\| | Ch. 6, 31 | | RMSE | Root mean squared error | √MSE | Ch. 6, 31 | | MAPE | Mean absolute percentage error | (1/n) ∑ \|y_t - ŷ_t\| / y_t | Ch. 31 | --- ## Survival Analysis Notation | Symbol | Name | Definition | First Introduced | |--------|------|-----------|------------------| | T | Time to event | Duration until event of interest (e.g., churn) | Ch. 30 | | C | Censoring time | Time until observation ends (if event not observed) | Ch. 30 | | S(t) | Survival function | P(T > t); probability of surviving past time t | Ch. 30 | | h(t) | Hazard function | Instantaneous risk of event at time t | Ch. 30 | | H(t) | Cumulative hazard | ∫_0^t h(u) du; cumulative risk | Ch. 30 | | δ_i | Event indicator | 1 if event occurred, 0 if censored | Ch. 30 | --- ## Bayesian Notation | Symbol | Name | Definition | First Introduced | |--------|------|-----------|------------------| | p(θ) | Prior | Probability distribution of parameter θ before observing data | Ch. 8 | | L(D\|θ) | Likelihood | Probability of data D given parameter θ | Ch. 8 | | p(θ\|D) | Posterior | Probability of θ given data D: ∝ L(D\|θ) × p(θ) | Ch. 8 | | p(D) | Marginal likelihood / Evidence | ∫ L(D\|θ) p(θ) dθ; normalising constant | Ch. 8 | --- ## Specific Model Notation | Model | Notation | Definition | First Introduced | |-------|----------|-----------|------------------| | ARIMA | ARIMA(p, d, q) | Autoregressive (p), Integrated (d), Moving Average (q) | Ch. 9 | | SARIMA | SARIMA(p,d,q)(P,D,Q)_m | Seasonal ARIMA with seasonal parameters P, D, Q and period m | Ch. 9 | | SHAP | φ_i | Shapley value for feature i; contribution to prediction | Ch. 29 | | XGBoost | f_t(x) | Prediction of tree t; ensemble prediction = ∑ f_t(x) | Ch. 7 | | SVM | w · φ(x) + b | Support vector machine decision function | Ch. 6 | | PCA | PC_k | k-th principal component | Ch. 18 | | k-means | μ_k | Centroid of cluster k | Ch. 28 | | Clustering | d(x_i, x_j) | Distance between observations i and j | Ch. 28 | --- ## Common Abbreviations in Formulas | Abbreviation | Meaning | |--------------|---------| | s.t. | Subject to | | i.e. | That is | | w.r.t. | With respect to | | o.w. | Otherwise | | lhs / rhs | Left-hand side / Right-hand side | | LHS / RHS | Left-hand side / Right-hand side (capitalized) | | a.s. | Almost surely | | w.p. 1 | With probability 1 | | ∃ | There exists | | ∀ | For all | | QED | Proof complete | --- ## Subscript and Superscript Conventions | Symbol | Meaning | |--------|---------| | x_i | i-th element of vector or observation index | | x_{ij} | (i,j)-th element of matrix | | x^{(k)} | k-th sample or iteration | | x^* | Optimal value | | x̂ | Estimate of x | | x̄ | Mean of x | | x̃ | Transformed or adjusted x | | x' | Derivative of x with respect to time | | ẋ | Derivative of x with respect to time (Newton notation) | --- ## Examples of Common Formula Notations ### Linear Regression **y_i = β_0 + β_1 x_i + ε_i** where i = 1, ..., n; ε_i ∼ N(0, σ²) ### Logistic Regression **logit(p_i) = β_0 + β_1 x_i ⇒ p_i = 1 / (1 + exp(-(β_0 + β_1 x_i)))** ### ARIMA(p, d, q) **Δ^d y_t = φ_1 Δ^d y_{t-1} + ... + φ_p Δ^d y_{t-p} + ε_t + θ_1 ε_{t-1} + ... + θ_q ε_{t-q}** ### Survival Analysis (Cox Model) **h(t|X) = h_0(t) exp(β_1 X_1 + ... + β_p X_p)** ### K-Means Objective **min ∑_{k=1}^K ∑_{i:C(i)=k} ||x_i - μ_k||²** --- ## Symbol Quick Reference by Unicode | Unicode | LaTeX | Symbol | Name | |---------|-------|--------|------| | U+03A3 | \Sigma | Σ | Summation | | U+03A0 | \Pi | Π | Product | | U+03BB | \lambda | λ | Lambda (eigenvalue, rate) | | U+03BC | \mu | μ | Mu (mean) | | U+03C3 | \sigma | σ | Sigma (std. dev) | | U+03B8 | \theta | θ | Theta (parameter) | | U+03B2 | \beta | β | Beta (coefficient) | | U+03C1 | \rho | ρ | Rho (correlation) | | U+03B1 | \alpha | α | Alpha (significance level) | | U+03B4 | \delta | δ | Delta (change, indicator) | | U+2202 | \partial | ∂ | Partial derivative | | U+222B | \int | ∫ | Integral | | U+2208 | \in | ∈ | Element of | | U+2217 | * | * | Convolution, multiplication | | U+2260 | \neq | ≠ | Not equal | --- *All notation in this appendix is used consistently throughout the book. When in doubt about a symbol, return to this reference or check the specific chapter where the concept is introduced.*