🎯 Complete Statistics Formula Collection

Descriptive Statistics: This reference guide contains all essential formulas for calculating basic and advanced descriptive statistics including central tendency, variability, distribution characteristics, and specialized measures.

Coverage: From basic calculations like mean and median to advanced measures like skewness, kurtosis, and coefficient of variation - everything you need for statistical analysis.

Applications: Perfect for students, researchers, data analysts, and anyone working with statistical data analysis and interpretation.

📝 Mathematical Notation Guide

= Sample mean
μ = Population mean
s = Sample standard deviation
σ = Population standard deviation
= Sample variance
σ² = Population variance
n = Sample size
N = Population size
Σ = Sum of values
xi = Individual data value
f = Frequency
Q1, Q2, Q3 = Quartiles
📈 Basic Measures

🔢Count

n = Number of data values
The total number of observations or data points in the dataset.
Example: For data [2, 4, 6, 8, 10], count n = 5

Sum

Σx = x₁ + x₂ + x₃ + ... + xₙ
The total of all data values added together.
Example: For data [2, 4, 6, 8, 10], sum = 2+4+6+8+10 = 30

⬇️Minimum

Min = Smallest value in dataset
The lowest value present in the data collection.
Example: For data [2, 4, 6, 8, 10], minimum = 2

⬆️Maximum

Max = Largest value in dataset
The highest value present in the data collection.
Example: For data [2, 4, 6, 8, 10], maximum = 10

↔️Range

Range = Max - Min
The difference between the largest and smallest values in the dataset.
Example: For data [2, 4, 6, 8, 10], range = 10 - 2 = 8

🎯Midrange

Midrange = (Max + Min) / 2
The average of the maximum and minimum values.
Example: For data [2, 4, 6, 8, 10], midrange = (10 + 2) / 2 = 6
🎯 Central Tendency

📊Mean (Average)

x̄ = Σx / n

μ = Σx / N
The arithmetic average of all values. Use x̄ for sample mean, μ for population mean.
Example: For data [2, 4, 6, 8, 10], mean = 30/5 = 6

🎯Median

If n is odd: Median = x₍ₙ₊₁₎/₂

If n is even: Median = (xₙ/₂ + x₍ₙ/₂₊₁₎) / 2
The middle value when data is arranged in ascending order.
Example: For [2, 4, 6, 8, 10], median = 6 (middle value)

🔄Mode

Mode = Most frequently occurring value(s)
The value(s) that appear most often in the dataset. Can be unimodal, bimodal, or multimodal.
Example: For [1, 2, 2, 3, 4], mode = 2 (appears twice)

⚖️Weighted Mean

x̄w = Σ(xi × wi) / Σwi
Mean calculated when different values have different weights or importance.
Example: Values [80, 90] with weights [3, 2]: (80×3 + 90×2)/(3+2) = 84
📊 Measures of Variability

📈Sample Variance

s² = Σ(xi - x̄)² / (n - 1)
Measures the average squared deviation from the sample mean. Uses n-1 for unbiased estimation.
Example: For sample data, divide sum of squared deviations by (n-1)

📊Population Variance

σ² = Σ(xi - μ)² / N
Measures the average squared deviation from the population mean. Uses N for entire population.
Example: For population data, divide sum of squared deviations by N

📏Sample Standard Deviation

s = √[Σ(xi - x̄)² / (n - 1)]
Square root of sample variance. Measures spread in same units as original data.
Example: If sample variance = 16, then standard deviation = √16 = 4

📐Population Standard Deviation

σ = √[Σ(xi - μ)² / N]
Square root of population variance. Measures spread for entire population.
Example: If population variance = 25, then standard deviation = √25 = 5

📊Mean Absolute Deviation

MAD = Σ|xi - x̄| / n
Average of absolute deviations from the mean. Less sensitive to outliers than standard deviation.
Example: Sum of |xi - mean| divided by number of observations

🎯Coefficient of Variation

CV = (s / x̄) × 100%

CV = (σ / μ) × 100%
Relative measure of variability. Expresses standard deviation as percentage of mean.
Example: If mean = 50 and std dev = 10, then CV = (10/50) × 100% = 20%
📍 Position Measures

🔢Percentiles

Position = (P/100) × (n + 1)

Where P = desired percentile
Values below which a certain percentage of data falls. 50th percentile = median.
Example: For 25th percentile with n=20: Position = (25/100) × 21 = 5.25

📊First Quartile (Q1)

Q1 = 25th percentile

Position = 0.25 × (n + 1)
Value below which 25% of the data falls. Lower quartile.
Example: Q1 divides the lower 25% from upper 75% of data

🎯Second Quartile (Q2)

Q2 = 50th percentile = Median

Position = 0.50 × (n + 1)
Middle value that divides data into two equal halves.
Example: Q2 is the same as the median of the dataset

📈Third Quartile (Q3)

Q3 = 75th percentile

Position = 0.75 × (n + 1)
Value below which 75% of the data falls. Upper quartile.
Example: Q3 divides the lower 75% from upper 25% of data

📏Interquartile Range (IQR)

IQR = Q3 - Q1
Range of the middle 50% of data. Measure of spread resistant to outliers.
Example: If Q3 = 80 and Q1 = 60, then IQR = 80 - 60 = 20

⚠️Outliers Detection

Lower fence = Q1 - 1.5 × IQR

Upper fence = Q3 + 1.5 × IQR
Values beyond these fences are considered potential outliers.
Example: Any value < lower fence or > upper fence is an outlier
📐 Distribution Shape

↗️Skewness

Skewness = [n/((n-1)(n-2))] × Σ[(xi-x̄)/s]³
Measures asymmetry of distribution. Positive = right-skewed, Negative = left-skewed, Zero = symmetric.
Interpretation: |Skewness| < 0.5 = approximately symmetric, 0.5-1 = moderately skewed, >1 = highly skewed

⛰️Kurtosis

Kurtosis = [n(n+1)/((n-1)(n-2)(n-3))] × Σ[(xi-x̄)/s]⁴ - 3(n-1)²/((n-2)(n-3))
Measures tail heaviness and peakedness of distribution compared to normal distribution.
Interpretation: Kurtosis = 0 (normal), > 0 (heavy tails), < 0 (light tails)

📊Excess Kurtosis

Excess Kurtosis = Kurtosis - 3
Kurtosis relative to normal distribution (which has kurtosis = 3).
Interpretation: Excess = 0 (normal), > 0 (leptokurtic), < 0 (platykurtic)
🔬 Advanced Measures

🔢Sum of Squares

SS = Σ(xi - x̄)²

Total SS = Σxi² - (Σxi)²/n
Sum of squared deviations from the mean. Used in variance calculations.
Example: Foundation for calculating variance and standard deviation

📐Root Mean Square (RMS)

RMS = √(Σxi² / n)
Square root of the arithmetic mean of squares. Useful for measuring magnitude.
Example: For [3, 4, 5], RMS = √((9+16+25)/3) = √(50/3) ≈ 4.08

⚠️Standard Error of Mean

SE = s / √n

SE = σ / √N
Standard deviation of the sampling distribution of the sample mean.
Example: If s = 10 and n = 25, then SE = 10/√25 = 10/5 = 2

📊Mean Deviation

MD = Σ|xi - x̄| / n
Average absolute deviation from the mean. Alternative to standard deviation.
Example: Less sensitive to extreme values than standard deviation

📏Absolute Deviation

|xi - x̄|

Median Absolute Deviation = Median(|xi - Median|)
Absolute difference between each value and the mean (or median).
Example: Robust measure of variability, less affected by outliers

🎯Z-Score (Standardization)

z = (xi - x̄) / s

z = (xi - μ) / σ
Number of standard deviations a value is from the mean. Standardizes different scales.
Example: z = 2 means the value is 2 standard deviations above the mean

🚀 Quick Reference Guide

When to Use Sample vs Population

Use sample formulas (n-1) when data represents a subset. Use population formulas (N) when data includes entire population.

Choosing Central Tendency

Mean: Normal distribution. Median: Skewed data or outliers. Mode: Categorical data or most common value.

Variability Measures

Standard deviation: Most common. IQR: Resistant to outliers. Range: Simple but sensitive to extremes.

Distribution Shape

Skewness: Asymmetry direction. Kurtosis: Tail heaviness. Both help understand data distribution characteristics.

🔬 Advanced Statistical Concepts

Degrees of Freedom: In sample statistics, we use (n-1) instead of n to account for the constraint that deviations from the sample mean must sum to zero.

Robust Statistics: Median and IQR are robust measures less affected by outliers, while mean and standard deviation are sensitive to extreme values.

Standardization: Z-scores allow comparison of values from different distributions by expressing them in terms of standard deviations from the mean.

Distribution Properties: Normal distributions have skewness ≈ 0 and excess kurtosis ≈ 0. Deviations indicate non-normal characteristics.