Statistics Formulas

Statistics Formulas Reference Guide

🎯 Complete Statistics Formula Collection

Descriptive Statistics: This reference guide contains all essential formulas for calculating basic and advanced descriptive statistics including central tendency, variability, distribution characteristics, and specialized measures.

Coverage: From basic calculations like mean and median to advanced measures like skewness, kurtosis, and coefficient of variation - everything you need for statistical analysis.

Applications: Perfect for students, researchers, data analysts, and anyone working with statistical data analysis and interpretation.

📈 Basic Measures 🎯 Central Tendency 📊 Variability 📍 Position Measures 📐 Distribution Shape 🔬 Advanced Measures

📝 Mathematical Notation Guide

x̄ = Sample mean

μ = Population mean

s = Sample standard deviation

σ = Population standard deviation

s² = Sample variance

σ² = Population variance

n = Sample size

N = Population size

Σ = Sum of values

xi = Individual data value

f = Frequency

Q1, Q2, Q3 = Quartiles

📈 Basic Measures

🔢Count

n = Number of data values

The total number of observations or data points in the dataset.

Example: For data [2, 4, 6, 8, 10], count n = 5

➕Sum

Σx = x₁ + x₂ + x₃ + ... + xₙ

The total of all data values added together.

Example: For data [2, 4, 6, 8, 10], sum = 2+4+6+8+10 = 30

⬇️Minimum

Min = Smallest value in dataset

The lowest value present in the data collection.

Example: For data [2, 4, 6, 8, 10], minimum = 2

⬆️Maximum

Max = Largest value in dataset

The highest value present in the data collection.

Example: For data [2, 4, 6, 8, 10], maximum = 10

↔️Range

Range = Max - Min

The difference between the largest and smallest values in the dataset.

Example: For data [2, 4, 6, 8, 10], range = 10 - 2 = 8

🎯Midrange

Midrange = (Max + Min) / 2

The average of the maximum and minimum values.

Example: For data [2, 4, 6, 8, 10], midrange = (10 + 2) / 2 = 6

🎯 Central Tendency

📊Mean (Average)

x̄ = Σx / n

μ = Σx / N

The arithmetic average of all values. Use x̄ for sample mean, μ for population mean.

Example: For data [2, 4, 6, 8, 10], mean = 30/5 = 6

🎯Median

If n is odd: Median = x₍ₙ₊₁₎/₂

If n is even: Median = (xₙ/₂ + x₍ₙ/₂₊₁₎) / 2

The middle value when data is arranged in ascending order.

Example: For [2, 4, 6, 8, 10], median = 6 (middle value)

🔄Mode

Mode = Most frequently occurring value(s)

The value(s) that appear most often in the dataset. Can be unimodal, bimodal, or multimodal.

Example: For [1, 2, 2, 3, 4], mode = 2 (appears twice)

⚖️Weighted Mean

x̄w = Σ(xi × wi) / Σwi

Mean calculated when different values have different weights or importance.

Example: Values [80, 90] with weights [3, 2]: (80×3 + 90×2)/(3+2) = 84

📊 Measures of Variability

📈Sample Variance

s² = Σ(xi - x̄)² / (n - 1)

Measures the average squared deviation from the sample mean. Uses n-1 for unbiased estimation.

Example: For sample data, divide sum of squared deviations by (n-1)

📊Population Variance

σ² = Σ(xi - μ)² / N

Measures the average squared deviation from the population mean. Uses N for entire population.

Example: For population data, divide sum of squared deviations by N

📏Sample Standard Deviation

s = √[Σ(xi - x̄)² / (n - 1)]

Square root of sample variance. Measures spread in same units as original data.

Example: If sample variance = 16, then standard deviation = √16 = 4

📐Population Standard Deviation

σ = √[Σ(xi - μ)² / N]

Square root of population variance. Measures spread for entire population.

Example: If population variance = 25, then standard deviation = √25 = 5

📊Mean Absolute Deviation

MAD = Σ|xi - x̄| / n

Average of absolute deviations from the mean. Less sensitive to outliers than standard deviation.

Example: Sum of |xi - mean| divided by number of observations

🎯Coefficient of Variation

CV = (s / x̄) × 100%

CV = (σ / μ) × 100%

Relative measure of variability. Expresses standard deviation as percentage of mean.

Example: If mean = 50 and std dev = 10, then CV = (10/50) × 100% = 20%

📍 Position Measures

🔢Percentiles

Position = (P/100) × (n + 1)

Where P = desired percentile

Values below which a certain percentage of data falls. 50th percentile = median.

Example: For 25th percentile with n=20: Position = (25/100) × 21 = 5.25

📊First Quartile (Q1)

Q1 = 25th percentile

Position = 0.25 × (n + 1)

Value below which 25% of the data falls. Lower quartile.

Example: Q1 divides the lower 25% from upper 75% of data

🎯Second Quartile (Q2)

Q2 = 50th percentile = Median

Position = 0.50 × (n + 1)

Middle value that divides data into two equal halves.

Example: Q2 is the same as the median of the dataset

📈Third Quartile (Q3)

Q3 = 75th percentile

Position = 0.75 × (n + 1)

Value below which 75% of the data falls. Upper quartile.

Example: Q3 divides the lower 75% from upper 25% of data

📏Interquartile Range (IQR)

IQR = Q3 - Q1

Range of the middle 50% of data. Measure of spread resistant to outliers.

Example: If Q3 = 80 and Q1 = 60, then IQR = 80 - 60 = 20

⚠️Outliers Detection

Lower fence = Q1 - 1.5 × IQR

Upper fence = Q3 + 1.5 × IQR

Values beyond these fences are considered potential outliers.

Example: Any value < lower fence or > upper fence is an outlier

📐 Distribution Shape

↗️Skewness

Skewness = [n/((n-1)(n-2))] × Σ[(xi-x̄)/s]³

Measures asymmetry of distribution. Positive = right-skewed, Negative = left-skewed, Zero = symmetric.

Interpretation: |Skewness| < 0.5 = approximately symmetric, 0.5-1 = moderately skewed, >1 = highly skewed

⛰️Kurtosis

Kurtosis = [n(n+1)/((n-1)(n-2)(n-3))] × Σ[(xi-x̄)/s]⁴ - 3(n-1)²/((n-2)(n-3))

Measures tail heaviness and peakedness of distribution compared to normal distribution.

Interpretation: Kurtosis = 0 (normal), > 0 (heavy tails), < 0 (light tails)

📊Excess Kurtosis

Excess Kurtosis = Kurtosis - 3

Kurtosis relative to normal distribution (which has kurtosis = 3).

Interpretation: Excess = 0 (normal), > 0 (leptokurtic), < 0 (platykurtic)

🔬 Advanced Measures

🔢Sum of Squares

SS = Σ(xi - x̄)²

Total SS = Σxi² - (Σxi)²/n

Sum of squared deviations from the mean. Used in variance calculations.

Example: Foundation for calculating variance and standard deviation

📐Root Mean Square (RMS)

RMS = √(Σxi² / n)

Square root of the arithmetic mean of squares. Useful for measuring magnitude.

Example: For [3, 4, 5], RMS = √((9+16+25)/3) = √(50/3) ≈ 4.08

⚠️Standard Error of Mean

SE = s / √n

SE = σ / √N

Standard deviation of the sampling distribution of the sample mean.

Example: If s = 10 and n = 25, then SE = 10/√25 = 10/5 = 2

📊Mean Deviation

MD = Σ|xi - x̄| / n

Average absolute deviation from the mean. Alternative to standard deviation.

Example: Less sensitive to extreme values than standard deviation

📏Absolute Deviation

|xi - x̄|

Median Absolute Deviation = Median(|xi - Median|)

Absolute difference between each value and the mean (or median).

Example: Robust measure of variability, less affected by outliers

🎯Z-Score (Standardization)

z = (xi - x̄) / s

z = (xi - μ) / σ

Number of standard deviations a value is from the mean. Standardizes different scales.

Example: z = 2 means the value is 2 standard deviations above the mean

🚀 Quick Reference Guide

When to Use Sample vs Population

Use sample formulas (n-1) when data represents a subset. Use population formulas (N) when data includes entire population.

Choosing Central Tendency

Mean: Normal distribution. Median: Skewed data or outliers. Mode: Categorical data or most common value.

Variability Measures

Standard deviation: Most common. IQR: Resistant to outliers. Range: Simple but sensitive to extremes.

Distribution Shape

Skewness: Asymmetry direction. Kurtosis: Tail heaviness. Both help understand data distribution characteristics.

🔬 Advanced Statistical Concepts

Degrees of Freedom: In sample statistics, we use (n-1) instead of n to account for the constraint that deviations from the sample mean must sum to zero.

Robust Statistics: Median and IQR are robust measures less affected by outliers, while mean and standard deviation are sensitive to extreme values.

Standardization: Z-scores allow comparison of values from different distributions by expressing them in terms of standard deviations from the mean.

Distribution Properties: Normal distributions have skewness ≈ 0 and excess kurtosis ≈ 0. Deviations indicate non-normal characteristics.