Descriptive Statistics
Numra’s numra-stats crate provides a comprehensive set of descriptive
statistics functions for summarizing data. All functions are generic over
S: Scalar (f32 or f64) and return Result types for proper error handling.
Central Tendency
Section titled “Central Tendency”The arithmetic mean (average) of values:
use numra::stats::mean;
let data = vec![1.0_f64, 2.0, 3.0, 4.0, 5.0];let m = mean(&data).unwrap();assert!((m - 3.0).abs() < 1e-14);Median
Section titled “Median”The middle value of the sorted data. For even-length data, it is the average of the two central values.
use numra::stats::median;
// Odd number of valueslet data = vec![3.0_f64, 1.0, 2.0];assert!((median(&data).unwrap() - 2.0).abs() < 1e-14);
// Even number of values: average of middle twolet data = vec![1.0_f64, 2.0, 3.0, 4.0];assert!((median(&data).unwrap() - 2.5).abs() < 1e-14);Percentile
Section titled “Percentile”The -th percentile is the value below which of the data falls. Uses linear interpolation between adjacent data points.
use numra::stats::percentile;
let data = vec![1.0_f64, 2.0, 3.0, 4.0, 5.0];
// Boundary valuesassert!((percentile(&data, 0.0).unwrap() - 1.0).abs() < 1e-14); // minimumassert!((percentile(&data, 100.0).unwrap() - 5.0).abs() < 1e-14); // maximum
// 50th percentile = medianassert!((percentile(&data, 50.0).unwrap() - 3.0).abs() < 1e-14);
// Quartileslet q1 = percentile(&data, 25.0).unwrap();let q3 = percentile(&data, 75.0).unwrap();let iqr = q3 - q1; // interquartile rangeprintln!("IQR = {}", iqr);Computing the IQR
Section titled “Computing the IQR”The interquartile range (IQR) measures statistical dispersion — the spread of the middle 50% of the data. It is robust to outliers unlike variance.
use numra::stats::percentile;
let data = vec![2.0_f64, 7.0, 3.0, 12.0, 5.0, 8.0, 4.0, 6.0, 9.0, 1.0];let q1 = percentile(&data, 25.0).unwrap();let q3 = percentile(&data, 75.0).unwrap();let iqr = q3 - q1;
// Outlier detection: values outside [Q1 - 1.5*IQR, Q3 + 1.5*IQR]let lower = q1 - 1.5 * iqr;let upper = q3 + 1.5 * iqr;let outliers: Vec<f64> = data.iter() .filter(|&&x| x < lower || x > upper) .copied().collect();println!("Outliers: {:?}", outliers);Dispersion
Section titled “Dispersion”Variance
Section titled “Variance”Sample variance with Bessel’s correction (divides by ):
use numra::stats::variance;
let data = vec![2.0_f64, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0];let v = variance(&data).unwrap();assert!((v - 4.571428571428571).abs() < 1e-10);Standard Deviation
Section titled “Standard Deviation”The square root of the sample variance:
use numra::stats::{std_dev, variance};
let data = vec![2.0_f64, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0];let s = std_dev(&data).unwrap();let v = variance(&data).unwrap();
// std_dev is the square root of varianceassert!((s * s - v).abs() < 1e-12);Skewness
Section titled “Skewness”Fisher’s adjusted skewness measures the asymmetry of the distribution:
| Skewness | Interpretation |
|---|---|
| Symmetric distribution | |
| Right-skewed (long right tail) | |
| Left-skewed (long left tail) |
use numra::stats::skewness;
// Symmetric data: skewness near 0let symmetric = vec![1.0_f64, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0];assert!(skewness(&symmetric).unwrap().abs() < 1e-10);
// Right-skewed datalet right_skew = vec![1.0_f64, 1.5, 2.0, 2.5, 3.0, 10.0, 20.0];assert!(skewness(&right_skew).unwrap() > 0.0);Kurtosis
Section titled “Kurtosis”Fisher’s excess kurtosis measures the “tailedness” of the distribution relative to a normal distribution (which has excess kurtosis of 0):
| Kurtosis | Interpretation |
|---|---|
| Mesokurtic (normal-like tails) | |
| Leptokurtic (heavy tails, sharp peak) | |
| Platykurtic (light tails, flat peak) |
use numra::stats::kurtosis;
// Uniform-like data has negative excess kurtosislet data: Vec<f64> = (0..100).map(|i| i as f64).collect();let k = kurtosis(&data).unwrap();assert!(k < 0.0); // approximately -1.2 for uniformprintln!("Excess kurtosis: {:.4}", k);Covariance
Section titled “Covariance”Sample Covariance
Section titled “Sample Covariance”Measures the joint variability of two data sets:
use numra::stats::{covariance, variance};
let x = vec![1.0_f64, 2.0, 3.0, 4.0, 5.0];let y = vec![2.0, 4.0, 6.0, 8.0, 10.0]; // y = 2x
let cov = covariance(&x, &y).unwrap();// Cov(x, 2x) = 2 * Var(x)let var_x = variance(&x).unwrap();assert!((cov - 2.0 * var_x).abs() < 1e-12);
// Covariance of a variable with itself = varianceassert!((covariance(&x, &x).unwrap() - var_x).abs() < 1e-12);Covariance Matrix
Section titled “Covariance Matrix”For variables, the covariance matrix is a symmetric matrix where entry is . The diagonal contains the variances.
use numra::stats::{covariance_matrix, variance};
let x = vec![1.0_f64, 2.0, 3.0, 4.0, 5.0];let y = vec![5.0_f64, 4.0, 3.0, 2.0, 1.0]; // inversely correlated with x
let cov = covariance_matrix(&[x.clone(), y.clone()]).unwrap();assert_eq!(cov.len(), 4); // 2x2 matrix, row-major
// Diagonal = variancesassert!((cov[0] - variance(&x).unwrap()).abs() < 1e-12); // Var(x)assert!((cov[3] - variance(&y).unwrap()).abs() < 1e-12); // Var(y)
// Off-diagonal: negative (inversely correlated)assert!(cov[1] < 0.0);
// Symmetricassert!((cov[1] - cov[2]).abs() < 1e-12);Summary Statistics Table
Section titled “Summary Statistics Table”| Function | Formula | Min Samples | Notes |
|---|---|---|---|
mean | 1 | Arithmetic mean | |
median | Middle value | 1 | Sorts internally |
percentile(p) | Linear interpolation | 1 | |
variance | 2 | Bessel’s correction | |
std_dev | 2 | Sample std dev | |
skewness | Adjusted third moment | 3 | Fisher’s definition |
kurtosis | Adjusted fourth moment | 4 | Fisher’s excess kurtosis |
covariance | 2 | Requires equal lengths | |
covariance_matrix | matrix | 2 | Row-major storage |
Error Handling
Section titled “Error Handling”All functions return Result<_, StatsError>. Common error cases:
use numra::stats::{mean, variance, median, percentile, covariance};
// Empty dataassert!(mean::<f64>(&[]).is_err());assert!(variance::<f64>(&[]).is_err());assert!(median::<f64>(&[]).is_err());
// Variance needs at least 2 data pointsassert!(variance(&[1.0_f64]).is_err());
// Percentile must be in [0, 100]let data = vec![1.0_f64, 2.0, 3.0];assert!(percentile(&data, -1.0).is_err());assert!(percentile(&data, 101.0).is_err());
// Covariance requires equal-length inputsassert!(covariance(&[1.0, 2.0], &[1.0, 2.0, 3.0]).is_err());Complete Example: Data Summary
Section titled “Complete Example: Data Summary”use numra::stats::{mean, median, std_dev, variance, skewness, kurtosis, percentile};
fn main() { let data = vec![ 12.5, 14.2, 11.8, 13.1, 15.0, 12.9, 14.7, 11.3, 13.6, 14.1, 12.2, 13.8, 15.5, 11.0, 14.9, 13.3, 12.7, 14.4, 13.0, 12.1, ];
println!("=== Data Summary (n = {}) ===", data.len()); println!("Mean: {:.4}", mean(&data).unwrap()); println!("Median: {:.4}", median(&data).unwrap()); println!("Std Dev: {:.4}", std_dev(&data).unwrap()); println!("Variance: {:.4}", variance(&data).unwrap()); println!("Skewness: {:.4}", skewness(&data).unwrap()); println!("Kurtosis: {:.4}", kurtosis(&data).unwrap());
let q1 = percentile(&data, 25.0).unwrap(); let q3 = percentile(&data, 75.0).unwrap(); println!("Q1: {:.4}", q1); println!("Q3: {:.4}", q3); println!("IQR: {:.4}", q3 - q1); println!("Min: {:.4}", percentile(&data, 0.0).unwrap()); println!("Max: {:.4}", percentile(&data, 100.0).unwrap());}