Skip to content

Hypothesis Testing

Numra provides a suite of hypothesis tests for common statistical inference tasks: t-tests for means, the chi-squared test for categorical data, the Kolmogorov-Smirnov test for distribution fitting, and one-way ANOVA for comparing group means.

All hypothesis tests return a TestResult<S> struct:

pub struct TestResult<S: Scalar> {
pub statistic: S, // test statistic (t, chi2, D, F, ...)
pub p_value: S, // p-value
pub reject: bool, // whether to reject H0 at the given alpha
}

The decision rule: reject H0H_0 when p_value < alpha.

Tests whether the population mean equals a hypothesized value μ0\mu_0.

Hypotheses:

  • H0:μ=μ0H_0: \mu = \mu_0
  • H1:μμ0H_1: \mu \neq \mu_0 (two-tailed)

Test statistic:

t=xˉμ0s/nt = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}

with n1n - 1 degrees of freedom.

use numra::stats::ttest_1samp;
// Data centered around 0 -- should not reject H0: mean = 0
let data = vec![-1.0_f64, -0.5, 0.0, 0.5, 1.0];
let result = ttest_1samp(&data, 0.0, 0.05).unwrap();
assert!(!result.reject);
assert!(result.statistic.abs() < 1e-12); // t = 0 exactly
// Data clearly not centered at 0
let data = vec![10.0_f64, 11.0, 12.0, 10.5, 11.5];
let result = ttest_1samp(&data, 0.0, 0.05).unwrap();
assert!(result.reject);
println!("t = {:.4}, p = {:.6}", result.statistic, result.p_value);

Compares the means of two independent groups without assuming equal variances.

Hypotheses:

  • H0:μ1=μ2H_0: \mu_1 = \mu_2
  • H1:μ1μ2H_1: \mu_1 \neq \mu_2

Test statistic:

t=xˉ1xˉ2s12/n1+s22/n2t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}}

Degrees of freedom are estimated using the Welch-Satterthwaite approximation:

ν=(s12/n1+s22/n2)2(s12/n1)2n11+(s22/n2)2n21\nu = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}
use numra::stats::ttest_ind;
let group_a = vec![1.0_f64, 2.0, 3.0, 4.0, 5.0];
let group_b = vec![1.5_f64, 2.5, 3.5, 4.5, 5.5];
let result = ttest_ind(&group_a, &group_b, 0.05).unwrap();
// Small difference, small sample: should not reject
assert!(!result.reject);
println!("t = {:.4}, p = {:.4}", result.statistic, result.p_value);
use numra::stats::ttest_ind;
let control = vec![5.0_f64, 5.5, 4.8, 5.2, 5.1, 4.9, 5.3];
let treatment = vec![8.0, 8.5, 7.8, 8.2, 8.1, 7.9, 8.3];
let result = ttest_ind(&control, &treatment, 0.05).unwrap();
assert!(result.reject); // significant difference
println!("t = {:.4}, p = {:.2e}", result.statistic, result.p_value);

Compares two related measurements on the same subjects (before/after, left/right, etc.).

Hypotheses:

  • H0:μd=0H_0: \mu_d = 0 (mean difference is zero)
  • H1:μd0H_1: \mu_d \neq 0

Internally, this computes the differences and applies a one-sample t-test with μ0=0\mu_0 = 0.

use numra::stats::ttest_rel;
let before = vec![200.0_f64, 220.0, 190.0, 210.0, 230.0];
let after = vec![195.0, 215.0, 185.0, 205.0, 225.0];
let result = ttest_rel(&before, &after, 0.05).unwrap();
// Consistent 5-unit decrease: should reject
assert!(result.reject);
println!("Paired t = {:.4}, p = {:.4}", result.statistic, result.p_value);
TestFunctionInputsPairingUse Case
One-samplettest_1sampOne sample + μ0\mu_0N/AIs the mean a specific value?
Independentttest_indTwo samplesNoCompare two group means
Pairedttest_relTwo samplesYesBefore/after on same subjects

Tests whether observed categorical frequencies match expected frequencies.

Hypotheses:

  • H0H_0: Observed data follows the expected distribution
  • H1H_1: Observed data does not follow the expected distribution

Test statistic:

χ2=i=1k(OiEi)2Ei\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}

with k1k - 1 degrees of freedom.

use numra::stats::chi2_test;
// Fair die: observed vs expected (100 total rolls)
let observed = vec![18.0_f64, 16.0, 17.0, 15.0, 17.0, 17.0];
let expected = vec![16.67; 6]; // 100/6 each
let result = chi2_test(&observed, &expected, 0.05).unwrap();
assert!(!result.reject); // good fit -- die appears fair
println!("chi2 = {:.4}, p = {:.4}", result.statistic, result.p_value);
use numra::stats::chi2_test;
// Heavily biased die: face 1 appears too often
let observed = vec![35.0_f64, 10.0, 12.0, 15.0, 14.0, 14.0];
let expected = vec![16.67; 6];
let result = chi2_test(&observed, &expected, 0.05).unwrap();
assert!(result.reject); // significant deviation from uniform

Tests whether a sample comes from a specified continuous distribution by measuring the maximum distance between the empirical CDF and the theoretical CDF.

Hypotheses:

  • H0H_0: Sample is drawn from the specified distribution
  • H1H_1: Sample is not drawn from the specified distribution

Test statistic:

D=supxFn(x)F(x)D = \sup_x |F_n(x) - F(x)|

where FnF_n is the empirical CDF and FF is the theoretical CDF.

use numra::stats::{ks_test, Normal, ContinuousDistribution};
use rand::SeedableRng;
// Generate data from N(0, 1)
let dist = Normal::<f64>::standard();
let mut rng = rand::rngs::StdRng::seed_from_u64(42);
let data = dist.sample_n(&mut rng, 200);
// Test against N(0, 1) -- should not reject
let result = ks_test(&data, &dist, 0.05).unwrap();
assert!(!result.reject);
println!("D = {:.4}, p = {:.4}", result.statistic, result.p_value);
use numra::stats::{ks_test, Normal, Uniform, ContinuousDistribution};
use rand::SeedableRng;
// Generate uniform data but test against normal
let uniform = Uniform::new(0.0_f64, 1.0);
let mut rng = rand::rngs::StdRng::seed_from_u64(42);
let data = uniform.sample_n(&mut rng, 200);
let normal = Normal::new(0.5, 0.3);
let result = ks_test(&data, &normal, 0.05).unwrap();
// Should reject: data is uniform, not normal
println!("D = {:.4}, p = {:.4}, reject = {}", result.statistic, result.p_value, result.reject);

Tests whether the means of three or more groups are all equal.

Hypotheses:

  • H0:μ1=μ2==μkH_0: \mu_1 = \mu_2 = \ldots = \mu_k
  • H1H_1: At least one mean differs

Test statistic:

F=MSbetweenMSwithin=SSbetween/(k1)SSwithin/(Nk)F = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}} = \frac{\text{SS}_{\text{between}} / (k-1)}{\text{SS}_{\text{within}} / (N-k)}
use numra::stats::anova_oneway;
// Three groups with similar means -- should not reject
let g1 = vec![1.0_f64, 2.0, 3.0, 4.0, 5.0];
let g2 = vec![1.5, 2.5, 3.5, 4.5, 5.5];
let g3 = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let result = anova_oneway(&[&g1, &g2, &g3], 0.05).unwrap();
assert!(!result.reject);
use numra::stats::anova_oneway;
let g1 = vec![1.0_f64, 2.0, 3.0];
let g2 = vec![10.0, 11.0, 12.0];
let g3 = vec![20.0, 21.0, 22.0];
let result = anova_oneway(&[&g1, &g2, &g3], 0.05).unwrap();
assert!(result.reject);
println!("F = {:.4}, p = {:.2e}", result.statistic, result.p_value);
SourceSum of SquaresdfMean Square
Between groupsni(xˉixˉ)2\sum n_i (\bar{x}_i - \bar{x})^2k1k - 1SSB/(k1)\text{SS}_B / (k-1)
Within groups(xijxˉi)2\sum\sum (x_{ij} - \bar{x}_i)^2NkN - kSSW/(Nk)\text{SS}_W / (N-k)
Total(xijxˉ)2\sum\sum (x_{ij} - \bar{x})^2N1N - 1
QuestionTestFunction
Is the mean a specific value?One-sample t-testttest_1samp
Do two independent groups differ?Welch’s t-testttest_ind
Do paired measurements differ?Paired t-testttest_rel
Do categorical frequencies match?Chi-squaredchi2_test
Does data follow a distribution?Kolmogorov-Smirnovks_test
Do 3+ group means differ?One-way ANOVAanova_oneway

The p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one computed, assuming H0H_0 is true.

p-valueInterpretation
>0.10> 0.10No evidence against H0H_0
0.050.100.05 \text{--} 0.10Weak evidence against H0H_0
0.010.050.01 \text{--} 0.05Moderate evidence against H0H_0
<0.01< 0.01Strong evidence against H0H_0
<0.001< 0.001Very strong evidence against H0H_0
use numra::stats::{ttest_ind, mean, std_dev};
fn main() {
// Conversion rates (as proportions) from an A/B test
let control = vec![
0.12, 0.15, 0.11, 0.13, 0.14, 0.12, 0.13, 0.11,
0.14, 0.12, 0.13, 0.15, 0.12, 0.14, 0.11, 0.13,
];
let treatment = vec![
0.15, 0.18, 0.14, 0.16, 0.17, 0.15, 0.16, 0.14,
0.17, 0.15, 0.16, 0.18, 0.15, 0.17, 0.14, 0.16,
];
let m_c = mean(&control).unwrap();
let m_t = mean(&treatment).unwrap();
let s_c = std_dev(&control).unwrap();
let s_t = std_dev(&treatment).unwrap();
println!("Control: mean = {:.4}, std = {:.4}", m_c, s_c);
println!("Treatment: mean = {:.4}, std = {:.4}", m_t, s_t);
println!("Lift: {:.1}%", (m_t - m_c) / m_c * 100.0);
let result = ttest_ind(&control, &treatment, 0.05).unwrap();
println!("\nWelch's t-test:");
println!(" t-statistic = {:.4}", result.statistic);
println!(" p-value = {:.6}", result.p_value);
println!(" Reject H0? {}", if result.reject { "Yes" } else { "No" });
}
FunctionSignatureDescription
ttest_1sampfn ttest_1samp<S>(data, mu0, alpha) -> Result<TestResult<S>>One-sample t-test
ttest_indfn ttest_ind<S>(data1, data2, alpha) -> Result<TestResult<S>>Independent two-sample t-test
ttest_relfn ttest_rel<S>(data1, data2, alpha) -> Result<TestResult<S>>Paired t-test
chi2_testfn chi2_test<S>(observed, expected, alpha) -> Result<TestResult<S>>Chi-squared goodness-of-fit
ks_testfn ks_test<S>(data, dist, alpha) -> Result<TestResult<S>>Kolmogorov-Smirnov test
anova_onewayfn anova_oneway<S>(groups, alpha) -> Result<TestResult<S>>One-way ANOVA