Skip to content

Performance Tuning

This chapter provides practical guidance for getting the most out of Numra’s solvers. The right configuration can mean orders-of-magnitude speedups.

The most impactful performance knob is the tolerance pair (rtol, atol).

Rule of thumb: set rtol to the number of correct digits you need, and atol to a value below the smallest meaningful solution magnitude.

use numra::ode::SolverOptions;
// "Engineering" accuracy: 6 digits
let opts = SolverOptions::default().rtol(1e-6).atol(1e-9);
// "Scientific" accuracy: 10 digits
let opts = SolverOptions::default().rtol(1e-10).atol(1e-12);
// When solution passes through zero, atol dominates:
// If y ~ 1e-8 and atol = 1e-9, you still get 1 correct digit
let opts = SolverOptions::default().rtol(1e-6).atol(1e-14);
ToleranceTypical useRelative cost
rtol = 1e-3Quick exploration1x
rtol = 1e-6Engineering2-5x
rtol = 1e-8Publication quality5-15x
rtol = 1e-10Reference solutions15-50x
rtol = 1e-12Verification50-200x

Tighter tolerance means smaller steps means more RHS calls. The cost scales roughly as rtol to the power of -1/p where p is the method order.

Choosing the right solver is the second most impactful decision.

  1. Is the problem stiff?

    • No: Use an explicit method (DoPri5, Tsit5, Vern6/7/8)
    • Yes: Use an implicit method (Radau5, ESDIRK, BDF)
    • Not sure: Use Auto
  2. What accuracy do you need?

    • Moderate (rtol ~ 1e-6): DoPri5 or ESDIRK54
    • High (rtol ~ 1e-10): Vern6 or Vern7
    • Very high (rtol ~ 1e-12): Vern8
  3. Is it a long integration?

    • Energy conservation matters: Tighten tolerances
    • Many RHS calls: Consider FSAL methods (Tsit5)
SolverOrderStagesRHS calls/stepBest for
Tsit55(4)7 (FSAL: 6)6General non-stiff
DoPri55(4)76-7General non-stiff
Vern66(5)98-9Medium-high accuracy
Vern77(6)1010High accuracy
Vern88(7)1313Very high accuracy
ESDIRK322(1)33 + LUMild stiffness
ESDIRK433(2)44 + LUModerate stiffness
ESDIRK544(3)66 + LUStiff, high accuracy
Radau553 (implicit)3*n + LUSevere stiffness, DAEs
BDF1-5multistep1 + LUSevere stiffness

The right-hand-side function is called thousands to millions of times. Make it fast:

// BAD: allocates every call
let problem = OdeProblem::new(
|t, y: &[f64], dydt: &mut [f64]| {
let temp = vec![0.0; y.len()]; // allocation!
// ...
},
0.0, 10.0, vec![1.0],
);
// GOOD: pre-allocate outside, or compute in-place
let problem = OdeProblem::new(
|t, y: &[f64], dydt: &mut [f64]| {
dydt[0] = -y[0]; // no allocation
},
0.0, 10.0, vec![1.0],
);
// BAD: recomputes every call
let problem = OdeProblem::new(
|t, y: &[f64], dydt: &mut [f64]| {
let omega = (2.0 * std::f64::consts::PI * 60.0); // recomputed!
dydt[0] = omega.cos() * y[0];
},
0.0, 10.0, vec![1.0],
);
// GOOD: capture pre-computed value
let omega = 2.0 * std::f64::consts::PI * 60.0;
let problem = OdeProblem::new(
move |t, y: &[f64], dydt: &mut [f64]| {
dydt[0] = omega.cos() * y[0];
},
0.0, 10.0, vec![1.0],
);

For implicit methods, the Jacobian J = df/dy is needed. The default finite-difference approximation requires n extra RHS calls per Jacobian (where n is the system dimension). Providing an analytical Jacobian avoids this cost:

use numra::ode::{OdeSystem, Radau5, Solver, SolverOptions};
struct Robertson;
impl OdeSystem<f64> for Robertson {
fn dim(&self) -> usize { 3 }
fn rhs(&self, _t: f64, y: &[f64], dydt: &mut [f64]) {
dydt[0] = -0.04 * y[0] + 1e4 * y[1] * y[2];
dydt[1] = 0.04 * y[0] - 1e4 * y[1] * y[2] - 3e7 * y[1] * y[1];
dydt[2] = 3e7 * y[1] * y[1];
}
fn jacobian(&self, _t: f64, y: &[f64], jac: &mut [f64]) {
// Row-major 3x3 Jacobian
jac[0] = -0.04; jac[1] = 1e4 * y[2]; jac[2] = 1e4 * y[1];
jac[3] = 0.04; jac[4] = -1e4*y[2] - 6e7*y[1]; jac[5] = -1e4 * y[1];
jac[6] = 0.0; jac[7] = 6e7 * y[1]; jac[8] = 0.0;
}
}
let result = DoPri5::solve(&problem, 0.0, 100.0, &[1.0], &opts).unwrap();
let s = &result.stats;
println!("Accepted steps: {}", s.n_accept);
println!("Rejected steps: {}", s.n_reject);
println!("Jacobian computations: {}", s.n_jac);
println!("LU decompositions: {}", s.n_lu);
println!("Reject ratio: {:.1}%", 100.0 * s.n_reject as f64 / s.n_accept as f64);
SymptomLikely causeFix
High reject ratio (> 30%)Tolerance too tight for method orderUse higher-order method or loosen tolerance
Millions of stepsWrong solver class for problemSwitch explicit to implicit
Many LU decompositionsRapidly changing JacobianTighten Newton tolerance or use ESDIRK
max_steps exceededExtremely stiff or singular problemIncrease max_steps, check problem formulation
Slow individual stepsExpensive RHS functionProfile and optimize RHS, provide Jacobian

If an explicit solver takes far more steps than expected, the problem may be stiff. Try Auto which detects stiffness automatically:

use numra::ode::{Auto, Solver};
let result = Auto::solve(&problem, 0.0, 100.0, &y0, &opts).unwrap();
// Auto starts with Tsit5 and switches to Radau5 if stiffness is detected

The solver estimates an initial step size automatically. Override for problems where the auto-estimate is poor:

// Force a small initial step for problems with fast initial transients
let opts = SolverOptions::default().h0(1e-6);

Limit the step size to capture features the error estimator might miss:

// Ensure at least 100 steps over the interval
let opts = SolverOptions::default().h_max(0.1); // for [0, 10]

Enabling dense output stores interpolation coefficients at every step, which can use significant memory for long integrations:

// If you only need the final state, don't enable dense output:
let opts = SolverOptions::default(); // dense_output = false by default
// If you need output at specific times, use the t_eval option instead of dense:
let times = vec![1.0, 2.0, 5.0, 10.0];
let opts = SolverOptions::default().t_eval(times);

The SolverResult stores the full trajectory. For very large systems (n > 10000) over many time steps, this can be gigabytes. Use the t_eval option to limit saved output, or extract components:

// Only save output at 100 points instead of every adaptive step
let times: Vec<f64> = (0..100).map(|i| i as f64 * 0.1).collect();
let opts = SolverOptions::default().t_eval(times);
GoalConfiguration
Fastest solveLoose tolerance + low-order method
Most accurateTight tolerance + Vern8
Stiff problemsRadau5 or BDF
Minimize memoryUse t_eval option, avoid .dense()
Debug convergenceCheck result.stats fields
Long integrationsSet max_steps and h_max