Skip to content

Unconstrained Optimization

This chapter covers algorithms for solving

minxRnf(x)\min_{x \in \mathbb{R}^n} f(x)

without any constraints on the decision variables. Numra provides four unconstrained minimizers spanning the spectrum from gradient-based quasi-Newton methods to derivative-free direct search.

The Broyden—Fletcher—Goldfarb—Shanno (BFGS) algorithm maintains a dense approximation Hk(2f)1H_k \approx (\nabla^2 f)^{-1} to the inverse Hessian and updates it each iteration with a rank-2 correction:

Hk+1=(Iρkskyk ⁣)Hk(Iρkyksk ⁣)+ρksksk ⁣,ρk=1yk ⁣sk,H_{k+1} = \Bigl(I - \rho_k s_k y_k^{\!\top}\Bigr) H_k \Bigl(I - \rho_k y_k s_k^{\!\top}\Bigr) + \rho_k s_k s_k^{\!\top}, \quad \rho_k = \frac{1}{y_k^{\!\top} s_k},

where sk=xk+1xks_k = x_{k+1} - x_k and yk=fk+1fky_k = \nabla f_{k+1} - \nabla f_k. A Wolfe line search determines the step length αk\alpha_k at each iteration.

use numra::optim::{bfgs_minimize, OptimOptions};
// Rosenbrock function: f(x) = (1 - x0)^2 + 100*(x1 - x0^2)^2
let f = |x: &[f64]| {
(1.0 - x[0]).powi(2) + 100.0 * (x[1] - x[0] * x[0]).powi(2)
};
let grad = |x: &[f64], g: &mut [f64]| {
g[0] = -2.0 * (1.0 - x[0]) - 400.0 * x[0] * (x[1] - x[0] * x[0]);
g[1] = 200.0 * (x[1] - x[0] * x[0]);
};
let opts = OptimOptions::default().gtol(1e-8).max_iter(2000);
let result = bfgs_minimize(f, grad, &[-1.0, 1.0], &opts).unwrap();
assert!(result.converged);
assert!((result.x[0] - 1.0).abs() < 1e-5);
assert!((result.x[1] - 1.0).abs() < 1e-5);
println!("Iterations: {}, f_evals: {}", result.iterations, result.n_feval);

The Bfgs struct wraps the same algorithm with an object-oriented interface:

use numra::optim::{Bfgs, OptimOptions};
let bfgs = Bfgs::new(OptimOptions::default().gtol(1e-10));
let result = bfgs.minimize(
|x: &[f64]| x[0] * x[0] + x[1] * x[1],
|x: &[f64], g: &mut [f64]| { g[0] = 2.0 * x[0]; g[1] = 2.0 * x[1]; },
&[3.0, 4.0],
).unwrap();
assert!(result.converged);

BFGS converges superlinearly on smooth strongly convex functions. On the nn-dimensional Rosenbrock it typically converges in O(50200)O(50\text{--}200) iterations regardless of nn, because the inverse Hessian approximation captures the curvature of the narrow valley.

Storage cost: O(n2)O(n^2) for the dense inverse Hessian. For n>1000n > 1000, use L-BFGS instead.

L-BFGS replaces the dense n×nn \times n inverse Hessian with a compact representation using the most recent mm correction pairs {(si,yi)}\{(s_i, y_i)\}. The search direction is computed via the two-loop recursion in O(mn)O(mn) time and O(mn)O(mn) storage:

dk=Hk0fk+iβisi,d_k = -H_k^0 \nabla f_k + \sum_{i} \beta_i s_i,

where Hk0=γkIH_k^0 = \gamma_k I with γk=sk1yk1/yk1yk1\gamma_k = s_{k-1}^\top y_{k-1} / y_{k-1}^\top y_{k-1} providing automatic scaling.

use numra::optim::{lbfgs_minimize, LbfgsOptions};
// Large-scale quadratic: f(x) = sum(x_i^2), n = 100
let n = 100;
let f = |x: &[f64]| x.iter().map(|xi| xi * xi).sum::<f64>();
let grad = |x: &[f64], g: &mut [f64]| {
for i in 0..x.len() { g[i] = 2.0 * x[i]; }
};
let x0: Vec<f64> = (1..=n).map(|i| i as f64 * 0.1).collect();
let opts = LbfgsOptions::default().memory(10).gtol(1e-10);
let result = lbfgs_minimize(f, grad, &x0, &opts).unwrap();
assert!(result.converged);
assert!(result.f < 1e-10);
OptionDefaultDescription
memory10Number of correction pairs to store
gtol1e-8Gradient norm convergence tolerance
max_iter1000Maximum number of iterations
ftol1e-12Relative function change tolerance

When to use L-BFGS over BFGS: Use L-BFGS when n1000n \gtrsim 1000. For small problems (n<50n < 50), BFGS can converge in fewer iterations because the full inverse Hessian captures curvature more accurately.

The Nelder-Mead method is a derivative-free optimizer that maintains a simplex of n+1n+1 vertices and transforms it via four operations: reflection, expansion, contraction, and shrink.

Numra’s implementation uses adaptive parameters (Gao & Han, 2012) that scale with problem dimension:

α=1,γ=1+2n,ρ=0.7512n,σ=11n.\alpha = 1, \quad \gamma = 1 + \frac{2}{n}, \quad \rho = 0.75 - \frac{1}{2n}, \quad \sigma = 1 - \frac{1}{n}.

These adaptive parameters significantly improve convergence in high dimensions compared to the classical fixed parameters (1,2,0.5,0.5)(1, 2, 0.5, 0.5).

use numra::optim::{nelder_mead, NelderMeadOptions};
// Beale function (non-convex, gradient not readily available)
let f = |x: &[f64]| {
(1.5 - x[0] + x[0] * x[1]).powi(2)
+ (2.25 - x[0] + x[0] * x[1] * x[1]).powi(2)
+ (2.625 - x[0] + x[0] * x[1].powi(3)).powi(2)
};
let opts = NelderMeadOptions {
max_iter: 50_000,
fatol: 1e-8,
xatol: 1e-8,
adaptive: true,
..Default::default()
};
let result = nelder_mead(f, &[1.0, 1.0], &opts).unwrap();
assert!(result.converged);
assert!(result.f < 1e-4);
println!("Function evaluations: {}", result.n_feval);
OptionDefaultDescription
fatol1e-8Function value tolerance (convergence when spread < fatol)
xatol1e-8Simplex size tolerance
initial_simplex_scale0.05Scale factor for initial simplex construction
adaptivetrueUse adaptive Gao—Han parameters
max_iter10000Maximum iterations

Convergence: Nelder-Mead converges linearly at best and can stall on degenerate problems. It uses zero gradient evaluations (n_geval = 0), making it ideal for black-box functions, noisy objectives, or non-differentiable problems.

Powell’s method builds a set of conjugate directions by performing sequential 1-D line minimizations along each direction, then replaces the direction with the largest decrease with the net displacement direction. Each line minimization uses Brent’s method (combining parabolic interpolation with golden-section search).

use numra::optim::{powell, PowellOptions};
let f = |x: &[f64]| (x[0] - 2.0).powi(2) + (x[1] - 3.0).powi(2);
let opts = PowellOptions::default();
let result = powell(f, &[0.0, 0.0], &opts).unwrap();
assert!(result.converged);
assert!((result.x[0] - 2.0).abs() < 1e-4);
assert!((result.x[1] - 3.0).abs() < 1e-4);
OptionDefaultDescription
ftol1e-8Function change convergence tolerance
xtol1e-8Step size convergence tolerance
max_line_search_iter100Maximum iterations per Brent line search
max_iter10000Maximum outer iterations
SolverGradientsStorageConvergenceBest For
BFGSRequiredO(n2)O(n^2)SuperlinearSmooth, small-to-medium scale (n<1000n < 1000)
L-BFGSRequiredO(mn)O(mn)SuperlinearSmooth, large scale (n1000n \gg 1000)
Nelder-MeadNoneO(n2)O(n^2)LinearBlack-box, noisy, non-smooth
PowellNoneO(n2)O(n^2)Superlinear (quadratic)Derivative-free, smooth

All gradient-based solvers share the OptimOptions struct:

use numra::optim::OptimOptions;
let opts = OptimOptions::default()
.max_iter(2000)
.gtol(1e-10) // gradient norm threshold
.ftol(1e-14) // relative function change threshold
.xtol(1e-14); // step size threshold

Default values: max_iter = 1000, gtol = 1e-8, ftol = 1e-12, xtol = 1e-12.

Every solver returns an OptimResult<S>:

let result = bfgs_minimize(f, grad, &x0, &opts).unwrap();
println!("Minimizer: {:?}", result.x);
println!("Minimum: {:.6e}", result.f);
println!("Gradient: {:?}", result.grad);
println!("Converged: {}", result.converged);
println!("Status: {:?}", result.status); // e.g., GradientConverged
println!("Iterations: {}", result.iterations);
println!("f evaluations: {}", result.n_feval);
println!("g evaluations: {}", result.n_geval);
println!("Wall time: {:.3} s", result.wall_time_secs);
// Convergence history (objective, gradient norm per iteration)
for rec in &result.history {
println!(
" iter {}: f={:.4e}, ||g||={:.2e}, alpha={:.2e}",
rec.iteration, rec.objective, rec.gradient_norm, rec.step_size
);
}
OptimStatusMeaning
GradientConvergedf<\lVert \nabla f\rVert < gtol
FunctionConvergedfk+1fk<\vert f_{k+1} - f_k\vert < ftol (1+fk)\cdot (1 + \vert f_k\vert)
StepConvergedsk<\lVert s_k\rVert < xtol (1+xk)\cdot (1 + \lVert x_k\rVert)
MaxIterationsIteration limit reached without convergence
LineSearchFailedWolfe line search could not find an acceptable step