Intermediate II: stopping rules

The theory of SDDP tells us that the algorithm converges to an optimal policy almost surely in a finite number of iterations. In practice, this number is very large. Therefore, we need some way of pre-emptively terminating SDDP when the solution is “good enough.” We call heuristics for pre-emptively terminating SDDP stopping rules.

Basic limits

The training of an SDDP policy can be terminated after a fixed number of iterations using the iteration_limit keyword.

Kokako.train(model, iteration_limit = 10)

The training of an SDDP policy can be terminated after a fixed number of seconds using the time_limit keyword.

Kokako.train(model, time_limit = 2.0)

Stopping rules

In addition to the limits provided as keyword arguments, a variety of other stopping rules are available. These can be passed to Kokako.train as a vector to the stopping_rules keyword. For example:

Kokako.train(model, stopping_rules = [Kokako.BoundStalling(10, 1e-4)])

Here are the stopping rules implemented in SDDP.jl:

Kokako.IterationLimit — Type.

IterationLimit(limit::Int)

Teriminate the algorithm after limit number of iterations.

Kokako.TimeLimit — Type.

TimeLimit(limit::Float64)

Teriminate the algorithm after limit seconds of computation.

Kokako.Statistical — Type.

Statistical(; num_replications, iteration_period = 1, z_score = 1.96,
            verbose = true)

Perform an in-sample Monte Carlo simulation of the policy with num_replications replications every iteration_periods. Terminate if the deterministic bound (lower if minimizing) calls into the confidence interval for the mean of the simulated cost. If verbose = true, print the confidence interval.

Note that this tests assumes that the simulated values are normally distributed. In infinite horizon models, this is almost never the case. The distribution is usually closer to exponential or log-normal.

Kokako.BoundStalling — Type.

BoundStalling(num_previous_iterations::Int, tolerance::Float64)

Teriminate the algorithm once the deterministic bound (lower if minimizing, upper if maximizing) fails to improve by more than tolerance in absolute terms for more than num_previous_iterations consecutve iterations.

In the next tutorial, Intermediate III: policy graphs, we discuss generic extensions to Kokako.LinearPolicyGraph and Kokako.MarkovianPolicyGraph.