AbstractInitStrategy for optimisation

#2676 introduced the use of `initial_params::AbstractInitStrategy` for MCMC sampling, but intentionally deferred the use of `initial_params::AbstractInitStrategy` for optimisation. This is partly because initialisation in optimisation is more complex, and also partly because the optimisation interface was due some reworking anyway.

I realised we don't have an issue to actually track this, so here are some thoughts.

For what I'm guessing is the large majority of use cases, a user will either specify initial parameters, or provide box constraints, or provide no constraints. What this means is that, at least for the univariate case, we can nest AbstractInitStrategy:

```julia
struct InitFromBoxConstraints{T<:Real}
    lb::T
    ub::T
end

function DynamicPPL.init(rng, vn, dist, strategy::InitFromBoxConstraints)
    # no constraints given, sample from prior
    if !haskey(strategy.constraints, vn)
        return rand(rng, dist), identity
    end
    # we have constraints, use them to generate parameters
    return if !isfinite(lb) && !isfinite(ub)
        # no actual constraints, just sample from prior
        rand(rng, dist), identity
    elseif !isfinite(lb)
        rand(rng, Uniform(ub - 4.0, ub)), identity
    elseif !isfinite(ub)
        rand(rng, Uniform(lb, lb + 4.0)), identity
    else
        rand(rng, Uniform(lb, ub)), identity
    end
end
```

Here `4.0` is quite arbitrary but it parallels `SampleFromUniform`. I suppose `InitFromBoxConstraints` _could_ allow customisation of this parameter as well in principle.

The problem here is mainly: how do we maintain a meaningful API? In theory, one would like to have

```julia
estimate_mode(...; box_constraints=nt_or_dict, initial_params=strategy
```

where `nt_or_dict` is some mapping of VarNames to constraints. What I propose is this:

- if `box_constraints == nothing` and `initial_params == nothing`, we use `InitFromPrior`.
- if `box_constraints != nothing` and `initial_params == nothing`, we use `InitFromBoxConstraints(box_constraints)`.
- InitFromParams is currently the only strategy that allows for a 'fallback'. That means that, if `strategy` is a series of nested `InitFromParams` then we can obtain the 'leaf strategy' using a recursive function.
- If the leaf strategy is `nothing` and box constraints are provided, then we can replace `nothing` with `InitFromBoxConstraints`. Otherwise, if the leaf strategy is `nothing` and there are no box constraints, replace the leaf strategy with `InitFromPrior`.

There are scenarios where this could in principle lead to inconsistencies:

- Consider when `box_constraints` is nonempty, but `initial_params` is `InitFromParams(nt_or_dict, InitFromPrior())`. In this case if `nt_or_dict` does not contain all the parameters, then we hit the fallback `InitFromPrior`, which could generate initial values that fall outside the box. I think this is not a big deal because it's easily fixed by the user setting a fallback of nothing instead; however we will need to clearly document this and provide some error messages too.

- Note that `InitFromBoxConstraints` must generate parameters in unlinked space. As written above, it does not make any sense for multivariate distributions and especially not distributions where parameters are mutually dependent (e.g. Dirichlet, LKJCholesky). A naive approach of generating vectorised parameters and using `from_vec_transform(dist)` could easily lead to invalid samples. We could instead adopt a SampleFromUniform approach and sample in linked space before using `from_linked_vec_transform`, but that would be incorrect because the box constraints are supposed to be specified in unlinked space.

   However, it should be noted that this problem already exists on `main` because the current generation of parameters from box constraints uses `rand.(Uniform.(lb, ub))`. It doesn't error, but it will give you nonsensical results.
   Thus I believe that an approach of _attempting_ to use `from_vec_transform`, _but_ throwing an error if the lp is `NaN`/`Inf`, is still going to be an improvement over the current situation.

```julia
julia> @model f() = x ~ Dirichlet(2, 1) # Dirichlet samples are between 0 and 1
f (generic function with 2 methods)

julia> maximum_a_posteriori(f(); lb=[1.0, 1.0], ub=[2.0, 2.0])
ModeResult with maximized lp of -Inf
[1.3910560913090908, 1.4022767128409757]
```

One might ask, "ahhhh, but I would like to set constraints in linked space". Well, the current optimisation framework does not really allow for that. In fact, the current optimisation framework is broken in this regard, because it makes an internal decision about whether to perform optimisation in linked or unlinked space. It will transform `initial_params` accordingly. However, it does **not** also transform `lb` and `ub` (!!), meaning that whether `lb` and `ub` refer to constraints in linked or unlinked space depends on whether Turing decided to link it or not.

Essentially, the argument I'm making here is that it's better to loudly error on some things that are not easy to handle, rather than to pretend that we are handling all cases nicely and end up giving nonsensical results OR silently distorting the meaning of the inputs.

Finally, none of this can handle transformed constraints well (`cons`, `lcons`, `ucons`). I think that this is beyond the capabilities of Turing to handle, and currently my opinion that if a user wants to do this, they should generate a `LogDensityFunction` themselves and perform optimisation directly on that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AbstractInitStrategy for optimisation #2735

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AbstractInitStrategy for optimisation #2735

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions