Skip to content

AbstractInitStrategy for optimisation #2735

@penelopeysm

Description

@penelopeysm

#2676 introduced the use of initial_params::AbstractInitStrategy for MCMC sampling, but intentionally deferred the use of initial_params::AbstractInitStrategy for optimisation. This is partly because initialisation in optimisation is more complex, and also partly because the optimisation interface was due some reworking anyway.

I realised we don't have an issue to actually track this, so here are some thoughts.

For what I'm guessing is the large majority of use cases, a user will either specify initial parameters, or provide box constraints, or provide no constraints. What this means is that, at least for the univariate case, we can nest AbstractInitStrategy:

struct InitFromBoxConstraints{T<:Real}
    lb::T
    ub::T
end

function DynamicPPL.init(rng, vn, dist, strategy::InitFromBoxConstraints)
    # no constraints given, sample from prior
    if !haskey(strategy.constraints, vn)
        return rand(rng, dist), identity
    end
    # we have constraints, use them to generate parameters
    return if !isfinite(lb) && !isfinite(ub)
        # no actual constraints, just sample from prior
        rand(rng, dist), identity
    elseif !isfinite(lb)
        rand(rng, Uniform(ub - 4.0, ub)), identity
    elseif !isfinite(ub)
        rand(rng, Uniform(lb, lb + 4.0)), identity
    else
        rand(rng, Uniform(lb, ub)), identity
    end
end

Here 4.0 is quite arbitrary but it parallels SampleFromUniform. I suppose InitFromBoxConstraints could allow customisation of this parameter as well in principle.

The problem here is mainly: how do we maintain a meaningful API? In theory, one would like to have

estimate_mode(...; box_constraints=nt_or_dict, initial_params=strategy

where nt_or_dict is some mapping of VarNames to constraints. What I propose is this:

  • if box_constraints == nothing and initial_params == nothing, we use InitFromPrior.
  • if box_constraints != nothing and initial_params == nothing, we use InitFromBoxConstraints(box_constraints).
  • InitFromParams is currently the only strategy that allows for a 'fallback'. That means that, if strategy is a series of nested InitFromParams then we can obtain the 'leaf strategy' using a recursive function.
  • If the leaf strategy is nothing and box constraints are provided, then we can replace nothing with InitFromBoxConstraints. Otherwise, if the leaf strategy is nothing and there are no box constraints, replace the leaf strategy with InitFromPrior.

There are scenarios where this could in principle lead to inconsistencies:

  • Consider when box_constraints is nonempty, but initial_params is InitFromParams(nt_or_dict, InitFromPrior()). In this case if nt_or_dict does not contain all the parameters, then we hit the fallback InitFromPrior, which could generate initial values that fall outside the box. I think this is not a big deal because it's easily fixed by the user setting a fallback of nothing instead; however we will need to clearly document this and provide some error messages too.

  • Note that InitFromBoxConstraints must generate parameters in unlinked space. As written above, it does not make any sense for multivariate distributions and especially not distributions where parameters are mutually dependent (e.g. Dirichlet, LKJCholesky). A naive approach of generating vectorised parameters and using from_vec_transform(dist) could easily lead to invalid samples. We could instead adopt a SampleFromUniform approach and sample in linked space before using from_linked_vec_transform, but that would be incorrect because the box constraints are supposed to be specified in unlinked space.

    However, it should be noted that this problem already exists on main because the current generation of parameters from box constraints uses rand.(Uniform.(lb, ub)). It doesn't error, but it will give you nonsensical results.
    Thus I believe that an approach of attempting to use from_vec_transform, but throwing an error if the lp is NaN/Inf, is still going to be an improvement over the current situation.

julia> @model f() = x ~ Dirichlet(2, 1) # Dirichlet samples are between 0 and 1
f (generic function with 2 methods)

julia> maximum_a_posteriori(f(); lb=[1.0, 1.0], ub=[2.0, 2.0])
ModeResult with maximized lp of -Inf
[1.3910560913090908, 1.4022767128409757]

One might ask, "ahhhh, but I would like to set constraints in linked space". Well, the current optimisation framework does not really allow for that. In fact, the current optimisation framework is broken in this regard, because it makes an internal decision about whether to perform optimisation in linked or unlinked space. It will transform initial_params accordingly. However, it does not also transform lb and ub (!!), meaning that whether lb and ub refer to constraints in linked or unlinked space depends on whether Turing decided to link it or not.

Essentially, the argument I'm making here is that it's better to loudly error on some things that are not easy to handle, rather than to pretend that we are handling all cases nicely and end up giving nonsensical results OR silently distorting the meaning of the inputs.

Finally, none of this can handle transformed constraints well (cons, lcons, ucons). I think that this is beyond the capabilities of Turing to handle, and currently my opinion that if a user wants to do this, they should generate a LogDensityFunction themselves and perform optimisation directly on that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions