-
Notifications
You must be signed in to change notification settings - Fork 230
Description
#2676 introduced the use of initial_params::AbstractInitStrategy for MCMC sampling, but intentionally deferred the use of initial_params::AbstractInitStrategy for optimisation. This is partly because initialisation in optimisation is more complex, and also partly because the optimisation interface was due some reworking anyway.
I realised we don't have an issue to actually track this, so here are some thoughts.
For what I'm guessing is the large majority of use cases, a user will either specify initial parameters, or provide box constraints, or provide no constraints. What this means is that, at least for the univariate case, we can nest AbstractInitStrategy:
struct InitFromBoxConstraints{T<:Real}
lb::T
ub::T
end
function DynamicPPL.init(rng, vn, dist, strategy::InitFromBoxConstraints)
# no constraints given, sample from prior
if !haskey(strategy.constraints, vn)
return rand(rng, dist), identity
end
# we have constraints, use them to generate parameters
return if !isfinite(lb) && !isfinite(ub)
# no actual constraints, just sample from prior
rand(rng, dist), identity
elseif !isfinite(lb)
rand(rng, Uniform(ub - 4.0, ub)), identity
elseif !isfinite(ub)
rand(rng, Uniform(lb, lb + 4.0)), identity
else
rand(rng, Uniform(lb, ub)), identity
end
endHere 4.0 is quite arbitrary but it parallels SampleFromUniform. I suppose InitFromBoxConstraints could allow customisation of this parameter as well in principle.
The problem here is mainly: how do we maintain a meaningful API? In theory, one would like to have
estimate_mode(...; box_constraints=nt_or_dict, initial_params=strategywhere nt_or_dict is some mapping of VarNames to constraints. What I propose is this:
- if
box_constraints == nothingandinitial_params == nothing, we useInitFromPrior. - if
box_constraints != nothingandinitial_params == nothing, we useInitFromBoxConstraints(box_constraints). - InitFromParams is currently the only strategy that allows for a 'fallback'. That means that, if
strategyis a series of nestedInitFromParamsthen we can obtain the 'leaf strategy' using a recursive function. - If the leaf strategy is
nothingand box constraints are provided, then we can replacenothingwithInitFromBoxConstraints. Otherwise, if the leaf strategy isnothingand there are no box constraints, replace the leaf strategy withInitFromPrior.
There are scenarios where this could in principle lead to inconsistencies:
-
Consider when
box_constraintsis nonempty, butinitial_paramsisInitFromParams(nt_or_dict, InitFromPrior()). In this case ifnt_or_dictdoes not contain all the parameters, then we hit the fallbackInitFromPrior, which could generate initial values that fall outside the box. I think this is not a big deal because it's easily fixed by the user setting a fallback of nothing instead; however we will need to clearly document this and provide some error messages too. -
Note that
InitFromBoxConstraintsmust generate parameters in unlinked space. As written above, it does not make any sense for multivariate distributions and especially not distributions where parameters are mutually dependent (e.g. Dirichlet, LKJCholesky). A naive approach of generating vectorised parameters and usingfrom_vec_transform(dist)could easily lead to invalid samples. We could instead adopt a SampleFromUniform approach and sample in linked space before usingfrom_linked_vec_transform, but that would be incorrect because the box constraints are supposed to be specified in unlinked space.However, it should be noted that this problem already exists on
mainbecause the current generation of parameters from box constraints usesrand.(Uniform.(lb, ub)). It doesn't error, but it will give you nonsensical results.
Thus I believe that an approach of attempting to usefrom_vec_transform, but throwing an error if the lp isNaN/Inf, is still going to be an improvement over the current situation.
julia> @model f() = x ~ Dirichlet(2, 1) # Dirichlet samples are between 0 and 1
f (generic function with 2 methods)
julia> maximum_a_posteriori(f(); lb=[1.0, 1.0], ub=[2.0, 2.0])
ModeResult with maximized lp of -Inf
[1.3910560913090908, 1.4022767128409757]One might ask, "ahhhh, but I would like to set constraints in linked space". Well, the current optimisation framework does not really allow for that. In fact, the current optimisation framework is broken in this regard, because it makes an internal decision about whether to perform optimisation in linked or unlinked space. It will transform initial_params accordingly. However, it does not also transform lb and ub (!!), meaning that whether lb and ub refer to constraints in linked or unlinked space depends on whether Turing decided to link it or not.
Essentially, the argument I'm making here is that it's better to loudly error on some things that are not easy to handle, rather than to pretend that we are handling all cases nicely and end up giving nonsensical results OR silently distorting the meaning of the inputs.
Finally, none of this can handle transformed constraints well (cons, lcons, ucons). I think that this is beyond the capabilities of Turing to handle, and currently my opinion that if a user wants to do this, they should generate a LogDensityFunction themselves and perform optimisation directly on that.