Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fit_mle methods #1670

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Conversation

dmetivie
Copy link

Hello,
I open this PR after asking a question on discourse and an issue (more like feature enhancement) here. I was not too convinced by the answer on Discourse, so I decided to show you directly what I meant in this PR.

Here are the additions of this PR

  1. Main one add fit_mle(dist::Distribution, x) instead of just fit_mle(::Type{<:Distribution}, x).
    Why?
    • Why not: From the tests I ran, it does not break anything. If fit_mle(::Type{<:Distribution}, x) is available, then it will use it.
    • It allows to fully use multiple dispatch of fit_mle as all “structure parameters” are now stored in dist and not as extra parameters. E.g. fit_mle(Binomial(n, 0.1), x) is fit_mle(Binomial, ntrials(Binomial(n, 0.1)), x). Of course, the 0.1 is not used and is kind of useless here.
    • This dist can be used when needed as a starting point for a maximum likelihood estimation.
    • This could maybe benefit many downstream pkg using fit_mle for complex distributions like Copulas.jl, ExpectationMaximization.jl or HMMBase.jl.
  2. Point 1. allows adding very easily a fit_mle method for Product distributions.
    fit_mle(product_distribution([Exponential(0.5), Normal(11.3, 3.2)]),x) is simply product_distribution([fit_mle(Exponential, x[1,:]), fit_mle(Normal, x[2, :])])
    I am not too sure how it would be done only with types. Thought this seems to be done in Copulas.jl (notation becomes quite heavy), but again not sure how one would infer the components of a mixtures distribution or a product distribution.
  3. Add fit_mle(Laplace, x, w) using the weighted median
  4. Add fit_mle(Dirac, x[,w]) (why not).
  5. I added test for all the new things and doc.
  6. I added in the estimation of mixture section a ref to my pkg ExpectationMaximization.jl which fits mixture models (and thanks to fit_mle(dist::Distribution, x) is able to use EM algorithm on very generic distribution e.g., mixture of mixtures).

I tried to follow as much as I understood the contributions guidelines (my VSCode formatter changed quite a bit of things).

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hej! Thank you for taking the time and putting together a PR. Before diving into more detailed discussions, I think it would be good if you could revert the formatting changes. Additionally, I think it would be good to move a few parts to separate PRs as they do not involve any design questions (see my comments below).

src/multivariate/product.jl Show resolved Hide resolved
src/univariate/continuous/laplace.jl Outdated Show resolved Hide resolved
@@ -196,6 +196,7 @@ suffstats(::Type{T}, data::BinomData, w::AbstractArray{<:Real}) where {T<:Binomi

fit_mle(::Type{T}, ss::BinomialStats) where {T<:Binomial} = T(ss.n, ss.ns / (ss.ne * ss.n))

fit_mle(d::T, x::AbstractArray{<:Integer}) where {T<:Binomial} = fit_mle(T, suffstats(T, ntrials(d), x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally not a big fan of supporting both instances and types if it is not really needed (https://docs.julialang.org/en/v1/manual/style-guide/#Avoid-confusion-about-whether-something-is-an-instance-or-a-type). In this line here, I'm not sure if there is a very strong argument for adding the method based on the instance since the same functionality already exists. More generally, I think it could be confusing to know which parameters of d are fixed and which ones are optimized with MLE.

Copy link
Author

@dmetivie dmetivie Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments are the ones I made before.
Mostly, it allows using fit_mle(dist, x) in a very generic code without having to care about extra arguments for some distributions like ProductDistribution, Binomial and Categorical.

For me, the point is not really adding a functionality to Binomial but all that comes with the form fit_mle(dist, x). The only cost is some possible minor confusion.
Docs should be clear on the difference between the instance and types difference. I tried typing something in that sense.

src/univariate/discrete/categorical.jl Outdated Show resolved Hide resolved
src/univariate/discrete/dirac.jl Outdated Show resolved Hide resolved
Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more comments to explain my reservations. Currently, my opinion is that the suggestions in this PR would lead to confusing behaviour that is difficult to predict, without a clear benefit. I'm also not completely convinced by the definitions for product distributions as they are not very flexible and make some seemingly strong assumptions such as e.g. that the same arguments are provided to all fits.

Therefore currently I would suggest not merging this PR.

docs/src/fit.md Outdated Show resolved Hide resolved
Comment on lines +81 to +89
fit_mle(Binomial(n, 0.1), x)
# equivalent to fit_mle(Binomial, ntrials(Binomial(n, 0.1)), x), here the parameter 0.1 is not used

fit_mle(Categorical(p), x)
# equivalent to fit_mle(Categorical, ncategories(Categorical(p)), x), here the only the length of p is used not its values

d = product_distribution([Exponential(0.5), Normal(11.3, 3.2)])
fit_mle(d, x)
# equivalent to product_distribution([fit_mle(Exponential, x[1,:]), fit_mle(Normal, x[2, :])]). Again parameters of d are not used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this different behaviour of the distribution instances is very confusing and exactly shows the problem I had in mind in https://github.com/JuliaStats/Distributions.jl/pull/1670/files#r1092571161. There's no clear and obvious way to tell which properties of the distribution will be fixed when fitting. And even in the case of the product distribution the resulting code is not (significantly) shorter than the current code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Categorical the current fit_mle(Categorical, x) = fit_mle(Categorical, max(x), x) could also lead to confusion (and error in really unlucky case where there are 0 samples of the last category) and not so different of fit_mle(Categorical, ncategories(Categorical(p)), x).

It is not about code length. See my comment below.

```

Note that for standard distributions, the values of the distribution parameters `d` are not used in `fit_mle` only the “structure” of `d` is passed into `fit_mle`.
However, for complex Maximum Likelihood estimation requiring optimization, e.g., EM algorithm, one could use `D` as an initial guess.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is yet another behaviour and IMO adds to the confusion.

src/genericfit.jl Outdated Show resolved Hide resolved
src/multivariate/product.jl Outdated Show resolved Hide resolved
function fit_mle(g::Product, x::AbstractMatrix, args...)
d = size(x, 1)
length(g) == d || throw(DimensionMismatch("The dimensions of g and x are inconsistent."))
return product_distribution([fit_mle(g.v[s], y, args...) for (s, y) in enumerate(eachrow(x))])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to what I mentioned above, I think there's another issue here: It is not possible to pass different arguments/options to the fits of the different components, which makes this approach much less flexible than just fitting each component manually.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that: is there a way to add kwargs = [kwargs[i] for i in 1:d] and apply them to each distribution?
About the arguments, I believe in most situation you have a sample and fit the whole ProductDistribution at once.
Hence, IMO the code above would correspond to the most typical situation. If you think about a situation where some distributions have weighted samples and some other not, one could either try to define a method for this case or as you said do it manually.

Comment on lines +39 to +40
params(g::Product) = params.(g.v)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated:

Suggested change
params(g::Product) = params.(g.v)

Comment on lines +77 to +78
params(d::ArrayOfUnivariateDistribution) = params.(d.dists)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
params(d::ArrayOfUnivariateDistribution) = params.(d.dists)

dmetivie and others added 3 commits February 2, 2023 13:08
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
@dmetivie
Copy link
Author

dmetivie commented Feb 2, 2023

Thanks for giving thoughts and time.
The main benefits of this PR are not about shorter code. I find that the instance version of fit_mle convey more information than the type version. In Distributions.jl advantages are not super obvious (except for ProductDistributions) since for simple univariate of multivariate normal MLE is known exactly.
The instance version allows writing the EM algorithm very simply for all kind of distributions see here sure I am very bias about that.
I feel that with types only we are missing something:

  • How would one code fit_mle for product distributions with types only? IMO fit_mle for product distribution is a very neat feature (even if it is not hard to code).
  • How would you extract the types of a mixture with only the type version of fit_mle (or another similar function)

@lrnv
Copy link

lrnv commented Mar 23, 2023

Although i really have no oppinon on the confusion argument, i think that a better solution would be to parametrize mixtures distributions:

MixtureModel{Gamma,Normal,Binomial} would be the type, and not only MixtureModel. This would be coherent with the current behavior of ProductDistribution, which is parametrized by types. But I agree that this is probably a more dificult change to implement (might even be considered breaking ?).

@ParadaCarleton
Copy link
Contributor

  • How would you extract the types of a mixture with only the type version of fit_mle (or another similar function)

Hmm, maybe the types of a mixture should also include how many of each component is included? e.g.

Mixture{MixtureComponent{Normal, 2}, MixtureComponent{Gamma, 3}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants