-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fit_mle methods #1670
base: master
Are you sure you want to change the base?
Add fit_mle methods #1670
Conversation
…tributions.jl into add-fit_mle-methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hej! Thank you for taking the time and putting together a PR. Before diving into more detailed discussions, I think it would be good if you could revert the formatting changes. Additionally, I think it would be good to move a few parts to separate PRs as they do not involve any design questions (see my comments below).
@@ -196,6 +196,7 @@ suffstats(::Type{T}, data::BinomData, w::AbstractArray{<:Real}) where {T<:Binomi | |||
|
|||
fit_mle(::Type{T}, ss::BinomialStats) where {T<:Binomial} = T(ss.n, ss.ns / (ss.ne * ss.n)) | |||
|
|||
fit_mle(d::T, x::AbstractArray{<:Integer}) where {T<:Binomial} = fit_mle(T, suffstats(T, ntrials(d), x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally not a big fan of supporting both instances and types if it is not really needed (https://docs.julialang.org/en/v1/manual/style-guide/#Avoid-confusion-about-whether-something-is-an-instance-or-a-type). In this line here, I'm not sure if there is a very strong argument for adding the method based on the instance since the same functionality already exists. More generally, I think it could be confusing to know which parameters of d
are fixed and which ones are optimized with MLE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arguments are the ones I made before.
Mostly, it allows using fit_mle(dist, x)
in a very generic code without having to care about extra arguments for some distributions like ProductDistribution
, Binomial
and Categorical
.
For me, the point is not really adding a functionality to Binomial
but all that comes with the form fit_mle(dist, x)
. The only cost is some possible minor confusion.
Docs should be clear on the difference between the instance and types difference. I tried typing something in that sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more comments to explain my reservations. Currently, my opinion is that the suggestions in this PR would lead to confusing behaviour that is difficult to predict, without a clear benefit. I'm also not completely convinced by the definitions for product distributions as they are not very flexible and make some seemingly strong assumptions such as e.g. that the same arguments are provided to all fits.
Therefore currently I would suggest not merging this PR.
fit_mle(Binomial(n, 0.1), x) | ||
# equivalent to fit_mle(Binomial, ntrials(Binomial(n, 0.1)), x), here the parameter 0.1 is not used | ||
|
||
fit_mle(Categorical(p), x) | ||
# equivalent to fit_mle(Categorical, ncategories(Categorical(p)), x), here the only the length of p is used not its values | ||
|
||
d = product_distribution([Exponential(0.5), Normal(11.3, 3.2)]) | ||
fit_mle(d, x) | ||
# equivalent to product_distribution([fit_mle(Exponential, x[1,:]), fit_mle(Normal, x[2, :])]). Again parameters of d are not used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this different behaviour of the distribution instances is very confusing and exactly shows the problem I had in mind in https://github.com/JuliaStats/Distributions.jl/pull/1670/files#r1092571161. There's no clear and obvious way to tell which properties of the distribution will be fixed when fitting. And even in the case of the product distribution the resulting code is not (significantly) shorter than the current code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Categorical
the current fit_mle(Categorical, x) = fit_mle(Categorical, max(x), x)
could also lead to confusion (and error in really unlucky case where there are 0 samples of the last category) and not so different of fit_mle(Categorical, ncategories(Categorical(p)), x)
.
It is not about code length. See my comment below.
``` | ||
|
||
Note that for standard distributions, the values of the distribution parameters `d` are not used in `fit_mle` only the “structure” of `d` is passed into `fit_mle`. | ||
However, for complex Maximum Likelihood estimation requiring optimization, e.g., EM algorithm, one could use `D` as an initial guess. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is yet another behaviour and IMO adds to the confusion.
function fit_mle(g::Product, x::AbstractMatrix, args...) | ||
d = size(x, 1) | ||
length(g) == d || throw(DimensionMismatch("The dimensions of g and x are inconsistent.")) | ||
return product_distribution([fit_mle(g.v[s], y, args...) for (s, y) in enumerate(eachrow(x))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to what I mentioned above, I think there's another issue here: It is not possible to pass different arguments/options to the fits of the different components, which makes this approach much less flexible than just fitting each component manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about that: is there a way to add kwargs = [kwargs[i] for i in 1:d]
and apply them to each distribution?
About the arguments, I believe in most situation you have a sample and fit the whole ProductDistribution
at once.
Hence, IMO the code above would correspond to the most typical situation. If you think about a situation where some distributions have weighted samples and some other not, one could either try to define a method for this case or as you said do it manually.
params(g::Product) = params.(g.v) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unrelated:
params(g::Product) = params.(g.v) |
params(d::ArrayOfUnivariateDistribution) = params.(d.dists) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params(d::ArrayOfUnivariateDistribution) = params.(d.dists) |
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Thanks for giving thoughts and time.
|
Although i really have no oppinon on the confusion argument, i think that a better solution would be to parametrize mixtures distributions:
|
Hmm, maybe the types of a mixture should also include how many of each component is included? e.g. Mixture{MixtureComponent{Normal, 2}, MixtureComponent{Gamma, 3}} |
Hello,
I open this PR after asking a question on discourse and an issue (more like feature enhancement) here. I was not too convinced by the answer on Discourse, so I decided to show you directly what I meant in this PR.
Here are the additions of this PR
fit_mle(dist::Distribution, x)
instead of justfit_mle(::Type{<:Distribution}, x)
.Why?
fit_mle(::Type{<:Distribution}, x)
is available, then it will use it.fit_mle
as all “structure parameters” are now stored indist
and not as extra parameters. E.g.fit_mle(Binomial(n, 0.1), x)
isfit_mle(Binomial, ntrials(Binomial(n, 0.1)), x)
. Of course, the 0.1 is not used and is kind of useless here.dist
can be used when needed as a starting point for a maximum likelihood estimation.fit_mle
for complex distributions like Copulas.jl, ExpectationMaximization.jl or HMMBase.jl.fit_mle
method for Product distributions.fit_mle(product_distribution([Exponential(0.5), Normal(11.3, 3.2)]),x)
is simplyproduct_distribution([fit_mle(Exponential, x[1,:]), fit_mle(Normal, x[2, :])])
I am not too sure how it would be done only with types. Thought this seems to be done in Copulas.jl (notation becomes quite heavy), but again not sure how one would infer the components of a mixtures distribution or a product distribution.
fit_mle(Laplace, x, w)
using the weightedmedian
fit_mle(Dirac, x[,w])
(why not).fit_mle(dist::Distribution, x)
is able to use EM algorithm on very generic distribution e.g., mixture of mixtures).I tried to follow as much as I understood the contributions guidelines (my VSCode formatter changed quite a bit of things).