Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mean et al. for truncated log normal #1874

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

ararslan
Copy link
Member

@ararslan ararslan commented Jun 26, 2024

This PR adds mgf for truncated normal and uses that to implement mean, var, skewness, and kurtosis for truncated log normal based on this observation. median is also implemented for truncated log normal.

Fixes #709

@devmotion
Copy link
Member

My worry (that was also expressed in issues such as #968) is that generally numerical integration is challenging and a fallback might lead to silently incorrect results. It seems such a fallback would be wrong (or at least problematic) e.g. if the moments are not finite (such as e.g. for Cauchy).

So my general feeling is that numerical integration should maybe be restricted to a smaller subset of distributions, or maybe even only be available as a separate function. In case we want to use it more broadly, I think it would also be safer to error if the integration error estimate is too large, to reduce the probability of silently incorrect results.

@PaulSoderlind
Copy link

@devmotion Is there any numerical integration in this code?

@ararslan
Copy link
Member Author

@devmotion, perhaps you're thinking of #1875? As @PaulSoderlind noted, there's no integration here.

@codecov-commenter
Copy link

codecov-commenter commented Jun 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.71%. Comparing base (65f056c) to head (70c7810).
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1874      +/-   ##
==========================================
- Coverage   85.99%   85.71%   -0.29%     
==========================================
  Files         144      145       +1     
  Lines        8666     8706      +40     
==========================================
+ Hits         7452     7462      +10     
- Misses       1214     1244      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@devmotion
Copy link
Member

Oh yes, indeed, it seems I commented on the wrong PR.

@ararslan ararslan marked this pull request as ready for review June 28, 2024 20:07
@ararslan ararslan requested a review from devmotion June 28, 2024 20:07
@ararslan
Copy link
Member Author

CI failures on nightly are unrelated.

Comment on lines 10 to 11
a = d.lower === nothing ? nothing : log(T(minimum(d)))
b = d.upper === nothing ? nothing : log(T(maximum(d)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use minimum/maximum instead of d.lower/d.upper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like maybe I had a reason but who knows what it was

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I remember now: it was to handle truncation points outside of the support, e.g. truncated(LogNormal(); lower=-1). In that case, d.lower == -1 but minimum(d) == 0. I should add a test for that.

@@ -118,6 +118,36 @@ function entropy(d::Truncated{<:Normal{<:Real},Continuous})
0.5 * (log2π + 1.) + log(σ * z) + (aφa - bφb) / (2.0 * z)
end

function mgf(d::Truncated{<:Normal{<:Real},Continuous}, t::Real)
T = promote_type(partype(d), typeof(t))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess

Suggested change
T = promote_type(partype(d), typeof(t))
T = float(promote_type(partype(d), typeof(t)))

would be more correct.

return @horner(m1, m4, -4m3, 6m2, 0, -3) / v^2 - 3
end

median(d::Truncated{<:LogNormal}) = exp(median(_truncnorm(d)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggests we should add a similar definition for quantile as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good call, it turns out that this relation holds in general, not just for the median.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, defining median/quantile is unnecessary as there's already a fallback definition for computing quantiles for truncated distributions that would be more efficient than the additional layer of indirection involved with constructing a truncated normal.

function var(d::Truncated{<:LogNormal})
tn = _truncnorm(d)
# Ensure the variance doesn't end up negative, which can occur due to numerical issues
return max(mgf(tn, 2) - mgf(tn, 1)^2, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated evaluation of mgf involves repeated calculations of the same (intermediate) quantities. But my fear is that optimizing this further will lead to less readable code...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had the same thought and wasn't sure what to do about it so I just... didn't address it, haha

@ararslan
Copy link
Member Author

Ugh, type inference issue on 1.3 and I can't test on 1.3 locally 😑

Comment on lines +10 to +11
a = d.lower === nothing || d.lower <= 0 ? nothing : log(T(d.lower))
b = d.upper === nothing || isinf(d.upper) ? nothing : log(T(d.upper))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests on Julia 1.3 pass locally when changing this to

Suggested change
a = d.lower === nothing || d.lower <= 0 ? nothing : log(T(d.lower))
b = d.upper === nothing || isinf(d.upper) ? nothing : log(T(d.upper))
a = d.lower === nothing ? nothing : log(T(max(d.lower, 0)))
b = d.upper === nothing ? nothing : log(T(d.upper))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That isn't functionally equivalent though; IIRC, I was relying on a and/or b being nothing in those cases so that truncated would handle it a particular way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But arguably this optimization (d.lower === nothing or d.upper === nothing) should already have happened, either by a user or internally, when constructing the d = truncated(LogNormal(...))? Maybe one shouldn't expect that unoptimized inputs lead to optimized algorithms. AFAICT the optimization of truncated(Normal(...), ...) is also only exploited in the case where this returns a Normal (a = b = nothing); the code for Truncated{<:Normal} does not seem to use the fact that a bound might be nothing. In the Normal case, arguably the LogNormal shouldn't be truncated in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently sick and can only barely comprehend that message but if you'd prefer to go with your suggested change then feel free to apply it, I trust your judgement

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we have this conversation a few months ago? My brain similarly can't comprehend whether this is the same discussion: #1874 (comment)

I should probably just log off and go sleep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error calculating mean of Truncated Log Normal
4 participants