Skip to content

Conversation

JaoCR
Copy link

@JaoCR JaoCR commented Aug 18, 2025

Following the TODO in the Hypergeometric distribution sampling code, this PR implements the H2PE/HIM algorithm presented by Kachitvichyanukul & Schmeiser in 1984, removing this dependency on RFunctions and improving the sampling performance with a penalty in setup time and some extra stored constants.

Added some test cases for complete coverage of the distribution code. More extensive testing with the distribution of the generated variables was done (locally, manually) on both of the sub-cases of this algorithm.

Example case of HIM

The HIM algorith runs for smaller distributions. To exemplify, the Hypergeometric(20, 20, 10):

Setup Benchmark (with RFunctions):
image

Setup Benchmark (HIM):
image

Sampling Benchmark (with RFunctions):
image

Sampling Benchmark (HIM):
image

QQPlot with 1000 samples

plot_3

Example case of H2PE

The H2PE algorithm is used for larger cases. To examplify, Hypergeometric(200, 200, 100):

Setup Benchmark (with RFunctions):
image

Setup Benchmark (H2PE)
image

Sampling Benchmark (with RFunctions):
image

Sampling Benchmark (H2PE):
image

QQPlot with 1000 samples
plot_4

Feel free to ask any questions and do/request any adjustments. Have a nice week.

@devmotion
Copy link
Member

👍 Can you update the PR to follow the design of Distributions.jl, ie create a new sampler in src/samplers instead of changing the distribution type? 🙂

@JaoCR
Copy link
Author

JaoCR commented Aug 18, 2025

Tanks for taking a look at it, will do. Will be back with an update in the near future. (Thinking about it, this may solve the test that is currently failing, I'm dispatching on a type parameter to switch between algorithms and this seems to be against the intended pattern here)

@JaoCR

This comment was marked as resolved.

@codecov-commenter
Copy link

codecov-commenter commented Aug 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.45%. Comparing base (a1b8bb3) to head (c1e848c).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1994      +/-   ##
==========================================
+ Coverage   86.28%   86.45%   +0.17%     
==========================================
  Files         146      147       +1     
  Lines        8787     8861      +74     
==========================================
+ Hits         7582     7661      +79     
+ Misses       1205     1200       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JaoCR

This comment was marked as resolved.

@JaoCR

This comment was marked as resolved.

@JaoCR

This comment was marked as resolved.

@JaoCR
Copy link
Author

JaoCR commented Aug 23, 2025

I'm back. The errors seen before, after further investigation, also happen with other distributions due to the lack of support for the logpdf function for Dual values of x in StatsFuns. Those other distributions are already excluded from the test. I've included the Hypergeometric distribution to this exclusion rule (it is still tested with Float64 values of x in the logpdf). Also added one more test to cover the modes function which was uncovered before.

@JaoCR
Copy link
Author

JaoCR commented Aug 23, 2025

Checks are passing and test coverage seems to be complete now. @devmotion , please review this when you have time, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants