Skip to content

Conversation

@benjamin-lieser
Copy link
Member

@benjamin-lieser benjamin-lieser commented Oct 27, 2025

  • Added a CHANGELOG.md entry

Summary

Added a TruncatedNormal distribution

Motivation

#7

Details

It follows Robert, Christian P. (1995). "Simulation of truncated normal variables"

The test still needs to be improved. The code seems to be working, but it is still a draft.

@benjamin-lieser
Copy link
Member Author

benjamin-lieser commented Oct 28, 2025

The two sided algorithm can fail (very low acceptance probability) when the range is big but far away from the mean.

There is a method missing, which uses the one sided algorithm and does rejection sampling to get to the two sided case. (I will implement this)

The problem is how to determine efficiently which of the 4 methods to use.

  • Naive rejection sampling
  • One sided
  • Two sided with the uniform proposal
  • Two sided with rejection sampling on the one sided case

The paper gives useful probabilities, but they are all quite expensive to evaluate. This would not matter if a lot of samples are drawn but could be quite a bit if only a few are used.
Maybe we actually put a tuning option to spend more upfront to get better sampling later.

I think we could also use the inverse cdf approach implemented by @Caellian with some numerical optimizations. This has the advantage that it does not use rejection sampling and would be vectorized well (maybe?)

@Caellian
Copy link

Caellian commented Oct 29, 2025

In the paper you're referencing (Robert, Christian P. (1995)) and later in Chopin 2012, a cutoff is computed analytically as 0.47 σ (or 0.477 σ) for finite intervals and 0.5 σ for semi-finite ones, so if "a ≥ 0.477 σ" switch to exponential-tail sampler (Robert/Marsaglia/Chopin).

So you can compare both bounds to figure out where they fall in the constructor, and then just use whichever algorithm is most efficient for that specific range. Branching is unavoidable - it will always have to be either a vtable lookup or a jump (preferable), rust is more likely to optimize away a simple jump if the range is constant, but CPU branch prediction should really remove this cost in most cases.

let a = (start - mean) / std;
let b = (end - mean) / std;
match () {
    // Extremely narrow interval: treat as degenerate
    _ if (b - a).abs() < 1e-6 => 0.5 * (a + b)

    // Narrow interval
        // Inverse CDF works best here, with f64
        // Use log-space for extreme a,b (e.g., > 8 sigma or < -8 sigma)
    _ if (b - a) < 1e-3 => sample_inverse_cdf(a, b)

    // Both tails positive (left cutoff above mean)
    _ if a >= 0.47 => sample_exponential_tail(a, b)
    // Both tails negative (right cutoff below mean)
        // symmetric: flip and reuse upper tail sampler
    _ if b <= -0.47 => -sample_exponential_tail(-b, -a)

    // Straddling zero (typical central case)
        // Standard rejection sampler works efficiently
    _ if a < 0.47 && b > -0.47 => sample_rejection(a, b)

    // Asymmetric truncation (one tail + near-zero cutoff)
        // mixed region: tail on one side, cutoff on the other
        // use exponential or hybrid rejection
    _ if a >= 0.0 => sample_rejection(a, b)
    _ if b <= 0.0 => -sample_rejection(-b, -a)
    _ => panic!("Invalid truncation bounds or NaN"),
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants