Skip to content

std::rand::distributions::Exp1 is biased #10084

Closed
@huonw

Description

@huonw

The following program generates 10,000 samples of the sample mean of 100,000 Exponential(1) random variates.

use std::rand::{StdRng, Rng};
use std::rand::distributions::Exp1;

static SAMPLES_PER_MEAN: uint = 100_000;
static NUM_MEANS: uint = 10_000;

fn main() {
    let mut rng = StdRng::new();
    let n = SAMPLES_PER_MEAN as f64;

    for _ in range(0, NUM_MEANS) {
        let mut s = 0.;
        for _ in range(0, SAMPLES_PER_MEAN) {
            s += *rng.gen::<Exp1>();
        }
        println!("{:.20}", s / n)
    }
}

The following plot plots the kernel density estimate of the above (solid) line, against the expected density based on the central limit theorem (dashed). They should match fairly closely with samples of this size. (The vertical line is the expected mean value of the samples, i.e. the peak should lie there.)

exp-density

Note that the scale is slightly deceptive and the bias isn't huge, the sample mean is converging to approximately 0.9964 rather than 1. However, it is highly unlikely that this is by chance with such large samples (1 billion samples in total), and my experimentation has shown it to be stable, i.e. running it several times with many different values of NUM_MEANS and SAMPLES_PER_MEAN all give plots similar to the above, with overall means around 0.9964.

(Plot created with R via:

dat <- read.table('out.csv')$V1
x <- seq(0.98, 1.02, by=0.0001)
png('exp-density.png', width=600, height=600)
plot(density(dat), main='Density of 10000 means of 100000 samples of Exp(1)', lty=1)
lines(x, dnorm(x, 1, 1/sqrt(100000)), lty=2)
lines(c(1.0, 1.0), c(-1000, 1000), lty=3)
legend('topright', c('Sample density', 'True density'), lty=c(1,2))
dev.off()

FWIW, generating a similar sample in R matches the expected density almost exactly.)


Having done a very small amount of investigation, I think this might be an error in the Ziggurat tables for exp, or an error in the implementation of Ziggurat for non-symmetric random variables. (Reasons: the f64 generator seems to match a uniform(0,1) closely, and the normal generator (which also uses Ziggurat) matches its expected distribution closely too. That said, I didn't check either of these super closely.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions