Description
The following program generates 10,000 samples of the sample mean of 100,000 Exponential(1) random variates.
use std::rand::{StdRng, Rng};
use std::rand::distributions::Exp1;
static SAMPLES_PER_MEAN: uint = 100_000;
static NUM_MEANS: uint = 10_000;
fn main() {
let mut rng = StdRng::new();
let n = SAMPLES_PER_MEAN as f64;
for _ in range(0, NUM_MEANS) {
let mut s = 0.;
for _ in range(0, SAMPLES_PER_MEAN) {
s += *rng.gen::<Exp1>();
}
println!("{:.20}", s / n)
}
}
The following plot plots the kernel density estimate of the above (solid) line, against the expected density based on the central limit theorem (dashed). They should match fairly closely with samples of this size. (The vertical line is the expected mean value of the samples, i.e. the peak should lie there.)
Note that the scale is slightly deceptive and the bias isn't huge, the sample mean is converging to approximately 0.9964 rather than 1. However, it is highly unlikely that this is by chance with such large samples (1 billion samples in total), and my experimentation has shown it to be stable, i.e. running it several times with many different values of NUM_MEANS
and SAMPLES_PER_MEAN
all give plots similar to the above, with overall means around 0.9964.
(Plot created with R via:
dat <- read.table('out.csv')$V1
x <- seq(0.98, 1.02, by=0.0001)
png('exp-density.png', width=600, height=600)
plot(density(dat), main='Density of 10000 means of 100000 samples of Exp(1)', lty=1)
lines(x, dnorm(x, 1, 1/sqrt(100000)), lty=2)
lines(c(1.0, 1.0), c(-1000, 1000), lty=3)
legend('topright', c('Sample density', 'True density'), lty=c(1,2))
dev.off()
FWIW, generating a similar sample in R matches the expected density almost exactly.)
Having done a very small amount of investigation, I think this might be an error in the Ziggurat tables for exp, or an error in the implementation of Ziggurat for non-symmetric random variables. (Reasons: the f64
generator seems to match a uniform(0,1) closely, and the normal generator (which also uses Ziggurat) matches its expected distribution closely too. That said, I didn't check either of these super closely.)