-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Description
I'm having trouble using the VBGMM and DPGMM for density estimation. As far as I understand, both should have the same interface as the "normal" GMM. However, while the "normal" GMM produces a good fit, the VBGMM and DPGMM produce bad fits and non-normalised densities. This leads me to wonder whether there is something deeper wrong than me incorrectly using the code.
The problem presents itself both in the density estimation example, by appending the line:
print np.sum(np.exp(-Z)) * (x[1] - x[0]) * (y[1] - y[0])
This is approximately 1 when using a normal GMM, but much smaller when using the VB or DP GMM's.
The same behaviour is shown on a toy 1D density estimation problem:
import numpy as np
import numpy.random as rndn
import sklearn.mixture as skmix
import matplotlib.pyplot as plt
X = rnd.randn(0.7 * 300, 1) - 5
X = np.vstack((X, rnd.randn(0.3 * 300, 1) * 0.3 + 3))
# gmm = skmix.GMM(2)
gmm = skmix.DPGMM(2)
gmm.fit(X)
x = np.linspace(-10, 10, 1000)
p = np.exp(gmm.score(x))
plt.hist(X, bins=50, normed=True)
plt.plot(x, p)
plt.show()
integral = np.sum(p) * (x[1] - x[0])
print integral
Is this behaviour just the result of a poor fit due to a local optimum or something? The fact that the predictive densities don't normalise lead me to believe it's something else.
I asked the same question on StackOverflow.