Description
This is a cool project and thanks for referencing our paper for random-projection encoding methods! Just FYI - the random projection encoding method you have implemented is a bit different from those discussed in the paper referenced. Let x
be an
z = sign(Mx)
where M
is a d x n
dimensional matrix whose rows are sampled from the uniform distribution over the unit-sphere. A simple way to generate samples from the uniform distribution over the unit-sphere is to normalize a sample from the [-1,1]^n
and normalizing results in a uniform distribution over the sphere. In particular, I think this approach results in too little mass distributed near the equator and poles. The following code should do the trick:
# Generate the embedding matrix:
import numpy as np
d = 10000; n = 100
M = np.random.normal(size=(d,n))
M /= np.linalg.norm(M, axis=1).reshape(-1,1)
# encode a point:
x = np.random.rand(n)
z = np.sign(M.dot(x))
The sign function is important because the sense in which the encoding preserves distances between points is different without it (and I'm not sure is what one would want). You may not want to use the sign
function because it messes with gradients (e.g. its derivative is zero everywhere except at zero, where it does not exist). If you want to omit the sign function and use a linear projection, I would recommend looking into the "Johnson-Lindenstrauss Transform" (see here).