In this paper, we address the problem of learning low-dimensional, discrete representations of real-valued vectors. We propose a new algorithm called similarity matrix construction and decomposition (C\&D). In the preparation phase, we constructively generate a set of consistent, unbiased and comprehensive anchor vectors, and obtain their low-dimensional forms with PCA. The C\&D algorithm learns the discrete representations of vectors in batches. For a batch of input vectors, we first construct a similarity matrix between them and the anchor vectors, and then learn their discrete representations from the similarity matrix decomposition, where the low-dimensional forms of the anchor vectors are regarded as a fixed factor of the similarity matrix. The matrix decomposition is a mixed-integer optimization problem. We obtain the optimal solution for each bit with mathematical derivation, and then use the discrete coordinate descent method to solve it. The C\&D algorithm does not learn directly discrete representations from the input vectors, which distinguishes it from other discrete learning algorithms. We evaluate the C\&D algorithm on sentence embedding compression tasks. Extensively experimental results reveal the C\&D algorithm outperforms the latest 4 methods and reaches state-of-the-art. Detailed analysis and ablation study further validate the rationality of the C\&D algorithm.
0 commit comments