Description
Currently we have 3 different similarity functions:
hamming_similarity
cosine_similarity
dot_similarity
And with the future introduction of complex hypervectors we will likely add a forth one if we follow the current design. I think, however, that we should only provide one similarity function that changes it's behavior based on the dtype
of the input tensors
. It would also be nice if it handles batched operations, i.e., with input shapes (*, d)
and (n, d)
the output shape should be (*, n)
which has the similarity score for each input sample against each other element.
In order to unify the output domain we can stick to the [-1, +1]
range that the cosine similarity and the complex variant of cosine similarity produce where 0 means orthogonal, +1 the same, and -1 the exact opposite. We can simply scale the hamming_similarity
to fall in this domain.
The dot_similarity
will then be removed from the library but is still available as part of PyTorch. And can therefore still be used in specific instances.
API design
x = torchhd.random_hv(10, 10000)
torchhd.functional.similarity(x, x) # aliased as torchhd.similarity(x, x)