Skip to content

Add methods to form thermometer codes #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 20, 2022
Merged

Add methods to form thermometer codes #93

merged 9 commits into from
Nov 20, 2022

Conversation

denkle
Copy link
Collaborator

@denkle denkle commented Nov 11, 2022

No description provided.

Copy link
Member

@mikeheddes mikeheddes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, it's looking good! I just have a couple comments about some details.

device=rand_hv.device,
)

for i in range(num_vectors):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you perform a quick benchmark (timing) of your implementation against something like torch.triu_indices? This would let you set all elements in parallel so I suspect it would be quite a bit faster especially as the number of dimensions increases but I haven't tried it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! I agree it is no-brainer that the usage of indexing will make the implementation extremely fast. So the code was modified accordingly

)

for i in range(num_vectors):
if (model == BSC) | (model == FHRR):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering shouldn't the FHRR also use bipolar elements? Since it's a generalization of the bipolar hypervectors. This would also be inline with the model agnostic version I mentioned in another comment using the identity and negative identity.

requires_grad=False,
**kwargs,
) -> VSA_Model:
"""Creates a thermometer code for given dimensionality.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another way to look at the thermometer codes is to take the first hypervector as the identity hypervector of the model and the last hypervector as the negative of the identity and then perform the interpolation in between. This could make it such that the function is agnostic to the model used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to save this point somewhere! I think the current implementation with the use of torch.tril_indices is pretty compact but as we expand the number of supported HD/VSA models we might want to switch to this idea of interpolation.

) -> VSA_Model:
"""Creates a thermometer code for given dimensionality.

Implements similarity-preserving hypervectors as described in "Sparse Binary Distributed Encoding of Scalars" <https://doi.org/10.1615/J Automat Inf Scien.v37.i6.20>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the thermometer_hv and Thermometer to the documentation under docs/torchhd.rst and docs/embeddings.rst? Please double check using the sphinx documentation generation command in the README whether the link appears correctly, there might be an error in your syntax for the reference in the function description.

Copy link
Collaborator Author

@denkle denkle Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have not thought about the documentation in the first place. Fixed it now by updating the corresponding .rst files. There was indeed an error in the link formatting. Fixed it as well so it works in the generated .html

input, self.low_value, self.high_value, self.num_embeddings
).clamp(0, self.num_embeddings - 1)

return super(Thermometer, self).forward(indices).as_subclass(MAP)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikeheddes a question here. I followed the structure as in Level(). It, however, looks weird that MAP is used at the end when doing .as_subclass(). It seems that we rather need to specify a "model" parameter at the input and use the corresponding model with return.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. It is currently this way to be compatible with the way we implemented it before v4. But we should allow the user to specify the model of the various embedding classes. I will make a new issue for this.

@denkle denkle requested a review from mikeheddes November 14, 2022 11:59
@mikeheddes
Copy link
Member

I'm wondering whether it would be better to keep the same API as for the other hypervector creation functions where you can specify the number of vectors and their dimensions. Because for 10000 dimension this will create a very large matrix while you might only need to use 6 hypervectors or something.

@denkle
Copy link
Collaborator Author

denkle commented Nov 19, 2022

I'm wondering whether it would be better to keep the same API as for the other hypervector creation functions where you can specify the number of vectors and their dimensions. Because for 10000 dimension this will create a very large matrix while you might only need to use 6 hypervectors or something.

Will work now on revising the code to address this.

Copy link
Member

@mikeheddes mikeheddes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denkle thank you for revising the code and congrats with merging your first PR!

I removed the VSA model argument from the embedding class so that all the embeddings have a consistent API. I think it's better if we change all the embedding classes to accept a VSA model argument in a separate PR.

@mikeheddes mikeheddes merged commit 1fd1344 into main Nov 20, 2022
@mikeheddes mikeheddes deleted the Thermometer branch November 20, 2022 00:45
@denkle
Copy link
Collaborator Author

denkle commented Nov 22, 2022

Thanks, @mikeheddes! Agree with the point on separate PR for changing embeddings' API for all classes. Just let me know if you would like to start it or whether I should give the first try. It might at least be worth checking if the code for Thermometer was reasonable.

@mikeheddes
Copy link
Member

@denkle you can start working on that feature if you want. What you had before was in the right direction I think, it's a similar API to the functional ones. The only thing that was still missing is making sure that the dtype of the created tensors matches (or is compatible with) the provided VSA_Model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants