Skip to content

Release v0.2.0

Latest
Compare
Choose a tag to compare
@laserkelvin laserkelvin released this 23 Jan 06:07
d450d0c

This posts a major change to the way embeddings are calculated. While the model weights are unchanged, the main user API for embedding molecular strings has been revised, as the previous implementation did not really take into account structure as it simply perform the einsum over raw nn.Embedding lookups.

The embed_molecule and related methods will now actually run the word embeddings through the encoder, then perform the einsum operation over non-padding tokens. This should now incorporate structural differences, and possibly explain why cosine similarities were very close to 1 for many molecules.

Full Changelog: v0.1.4...v0.2.0