Skip to content

Tags: dnbaker/bioseq

Tags

v0.1.7

Toggle v0.1.7's commit message
Bug fix - alias remapping.

v0.1.6

Toggle v0.1.6's commit message
Pre-made tokenizers, CNN- and transformer-style encoders + training, …

…BLOSUM-based augmentation, and flatten_swiss, which extracts tabular data for NLP pretraining

v0.1.5

Toggle v0.1.5's commit message
Training for autoregressive language modeling of biological sequences.

v0.1.4

Toggle v0.1.4's commit message
Numpy type-parity, trainable sparse softmax, and a SeqEncoder factory…

… for biological sequence embeddings.

v0.1.3

Toggle v0.1.3's commit message
Add is_padded(), includes_bos(), includes_eos(), alphabet_size(), pad…

…(), eos(), bos() functions to Tokenizers; add prebuilt Tokenizers, and a bioseq.make_embedding utility to handled padding and torch.nn.Embedding creation.

v0.1.2

Toggle v0.1.2's commit message
v0.1.2 - add direct tokenizers which can be used in conjunction with …

…torch.nn.Embedding.

v0.1.1

Toggle v0.1.1's commit message
Expanded alphabet usage