A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
-
Updated
Mar 19, 2025 - Python
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
Byte-Pair Encoding tokenizer for training large language models on huge datasets
(1) Train large language models to help people with automatic essay scoring. (2) Extract essay features and train new tokenizer to build tree models for score prediction.
A modular Byte Pair Encoding (BPE) tokenizer implemented in Python
From-Scratch High Valyrian MiniGPT — Build and train a GPT-like transformer model fully from scratch, including custom tokenizer, dataloader, transformer, training, and inference! Learn the internals of LLMs layer-by-layer.
TF-IDF Calculation
self made byte-pair-encoding tokenizer
Implementation of Byte-Pair-Encoding (BPE) tokenization
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
Add a description, image, and links to the bpe-tokenizer topic page so that developers can more easily learn about it.
To associate your repository with the bpe-tokenizer topic, visit your repo's landing page and select "manage topics."