Releases: RWKV/RWKV-infctx-trainer
v2.3.0 - S3FS datapath support
What's Changed
- Fix a bug in DS3 for A100/H100 nodes
- Added support for S3 datapath
- Change lr scheduler to cosine (credit: @SmerkyG )
- Changes to step calculation (may affect your training scripts/templates)
- Several bug fixes (credit: @SmerkyG )
Full Changelog: v2.2.1...v2.3.0
Example of S3 datapath config
data:
# Skip the datapath setup
#
# ignored if using the preload_datapath.py, useful for speeding up the trainer startup
# provided you have your datasets all properly preinitialized
# ---
skip_datapath_setup: True
# dataset_path for the prebuilt dataset, using HF `load_from_disk()`
#
# Use this if you have built your own dataset and saved it with `save_to_disk()`
# with source left as null. Other wise configure this to a directory which the
# dataset will be built and tokenized by the huggingface dataset process.
#
# If using relative path, this should be relative to the trainer script path
data_path: s3://bucket-name/subpath/
# Data path storage options, this is used to support cloud storage
# via the huggingface dataset API. See:
# https://huggingface.co/docs/datasets/v2.16.1/en/filesystems#amazon-s3
#
# Note: As of Jan 2023, these options has been only tested to work with AWS S3, and backblaze. YMMV
# For S3 bucket support you will also need to install s3fs `python3 -m pip install s3fs`
#
# If you want to reduce the risk of accidental key/secret commits, you can use
# `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables instead
#
# For datapath, it should use the `s3://bucket-name/subpath` format
# ---
data_path_storage_options:
key: <example S3 key>
secret: <example S3 secret>
endpoint_url: <example S3 endpoint>
v2.2.1 - bug fix for minibatch and mixed dataset + some experimental flags
What's Changed
- Fixed a bug where microbatch > 1 and mixed dataset size has errors
- Additional experimental flags for training tweaks
Full Changelog: v2.2.0...v2.2.1
v2.2.0
What's Changed
- Dataset packing by @PicoCreator in #56
- When combined with microbatches, it can drastically increase the tokens/s for a well tuned setup - this is similar to what was done for the axolot trainer.
- V5 batching support by @PicoCreator in #46
- Updating to latest state / saaaa / sbbbb / scccc / sdddd code impleme… by @PicoCreator in #47
- fix: loss forward segment count moved to correct device by @m8than in #48
- Fix config example by @m8than in #49
- Conversational data feature by @PicoCreator in #55
Full Changelog: v2.1.0...v2.2.0
v2.1.0 - Batching support, minor logging behaviour and default changes
What's Changed
- Added batching support with new trainer config
microbatch_size
- use this to trade vram for substential tokens/sec increase (50%++)- This brings infctx trainer speed to be much closer to the main trainer, i should have done this earlier lol
- Change real_ctx_len to data_ctx_len in wandb logging, to better reflect the above, as it will be now be an average of the microbatch
- Changed default behaviour of multi_column, and fix #35
- Improved documentation for #34
- Microbatch support #25
Full Changelog: v2.0.1...v2.1.0
v2.0.1 - Breaking change, for v5.2 / V5-R4 specification support
WARNING: this version breaks only supports the latest v5.2 (aka v5-R4) models, and breaks all previous models. Moving forward we will only be supporting v5-R4
What's Changed
- Fix
segment_count
not properly moved onto device by @TearGosling in #37 - Full v5 r4 rewrite by @PicoCreator @harrisonvanderbyl in #44
- Dropped the v5beta2 folder
bptt_truncate=true
is enforced, until state gradients is supported by the cuda kernel- Technically the non cuda kernel works, but its so slow (~100x slower) that you really should not train without GPUs
New Contributors / Acknowledgements
- @TearGosling made their first contribution in #37
- @harrisonvanderbyl who helped directly to debug the code, and for his RNN-Factory codebase which was used for reference
Full Changelog: v1.1.1...v2.0.1
v1.1.1 - v5r3 model bug fix
Fixed an issue of missing output normalisation used in v5 model
Full Changelog: v1.1.0...v1.1.1
v1.1.0 - breaking change for v5 (treat it as beta3)
WARNING: this version breaks existing v5 models, use v5-beta2 for previous models. Until an official v5 1.5B or higher official model is released, we will not be treating v5 breaking changes as major version changes. The existing v5 code (r3) has not been throughly tested, and maybe subjected to future changes.
What's Changed
- Upgrading v5 to be in sync with v5r3 from blinks official repo
- Move existing v5 code to v5-beta2 folder (as i know some folks have already started experimenting with v5)
- various readme / documentation / example changes
- Added data offset and limit params (to document)
- WIP docker container
- Fix for older python/lightning version for multi-gpu sync
Additional changes that was merged in
- limited dataloader num_worker max to 8 by @diannaojiang in #17
- (optional) Added token 0 to the tokenizer. by @m8than in #18
- Dataset Sorting + multi column suffix features by @m8than in #23
New Contributors
- @diannaojiang made their first contribution in #17
- @m8than made their first contribution in #18
Full Changelog: v1.0.2...v1.1.0
v1.0.2 - Fixing world tokenizer vocab size
v1.0.1 - Incremental bug fixes
Various incremental fix
- Export with BF16 support
- potential fix for 4 tokens with issues in world tokenizer
- Updated requirements.txt with missing
jsonargparse[signatures]
- Fix an issue with python 3.10 when doing GPU sync (you should still use 3.11 though)
The following non stable features were added:
- WIP: Docker Env Container
- PROTOTYPE: loss_bias support
v1.0.0 official release of infctx trainer
This is a major release from the original infctx trainer with a huge list of features, from the original infctx trainer
- HF First dataset configuration (see: https://github.com/RWKV/RWKV-infctx-trainer/tree/main/notebook/dataset-config)
- Deepspeed 3 support
- Support for world tokenizer
- Script included to initialize new models, to train models from scratch
- RWKV v5 support (to finetune upcoming models)
- BPTT support (default), for training arbitary data context length
Thanks for all those who helped test the trainer for bugs and issues, even when it was in a very rough early stages. While there are still some features that need to be added, or performance and docs that needs improving. For the vast majority of use cases, you should be able to get started with this new trainer for your finetuning (non LoRA) needs.
Special thanks to @Blealtan @BlinkDL @Yuzaboto @Bananaman