Tags · instructlab/training

v0.14.1

fix _no_split_modules subscript error for transformers v5 (#683)

Feb 11, 2026
c517712
zip
tar.gz
Notes
Downloads

v0.14.0

Add MLflow support and expose logging configuration in TrainingArgs (#…

…680)

* add support for mlflow

* fix formatting changes

* Add tensorboard_log_dir to TrainingArgs for configurable TensorBoard logging

- Add tensorboard_log_dir field to TrainingArgs in config.py
- Update setup_metric_logger to use tensorboard_log_dir when provided
- Add CLI argument for tensorboard_log_dir
- Wire tensorboard_log_dir through run_training() to subprocess command

This allows users to specify a custom directory for TensorBoard logs,
defaulting to output_dir if not specified.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Address PR review feedback

- Replace defensive getattr() with direct attribute access in main_ds.py
  since args are guaranteed to exist from argparse defaults
- Remove unused log_dir parameter from MLflowHandler
- Add debug logging for non-numeric metrics skipped by MLflowHandler

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* removes generic `run_name` and `logger_type` kwargs

* review comments

* something something mlflow active runs

* review comments

* coderabbit

* adds install targets for logging backends

* add targets for loggers

* messaging

* comments

* interim changes

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Feb 5, 2026
0c47c97
zip
tar.gz
Notes
Downloads

v0.13.0

Exposes API for processing pretraining data (#672)

This commit enables the data processing code to create pre-training style datasets. The training loop is also updated to ingest pretraining-style datasets, where documents are chunked by some `block_size` and the chunks are then treated as independent and fully-unmasked samples.

Jan 8, 2026
574f946
zip
tar.gz
Notes
Downloads

v0.12.1

fix(torchrun): Omit empty arguments and correct nproc_per_node type (#…

…661)

* fix(torchrun): Omit empty arguments and correct nproc_per_node type

The command generation logic is updated to dynamically
build the torchrun command, excluding arguments that
are empty or None. This prevents them from overriding
environment variables, ensuring that torchrun can
correctly inherit its configuration. An exception is
made for integer arguments where 0 is a valid value.

Additionally, the nproc_per_node argument type has been
changed from int to str to support special values
accepted by PyTorch, such as 'auto', 'gpu', and 'cpu'.

Reference: https://github.com/pytorch/pytorch/blob/main/torch/distributed/run.py#L77-L88

Signed-off-by: Saad Zaher <szaher@redhat.com>

* only dynamically add torchrun args & change rdzv_id type to str

Signed-off-by: Saad Zaher <szaher@redhat.com>

* fix smoke tests

Signed-off-by: Saad Zaher <szaher@redhat.com>

* Enable both dtypes str, int for nproc_per_node, rdzv_id

Signed-off-by: Saad Zaher <szaher@redhat.com>

* Use python3.11 style for pydatnic model

Signed-off-by: Saad Zaher <szaher@redhat.com>

* add all torchrun args and validate them

Signed-off-by: Saad Zaher <szaher@redhat.com>

* Remove non-required dependencies

Signed-off-by: Saad Zaher <szaher@redhat.com>

* update datatypes only

Signed-off-by: Saad Zaher <szaher@redhat.com>

* replace _ with - when passing torchrun args

Signed-off-by: Saad Zaher <szaher@redhat.com>

* make nproc_per_node to only accept gpu or int

Signed-off-by: Saad Zaher <szaher@redhat.com>

* add master_{addr, port} validate args

Signed-off-by: Saad Zaher <szaher@redhat.com>

* check for not set or empty rdzv endpoint

Signed-off-by: Saad Zaher <szaher@redhat.com>

* fix formatting error

Signed-off-by: Saad Zaher <szaher@redhat.com>

* Update src/instructlab/training/config.py

Signed-off-by: Saad Zaher <szaher@redhat.com>

* Update tests/smoke/test_train.py

Signed-off-by: Saad Zaher <szaher@redhat.com>

* Update src/instructlab/training/main_ds.py

Signed-off-by: Saad Zaher <szaher@redhat.com>

* fixes indentation

Signed-off-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>

* formatting

* add standalone as the fallback when neither master_addr nor rdzv_endpoint are provided

Signed-off-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>

* clarify rdzv-backend arg

---------

Signed-off-by: Saad Zaher <szaher@redhat.com>
Signed-off-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>
Co-authored-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>

Oct 14, 2025
637afae
zip
tar.gz
Notes
Downloads

v0.12.0

Add kernels>0.9.0 to CUDA requirements (#658)

Signed-off-by: Mustafa Eyceoz <meyceoz@redhat.com>

Sep 17, 2025
536ebfb
zip
tar.gz
Notes
Downloads

v0.11.1

Fix isort errors

Aug 2, 2025
bfd0d73
zip
tar.gz
Notes
Downloads

v0.10.4

Merge pull request #634 from instructlab/mergify/bp/release-v0.10/pr-628

uncap accelerate in `requirements-cuda.txt` (backport #628)

Jul 3, 2025
0cc2e30
zip
tar.gz
Notes
Downloads

v0.10.3

Merge pull request #546 from instructlab/mergify/bp/release-v0.10/pr-455

moves deepspeed requirements into their own file; add deepspeed extras (backport #455)

May 8, 2025
40e1e8c
zip
tar.gz
Notes
Downloads

v0.11

Merge pull request #528 from fynnsu/pylint-unused-argument

Enable pylint 'unused-argument' check

May 6, 2025
e8eb284
zip
tar.gz
Notes
Downloads

v0.10.2

Merge pull request #518 from instructlab/mergify/bp/release-v0.10/pr-517

deps: Remove caps on ROCm dependencies (backport #517)

May 1, 2025
a9a69e9
zip
tar.gz
Notes
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.14.1

v0.14.0

v0.13.0

v0.12.1

v0.12.0

v0.11.1

v0.10.4

v0.10.3

v0.11

v0.10.2

Tags: instructlab/training