Failing to run library on Kaggle

Hello,

I am trying to use this library to train a model and test the results. I have started by trying to get the code to work without errors and then add my data.  So far, I have run into the basic errors that torchrun_args  and training_args are not valid run_training inputs. I search the repo here and there were no matches for these either. Should i try an older version? 

Thanks for your assistance as I am very interested in using this library. 

ERROR:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 46
     43 os.makedirs(training_args.data_output_dir, exist_ok=True)
     45 # Run the training
---> 46 run_training(
     47     torchrun_args=TorchrunArgs(
     48         nnodes=1,
     49         nproc_per_node=1,
     50         node_rank=0,                  # Node rank
     51         rdzv_id=0,                    # Changed rdzv_id to an integer
     52         rdzv_endpoint="localhost:29500",  # Endpoint
     53     ),
     54     training_args=training_args
     55 )
     57 print("Training completed successfully.")

TypeError: run_training() got an unexpected keyword argument 'torchrun_args'

PYTHON CODE ON KAGGLE 

#!pip install instructlab-training
import json
import os
from instructlab.training import run_training, TrainingArgs, TorchrunArgs

# Step 1: Create a small hardcoded synthetic dataset in JSONL format
def create_synthetic_data(output_file="dataset.jsonl"):
    examples = [
        {"instruction": "Translate 'Hello' to Spanish.", "response": "Hola"},
        {"instruction": "What is the capital of France?", "response": "Paris"},
        {"instruction": "Solve 5 + 3.", "response": "8"},
        {"instruction": "Provide a synonym for 'happy'.", "response": "Joyful"},
        {"instruction": "List three primary colors.", "response": "Red, Blue, Yellow"}
    ]
    
    with open(output_file, 'w') as f:
        for example in examples:
            f.write(json.dumps(example) + '\n')
    print(f"Synthetic dataset created at {output_file}")

# Generate dataset
create_synthetic_data()

# Step 2: Define training arguments with all required fields
training_args = TrainingArgs(
    model_path="ibm-granite/granite-3.0-1b-a400m-instruct",
    data_path="dataset.jsonl",
    ckpt_output_dir="data/saved_checkpoints",
    data_output_dir="data/outputs",
    max_seq_len=512,
    max_batch_len=64,  # Added max_batch_len
    num_epochs=1,
    effective_batch_size=8,
    save_samples=1000,  # Added save_samples
    learning_rate=2e-6,
    warmup_steps=100,    # Added warmup_steps
    is_padding_free=True, # Added is_padding_free
    random_seed=42,
)

# Ensure output directories exist
os.makedirs(training_args.ckpt_output_dir, exist_ok=True)
os.makedirs(training_args.data_output_dir, exist_ok=True)

# Run the training
run_training(
    torchrun_args=TorchrunArgs(
        nnodes=1,
        nproc_per_node=1,
        node_rank=0,                  # Node rank
        rdzv_id=0,                    # Changed rdzv_id to an integer
        rdzv_endpoint="localhost:29500",  # Endpoint
    ),
    training_args=training_args
)

print("Training completed successfully.")


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Failing to run library on Kaggle #317

Step 1: Create a small hardcoded synthetic dataset in JSONL format

Generate dataset

Step 2: Define training arguments with all required fields

Ensure output directories exist

Run the training

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Failing to run library on Kaggle #317

Description

Step 1: Create a small hardcoded synthetic dataset in JSONL format

Generate dataset

Step 2: Define training arguments with all required fields

Ensure output directories exist

Run the training

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions