Skip to content

Failing to run library on Kaggle #317

Closed as not planned
Closed as not planned
@kittycattoys

Description

@kittycattoys

Hello,

I am trying to use this library to train a model and test the results. I have started by trying to get the code to work without errors and then add my data. So far, I have run into the basic errors that torchrun_args and training_args are not valid run_training inputs. I search the repo here and there were no matches for these either. Should i try an older version?

Thanks for your assistance as I am very interested in using this library.

ERROR:


TypeError Traceback (most recent call last)
Cell In[7], line 46
43 os.makedirs(training_args.data_output_dir, exist_ok=True)
45 # Run the training
---> 46 run_training(
47 torchrun_args=TorchrunArgs(
48 nnodes=1,
49 nproc_per_node=1,
50 node_rank=0, # Node rank
51 rdzv_id=0, # Changed rdzv_id to an integer
52 rdzv_endpoint="localhost:29500", # Endpoint
53 ),
54 training_args=training_args
55 )
57 print("Training completed successfully.")

TypeError: run_training() got an unexpected keyword argument 'torchrun_args'

PYTHON CODE ON KAGGLE

#!pip install instructlab-training
import json
import os
from instructlab.training import run_training, TrainingArgs, TorchrunArgs

Step 1: Create a small hardcoded synthetic dataset in JSONL format

def create_synthetic_data(output_file="dataset.jsonl"):
examples = [
{"instruction": "Translate 'Hello' to Spanish.", "response": "Hola"},
{"instruction": "What is the capital of France?", "response": "Paris"},
{"instruction": "Solve 5 + 3.", "response": "8"},
{"instruction": "Provide a synonym for 'happy'.", "response": "Joyful"},
{"instruction": "List three primary colors.", "response": "Red, Blue, Yellow"}
]

with open(output_file, 'w') as f:
    for example in examples:
        f.write(json.dumps(example) + '\n')
print(f"Synthetic dataset created at {output_file}")

Generate dataset

create_synthetic_data()

Step 2: Define training arguments with all required fields

training_args = TrainingArgs(
model_path="ibm-granite/granite-3.0-1b-a400m-instruct",
data_path="dataset.jsonl",
ckpt_output_dir="data/saved_checkpoints",
data_output_dir="data/outputs",
max_seq_len=512,
max_batch_len=64, # Added max_batch_len
num_epochs=1,
effective_batch_size=8,
save_samples=1000, # Added save_samples
learning_rate=2e-6,
warmup_steps=100, # Added warmup_steps
is_padding_free=True, # Added is_padding_free
random_seed=42,
)

Ensure output directories exist

os.makedirs(training_args.ckpt_output_dir, exist_ok=True)
os.makedirs(training_args.data_output_dir, exist_ok=True)

Run the training

run_training(
torchrun_args=TorchrunArgs(
nnodes=1,
nproc_per_node=1,
node_rank=0, # Node rank
rdzv_id=0, # Changed rdzv_id to an integer
rdzv_endpoint="localhost:29500", # Endpoint
),
training_args=training_args
)

print("Training completed successfully.")

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions