Description
Hello,
I am trying to use this library to train a model and test the results. I have started by trying to get the code to work without errors and then add my data. So far, I have run into the basic errors that torchrun_args and training_args are not valid run_training inputs. I search the repo here and there were no matches for these either. Should i try an older version?
Thanks for your assistance as I am very interested in using this library.
ERROR:
TypeError Traceback (most recent call last)
Cell In[7], line 46
43 os.makedirs(training_args.data_output_dir, exist_ok=True)
45 # Run the training
---> 46 run_training(
47 torchrun_args=TorchrunArgs(
48 nnodes=1,
49 nproc_per_node=1,
50 node_rank=0, # Node rank
51 rdzv_id=0, # Changed rdzv_id to an integer
52 rdzv_endpoint="localhost:29500", # Endpoint
53 ),
54 training_args=training_args
55 )
57 print("Training completed successfully.")
TypeError: run_training() got an unexpected keyword argument 'torchrun_args'
PYTHON CODE ON KAGGLE
#!pip install instructlab-training
import json
import os
from instructlab.training import run_training, TrainingArgs, TorchrunArgs
Step 1: Create a small hardcoded synthetic dataset in JSONL format
def create_synthetic_data(output_file="dataset.jsonl"):
examples = [
{"instruction": "Translate 'Hello' to Spanish.", "response": "Hola"},
{"instruction": "What is the capital of France?", "response": "Paris"},
{"instruction": "Solve 5 + 3.", "response": "8"},
{"instruction": "Provide a synonym for 'happy'.", "response": "Joyful"},
{"instruction": "List three primary colors.", "response": "Red, Blue, Yellow"}
]
with open(output_file, 'w') as f:
for example in examples:
f.write(json.dumps(example) + '\n')
print(f"Synthetic dataset created at {output_file}")
Generate dataset
create_synthetic_data()
Step 2: Define training arguments with all required fields
training_args = TrainingArgs(
model_path="ibm-granite/granite-3.0-1b-a400m-instruct",
data_path="dataset.jsonl",
ckpt_output_dir="data/saved_checkpoints",
data_output_dir="data/outputs",
max_seq_len=512,
max_batch_len=64, # Added max_batch_len
num_epochs=1,
effective_batch_size=8,
save_samples=1000, # Added save_samples
learning_rate=2e-6,
warmup_steps=100, # Added warmup_steps
is_padding_free=True, # Added is_padding_free
random_seed=42,
)
Ensure output directories exist
os.makedirs(training_args.ckpt_output_dir, exist_ok=True)
os.makedirs(training_args.data_output_dir, exist_ok=True)
Run the training
run_training(
torchrun_args=TorchrunArgs(
nnodes=1,
nproc_per_node=1,
node_rank=0, # Node rank
rdzv_id=0, # Changed rdzv_id to an integer
rdzv_endpoint="localhost:29500", # Endpoint
),
training_args=training_args
)
print("Training completed successfully.")