Skip to content

Start agent traces #414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 50 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
352008b
Start agent traces
aymeric-roucher Feb 24, 2025
6c231d2
Working local version with o1
aymeric-roucher Feb 25, 2025
69b2651
Update api addr
aymeric-roucher Feb 26, 2025
ad948c2
Increase concurrent requests
aymeric-roucher Feb 26, 2025
a00f0ee
Update sbatch params
aymeric-roucher Feb 26, 2025
143fcfa
Add conda activation
aymeric-roucher Feb 26, 2025
0af9e75
Use local model
aymeric-roucher Feb 26, 2025
6cffffe
128 concurrent
aymeric-roucher Feb 26, 2025
cf13c2b
Log
aymeric-roucher Feb 26, 2025
cffa362
Add conda init
aymeric-roucher Feb 26, 2025
e35800c
Fix slurm script
aymeric-roucher Feb 26, 2025
b47a4be
Add await
aymeric-roucher Feb 26, 2025
0cd0999
Try fixing async func
aymeric-roucher Feb 26, 2025
dd15ad8
Add stop sequences
aymeric-roucher Feb 26, 2025
d2588cd
Add port
aymeric-roucher Feb 27, 2025
b738e58
Make synchronous
aymeric-roucher Feb 28, 2025
f78b865
Small adapts to script
aymeric-roucher Feb 28, 2025
cb2a2c2
More detailed error logging
aymeric-roucher Feb 28, 2025
9a2d16f
Even more detailed request error logging
aymeric-roucher Feb 28, 2025
2a1ff76
Reduce context length
aymeric-roucher Feb 28, 2025
a97eb27
Add token counting
aymeric-roucher Feb 28, 2025
d8cb19b
Fix message roles an add token counting
aymeric-roucher Feb 28, 2025
e42b1cd
Add dummy completion
aymeric-roucher Feb 28, 2025
83a679f
Test
aymeric-roucher Feb 28, 2025
d87e3f3
Running with gpt-4o
aymeric-roucher Feb 28, 2025
8e70ca4
Update timeouts
aymeric-roucher Feb 28, 2025
2876d52
Adjust
aymeric-roucher Feb 28, 2025
cf52433
Flatten messages
aymeric-roucher Feb 28, 2025
a07cd54
Prompt more around testing the function
aymeric-roucher Feb 28, 2025
ddc1cdd
Improve explanations in prompt
aymeric-roucher Feb 28, 2025
4c2fce6
Also store final outputs
aymeric-roucher Mar 13, 2025
4a20ba4
Try Qwen Coder 32B
aymeric-roucher Apr 2, 2025
6961c36
Remove some dependencies to work on mac
aymeric-roucher Apr 3, 2025
2b1bc05
Merge branch 'main' into agent-traces
aymeric-roucher Apr 3, 2025
38efcfc
Working trace generation with auto verification by running test cases
aymeric-roucher Apr 3, 2025
b7522e3
Add training scripts for agents
aymeric-roucher Apr 3, 2025
2ddf70e
Change job name
aymeric-roucher Apr 3, 2025
49083cc
Intervert sft training configs
aymeric-roucher Apr 3, 2025
de2b792
Point to proper config file
aymeric-roucher Apr 3, 2025
5647c26
Add distributed type
aymeric-roucher Apr 3, 2025
8a7951c
Revert to zero3 config
aymeric-roucher Apr 3, 2025
d28d07b
Remove deepspeed config
aymeric-roucher Apr 4, 2025
cae3c7c
Update train slurm
aymeric-roucher Apr 4, 2025
2a08444
Switch to new venv
aymeric-roucher Apr 8, 2025
1eaf1d1
Move script to proper file
aymeric-roucher Apr 8, 2025
2043be9
Change job name
aymeric-roucher Apr 8, 2025
2030e16
Increase epochs
aymeric-roucher Apr 8, 2025
08a449c
Update dataset name
aymeric-roucher Apr 9, 2025
60472f6
Increase epochs
aymeric-roucher Apr 9, 2025
9347590
adding qwen 3b training setup
Apr 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions recipes/Qwen2.5-3B-Instruct/sft/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Model arguments
# You can download the model and manually change the rope to 300k/500k and max_position_embeddings to 32768
model_name_or_path: HuggingFaceTB/SmolLM2-1.7B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: sdpa

# Data training arguments
dataset_name: open-r1/OpenR1-Math-220k
dataset_num_proc: 48

#SFT hyperparam
max_length: 8192 # You can set this to 32768 if you change the rope, but you need to change the config.json file
weight_decay: 0.0001
optim: adamw_torch
lr_scheduler_type: linear
warmup_ratio: 0.1
learning_rate: 5.0e-05
gradient_accumulation_steps: 2
per_device_eval_batch_size: 4
per_device_train_batch_size: 4 # Change this depending on the context length of the model to keep a 500M GBS.

# SFT trainer config
max_steps: -1
num_train_epochs: 3
bf16: true
do_eval: false
eval_strategy: 'no'
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: OpenR1-Qwen-7B-SFT
hub_strategy: every_save
log_level: info
logging_steps: 5
logging_strategy: steps
packing: true
output_dir: data/OpenR1-Qwen-7B-SFT
overwrite_output_dir: true
push_to_hub: true
report_to:
- wandb
save_strategy: "steps"
save_steps: 500
save_total_limit: 1
seed: 42
46 changes: 46 additions & 0 deletions recipes/Qwen2.5-3B-Instruct/sft/config_agent.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Model arguments
# You can download the model and manually change the rope to 300k/500k and max_position_embeddings to 32768
model_name_or_path: Qwen/Qwen2.5-3B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: sdpa

# Data training arguments
dataset_name: smolagents/training-traces
dataset_num_proc: 48

#SFT hyperparam
max_length: 8192 # You can set this to 32768 if you change the rope, but you need to change the config.json file
weight_decay: 0.0001
optim: adamw_torch
lr_scheduler_type: linear
warmup_ratio: 0.1
learning_rate: 4.0e-05
gradient_accumulation_steps: 1
per_device_eval_batch_size: 4
per_device_train_batch_size: 2 # Change this depending on the context length of the model to keep a 500M GBS.

# SFT trainer config
max_steps: -1
num_train_epochs: 2
bf16: true
do_eval: false
eval_strategy: 'no'
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: oR1-Qwen-3B-Agentic-e2-lr4e-b2
hub_strategy: every_save
log_level: info
logging_steps: 5
logging_strategy: steps
packing: true
output_dir: data/oR1-Qwen-3B-Agentic-e2-lr4e-b2
overwrite_output_dir: true
push_to_hub: true
report_to:
- wandb
save_strategy: "steps"
save_steps: 500
save_total_limit: 1
seed: 42
46 changes: 46 additions & 0 deletions recipes/SmolLM2-1.7B-Instruct/sft/config_agent.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Model arguments
# You can download the model and manually change the rope to 300k/500k and max_position_embeddings to 32768
model_name_or_path: HuggingFaceTB/SmolLM2-1.7B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: sdpa

# Data training arguments
dataset_name: smolagents/training-traces
dataset_num_proc: 48

#SFT hyperparam
max_length: 8192 # You can set this to 32768 if you change the rope, but you need to change the config.json file
weight_decay: 0.0001
optim: adamw_torch
lr_scheduler_type: linear
warmup_ratio: 0.1
learning_rate: 5.0e-05
gradient_accumulation_steps: 2
per_device_eval_batch_size: 4
per_device_train_batch_size: 4 # Change this depending on the context length of the model to keep a 500M GBS.

# SFT trainer config
max_steps: -1
num_train_epochs: 6
bf16: true
do_eval: false
eval_strategy: 'no'
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: OpenR1-SmolLM2-1.7B-Instruct-Agentic
hub_strategy: every_save
log_level: info
logging_steps: 5
logging_strategy: steps
packing: true
output_dir: data/OpenR1-SmolLM2-1.7B-Instruct-Agentic
overwrite_output_dir: true
push_to_hub: true
report_to:
- wandb
save_strategy: "steps"
save_steps: 500
save_total_limit: 1
seed: 42
Loading