-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use local dataset #41
Comments
Hi, Could you check your Best, |
The version is 0.2.0, where did u find the version 0.4.0? |
And another question is about accelerate_configs, I found 3 yaml files: deepspeed_zero3、fsdp and multi_gpu. If I want to change number of gpus from 4 to 8, which paramer should I change? 'num_processes' in deepspeed_zero3? What is multi_gpu.yaml used for? |
Best, |
Since my training environment could not connect to the internet, I download the model and dataset and save them in the local disk.
The arguments:
model path: ModelArguments(base_model_revision=None, model_name_or_path='/home/models/huggingface/llama-3-8b--777cfbd-C11', model_revision='main', model_code_revision=None, torch_dtype=None, tokenizer_name_or_path=None, trust_remote_code=False, use_flash_attention_2=True, use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False)
data path: DataArguments(chat_template=None, dataset_mixer=/home/SimPO/data/ultrafeedback_binarized', text_column='text', dataset_splits=['train_prefs', 'test_prefs'], dataset_configs=None, preprocessing_num_workers=12, truncation_side=None, auto_insert_empty_system_msg=True)
But there is an error when runing the scripts:
Traceback (most recent call last):
File "/home/SimPO/scripts/run_simpo.py", line 319, in
main()
File "/home/SimPO/scripts/run_simpo.py", line 162, in main
raw_datasets = get_datasets(
File "/usr/local/lib/python3.10/dist-packages/alignment/data.py", line 170, in get_datasets
raw_datasets = mix_datasets(
File "/usr/local/lib/python3.10/dist-packages/alignment/data.py", line 215, in mix_datasets
for ds, frac in dataset_mixer.items():
AttributeError: 'str' object has no attribute 'items'
I think there maybe something wrong when I use the local data path, how could I fix it?
The text was updated successfully, but these errors were encountered: