Using a different ref_model
from model
leads to incorrect results #2307
Open
Description
System Info
- Platform: Linux-6.8.0-47-generic-x86_64-with-glibc2.35
- Python version: 3.10.15
- PyTorch version: 2.4.0
- CUDA device(s): NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3
- Transformers version: 4.46.1
- Accelerate version: 1.1.0
- Accelerate config:
- compute_environment: LOCAL_MACHINE
- distributed_type: FSDP
- mixed_precision: bf16
- use_cpu: False
- debug: True
- num_processes: 4
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- fsdp_config: {'fsdp_activation_checkpointing': True, 'fsdp_auto_wrap_policy': 'TRANSFORMER_BASED_WRAP', 'fsdp_backward_prefetch': 'BACKWARD_PRE', 'fsdp_cpu_ram_efficient_loading': True, 'fsdp_forward_prefetch': True, 'fsdp_offload_params': False, 'fsdp_sharding_strategy': 'FULL_SHARD', 'fsdp_state_dict_type': 'FULL_STATE_DICT', 'fsdp_sync_module_states': True, 'fsdp_use_orig_params': True}
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
- Datasets version: 3.1.0
- HF Hub version: 0.26.2
- TRL version: 0.12.0
- bitsandbytes version: 0.44.1
- DeepSpeed version: not installed
- Diffusers version: not installed
- Liger-Kernel version: not installed
- LLM-Blender version: not installed
- OpenAI version: not installed
- PEFT version: not installed
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder - My own task or dataset (give details below)
Reproduction
I noticed that the DPO trainer uses the processing_class to tokenize inputs to both model
and ref_model
. Is there a way to allow for a different ref_model
base class that does not share the same tokenizer config with the model
? For example using a Llama-3.1-8b model to align a Llama-3.2-3b model - training with this configuration leads to a constant loss=1.0
at the moment.
Expected behavior
The Trainer must take two processing classes and allow for a different ref_model and model class