Closed
Description
Reproduction
Due to a refactor in transformers
(huggingface/transformers#33771), the latest stable release of transformers
(4.52.2) and liger-kernel
(0.5.9) currently doesn't work in SFTTrainer
when use_liger_kernel=True
is set in SFTConfig
. Command to repro:
python trl/scripts/sft.py \
--model_name_or_path Qwen/Qwen2-0.5B \
--dataset_name trl-lib/Capybara \
--learning_rate 2.0e-5 \
--num_train_epochs 1 \
--packing \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--gradient_checkpointing \
--eos_token '<|im_end|>' \
--logging_steps 25 \
--eval_strategy steps \
--eval_steps 100 \
--output_dir Qwen2-0.5B-SFT \
--use_liger_kernel
outputs:
Traceback (most recent call last):
File "/fsx/lewis/git/hf/trl/trl/scripts/sft.py", line 149, in <module>
main(script_args, training_args, model_args)
File "/fsx/lewis/git/hf/trl/trl/scripts/sft.py", line 117, in main
trainer = SFTTrainer(
^^^^^^^^^^^
File "/fsx/lewis/git/hf/trl/trl/trainer/sft_trainer.py", line 385, in __init__
super().__init__(
File "/fsx/lewis/git/hf/trl/trl-env/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/fsx/lewis/git/hf/trl/trl-env/lib/python3.11/site-packages/transformers/trainer.py", line 531, in __init__
from liger_kernel.transformers import _apply_liger_kernel_to_instance
File "/fsx/lewis/git/hf/trl/trl-env/lib/python3.11/site-packages/liger_kernel/transformers/__init__.py", line 97, in __getattr__
module = importlib.import_module("liger_kernel.transformers.monkey_patch")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/admin/home/lewis/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/fsx/lewis/git/hf/trl/trl-env/lib/python3.11/site-packages/liger_kernel/transformers/monkey_patch.py", line 16, in <module>
from liger_kernel.transformers.model.gemma import lce_forward as gemma_lce_forward
File "/fsx/lewis/git/hf/trl/trl-env/lib/python3.11/site-packages/liger_kernel/transformers/model/gemma.py", line 11, in <module>
from transformers.models.gemma.modeling_gemma import _CONFIG_FOR_DOC
ImportError: cannot import name '_CONFIG_FOR_DOC' from 'transformers.models.gemma.modeling_gemma' (/fsx/lewis/git/hf/trl/trl-env/lib/python3.11/site-packages/transformers/models/gemma/modeling_gemma.py)
This has been fixed on liger-kernel@main
in linkedin/Liger-Kernel#712 so a current workaround is to install from source via
pip install git+https://github.com/linkedin/Liger-Kernel.git
When the next version of liger-kernel
is published, we should bump our lower bound in the trl
dependencies cc @kashif for viz
System Info
- Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31
- Python version: 3.11.11
- TRL version: 0.18.0.dev0+a528b9c
- PyTorch version: 2.6.0
- accelerator(s): NVIDIA H100 80GB HBM3
- Transformers version: 4.52.2
- Accelerate version: 1.7.0.dev0
- Accelerate config: not found
- Datasets version: 3.5.0
- HF Hub version: 0.30.2
- bitsandbytes version: 0.45.5
- DeepSpeed version: 0.16.6
- Diffusers version: 0.32.2
- Liger-Kernel version: 0.5.9
- LLM-Blender version: 0.0.2
- OpenAI version: 1.75.0
- PEFT version: 0.15.2
- vLLM version: 0.8.4
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete