Skip to content

ImportError: cannot import name 'LlavaOnevisionForConditionalGeneration' from 'transformers' #1878

Closed
@Lopa07

Description

@Lopa07

Describe the bug
I am trying to SFT fine-tune the model llava-onevision-qwen2-0_5b-ov using the following command:

swift sft \
    --model_type llava-onevision-qwen2-0_5b-ov \
    --dataset rlaif-v#1000 \
    --dataset_test_ratio 0.1 \
    --num_train_epochs 5 \
    --output_dir output

I received the error, ImportError: cannot import name 'LlavaOnevisionForConditionalGeneration' from 'transformers'. The error message along with the complete stack is posted below.

(swift) m.banerjee@PHYVDGPU02PRMV:/VDIL_COREML/m.banerjee/ms-swift$ CUDA_VISIBLE_DEVICES=0,1,2,3,5 \
swift sft \
    --model_type llava-onevision-qwen2-0_5b-ov \
    --dataset rlaif-v#1000 \
    --dataset_test_ratio 0.1 \
    --num_train_epochs 5 \
    --output_dir output

run sh: `/VDIL_COREML/m.banerjee/anaconda3/envs/swift/bin/python /VDIL_COREML/m.banerjee/ms-swift/swift/cli/sft.py --model_type llava-onevision-qwen2-0_5b-ov --dataset rlaif-v#1000 --dataset_test_ratio 0.1 --num_train_epochs 5 --output_dir output`
[INFO:swift] Successfully registered `/VDIL_COREML/m.banerjee/ms-swift/swift/llm/data/dataset_info.json`
[INFO:swift] No vLLM installed, if you are using vLLM, you will get `ImportError: cannot import name 'get_vllm_engine' from 'swift.llm'`
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
[INFO:swift] Start time of running main: 2024-08-31 01:32:33.881287
[INFO:swift] Setting template_type: llava-onevision-qwen
[INFO:swift] Setting args.lazy_tokenize: True
[INFO:swift] Setting args.dataloader_num_workers: 1
[INFO:swift] output_dir: /VDIL_COREML/m.banerjee/ms-swift/output/llava-onevision-qwen2-0_5b-ov/v0-20240831-013234
[INFO:swift] args: SftArguments(model_type='llava-onevision-qwen2-0_5b-ov', model_id_or_path='AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf', model_revision='master', full_determinism=False, sft_type='lora', freeze_parameters=0.0, additional_trainable_parameters=[], tuner_backend='peft', template_type='llava-onevision-qwen', output_dir='/VDIL_COREML/m.banerjee/ms-swift/output/llava-onevision-qwen2-0_5b-ov/v0-20240831-013234', add_output_dir_suffix=True, ddp_backend=None, ddp_find_unused_parameters=None, ddp_broadcast_buffers=None, seed=42, resume_from_checkpoint=None, resume_only_model=False, ignore_data_skip=False, dtype='bf16', packing=False, train_backend='transformers', tp=1, pp=1, min_lr=None, sequence_parallel=False, dataset=['rlaif-v#1000'], val_dataset=[], dataset_seed=42, dataset_test_ratio=0.1, use_loss_scale=False, loss_scale_config_path='/VDIL_COREML/m.banerjee/ms-swift/swift/llm/agent/default_loss_scale_config.json', system=None, tools_prompt='react_en', max_length=2048, truncation_strategy='delete', check_dataset_strategy='none', streaming=False, streaming_val_size=0, streaming_buffer_size=16384, model_name=[None, None], model_author=[None, None], quant_method=None, quantization_bit=0, hqq_axis=0, hqq_dynamic_config_path=None, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, rescale_image=-1, target_modules='^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*', target_regex=None, modules_to_save=[], lora_rank=8, lora_alpha=32, lora_dropout=0.05, lora_bias_trainable='none', lora_dtype='AUTO', lora_lr_ratio=None, use_rslora=False, use_dora=False, init_lora_weights='true', fourier_n_frequency=2000, fourier_scaling=300.0, rope_scaling=None, boft_block_size=4, boft_block_num=0, boft_n_butterfly_factor=1, boft_dropout=0.0, vera_rank=256, vera_projection_prng_key=0, vera_dropout=0.0, vera_d_initial=0.1, adapter_act='gelu', adapter_length=128, use_galore=False, galore_target_modules=None, galore_rank=128, galore_update_proj_gap=50, galore_scale=1.0, galore_proj_type='std', galore_optim_per_parameter=False, galore_with_embedding=False, galore_quantization=False, galore_proj_quant=False, galore_proj_bits=4, galore_proj_group_size=256, galore_cos_threshold=0.4, galore_gamma_proj=2, galore_queue_size=5, adalora_target_r=8, adalora_init_r=12, adalora_tinit=0, adalora_tfinal=0, adalora_deltaT=1, adalora_beta1=0.85, adalora_beta2=0.85, adalora_orth_reg_weight=0.5, ia3_feedforward_modules=[], llamapro_num_new_blocks=4, llamapro_num_groups=None, neftune_noise_alpha=None, neftune_backend='transformers', lisa_activated_layers=0, lisa_step_interval=20, reft_layers=None, reft_rank=4, reft_intervention_type='LoreftIntervention', reft_args=None, gradient_checkpointing=True, deepspeed=None, batch_size=1, eval_batch_size=1, auto_find_batch_size=False, num_train_epochs=5, max_steps=-1, optim='adamw_torch', adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, learning_rate=0.0001, weight_decay=0.1, gradient_accumulation_steps=16, max_grad_norm=1, predict_with_generate=False, lr_scheduler_type='cosine', lr_scheduler_kwargs={}, warmup_ratio=0.05, warmup_steps=0, eval_steps=50, save_steps=50, save_only_model=False, save_total_limit=2, logging_steps=5, acc_steps=1, dataloader_num_workers=1, dataloader_pin_memory=True, dataloader_drop_last=False, push_to_hub=False, hub_model_id=None, hub_token=None, hub_private_repo=False, push_hub_strategy='push_best', test_oom_error=False, disable_tqdm=False, lazy_tokenize=True, preprocess_num_proc=1, use_flash_attn=None, ignore_args_error=False, check_model_is_latest=True, logging_dir='/VDIL_COREML/m.banerjee/ms-swift/output/llava-onevision-qwen2-0_5b-ov/v0-20240831-013234/runs', report_to=['tensorboard'], acc_strategy='token', save_on_each_node=False, evaluation_strategy='steps', save_strategy='steps', save_safetensors=True, gpu_memory_fraction=None, include_num_input_tokens_seen=False, local_repo_path=None, custom_register_path=None, custom_dataset_info=None, device_map_config_path=None, device_max_memory=[], max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=1, fsdp='', fsdp_config=None, sequence_parallel_size=1, model_layer_cls_name=None, metric_warmup_step=0, fsdp_num=1, per_device_train_batch_size=None, per_device_eval_batch_size=None, eval_strategy=None, self_cognition_sample=0, train_dataset_mix_ratio=0.0, train_dataset_mix_ds=['ms-bench'], train_dataset_sample=-1, val_dataset_sample=None, safe_serialization=None, only_save_model=None, neftune_alpha=None, deepspeed_config_path=None, model_cache_dir=None, lora_dropout_p=None, lora_target_modules=[], lora_target_regex=None, lora_modules_to_save=[], boft_target_modules=[], boft_modules_to_save=[], vera_target_modules=[], vera_modules_to_save=[], ia3_target_modules=[], ia3_modules_to_save=[], custom_train_dataset_path=[], custom_val_dataset_path=[])
[INFO:swift] Global seed set to 42
device_count: 5
rank: -1, local_rank: -1, world_size: 1, local_world_size: 1
[INFO:swift] Downloading the model from ModelScope Hub, model_id: AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /VDIL_COREML/m.banerjee/.cache/modelscope/hub/AI-ModelScope/llava-onevision-qwen2-0___5b-ov-hf
Traceback (most recent call last):
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/cli/sft.py", line 5, in <module>
    sft_main()
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/sft.py", line 215, in llm_sft
    model, tokenizer = get_model_tokenizer(
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/utils/model.py", line 6347, in get_model_tokenizer
    model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, load_model, **kwargs)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/utils/model.py", line 5889, in get_model_tokenizer_llava_onevision
    from transformers import LlavaOnevisionForConditionalGeneration
ImportError: cannot import name 'LlavaOnevisionForConditionalGeneration' from 'transformers' (/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/__init__.py)
(swift) m.banerjee@PHYVDGPU02PRMV:/VDIL_COREML/m.banerjee/ms-swift$ 

Your hardware and system info
CUDA Version: 12.4
System: Ubuntu 22.04.3 LTS
GPU
torch==2.4.0
transformers==4.45.0.dev0
trl==0.9.6
peft==0.12.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions