Skip to content

Phi3.5-vision-instruct fine-tuning best practices. (Latex OCR Fine-tuning) #1809

@Jintao-Huang

Description

@Jintao-Huang

Huggingface Model: https://huggingface.co/microsoft/Phi-3.5-vision-instruct

Fine-tuned Dataset: https://huggingface.co/datasets/linxy/LaTeX_OCR

Usually, fine-tuning a multimodal large model involves using a custom dataset for fine-tuning. Here, we will demonstrate a runnable demo.

Before starting the fine-tuning, please ensure that your environment is properly prepared.

git clone https://github.com/modelscope/ms-swift.git
cd swift
pip install -e .[llm]

Inference

# ModelScope
CUDA_VISIBLE_DEVICES=0 swift infer \
  --model_type phi3_5-vision-instruct \
  --use_flash_attn false

# HuggingFace
USE_HF=1 CUDA_VISIBLE_DEVICES=0 swift infer \
  --model_type phi3_5-vision-instruct \
  --model_id_or_path microsoft/Phi-3.5-vision-instruct \
  --use_flash_attn false

Results

<<< who are you
I am Phi, an AI developed by Microsoft to assist with providing information, answering questions, and helping users find solutions to their queries. How can I assist you today?
--------------------------------------------------
<<< <image>please describe the image.
Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
The image features a close-up of a kitten with striking blue eyes and a white and grey striped coat. The kitten's fur is soft and fluffy, and it appears to be looking directly at the camera with a curious and innocent expression. The background is blurred, which puts the focus entirely on the kitten's face.
--------------------------------------------------
<<<  <image>What is the result of the calculation?
Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
The result of the calculation 1452 + 45304 is 46756.

GPU Memory:

截屏2024-08-23 18 09 29

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions