Phi3.5-vision-instruct fine-tuning best practices. (Latex OCR Fine-tuning)

Huggingface Model: https://huggingface.co/microsoft/Phi-3.5-vision-instruct

Fine-tuned Dataset: https://huggingface.co/datasets/linxy/LaTeX_OCR

Usually, fine-tuning a multimodal large model involves using a custom dataset for fine-tuning. Here, we will demonstrate a runnable demo.

Before starting the fine-tuning, please ensure that your environment is properly prepared.

```bash
git clone https://github.com/modelscope/ms-swift.git
cd swift
pip install -e .[llm]
```


### Inference

```bash
# ModelScope
CUDA_VISIBLE_DEVICES=0 swift infer \
  --model_type phi3_5-vision-instruct \
  --use_flash_attn false

# HuggingFace
USE_HF=1 CUDA_VISIBLE_DEVICES=0 swift infer \
  --model_type phi3_5-vision-instruct \
  --model_id_or_path microsoft/Phi-3.5-vision-instruct \
  --use_flash_attn false
```

**Results**
```
<<< who are you
I am Phi, an AI developed by Microsoft to assist with providing information, answering questions, and helping users find solutions to their queries. How can I assist you today?
--------------------------------------------------
<<< <image>please describe the image.
Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
The image features a close-up of a kitten with striking blue eyes and a white and grey striped coat. The kitten's fur is soft and fluffy, and it appears to be looking directly at the camera with a curious and innocent expression. The background is blurred, which puts the focus entirely on the kitten's face.
--------------------------------------------------
<<<  <image>What is the result of the calculation?
Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
The result of the calculation 1452 + 45304 is 46756.
```

**GPU Memory**:

<img width="585" alt="截屏2024-08-23 18 09 29" src="https://github.com/user-attachments/assets/faee9bdd-1a39-4ca8-9258-5e43948b8cd4">


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phi3.5-vision-instruct fine-tuning best practices. (Latex OCR Fine-tuning) #1809

Inference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Phi3.5-vision-instruct fine-tuning best practices. (Latex OCR Fine-tuning) #1809

Description

Inference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions