[Feature Request]: Can we use a  VLM to do document parser?

### Is there an existing issue for the same feature request?

- [x] I have checked the existing issues.

### Is your feature request related to a problem?

```Markdown
Unable to accurately parse charts or other documents.
```

### Describe the feature you'd like

Can we use a multimodal large model, such as Qwen2.5-VL, to extract content from images, scanned PDFs, or charts embedded in DOC files? If there is an interface that can be configured, it would be very flexible.

### Describe implementation you've considered

_No response_

### Documentation, adoption, use case

```Markdown

```

### Additional information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Can we use a VLM to do document parser? #5499

Is there an existing issue for the same feature request?

Is your feature request related to a problem?

Describe the feature you'd like

Describe implementation you've considered

Documentation, adoption, use case

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request]: Can we use a VLM to do document parser? #5499

Description

Is there an existing issue for the same feature request?

Is your feature request related to a problem?

Describe the feature you'd like

Describe implementation you've considered

Documentation, adoption, use case

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions