Skip to content

[Feature Request]: Can we use a VLM to do document parser? #5499

Closed
@sinopec

Description

@sinopec

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

Unable to accurately parse charts or other documents.

Describe the feature you'd like

Can we use a multimodal large model, such as Qwen2.5-VL, to extract content from images, scanned PDFs, or charts embedded in DOC files? If there is an interface that can be configured, it would be very flexible.

Describe implementation you've considered

No response

Documentation, adoption, use case

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions