Multimodal Prompt Collaboration

Dear authors,

I'm very interested in the text-visual interaction and collaboration described in your paper. In real-world tasks, multiple prompts are often used, and it's challenging to make these different prompts coordinate effectively to achieve the best results. I've read your paper and tried to find the corresponding implementation of text-visual collaboration in the code, but it seems that most parts directly call the T-Rex2 API for processing. This didn't fully answer my questions. I'd like to know the specific implementation details of how to coordinate multiple types of prompts (such as text and visual prompts) to work together. Could you please provide some insights or guidance on this?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multimodal Prompt Collaboration #98

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multimodal Prompt Collaboration #98

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions