Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Oct 28, 2025

Important

This is still WIP

Supersede #15474 and #16745

Fix #13694

In this PR, I try to add one more dimension to llama_batch.pos when using M-RoPE. This is exactly the solution illustrated in this comment: #15474 (comment) , but currently only the image batch has this extra dim

The output is more accurate now, using the same test from #13694

[
        {"bbox_2d": [168, 679, 434, 837], "label": "rectangle"},
        {"bbox_2d": [312, 575, 480, 764], "label": "rectangle"},
        {"bbox_2d": [599, 708, 672, 775], "label": "rectangle"}
]

TODO: add the same dim to text batch

@ngxson ngxson added the breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. label Oct 28, 2025
@ngxson
Copy link
Collaborator Author

ngxson commented Oct 28, 2025

On second thought, I doubt that implementing the 2-dim positions for text batch as described in this comment will break quite a lot of things, especially on llama-server.

Instead, I'm thinking about an alternative approach: patch the llama_kv_cache::set_input_kq_mask to produce the correct KQ mask for M-RoPE

@ggerganov
Copy link
Member

ggerganov commented Oct 28, 2025

TODO: add the same dim to text batch

Is there a use case for 2-dim pos for the text batch? Or your idea is just for consistency with the m-rope case? nvm, ignore this comment

@ngxson
Copy link
Collaborator Author

ngxson commented Oct 28, 2025

Superseded by #16825

@ngxson ngxson closed this Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. examples

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates

3 participants