Implement Joycaption as a custom Llava model #659

nArn0 · 2026-01-16T07:38:32Z

Implement Joycaption as a specific llava model. All information and quantized Joycaption model available here: https://huggingface.co/n-Arno/joycaption-mlx-mxfp4

I know this PR may not fit perfectly in mlx-vlm, but just in case, i prefer proposing the PR even if it may be rejected.

Thanks for your work on mlx-vlm!

Blaizzy · 2026-01-17T21:19:59Z

Hey @nArn0
This is perfect fit actually, thanks for the contributions!

A could nits:

If the model uses llava arch, then in this PR you can just add a model mapping key, mapping llava_joycaption to the existing llava. https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/utils.py#L26-L33
This can be a notebook with more details, you can check the existing notebooks: examples/test_joycaption.py

nArn0 · 2026-01-18T09:15:53Z

Oh! Thanks for your feedback! I'll look into it, JoyCaption looks like a classic llava but since it uses Siglip2, i had to get your implementation from mlx-embeddings for the vision part which didn't fit the classic CLIP llava used.

Indeed, a notebook would help for the example.

My only issue i really didn't find how to fix is that if torchvision is present, torch.nn.functional.interpolate fails with a cryptic error.

Blaizzy · 2026-01-18T10:17:24Z

My pleasure!

In that case just import all the common components from llava (inherit from them) and add the new. For instance inherit Model and override vision_tower.

Check mistral3

https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/mistral3/mistral3.py#L6

nArn0 and others added 2 commits January 16, 2026 08:10

Implement Joycaption as a custom Llava model

41c2547

Merge branch 'main' into main

eac485b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Joycaption as a custom Llava model #659

Implement Joycaption as a custom Llava model #659

nArn0 commented Jan 16, 2026

Uh oh!

Blaizzy commented Jan 17, 2026 •

edited

Loading

Uh oh!

nArn0 commented Jan 18, 2026

Uh oh!

Blaizzy commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Implement Joycaption as a custom Llava model #659

Are you sure you want to change the base?

Implement Joycaption as a custom Llava model #659

Conversation

nArn0 commented Jan 16, 2026

Uh oh!

Blaizzy commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nArn0 commented Jan 18, 2026

Uh oh!

Blaizzy commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Blaizzy commented Jan 17, 2026 •

edited

Loading