Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for phi-3-vision-128k-instruct #36

Merged
merged 11 commits into from
Jun 24, 2024
Merged

Conversation

JosefAlbers
Copy link
Contributor

As per #28

@Blaizzy
Copy link
Owner

Blaizzy commented Jun 5, 2024

Hi Josef,

Thanks, this is a really great contribution! 🚀

Could you add the test cases, run pre-commit and rename the phi3vision.py to the model phi3_v.py?

@JosefAlbers
Copy link
Contributor Author

Oh wow, thanks! Will do right away.

@Blaizzy
Copy link
Owner

Blaizzy commented Jun 6, 2024

Thanks for updating the test cases!

Looking at the model structure, I think we can sanitize the weights to create the following structure:

Model

  • language_model
  • vision_model -> vision_embed_tokens

This way we'll have a single structure for all models:

class Model(nn.Module):
    def __init__(self, config: ModelConfig):
        self.model_type = config.model_type
        self.config = config

        self.vision_model = VisionModel(config.vision_config)
        self.language_model = LanguageModel(config.text_config)


    def __call__(
        self,
        inputs: mx.array,
        pixel_values=None,
        image_sizes=None,
        cache=None,
    ):
       h = self.language_model.embed_tokens(inputs)
       p = np.argwhere(inputs < 0).tolist()
       if pixel_values is not None:
            h = self.vision_model(pixel_values, h, image_sizes, p)
       return self.language_model(h)

This will help with testing, debugging and will simplify future contributions.

Example: https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/nanoLlava/nanoLlava.py#L240

@JosefAlbers
Copy link
Contributor Author

Right, I will look into it.

@Blaizzy
Copy link
Owner

Blaizzy commented Jun 8, 2024

Hey @JosefAlbers

Any updates?

@JosefAlbers
Copy link
Contributor Author

@Blaizzy sorry, not yet.

@Blaizzy
Copy link
Owner

Blaizzy commented Jun 20, 2024

Hey @JosefAlbers

Any updates?

@JosefAlbers
Copy link
Contributor Author

JosefAlbers commented Jun 23, 2024

Hi @Blaizzy, Thanks again for the feedback and for patiently guiding me through this process. I apologize for the delay in responding.

After further reflection on the proposed model restructuring, however, I believe it might be more practical to defer this change for a future iteration. While I see the value in standardizing the model structure across the repository, the current structure aligns better with the original Phi-3-Vision model implementation, and maintaining this alignment could be beneficial in the short term.

My primary reason is to prioritize getting this model available for users as quickly as possible. There already are many newer and possibly better VLM models than Phi-3-Vision at this point (e.g., Kosmos, Florence, ...), and introducing a structural change now could potentially delay the merge, as it would require additional testing to ensure everything still functions as expected. I'd propose the following approach:

  1. Merge the model in its current state.
  2. Add a note to the issue tracker or project roadmap highlighting the potential for future refactoring to align with other models in the repository.

This would allow us to revisit the structure at a later time when we have more bandwidth for potential adjustments and testing. I'm open to discussing this further and collaborating on the best path forward. Please let me know your thoughts on this approach.

Lastly, I want to thank you again for the great codes you've been writing in this repository. Your work has been an invaluable learning resource for me.

@Blaizzy
Copy link
Owner

Blaizzy commented Jun 23, 2024

Most welcome!

Ayt, I agree with you we can merge it and then update it.

Before that, could you rebase your branch and test it still works as expected.

@JosefAlbers
Copy link
Contributor Author

JosefAlbers commented Jun 23, 2024

@Blaizzy, Yeap, got a new branch rebased it up to date, copy pasted files from original to the new rebased branch, ran precommits and tests.🤞🏾

@Blaizzy
Copy link
Owner

Blaizzy commented Jun 23, 2024

Please rebase/sync this branch with the main to fix the conflicts, run the tests and precommit.

pytest .

pre-commit run --all-files

@JosefAlbers
Copy link
Contributor Author

Sorry for the silly error, I realize I might have not pushed after running the pre-commit on my local machine. I fixed them.

Copy link
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Just a tiny nit.

mlx_vlm/models/phi3_v/vision.py Outdated Show resolved Hide resolved
jjb21
jjb21 approved these changes Jun 24, 2024
@Blaizzy Blaizzy changed the title Support for phi-3-vision-128k-instruct Add support for phi-3-vision-128k-instruct Jun 24, 2024
@Blaizzy
Copy link
Owner

Blaizzy commented Jun 24, 2024

Thank you very much @JosefAlbers for your hard work! 🎉👏🏾

We got some more models to port here: #39

@Blaizzy Blaizzy merged commit 8ca2a55 into Blaizzy:main Jun 24, 2024
1 check passed
@Blaizzy
Copy link
Owner

Blaizzy commented Jun 24, 2024

@JosefAlbers what's your twitter handle?

@JosefAlbers
Copy link
Contributor Author

@Blaizzy You're very welcome, I'm just happy to help this awesome project grow. And I can't wait to see these new model ports in action!

@Blaizzy
Copy link
Owner

Blaizzy commented Jun 25, 2024

Thank you very much!

You are an absolute legend 👏🏾

Please feel free to share more suggestions or pick any of the good first issues.

I'm currently working on the trainer 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants