GitHub - YuchenLiu98/COMM: Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

COMM

The PyTorch implementation of paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

Overview

COMM, an MLLM designed to integrate the visual embeddings of CLIP and DINOv2 with Multi-level features Merging for enhancing the visual capabilities of multi-modal large language model.

News

[10/16] We released From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models, which is designed to integrate CLIP and DINOv2 with multi-level features merging for enhancing visual capabilities of MLLMs. Checkout the paper. (We have added the pdf of the paper in /images folder)
[10/18] We apologized that the paper and code are under the corporation's legal review. The code release will be delayed. Thanks for your patience!

Performance

We evaluate the model's multi-modal capabilities on five major categories of multi-modal tasks: Referring Expression Comprehension, Referring Expression Generation, Object Hallucination Benchmark, Visual Question Answering and Image Captioning. Our COMM achieves SOTA performance on multiple VL tasks as follows.

Examples

Citation

Please cite our paper if the code is helpful to your research.

@article{jiang2023from,
    author = {Jiang, Dongsheng and Liu, Yuchen and Liu, Songlin and Zhang, Xiaopeng and Li, Jin and Xiong, Hongkai and Tian, Qi},
    title = {From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models},
    journal={arXiv preprint arXiv:2310.08825},
    year = {2023}
}

Acknowledgement

LLaVA and Shikra: The codebase we built upon, which have the amazing multi-modal capabilities!
Vicuna: The powerful LLM we used.
DINOv2: Our used vision encoder.

Thanks for their wonderful works.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMM

Overview

News

Performance

Examples

Citation

Acknowledgement

About

Releases

Packages

License

YuchenLiu98/COMM

Folders and files

Latest commit

History

Repository files navigation

COMM

Overview

News

Performance

Examples

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages