ChatBridge

ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations of paired data.

Introduction

ChatBridge is a multimodal language model capable of perceiving real-world multimodal information, as well as following instructions, thinking, and interacting with humans in natural language. Inspired by Flamingo and BLIP-2, we introduce perceiver modules to bridge the encoders and the LLM. we choose open-sourced Vicuna-13B as the LLM, which is built upon LLaMA, and reported to achieve 90% of ChatGPT's quality as per GPT-4's evaluation. As for the modal-specific encoders, we choose EVA-ViT-G as the vision encoder to encode images and videos, and BEAT as the audio encoder to encoder audios.

Stage 1: Bridge each modality with language, leverage large-scale language-paired two-modality data for multimodal alignment training, including image-text, video-text, and audio-text pairs.
Stage 2: Multimodal Instruction Tuning, instruction-finetune ChatBridge to align the model with user intent on a multimodal instruction dataset MULTIS, enabling more effective zero-shot generalization on multimodal tasks.

Examples

More examples can be found in the project page.

Getting Started

Code and data will be released in June!

Acknowledgement

BLIP2 The model architecture of ChatBridge follows BLIP-2. Don't forget to check this great open-source work if you don't know it before!
Lavis This repository is built upon Lavis!
Vicuna The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!
MiniGPT4 and LLaVA. We utilize their instruction data and drew inspiration from their approach to design a more comprehensive multimodal instruction dataset. They are all open-source!

If you're using ChatBridge in your research or applications, please cite using this BibTeX:

@article{zhao2023chatbridge,
  title={ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst},
  author={Zhao, Zijia and Guo, Longteng and Yue, Tongtian and Chen, Sihan and Shao, Shuai and Zhu, Xinxin and Yuan, Zehuan and Liu, Jing},
  journal={arXiv preprint arXiv:2305.16103},
  year={2023}
}

License

This repository is under BSD 3-Clause License. Many codes are based on Lavis with BSD 3-Clause License here.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatBridge

Introduction

Examples

Getting Started

Acknowledgement

License

About

Releases

Packages

License

lazykumasensei/ChatBridge

Folders and files

Latest commit

History

Repository files navigation

ChatBridge

Introduction

Examples

Getting Started

Acknowledgement

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages