Skip to content

LEGO:Language-Enhanced Multi-modal Grounding Model

License

Notifications You must be signed in to change notification settings

standardgalactic/lego

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

LEGO:Language-Enhanced Multi-modal Grounding Model


Introduction

LEGO is an end-to-end multimodal grounding model that accurately comprehends inputs and possesses robust grounding capabilities across multi modalities,including images, audios, and videos. To address the issue of limited data, we construct a diverse and high-quality multimodal training dataset. This dataset encompasses a rich collection of multimodal data enriched with spatial and temporal information, thereby serving as a valuable resource to foster further advancements in this field. Extensive experimental evaluations validate the effectiveness of the LEGO model in understanding and grounding tasks across various modalities.

More details are available in our project page.


The overall structure of LEGO. Blue boxes represent video as input, while yellow boxes represent image as input.

Release

We will soon open-source our datasets, codes and models, stay tuned!

Content

Demo

Acknowledgement

About

LEGO:Language-Enhanced Multi-modal Grounding Model

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published