-
Zhejiang University
- HangZhou
LLM
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Awesome-LLM: a curated list of Large Language Model
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editin…
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Recent LLM-based CV and related works. Welcome to comment/contribute!
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
Code for ALBEF: a new vision-language pre-training method
LAVIS - A One-stop Library for Language-Vision Intelligence
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
Code and documentation to train Stanford's Alpaca models, and generate the data.
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
QLoRA: Efficient Finetuning of Quantized LLMs