Skip to content
View Sprinter1999's full-sized avatar
🏀
Working out
🏀
Working out

Block or report Sprinter1999

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

🎶Multi-modal

44 repositories

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 28,365 3,537 Updated Jul 23, 2024

This repo hosts the code and model of "Separate What You Describe: Language-Queried Audio Source Separation", Interspeech 2022

Python 145 8 Updated Oct 11, 2023

[ICASSP 2023] FedAudio: A Federated Learning Benchmark for Audio and Speech Tasks

Python 49 1 Updated Feb 21, 2024

Code for paper Learning Audio-Visual Dereverberation

Python 27 5 Updated Aug 10, 2022

Toolkits for Multimodal Emotion Recognition

Python 196 17 Updated May 26, 2024

[CVPR 2023] iQuery: Instruments as Queries for Audio-Visual Sound Separation

Python 65 Updated Jul 25, 2023

Using Segment-Anything and CLIP to generate pixel-aligned semantic features.

Python 39 3 Updated Apr 27, 2023

A toolkit for researchers in the multimodal sound separation.

16 Updated Oct 20, 2023

Survey Paper List - Efficient LLM and Foundation Models

242 18 Updated Sep 22, 2024

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Jupyter Notebook 5,169 681 Updated Aug 5, 2024

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,855 381 Updated Mar 14, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 22,143 2,437 Updated Aug 12, 2024

Large World Model -- Modeling Text and Video with Millions Context

Python 7,269 557 Updated Oct 19, 2024

Code for the paper: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models" [ICCV'23]

Python 97 4 Updated Aug 22, 2023

Separable Diffusion Model Unlearning

Python 12 Updated Jan 29, 2025

DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency (AAAI24)

Python 42 4 Updated Aug 20, 2024

Image captioning using python and BLIP

Python 47 11 Updated Aug 16, 2023

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python 30,209 2,406 Updated Apr 9, 2025

Data annotation toolbox supports image, audio and video data.

Python 1,142 116 Updated Apr 10, 2025

[ECCV2024] The Official Implementation for ''AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection''

Python 209 9 Updated Dec 26, 2024
Python 7 Updated Nov 16, 2023

[ACL 2024] Official resources of "ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models".

Python 293 25 Updated Aug 17, 2024

[NeurIPS 2024, spotlight] Scaling Out-of-Distribution Detection for Multiple Modalities

Python 56 4 Updated Apr 8, 2025

Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)

Python 53 9 Updated Mar 5, 2023

[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model

Python 87 7 Updated Nov 28, 2023

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 2,169 320 Updated Apr 10, 2025

Get your documents ready for gen AI

Python 26,735 1,607 Updated Apr 9, 2025

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Python 263 26 Updated Apr 2, 2025

A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.

489 22 Updated Apr 1, 2025