Show Lab

All

71 repositories

Awesome-Video-Diffusion
Public
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
awesome video-editing video-understanding video-generation diffusion-models text-to-video video-restoration text-to-motion
200•3.4k•0•0•Updated Nov 18, 2024Nov 18, 2024
ShowUI
Public
0•7•0•0•Updated Nov 17, 2024Nov 17, 2024
Show-1
Public
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Python
•
Other
•62•1.1k•8•7•Updated Nov 15, 2024Nov 15, 2024
computer_use_ootb
Public
An out-of-the-box (OOTB) version of Anthropic Claude Computer Use for Windows and macOS
Python
•
MIT License
•27•263•9•2•Updated Nov 14, 2024Nov 14, 2024
BoxDiff
Public
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
text-to-image-synthesis diffusion-models
Python
•17•251•7•0•Updated Nov 12, 2024Nov 12, 2024
Show-o
Public
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
multimodal diffusion-models large-language-models
Python
•
Apache License 2.0
•44•1k•30•0•Updated Nov 11, 2024Nov 11, 2024
sparseformer
Public
(ICLR 2024, CVPR 2024) SparseFormer
computer-vision transformer efficient-neural-networks vision-transformer sparseformer
Python
•
MIT License
•2•63•1•0•Updated Nov 10, 2024Nov 10, 2024
LOVA3
Public
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
benchmark visual-question-answering multimodal-deep-learning visual-question-generation multimodal-large-language-models data-asse
Python
•1•63•0•0•Updated Nov 7, 2024Nov 7, 2024
Awesome-Unified-Multimodal-Models
Public
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
9•212•0•0•Updated Nov 7, 2024Nov 7, 2024
VideoLISA
Public
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
1•33•2•0•Updated Nov 3, 2024Nov 3, 2024
VisInContext
Public
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
efficient in-context-learning llm mllm
Python
•2•13•1•0•Updated Oct 30, 2024Oct 30, 2024
Exo2Ego-V
Public
0•6•0•0•Updated Oct 29, 2024Oct 29, 2024
Awesome-GUI-Agent
Public
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
awesome graphical-user-interface ai-assistant llm-agent gui-agents
10•193•0•0•Updated Oct 27, 2024Oct 27, 2024
watermark-steganalysis
Public
Python
•0•2•0•0•Updated Oct 24, 2024Oct 24, 2024
videogui
Public
[NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
gui video-language llm-agent
JavaScript
•0•21•0•0•Updated Oct 22, 2024Oct 22, 2024
EvolveDirector
Public
[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
Python
•0•40•0•0•Updated Oct 14, 2024Oct 14, 2024
Awesome-MLLM-Hallucination
Public
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
13•455•1•0•Updated Oct 10, 2024Oct 10, 2024
MovieSeq
Public
[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
Jupyter Notebook
•1•29•1•0•Updated Oct 1, 2024Oct 1, 2024
GUI-Narrator
Public
Repository of GUI Action Narrator
JavaScript
•0•3•0•0•Updated Sep 22, 2024Sep 22, 2024
RingID
Public
Python
•0•18•1•0•Updated Aug 30, 2024Aug 30, 2024
MotionDirector
Public
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
video-generation diffusion-models text-to-video text-to-motion text-to-video-generation motion-customization
Python
•
Apache License 2.0
•53•847•21•0•Updated Aug 21, 2024Aug 21, 2024
videollm-online
Public
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Python
•
Apache License 2.0
•27•227•17•0•Updated Aug 15, 2024Aug 15, 2024
X-Adapter
Public
[CVPR 2024] X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Python
•
Apache License 2.0
•44•742•17•4•Updated Aug 14, 2024Aug 14, 2024
afformer
Public
Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)
deep-learning pytorch
Python
•2•39•6•0•Updated Jul 26, 2024Jul 26, 2024
cvpr2024-tutorial-video-diffusion-models
Public
HTML
•
MIT License
•0•1•0•0•Updated Jul 16, 2024Jul 16, 2024
DragAnything
Public
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
Python
•15•431•20•0•Updated Jul 2, 2024Jul 2, 2024
AssistGaze
Public
Python
•0•1•0•0•Updated Jun 25, 2024Jun 25, 2024
cosmo
Public
Python
•4•72•2•2•Updated May 10, 2024May 10, 2024
EgoVLP
Public
[NeurIPS2022] Egocentric Video-Language Pretraining
pretraining video-language egocentric-vision pytorch
Python
•20•229•5•0•Updated May 9, 2024May 9, 2024
UniVTG
Public
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
video-summarization video-grounding pretraining moment-retrieval highlight-detection video-language
Python
•
MIT License
•29•322•19•0•Updated May 8, 2024May 8, 2024