zouhaoa

Follow

ZouHao zouhaoa

Follow

3D Computer Vision, 3D Object Tracking, Semantic Scene Completion

23 followers · 218 following

Zhejiang University
HangZhou

Lists (6)

Sort

BEV Perception

22 repositories

diffusion

35 repositories

🔮 Future ideas

LLM

99 repositories

Multimodality

radar

Radar perception

17 repositories

Starred repositories

deepseek-ai / DeepSeek-V3

Python 17,924 1,409 Updated Jan 7, 2025

pkunlp-icler / FastV

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 329 12 Updated Jan 4, 2025

deepseek-ai / DeepSeek-VL2

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 757 66 Updated Dec 30, 2024

apple / ml-aim

This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.

Python 1,143 54 Updated Nov 22, 2024

rhymes-ai / Aria

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 959 80 Updated Dec 18, 2024

cybertronai / gradient-checkpointing

Make huge neural nets fit in memory

Python 2,743 272 Updated Apr 26, 2020

AFeng-x / PixWizard

Python 124 Updated Oct 9, 2024

daixiangzi / Awesome-Token-Compress

A paper list of some recent works about Token Compress for Vit and VLM

266 14 Updated Jan 8, 2025

dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,235 282 Updated May 4, 2024

bklieger-groq / g1

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Python 4,135 376 Updated Dec 6, 2024

Coobiw / MPP-LLaVA

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Tr…

Jupyter Notebook 397 21 Updated Dec 9, 2024

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 19,294 1,361 Updated Jan 9, 2025

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 4,132 251 Updated Jan 8, 2025

rasbt / LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 37,613 4,841 Updated Jan 8, 2025

facebookresearch / sapiens

High-resolution models for human tasks.

Python 4,733 269 Updated Nov 18, 2024

showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,126 48 Updated Dec 26, 2024

facebookresearch / chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,899 112 Updated Jul 29, 2024

thunlp / LLaVA-UHD

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Python 348 15 Updated Jan 6, 2025

dvlab-research / ControlNeXt

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA

Python 1,467 72 Updated Sep 25, 2024

nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Python 9,310 725 Updated Aug 5, 2024

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 13,542 1,317 Updated Dec 25, 2024

naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 14,002 1,140 Updated May 23, 2024

OpenGVLab / Diffree

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Python 232 14 Updated Aug 6, 2024

zhangxulu1996 / awesome-personalization

11 1 Updated May 10, 2024

xichenpan / Kosmos-G

Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Python 57 3 Updated May 25, 2024

csyxwei / ELITE

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation (ICCV 2023, Oral)

Python 519 30 Updated Jan 8, 2024

315386775 / DeepLearing-Interview-Awesome-2024

AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓，同时包含工作和科研过程中的新想法、新问题、新资源与新项目

1,901 185 Updated Dec 24, 2024

tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Jupyter Notebook 5,528 350 Updated Jun 28, 2024

instantX-research / InstantID

InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥

Python 11,288 827 Updated Jul 18, 2024

Yangyi-Chen / SOLO

[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"

Jupyter Notebook 127 4 Updated Nov 14, 2024

Starred topics

Awesome Lists

3d-object-detection