Skip to content
View zouhaoa's full-sized avatar
  • Zhejiang University
  • HangZhou

Block or report zouhaoa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 329 12 Updated Jan 4, 2025

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 757 66 Updated Dec 30, 2024

This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.

Python 1,143 54 Updated Nov 22, 2024

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 959 80 Updated Dec 18, 2024

Make huge neural nets fit in memory

Python 2,743 272 Updated Apr 26, 2020
Python 124 Updated Oct 9, 2024

A paper list of some recent works about Token Compress for Vit and VLM

266 14 Updated Jan 8, 2025

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,235 282 Updated May 4, 2024

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Python 4,135 376 Updated Dec 6, 2024

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Tr…

Jupyter Notebook 397 21 Updated Dec 9, 2024

Official inference repo for FLUX.1 models

Python 19,294 1,361 Updated Jan 9, 2025

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 4,132 251 Updated Jan 8, 2025

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 37,613 4,841 Updated Jan 8, 2025

High-resolution models for human tasks.

Python 4,733 269 Updated Nov 18, 2024

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,126 48 Updated Dec 26, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,899 112 Updated Jul 29, 2024

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Python 348 15 Updated Jan 6, 2025

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA

Python 1,467 72 Updated Sep 25, 2024

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Python 9,310 725 Updated Aug 5, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 13,542 1,317 Updated Dec 25, 2024

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 14,002 1,140 Updated May 23, 2024

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Python 232 14 Updated Aug 6, 2024

Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Python 57 3 Updated May 25, 2024

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation (ICCV 2023, Oral)

Python 519 30 Updated Jan 8, 2024

AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目

1,901 185 Updated Dec 24, 2024

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Jupyter Notebook 5,528 350 Updated Jun 28, 2024

InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥

Python 11,288 827 Updated Jul 18, 2024

[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"

Jupyter Notebook 127 4 Updated Nov 14, 2024
Next