Skip to content
View fupiao1998's full-sized avatar

Block or report fupiao1998

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

small audio language model for reasoning

Python 50 1 Updated Mar 25, 2025

A Unified Tokenizer for Visual Generation and Understanding

Python 218 5 Updated Mar 3, 2025

Pytorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting

Python 69 2 Updated Feb 17, 2025

Official Implementation of VideoDPO

Python 72 Updated Jan 12, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 9,342 1,014 Updated Mar 29, 2025

PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437

Python 1,023 51 Updated Feb 25, 2025

《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀

Shell 55,147 11,917 Updated Mar 17, 2025

Paper collections of multi-modal LLM for Math/STEM/Code.

84 4 Updated Mar 30, 2025

Ola: Pushing the Frontiers of Omni-Modal Language Model

Python 321 14 Updated Feb 28, 2025

Witness the aha moment of VLM with less than $3.

Python 3,432 271 Updated Mar 1, 2025

A fork to add multimodal model training to open-r1

Python 1,143 58 Updated Feb 8, 2025

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

Python 2,434 179 Updated Mar 18, 2025

FastVideo is a lightweight framework for accelerating large video diffusion models.

Python 1,284 76 Updated Mar 30, 2025

Code for the Molmo Vision-Language Model

Python 347 26 Updated Dec 12, 2024

FaceChain is a deep-learning toolchain for generating your Digital-Twin.

Jupyter Notebook 9,358 875 Updated Dec 10, 2024

[ICLR 2025] Autoregressive Video Generation without Vector Quantization

Python 440 12 Updated Mar 27, 2025

A Tutorial for Diffusion Models

Jupyter Notebook 45 5 Updated Jul 17, 2023

A paper list of some recent works about Token Compress for Vit and VLM

394 20 Updated Mar 27, 2025

[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".

Python 296 1 Updated Mar 5, 2025
Python 389 44 Updated Jul 30, 2024

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python 29,352 2,319 Updated Mar 27, 2025

[ACL'2024 Findings] GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation

Python 55 5 Updated Mar 13, 2024

中国科研常用LaTeX模板集

TeX 495 68 Updated Mar 11, 2025

LiVOS: Light Video Object Segmentation with Gated Linear Matching (CVPR 2025)

Python 28 2 Updated Mar 10, 2025

A suite of image and video neural tokenizers

Jupyter Notebook 1,589 74 Updated Feb 11, 2025

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 123 2 Updated Dec 13, 2024

Align Anything: Training All-modality Model with Feedback

Python 3,136 395 Updated Mar 30, 2025
Python 366 27 Updated Feb 28, 2025

Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.

Python 1,065 62 Updated Feb 7, 2025
Next