
- Beijing, China
- https://sprinter1999.github.io/website/
Lists (30)
Sort Name ascending (A-Z)
⏩Acceleration
AIGC
🎵Audio
🚗Auto-drive
Awesome List
🏫Course
🐱CV
🏘FL
Foundation Model
🌐Graph
✨ Inspiration
👾Interesting
📚IR
iris
LT-experiments
MLsys
🎶Multi-modal
My track
📕NLP
🏴OOD&OD
💻RecSys
🤖RL
Security
Self-Supervised
Series learning
Tabular
Transfer Learning
👻Util
⚡Workspace
Σ Math Inspired
Starred repositories
[Up-to-date] Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Cache-Augmented Generation: A Simple, Efficient Alternative to RAG
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels, FA2, HGEMM via MMA and CuTe (~99% TFLOPS of cuBLAS/FA2 🎉).
Official Pytorch Implementation of "OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning" by Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei Liu
Official implementation of paper "VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation"
"AI-Researcher: Fully-Automated Scientific Discovery with LLM Agents" & "Open-Sourced Alternative to Google AI Co-Scientist"
[NeurIPS 2024] NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
Latest Advances on Vison-Language-Action Models.
AlignCLIP: Improving Cross-Modal Alignment in CLIP (ICLR 2025)
OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.
Zero-Shot Detection via Vision and Language Knowledge Distillation
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
[CVPR 2025] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving
AMD 0.9B efficient text to video diffusion model
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Recent Advances on MLLM's Reasoning Ability
A framework for few-shot evaluation of language models.
[NeurIPS '18] "Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?" Official Implementation.
Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
[AAAI Alignment Track 25 Poster] Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction