Skip to content
View zju-stu-lizheng's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Zhejiang University
  • Hangzhou,China

Highlights

  • Pro

Block or report zju-stu-lizheng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

verl: Volcano Engine Reinforcement Learning for LLMs

Python 16 2 Updated Aug 28, 2025

Implementation for FP8/INT8 Rollout for RL training without performence drop.

Python 188 13 Updated Sep 1, 2025

HoliTom: Holistic Token Merging for Fast Video Large Language Models

Python 41 Updated Aug 17, 2025

FastVID: Dynamic Density Pruning for Fast Video Large Language Models

Python 14 Updated May 26, 2025

[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

Python 42 Updated May 27, 2025
Jupyter Notebook 11 Updated Aug 18, 2025

Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025

Python 71 2 Updated Mar 14, 2025

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,123 58 Updated Aug 11, 2025
Python 3 Updated Jul 25, 2025

Code for paper: Unraveling the Shift of Visual Information Flow in MLLMs: From Phased Interaction to Efficient Inference

Python 11 Updated Jun 7, 2025

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python 233 33 Updated Sep 4, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,091 93 Updated Sep 5, 2025
Python 98 8 Updated Sep 9, 2024

CUDA Python: Performance meets Productivity

Python 2,956 203 Updated Sep 7, 2025

This is a Chinese translation of the CUDA programming guide

1,669 247 Updated Nov 13, 2024

Curated collection of papers in MoE model inference

253 10 Updated Aug 1, 2025

A collection of 150+ surveys on LLMs

325 24 Updated Feb 19, 2025

a distributed deep learning platform

C++ 3,532 1,272 Updated Sep 5, 2025

FlashMLA: Efficient MLA kernels

C++ 11,719 897 Updated Aug 27, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,007 1,076 Updated Sep 5, 2025

An Open Source Toolkit For LLM Distillation

Python 721 93 Updated Jul 8, 2025

[ICLR 2025] Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

Python 73 2 Updated Mar 29, 2025

Efficient Mixture of Experts for LLM Paper List

Python 123 5 Updated Sep 7, 2025

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

Python 230 24 Updated Nov 18, 2024

[EMNLP 2024] CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification

Python 3 Updated Jun 5, 2025

Bringing BERT into modernity via both architecture changes and scaling

Python 1,506 122 Updated Jun 30, 2025

浙江大学计算机组成riscv——实验部分(vivado2020)

VHDL 16 6 Updated Jan 13, 2022

This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation.

Jupyter Notebook 17 2 Updated Oct 25, 2024

A pre-built agent for TableGPT2.

Python 608 55 Updated Aug 28, 2025

CCKS2023-PromptCBLUE: Code implement of TianChi completition

Python 20 2 Updated Feb 27, 2024
Next