-
AMD | Tsinghua University
- California, USA
-
12:04
(UTC -07:00) - https://yushengsu-thu.github.io/
- @thu_yushengsu
Highlights
- Pro
Lists (3)
Sort Oldest
Stars
A Next-Generation Training Engine Built for Ultra-Large MoE Models
I recently interviewed with some AI labs and these are the notes I took during my study for ML fundamentals and Design. This was in Mar 2025 and given how fast the field of AI moves, some of it may…
Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
Renderer for the harmony response format to be used with gpt-oss
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
Allow torch tensor memory to be released and resumed later
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
yushengsu-thu / slime
Forked from THUDM/slimeslime is a LLM post-training framework aiming at scaling RL.
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Ongoing research training transformer models at scale
slime is a LLM post-training framework for RL Scaling.
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
SkyRL: A Modular Full-stack RL Library for LLMs
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Code to automatically prove or verify estimates in analysis
Allow torch tensor memory to be released and resumed later
My learning notes/codes for ML SYS.
Some Environment Examples of LLM Agents, it's designed to be able to integrated with VeRL
Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your research ideas
My learning notes/codes for ML SYS.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
yushengsu-thu / verl
Forked from volcengine/verlverl: Volcano Engine Reinforcement Learning for LLMs
Fully open reproduction of DeepSeek-R1
Sky-T1: Train your own O1 preview model within $450