yushengsu-thu

Ethan (Yusheng) Su yushengsu-thu

#ML #NLP #LLM Goal: Building a model toward AGI.

112 followers · 74 following

AMD | Tsinghua University
California, USA
12:04 (UTC -07:00)
https://yushengsu-thu.github.io/
@thu_yushengsu

Achievements

Highlights

Organizations

Lists (3)

Sort

Stars

InternLM / xtuner

A Next-Generation Training Engine Built for Ultra-Large MoE Models

Python 4,765 358 Updated Sep 8, 2025

asiddhant / ai_interview_prep_notes_mar_2025

I recently interviewed with some AI labs and these are the notes I took during my study for ML fundamentals and Design. This was in Mar 2025 and given how fast the field of AI moves, some of it may…

17 3 Updated Aug 21, 2025

Multi-LLM / prism-research

Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.

Python 24 Updated Aug 15, 2025

sgl-project / sgl-learning-materials

Materials for learning SGLang

566 47 Updated Aug 31, 2025

openai / harmony

Renderer for the harmony response format to be used with gpt-oss

Rust 3,758 193 Updated Aug 15, 2025

SandAI-org / MagiAttention

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 501 30 Updated Sep 8, 2025

yushengsu-thu / torch_memory_saver

Forked from fzyzcjy/torch_memory_saver

Allow torch tensor memory to be released and resumed later

Python 2 Updated Aug 12, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,890 367 Updated Sep 8, 2025

NVlabs / Long-RL

Long-RL: Scaling RL to Long Sequences

Python 604 21 Updated Sep 8, 2025

yushengsu-thu / slime

Forked from THUDM/slime

slime is a LLM post-training framework aiming at scaling RL.

Python 1 Updated Sep 8, 2025

yushengsu-thu / Pai-Megatron-Patch-amd_version

Forked from alibaba/Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Python 1 Updated Jun 30, 2025

yushengsu-thu / Megatron-LM-amd_version

Forked from NVIDIA/Megatron-LM

Ongoing research training transformer models at scale

Python 1 Updated Jun 29, 2025

THUDM / slime

slime is a LLM post-training framework for RL Scaling.

Python 1,675 146 Updated Sep 8, 2025

ISEEKYAN / mbridge

Bridge Megatron-Core to Hugging Face/Reinforcement Learning

Python 116 16 Updated Sep 5, 2025

RLFoundation / vllm-patch

Python 1 Updated Sep 8, 2025

ROCm / clr

C++ 150 79 Updated Sep 8, 2025

NovaSky-AI / SkyRL

SkyRL: A Modular Full-stack RL Library for LLMs

Python 823 93 Updated Sep 8, 2025

yushengsu-thu / yushengsu-thu.github.io

Forked from academicpages/academicpages.github.io

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

JavaScript 1 1 Updated Aug 24, 2025

yushengsu-thu / yushengsu-thu

1 Updated Sep 6, 2025

teorth / estimates

Code to automatically prove or verify estimates in analysis

JavaScript 303 24 Updated Jul 1, 2025

fzyzcjy / torch_memory_saver

Allow torch tensor memory to be released and resumed later

Python 124 20 Updated Aug 29, 2025

yushengsu-thu / Awesome-ML-SYS-Tutorial

Forked from zhaochenyang20/Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 1 Updated Jun 2, 2025

Python 2 2 Updated Jul 8, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,394 2,368 Updated Sep 8, 2025

NovaSky-AI / SkyThought

Sky-T1: Train your own O1 preview model within $450

Python 3,327 338 Updated Jul 12, 2025

Ethan (Yusheng) Su yushengsu-thu

Highlights

Organizations

Lists (3)

Leaded Projects

In-processing-project

Reference_project

Stars