- 👋 Hi, I’m Lingling.
- 👀 I’m interested in LLM inference/serving and photonic compiler
- 🌱 I’m currently focused on model quantization and parallelism.
- 📫 How to reach me: linglingfan@stanford.edu
- 😄 Pronouns: She/Her
- ⚡ Fun fact: I am a pianist.
🎯
Focusing
LF started a new repo only for open-source exploration. This repo doesn't represent her affiliations with company and school.
- Menlo Park
Pinned Loading
-
SageAttention
SageAttention PublicForked from thu-ml/SageAttention
Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Cuda
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.