Shaw Xiao9905

Hi, welcome to my Github 👋

I am Xiao Liu, a fourth-year PhD student in Tsinghua University since 2021, expected to graduate on June 2026.

🔭 Interested in Machine Learning, Natural Language Processing, and Foundation Models.
🌱 Find my up-to-date publication list in Google Scholar! Some of my proud works as lead authors:
Large Language Model (LLM) Training and Prompt Learning
- P-tuning and P-tuning v2 (ACL'22): pioneer works on prompt tuning
- GLM-130B (ICLR'23): an open bilingual (Enligsh & Chinese) pre-trained model with 130 billion parameters based on GLM (ACL'22); better than GPT-3 175B on LAMBADA and MMLU.
- ChatGLM-6B & ChatGLM2-6B & ChatGLM3-6B & GLM-4: a family of open bilingual dialogue language models, over 14,000,000 global downloads. Receiving , , , and GitHub Stars!
- WebGLM (KDD'23): an efficient web-enhanced question answering system based on GLM-10B, outperforming WebGPT-13B and approaching WebGPT-175B performance in human evaluation.
Foundational Agents For Real-world Challenging Missions
- AgentBench (ICLR'24): the first systematic multi-dimensional benchmark to evaluate LLMs as Agents in 8 distinct environments deriving from real-world practical missions.
- AutoWebGLM (KDD'24): a strong web navigating agent constructed upon ChatGLM-3-6B, outperforming prompted GPT-4 on Mind2Web, WebArena, and our constructed new dataset AutoWebBench.
- VisualAgentBench (ICLR'25): a comprehensive framework to train and test Large Multimodal Models (LMMs) to serve as visual foundation agents.
- WebRL (ICLR'25): self-evolving online curriculum RL transform open LLMs to outperform GPT-4-Turbo on Web Agent tasks by 160%.
- AndroidLab (ACL'25): training and systematic benchmarking android autonomous agents.
- AutoGLM: autonomous foundation agents for GUIs, the first Phone Use and Web Browser Use agent family.
Alignment and Scalable Oversights over LLMs and Diffusers
- ImageReward (NeurIPS'23): the first general-purpose text-to-image human preference reward model (RM) for RLHF, outperforming CLIP/BLIP/Aesthetic by 30% in terms of human preference prediction.
- BPO (Black-box Prompt Optimization, ACL'24): a novel direction to align LLMs via preference-aware prompt optimization. Improving ChatGPT, Claude, LLaMA on human preference's win rates by 20%+ without training them.
- AlignBench (ACL'24): the first comprehensive benchmark on evaluating LLMs' Chinese alignment, deriving from ChatGLM's online real scenarios. Adopted by top Chinese LLMs (ChatGLM, Qwen, DeepSeek, Yi, Baichuan, Abab, and etc.)
- SPaR (ICLR'25): Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Self-supervised Learning and Reasoning
- Self-supervised Learning: Generative or Contrastive (TKDE'21): one of the most cited survey on self-supervised learning
- SelfKG (WWW'22): self-supervised alignment can be comparable to supervised ones, Best Paper Nominee in WWW 2022.
- kgTransformer (KDD'22): pre-training knowledge graph transformers with mixture-of-experts (MoE) for complex logical reasoning
🤔 Dedicated to building next-generation of AI systems via both Large Pre-trained Model and Symbolic Agent Reasoning.
💬 Feel free to drop me an email for:
- Any form of collaboration
- Any issue about my works or code
- Interesting ideas to discuss or just chatting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shaw Xiao9905

Achievements

Achievements

Organizations

Block or report Xiao9905

Hi, welcome to my Github 👋

Pinned Loading

Uh oh!