🤩 A comprehensive list of papers about Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions.
▼ High-level view of a compound AI system and its optimization
Note
Contributions welcome
- Open an issue or PR to add new or missing papers.
- If your paper is accepted by venues, you may consider updating the relevant information.
- If you think your paper is more suitable for another category, submit a PR or contact us.
- Thank you for helping us maintain a comprehensive and accurate survey!
Recent advancements in large language models (LLMs) and AI systems have led to a paradigm shift in the design and optimization of complex AI workflows. By integrating multiple components, compound AI systems have become increasingly adept at performing sophisticated tasks. However, as these systems grow in complexity, new challenges arise in optimizing not only individual components but also their interactions. While traditional optimization methods such as supervised fine-tuning (SFT) and reinforcement learning (RL) remain foundational, the rise of natural language feedback introduces promising new approaches, especially for optimizing non-differentiable systems. This paper provides a systematic review of recent progress in optimizing compound AI systems, encompassing both numerical and language-based techniques. We formalize the notion of compound AI system optimization, classify existing methods along several key dimensions, and highlight open research challenges and future directions in this rapidly evolving field.
▼ The proposed 2×2 taxonomy spans Structural Flexibility (y-axis) and Learning Signals (x-axis)
- Fixed Structure, NL Feedback
- Fixed Structure, Numerical Signals
- Flexible Structure, NL Feedback
- Flexible Structure, Numerical Signals
▼ Learning Signals are classified into two categories, with Numerical Signals further divided by their utilization schemes.
📊 System Metrics
(a) Devise rule-based algorithms that directly learn from raw system performance metrics
🎯 Formalized Training Objectives
Transform system evaluation results into formalized training objectives:
(b1) Supervised Fine-tuning (SFT) losses
(b2) Reinforcement Learning (RL) reward functions
(b3) Direct Preference Optimization (DPO) losses
Paper Title | Date | Conference/Journal |
---|---|---|
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow | 2025/01 | arXiv |
How to Correctly do Semantic Backpropagation on Language-based Agentic Systems | 2024/12 | arXiv |
Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization | 2024/12 | arXiv |
metaTextGrad: Automatically optimizing language model optimizers | 2024/10 | NeurIPS |
AIME: AI System Optimization via Multiple LLM Evaluators | 2024/10 | arXiv |
Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs | 2024/06 | NeurIPS |
TextGrad: Automatic "Differentiation" via Text | 2024/06 | Nature |
Paper Title | Date | Conference/Journal | Signals Type |
---|---|---|---|
Aligning Compound AI Systems via System-level DPO | 2025/02 | AAAI | b3 |
MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning | 2025/02 | arXiv | b2 |
Optimizing Model Selection for Compound AI Systems | 2025/02 | arXiv | a |
SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning | 2025/02 | arXiv | b1 |
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains | 2025/01 | ICLR | b1 |
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together | 2024/07 | EMNLP | a, b1 |
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs | 2024/06 | EMNLP | a |
Towards AutoAI: Optimizing a Machine Learning System with Black-box and Differentiable Components | 2024/05 | ICML | a |
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines | 2023/10 | ICLR | a |
Paper Title | Date | Conference/Journal |
---|---|---|
DebFlow: Automating Agent Creation via Agent Debate | 2025/03 | COLM |
Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies | 2025/02 | arXiv |
AFlow: Automating Agentic Workflow Generation | 2024/10 | ICLR |
Automated Design of Agentic Systems | 2024/08 | ICLR |
Symbolic Learning Enables Self-Evolving Agents | 2024/06 | arXiv |
Paper Title | Date | Conference/Journal | Signals Type |
---|---|---|---|
MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning | 2025/06 | arXiv | b2 |
FlowReasoner: Reinforcing Query-Level Meta-Agents | 2025/04 | arXiv | b1, b2 |
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors | 2025/04 | arXiv | b2 |
MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems | 2025/03 | arXiv | b1 |
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | 2025/02 | arXiv | b3 |
Multi-agent Architecture Search via Agentic Supernet | 2025/01 | ICML | b2 |
AutoFlow: Automated Workflow Generation for Large Language Model Agents | 2024/07 | arXiv | b2 |
Language Agents as Optimizable Graphs | 2024/02 | ICML | b2 |
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration | 2023/10 | COLM | a |
We welcome contributions from researchers and developers to enhance this 'Awesome Compound AI System Optimization Methods' collection.
If you know of relevant papers that should be included in this repository, please reach out to us.
Contact: r12946015@ntu.edu.tw / r13922053@ntu.edu.tw