Debate2Create is a research codebase for co-optimizing robot morphology and control with multi-agent LLM debate. Design agents propose MuJoCo XML edits, control agents write reward functions, judge agents critique the candidates, and the resulting robots are evaluated with reinforcement learning in Brax/MuJoCo locomotion environments.
Overview of the Debate2Create framework: agents debate, synthesize, train, and evaluate candidate robot designs and rewards.
- Jointly proposes robot morphologies and reward functions instead of treating body design and control design as separate problems.
- Uses multi-agent debate to critique and revise candidate XML/reward pairs before reinforcement-learning evaluation.
- Trains candidates with PPO or SAC in Brax/MuJoCo and compares methods with a shared simulator score.
- Includes robot assets, Hydra configs, command-line utilities, and CPU smoke tests for Ant, HalfCheetah, Hopper, Swimmer, and Walker2d.
Use Python 3.10 from the repository root:
python3.10 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -e . --no-depsRun the local CPU smoke tests:
PYTHONPATH=src:. JAX_PLATFORM_NAME=cpu MUJOCO_GL=disable \
python -m unittest discover -s testsRun the full local preflight:
scripts/preflight.shRender one included D2C design/reward pair without calling an LLM or training a policy:
PYTHONPATH=src:. JAX_PLATFORM_NAME=cpu MUJOCO_GL=disable \
d2c-render --xml baselines/swimmer/d2c/swimmer_modified.xml \
--reward baselines/swimmer/d2c/reward_id0.py --env-name swimmer \
--steps 5 --out outputs/readme_swimmer_static.htmlGPU training requires a CUDA-enabled JAX/JAXLIB installation matching your
machine. For local headless CPU checks, use MUJOCO_GL=disable; for rendering
or GPU execution, choose the MuJoCo GL backend appropriate for the system.
The main Hydra config is cfg/config.yaml. Select an environment with
env=<name>:
PYTHONPATH=src:. python src/debate.py --helpLLM-backed runs require an API key for the selected provider:
export OPENAI_API_KEY=...
# or, for Gemini-backed agents:
export GEMINI_API_KEY=...Provider selection is controlled by LLM_PROVIDER or by choosing a model name
with a provider-specific prefix, such as model=gemini-2.5-pro. For Gemini,
GOOGLE_API_KEY is also accepted as a compatibility alias, but set only one of
GEMINI_API_KEY or GOOGLE_API_KEY.
Weights & Biases logging is off by default in cfg/config.yaml. To force
offline behavior in shared environments:
export WANDB_MODE=disabledA small functionality check on Hopper:
OPENAI_API_KEY=... PYTHONPATH=src:. python src/debate.py env=hopper \
debate.rounds=1 sample=1 design_sample=1 debate.enable_judges=false \
rl.envs.hopper.sac.num_timesteps=10000The timestep override is only a smoke setting; use larger training budgets for meaningful policies.
PYTHONPATH=src:. python scripts/train_xml_reward.py --help
PYTHONPATH=src:. python scripts/render_xml_reward.py --help
d2c-debate --help
d2c-train-xml --help
d2c-render --helpsrc/debate.py: main Hydra entry point for Debate2Create.scripts/train_xml_reward.py: train a policy for a specific XML and reward.scripts/render_xml_reward.py: train and render an XML/reward pair.
Reward files are Python code and are executed locally when compiled. Review LLM-generated or third-party reward files before running training or rendering commands against them.
Runtime outputs should go under ignored directories such as outputs/, runs/,
or logs/. A typical debate run writes one directory per round:
outputs/debate/<timestamp>/debate_runs/
config.yaml
exchange_history.json
round_000/
feedback_context.txt
round_exchange.json
reward_scores_cand00.json
thesis_00/
critique.txt
design_thesis.json
<env>_modified.xml
prompts/
synthesis_00/
critique.txt
design_synthesis.json
<env>_modified.xml
reward_id0.py
train_metrics_0.json
persona_feedback/
prompts/
assets/ MuJoCo XML robot assets
baselines/ Reference XML/reward assets and benchmark utilities
cfg/ Hydra configuration
docs/ Additional documentation and README figures
envs/ Brax environment definitions
scripts/ Training, rendering, and analysis utilities
src/ Debate, design, training, judging, and rendering code
tests/ CPU smoke tests
utils/ LLM, reward, XML, prompt, and filesystem utilities
If you use Debate2Create in your research, please cite:
@inproceedings{qiu2026debate2create,
title={Debate2Create: Robot Co-design via Multi-Agent {LLM} Debate},
author={Qiu, Kevin and Cygan, Marek},
booktitle={Proceedings of the International Conference on Machine Learning (ICML)},
year={2026},
note={To appear}
}This project is licensed under the Apache License 2.0. See LICENSE for details. Third-party code and asset attributions are listed in THIRD_PARTY_NOTICES.md.
