Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞
-
Updated
Jun 3, 2026 - Python
Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞
Official code repo for NeurIPS 2025 Spotlight paper, "Debate or Vote: Which Yields Better Decisions in Multi-Agent LLMs?"
Framework: Multi-Agent LLMs For Conversational Task-Solving (MALLM)
Research-backed methodology for multi-AI collaborative decision-making with structured debate, consensus synthesis, and bias reduction
Source code for the paper: Hear Both Sides: Efficient Multi-Agent Debate via Diversity-Aware Message Retention
Human-in-the-loop adversarial workflows for high-stakes research audit: from ChatGPT-Gemini duels to 4-model MAD.
Code for "Multiple LLM Agents Debate for Equitable Cultural Alignment" [ACL 2025 Oral]
Code review, but with 5 models arguing first.
A brutally fault-tolerant Mixture-of-Agents (MoA) pipeline built in pure Python. Designed to orchestrate chaotic, round-robin LLM proxy endpoints through a rigorous 4-stage Agentic Workflow (Generate ➔ Cross-Critique ➔ Rebuttal ➔ Judge). Built to eradicate hallucination and guarantee absolute accuracy in complex, multi-step reasoning tasks.
Three Claude Code skills for working with Codex CLI: codex-bridge (one-shot Codex calls), mad-build (Claude+Codex collaboration with cross-review), and mad-research (three-stream adversarial audit of papers, grants, reports with anonymized cross-critique and fresh-Codex synthesis).
Enable autonomous AI agents to optimize LLM training code through iterative experiments and improve models without manual intervention overnight
Research paper on how agentic debate pipelines can be constructed to reduce hallucinations in LLMs with open-source and commercial models
Generate research papers autonomously by chatting with OpenClaw, using Python 3.11+, with a self-evolving framework and extensive test coverage.
AI Agent Workspace Redesign: A structured multi-agent debate methodology for managing AI agent workspaces (memory, file organization, protection tiers, boot sequences)
supporting codes for the study on multi-agent debate protocols
An adversarial AI expert workshop that stress-tests a research paper (rival-tradition referees argue; every comment quote-grounded and independently re-verified) and then rebuilds it: tracked-changes redline, clean version, your code re-run under a provenance wall, and a replication package. A Claude Code skill.
Neurips paper code - Evaluating and enhancing Large Language Models (LLMs) using mathematical datasets through innovative Multi-Agent Debate Architecture, without traditional fine-tuning or Retrieval-Augmented Generation techniques. This project explores advanced strategies to boost LLM capabilities in mathematical reasoning.
Multi-LLM debate orchestrator that drives ChatGPT, Claude, and DeepSeek web UIs (no API keys) through a 5-phase loop: propose → critique → revise → synthesize → ratify-or-veto. Editorial dark UI.
Run your decisions through a jury of 12 AI minds before you commit.
Build autonomous experiment loops that edit files, run tests, and keep only improvements for any project type
Add a description, image, and links to the multi-agent-debate topic page so that developers can more easily learn about it.
To associate your repository with the multi-agent-debate topic, visit your repo's landing page and select "manage topics."