Skip to content

OpenMLRL/LLM_Collab_Writing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Collaboration – Writing

This repo provides the extended environments for CoMLRL.

This repository contains the writing-task experiments in [AAAI26] LLM Collaboration with Multi‑Agent Reinforcement Learning.

Writing demo

Installation

Install CoMLRL:

pip install comlrl
# Install PyTorch compatible with your device

Or via conda-forge:

conda install -c conda-forge comlrl
# Install PyTorch compatible with your device

Benchmarks

  • ArXiv Abstract Expansion: LovelyBuggies/arXiv_abstract (train[:1000], val[:1000])
  • TLDR Summarization: trl-lib/tldr (train[:1000], test[:1000])

Training Scripts

python LLM_Collab_Writing/train_grpo.py \
  --config LLM_Collab_Writing/configs/grpo_arxiv_config.yaml

python LLM_Collab_Writing/train_magrpo.py \
  --config LLM_Collab_Writing/configs/magrpo_tldr_config.yaml

Override any configuration value inline with --override:

python LLM_Collab_Writing/train_magrpo.py \
  --config LLM_Collab_Writing/configs/magrpo_arxiv_config.yaml \
  --override model.name='Qwen/Qwen3-7B' magrpo.learning_rate=3e-6

Settings

Single Turn

Writing runs are strictly single-turn. Both training entrypoints enforce num_turns=1; configs that specify other values will raise an error.

Formatters

  • ArXiv: Agent 1 writes background/motivation; Agent 2 writes methodology/implications.
  • TLDR: Agent 1 produces a concise summary; Agent 2 expands with additional details and vocabulary diversity.
  • GRPO mode: A single agent emits both paragraphs separated by [PARAGRAPH_SPLIT], which the reward splits internally.

Reward Structure

Rewards reuse the level-based metrics from the paper:

  1. Structural token limits.
  2. Relative length coordination.
  3. Vocabulary diversity (unique word ratios).
  4. Style mix (transition-word coverage + Jaccard overlap).

The same functions back evaluation loggers for the baselines.

Logging

Evaluation wrappers adapt the original logging utilities to the unified MAGRPOTrainer API, yielding aggregated metrics such as token ratios, transition coverage, and gated vs. ungated rewards. Weights & Biases configs mirror the code-generation project; set wandb.project, wandb.entity, and wandb.name in YAML or via overrides.

About

LLM Collaboration for Article Writing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages