Skip to content

πŸ”₯[ICLR'26] Official repository for the paper "Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs"

Notifications You must be signed in to change notification settings

HKUSTDial/LiteCoST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs

A two-stage RL-enhanced framework that equips SLMs for high-accuracy long-document QA.

OpenReview Model Python

πŸŽ‰ News

  • [2026-01-26] Our LiteCoST is accepted by ICLR’26.

πŸ“‹ Overview

Overview Figure

Pillar 1: Chain-of-Structured-Thought (CoST) uses a high-capability LLM purely as a trace generator: it proposes a minimal structure, executes a step-wise, structure-guided trace over the documents, serializes the result, and verifies/refines it (optionally with an LLM-as-judge).

Overview Figure

Pillar 2: SLM fine- tuning (SFT β†’ GRPO) trains an SLM with the CoST supervision in two phases: Supervised Fine-Tuning to learn structural patterns, formatting rules, and reasoning steps, followed by Group Relative Policy Optimization with dual signals that reward both answer/format quality and step/process consistencyβ€”transferring structure-first behavior to an efficient SLM for low-latency deployment.

πŸ—οΈ Method & Architecture

CoST: Structure-First Reasoning and Trace Generation

  1. πŸ” Structure Analysis
  2. 🧠 Trace Geneartion
  3. βœ… Data Verification
  4. πŸ” Data Refinement

SLM Fine-Tuning: SFT β†’ GRPO

  1. 🎯 Supervised Fine-Tuning (SFT)
  2. ⚑ Group Relative Poilcy Optimization (GRPO)

The core execution of LiteCoST is implemented in the src directory (See GRPO in 'verl/'):

src
β”œβ”€β”€ convert_func.py              # Conversion function module
β”œβ”€β”€ data_refinement.py           # Data refinement module
β”œβ”€β”€ data_verification.py         # Data verification module
β”œβ”€β”€ extract/                     # Extraction module
β”‚   β”œβ”€β”€ graph.py                 # Graph class
β”‚   β”œβ”€β”€ main.py                  # Main program
β”‚   β”œβ”€β”€ table.py                 # Table class
β”‚   β”œβ”€β”€ to_desc.py               # Convert to description
β”‚   β”œβ”€β”€ to_graph.py              # Convert to graph 
β”‚   └── to_table.py              # Convert to table
β”œβ”€β”€ sft.py                       # SFT module
β”œβ”€β”€ prompt.py                    # Prompt template module
β”œβ”€β”€ reasoner.py                  # Reasoning module
β”œβ”€β”€ reward.py                    # Reward module
β”œβ”€β”€ structure_analysis/          # Structure analysis module
β”‚   β”œβ”€β”€ query2schema.py          # Schema construction
β”‚   └── structure_decision.py    # Structure decision
β”œβ”€β”€ cal_latenct.py               # Calculate Latency
└── utils.py                     # Utility functions module

πŸ› οΈ Usage

  1. Generate the Serialized Structured Output
python main.py --model gpt-4o --dataset Loong --structured --document

cd src
python data_verification.py
python data_refinement.py
  1. Conduct SFT Training
python -m src.convert_func # data format conversion
python -m src.sft
  1. Conduct GRPO Optimization
cd verl
bash scripts/run_grpo_cost.sh

## merge model 
python scripts/model_merger.py merge --backend fsdp --local_dir checkpoints/cost-sft/cost-sft-llama3.2-3b-ins/global_step_1566/actor --target_dir merged/cost-grpo/llama3.2-3b-ins

Usage Examples

1. Quick Deployment
cd Loong/src
bash vllm_example.sh

2. Run the pipeline
python main.py --model deployed_model --dataset Loong --structured --document

🎯 Performance

Efficacy of Chain-of-Structured-Thought (CoST).

Finance Finance

Effectiveness: How good is LiteCoST for SSO Generation?

Finance Finance Legal Legal

Acknowledgement

We implement our reinforcement learning algorithm by extending the veRL framework. For efficient inference, we leverage vLLM, and we develop evaluation scripts based on the Loong datasets. We sincerely thank these communities for their valuable contributions!

About

πŸ”₯[ICLR'26] Official repository for the paper "Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published