Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs

A two-stage RL-enhanced framework that equips SLMs for high-accuracy long-document QA.

🎉 News

[2026-01-26] Our LiteCoST is accepted by ICLR’26.

📋 Overview

Pillar 1: Chain-of-Structured-Thought (CoST) uses a high-capability LLM purely as a trace generator: it proposes a minimal structure, executes a step-wise, structure-guided trace over the documents, serializes the result, and verifies/refines it (optionally with an LLM-as-judge).

Pillar 2: SLM fine- tuning (SFT → GRPO) trains an SLM with the CoST supervision in two phases: Supervised Fine-Tuning to learn structural patterns, formatting rules, and reasoning steps, followed by Group Relative Policy Optimization with dual signals that reward both answer/format quality and step/process consistency—transferring structure-first behavior to an efficient SLM for low-latency deployment.

🏗️ Method & Architecture

CoST: Structure-First Reasoning and Trace Generation

🔍 Structure Analysis
🧠 Trace Geneartion
✅ Data Verification
🔁 Data Refinement

SLM Fine-Tuning: SFT → GRPO

🎯 Supervised Fine-Tuning (SFT)
⚡ Group Relative Poilcy Optimization (GRPO)

The core execution of LiteCoST is implemented in the src directory (See GRPO in 'verl/'):

src
├── convert_func.py              # Conversion function module
├── data_refinement.py           # Data refinement module
├── data_verification.py         # Data verification module
├── extract/                     # Extraction module
│   ├── graph.py                 # Graph class
│   ├── main.py                  # Main program
│   ├── table.py                 # Table class
│   ├── to_desc.py               # Convert to description
│   ├── to_graph.py              # Convert to graph 
│   └── to_table.py              # Convert to table
├── sft.py                       # SFT module
├── prompt.py                    # Prompt template module
├── reasoner.py                  # Reasoning module
├── reward.py                    # Reward module
├── structure_analysis/          # Structure analysis module
│   ├── query2schema.py          # Schema construction
│   └── structure_decision.py    # Structure decision
├── cal_latenct.py               # Calculate Latency
└── utils.py                     # Utility functions module

🛠️ Usage

Generate the Serialized Structured Output

python main.py --model gpt-4o --dataset Loong --structured --document

cd src
python data_verification.py
python data_refinement.py

Conduct SFT Training

python -m src.convert_func # data format conversion
python -m src.sft

Conduct GRPO Optimization

cd verl
bash scripts/run_grpo_cost.sh

## merge model 
python scripts/model_merger.py merge --backend fsdp --local_dir checkpoints/cost-sft/cost-sft-llama3.2-3b-ins/global_step_1566/actor --target_dir merged/cost-grpo/llama3.2-3b-ins

Usage Examples

1. Quick Deployment
cd Loong/src
bash vllm_example.sh

2. Run the pipeline
python main.py --model deployed_model --dataset Loong --structured --document

🎯 Performance

Efficacy of Chain-of-Structured-Thought (CoST).

Finance

Effectiveness: How good is LiteCoST for SSO Generation?

Finance

Legal

Acknowledgement

We implement our reinforcement learning algorithm by extending the veRL framework. For efficient inference, we leverage vLLM, and we develop evaluation scripts based on the Loong datasets. We sincerely thank these communities for their valuable contributions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs

🎉 News

📋 Overview

🏗️ Method & Architecture

CoST: Structure-First Reasoning and Trace Generation

SLM Fine-Tuning: SFT → GRPO

🛠️ Usage

Usage Examples

🎯 Performance

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Loong		Loong
assets		assets
dataset		dataset
llm		llm
scripts		scripts
src		src
verl		verl
.DS_Store		.DS_Store
README.md		README.md
main.py		main.py
run.sh		run.sh

HKUSTDial/LiteCoST

Folders and files

Latest commit

History

Repository files navigation

Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs

🎉 News

📋 Overview

🏗️ Method & Architecture

CoST: Structure-First Reasoning and Trace Generation

SLM Fine-Tuning: SFT → GRPO

🛠️ Usage

Usage Examples

🎯 Performance

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages