Skip to content

Conversation

@shivamshan
Copy link

What does this PR do?

This PR introduces TACO-RL (Task-Aware Prompt Compression Optimization with Reinforcement Learning), a new submodule that extends LLMLingua with reinforcement learning capabilities for fine-tuning pre-trained models on new tasks using reward signals from language models like GPT-3.5.

Key Features Added

New TACO-RL Submodule

  • Location: llmlingua/taco-rl/ - Main submodule with PromptCompressorReinforce class
  • Experiments: experiments/taco-rl/ - Training scripts, utilities, and configuration files

Research Foundation

Based on the paper "TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning" (arXiv:2409.13035), this implementation addresses:

  • Q1: How to design a prompt compression model that effectively leverages bidirectional context while providing low inference latency?
  • Q2: How to efficiently train a model with proper guidance from task-specific reward signals while minimizing computational cost?

Directory Structure

llmlingua/taco-rl/
├── README.md                     # Main documentation
├── prompt_compressor_reinforce.py  # RL-enhanced compressor class
└── __init__.py                   # Module initialization

experiments/taco-rl/
├── README.md                     # Training and implementation guide
├── train_reinforce.py            # Main training script
├── utils.py                      # Utilities and API configuration
├── metrics.py                    # Evaluation metrics
├── configs/                      # Configuration files
│   └── train_reinforce.yaml      # Training configuration
└── logs/                         # Training logs (created during training)

Usage Example

from llmlingua.taco_rl import PromptCompressorReinforce

# Load fine-tuned model
compressor = PromptCompressorReinforce(
    model_name="path/to/fine_tuned_model",
    use_llmlingua2=True
)

# Use for compression during training
compressed_prompt = compressor.compress_prompt_llmlingua2(
    ["Your prompt here..."],
    rate=0.5
)

Dependencies Added

Core Dependencies

  • llmlingua (main package)

Additional Dependencies

pip install openai evaluate csv_logger hydra-core rouge_score

Documentation

  • Main README: Overview, architecture, and integration guide
  • Experiments README: Detailed training instructions, configuration examples, and troubleshooting
  • API Configuration: User guide for setting up Azure OpenAI endpoints
  • Evaluation: Links to existing evaluation framework in LLMLingua2 experiments

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Was this discussed/approved via a Github issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@iofu728

@shivamshan
Copy link
Author

@shivamshan please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree company="Microsoft"

@iofu728 iofu728 requested review from Copilot and iofu728 and removed request for Copilot July 3, 2025 14:09
@iofu728 iofu728 self-assigned this Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants