Skip to content

Conversation

@wangzhixin-ai
Copy link
Collaborator

Summary

Major codebase refactoring to improve modularity, maintainability, and user experience. This PR restructures the core
architecture into clearer responsibility boundaries, introduces Python-based pipeline APIs, and significantly enhances
documentation with comprehensive user interface guides.

Key improvements:

  • Simplified architecture: Consolidated DAGWorker from 5 mixins (~3500 lines) into a single cohesive module.
  • User-facing APIs: New Python-based pipeline builder with clear interfaces for rewards, filters, and metrics.
  • Better organization: Renamed siirl/workers/siirl/engine/ and separated execution logic into siirl/execution/.
  • Enhanced documentation: Added 4 new user interface guides (1323+ lines) and comprehensive code structure documentation.
  • Removed legacy code: Deleted YAML-based workflow configs and deprecated DataProto in favor of TensorDict/Sample.
  • Data coordinator: Refactored distributed data management with clearer separation between DataBuffer and DataLoader.

Major Changes

1. Architecture Restructuring

DAGWorker consolidation (siirl/dag_worker/):

  • Merged 5 mixin classes into focused modules:
    • dagworker.py: Core execution logic (1319 lines)
    • validator.py: Validation logic (extracted from validation_mixin)
    • metrics_collector.py: Metrics collection (extracted from utilities_mixin)
    • checkpoint_manager.py: Checkpoint management
    • metric_aggregator.py: Metrics aggregation
  • Decoupled instance state dependencies for better testability

Directory reorganization:

  • siirl/workers/siirl/engine/ (model workers: actor, critic, rollout, reward)
  • Created siirl/execution/ for orchestration:
    • execution/dag/: TaskGraph, Pipeline API, builtin pipelines
    • execution/scheduler/: Task scheduling, resource management
    • execution/metric_worker/: Distributed metrics collection
  • siirl/params/ moved from siirl/utils/params/ to top-level for clarity

2. Python-Based Pipeline API

New pipeline builder (siirl/execution/dag/pipeline.py):

# Replace YAML-based workflow with Python code
from siirl.execution.dag.pipeline import Pipeline

pipeline = Pipeline()
pipeline.add_node("rollout_actor", ...)
pipeline.add_node("function_reward", ..., deps=["rollout_actor"])
# ...

Built-in pipelines (siirl/execution/dag/builtin_pipelines.py):
- GRPO, PPO, DAPO pipelines implemented in Python
- Users can define custom pipelines via dag.custom_pipeline_fn config
- Example: examples/custom_pipeline_example/custom_grpo.py

Removed deprecated configs:
- Deleted 7 YAML workflow files (siirl/client/config/workflow_*.yaml)
- Removed ppo_dag_trainer.yaml and embodied_srpo_trainer.yaml

3. Data Coordinator Refactor

Simplified data management (siirl/data_coordinator/):
- New lightweight data_buffer.py (350 lines, down from 484)
- Simplified protocol.py (159 lines, down from 1084)
- Introduced sample.py with TensorDict-based Sample class (322 lines)
- Removed legacy DataProto and batch manager abstractions
- DataLoader moved from root to data_coordinator/dataloader/

Key improvements:
- Better separation between DataBuffer (distributed KV store) and DataLoader (per-GPU loading)
- Reuses local cache for better performance
- Fixed sequence balancing and metrics collection in pipeline parallel scenarios

4. User Interface Abstractions

New interface directories (siirl/user_interface/):
- filter_interface/: DAPO filter, embodied filter (223 lines doc)
- rewards_interface/: Custom reward function examples (231 lines doc)

Updated entry point:
- siirl/main_dag.py refactored for clearer orchestration (moved from siirl/client/)

5. Documentation Overhaul

New user guides (docs/user_interface/):
- filter_interface.rst (223 lines): How to define custom filters
- metrics_interface.rst (692 lines): Metrics collection and customization
- pipeline_interface.rst (177 lines): Building custom pipelines
- reward_interface.rst (231 lines): Custom reward functions

New programming guide:
- docs/programming_guide/code_structure.rst (428 lines): Complete architecture overview
- Replaces outdated code_explained/siiRL-code-explained.md (removed 196 lines)
- Moved guide.rst into programming_guide/ directory

Config documentation update:
- Simplified docs/examples/config.rst (294 additions, 600 deletions)
- Better organized with Hydra-based configuration examples

6. Bug Fixes and Improvements

Metrics collection:
- Fixed actor metrics loss in distributed scenarios
- Fixed throughput calculation in pipeline/tensor parallel modes
- Added Ray-based distributed metrics aggregation

Data handling:
- Fixed sequence balance in data_buffer
- Fixed DataBuffer operations in pipeline parallel mode
- Fixed async_generate mode bugs
- Improved cache cleanup logic

Multi-agent/Embodied:
- Fixed embodied import bugs
- Fixed DAPO and VLA filter logic
- Fixed multi-agent support after DAG refactor
- Moved multi-turn agent loop from separate package into siirl/execution/rollout_flow/multiturn/

NPU/Hardware support:
- Fixed Mindspeed repatch issues
- Updated examples for NPU training (Qwen2.5/Qwen3 models)

7. Examples and Scripts

Updated run scripts (all examples/*/run_*.sh):
- Updated entry point from siirl.client.main_dagsiirl.main_dag
- Removed YAML workflow overrides (now using Python pipelines)
- Deleted obsolete single-GPU and outdated examples

New examples:
- examples/custom_pipeline_example/custom_grpo.py: Custom pipeline definition
- siirl/user_interface/rewards_interface/custom_gsm8k_reward.py: Custom reward example

8. Testing

New tests:
- tests/dag_worker/test_dapo_pipeline.py (195 lines): DAPO pipeline tests
- tests/dag_worker/test_dapo_merge.py (320 lines): DAPO merge logic tests
- tests/data_buffer/performance_test_data_buffer.py (209 lines): Performance benchmarks
- tests/data_buffer/detailed_put_performance_test.py (174 lines): Put operation profiling

Updated tests:
- Updated test_dag_worker.py and test_data_buffer.py for new architecture

Code Statistics

- 249 files changed: 10,305 insertions, 13,164 deletions (net -2,859 lines)
- Removed legacy code: ~6,000+ lines of deprecated mixins, batch managers, and YAML configs
- Added documentation: ~1,750+ lines of new user guides and architecture docs
- New functionality: ~3,500+ lines of refactored core logic and new APIs

Breaking Changes

⚠️ Entry point change:
# Old (deprecated)
python -m siirl.client.main_dag

# New
python -m siirl.main_dag

⚠️ YAML workflows removed: Users must migrate to Python-based pipelines or use built-in pipelines

⚠️ Import path changes:
- siirl.workers.*siirl.engine.* (models)
- siirl.workers.dag.*siirl.execution.dag.* (orchestration)
- siirl.utils.params.*siirl.params.*

Migration Guide

For users with custom workflows:
1. Replace YAML workflow files with Python pipeline definitions (see examples/custom_pipeline_example/)
2. Update imports from siirl.workers to siirl.engine or siirl.execution
3. Update run scripts to use new entry point: python -m siirl.main_dag

For developers:
- Review new code structure: docs/programming_guide/code_structure.rst
- Check user interface docs for custom rewards/filters: docs/user_interface/

Test Plan

- All existing unit tests pass (tests/dag_worker/, tests/data_buffer/)
- New DAPO pipeline tests added
- Performance tests for DataBuffer operations
- GRPO training validated (examples/grpo_trainer/run_qwen3-8b.sh)
- Multi-node training tested (GRPO/PPO/DAPO)
- Embodied RL examples functional (examples/embodied_srpo_trainer/)
- Multi-turn agent loop tested (examples/experimental/multiturn_server/)

@SII-limingliu SII-limingliu merged commit c3ecc6a into main Dec 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants