Skip to content

[Feature] Add single-controller mode #260

@nuzant

Description

@nuzant

Checklist

  • This feature will maintain backward compatibility with the current APIs in
    areal/api/. If not, please raise a refactor issue first.

Background

Currently, refactored AReaL only supports running experiments with SPMD mode, which is simple but add constraints to development of advanced features. Designing and implementing an additional single-controller mode for AReaL can enable possible new features such as centralized data management, elastic resource allocation and so on.

Potential Solution

In the single-controller mode, an additional controller-level will be introduced between algorithm orchestration and training/inference engines. Each controller will wrap an engine, expose engine APIs to a centralized training script instead of a SPMD one. For ease-of-use, controllers have APIs identical to the ones in the engines. There will also be powerful schedulers and distributed data utilities for resources and data management in a centralized manner.

Detailed discussion is pending.

Additional Information

Progress and Checklist

TBD

Metadata

Metadata

Assignees

No one assigned

    Labels

    in-progressSomething is under our internal developmentstale

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions