-
Notifications
You must be signed in to change notification settings - Fork 192
Description
Checklist
- This feature will maintain backward compatibility with the current APIs in
areal/api/
. If not, please raise a refactor issue first.
Background
Currently, refactored AReaL only supports running experiments with SPMD mode, which is simple but add constraints to development of advanced features. Designing and implementing an additional single-controller mode for AReaL can enable possible new features such as centralized data management, elastic resource allocation and so on.
Potential Solution
In the single-controller mode, an additional controller-level will be introduced between algorithm orchestration and training/inference engines. Each controller will wrap an engine, expose engine APIs to a centralized training script instead of a SPMD one. For ease-of-use, controllers have APIs identical to the ones in the engines. There will also be powerful schedulers and distributed data utilities for resources and data management in a centralized manner.
Detailed discussion is pending.
Additional Information
Progress and Checklist
TBD