-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem? Please describe.
We have added infrastructure for integration tests in training and have added tests to cover most uses cases, and a couple of additional tests (restart, restart from existing checkpoint, use existing graph, etc.). However, some aspects of training are not tested or only tested for a couple of use cases and some problems are missed because we use datasets with rless parameters for these tests.
Describe the solution you'd like
A couple of things we could think about adding:
- more comprehensive tests for checkpoints / checkpoint migrations -- currently only testing gnn global
- tests for rollout -- currently not tested
- add multi-gpu tests to test sharding for different models (probably partially covered in benchmark tests)
- review datasets used for testing, can we keep them small and still catch more of the potential problems?
- tests for forking runs (potentially better placed in system-level tests)
Depending on how comprehensively we want to test these, it might be enough to add a few tests, or it might be better to revisit the existing structure of fixtures to make them more reusable.
mchantry and anaprietonem
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
Type
Projects
Status
To be triaged