Description
The purpose of this issue is to keep track of all changes we need to make in our tests for them to be more robust
-
FSDP sharding
- Unit test for training.shard_model (see [bug] fix sharding multimodal #1889)
- Unit test for training.get_shard_conditions
- Test custom_sharded_layers
- Test if if memory(FSDP) < memory(single_device)
-
Test correct initialization of LoRA params when using meta device (a la test_lora_meta_device_init_fsdp)
-
Better testing for eval recipe: add test for generation and log likelihood tasks #1873
-
Add recipe tests which use DoRA (see [bug] DoRA is broken #1903) (closed by DoRA fixes #2139)
-
Config CI: [WIP] Config Continous Integration (CCI) #1717
- In the config CI, add test to make sure that component is always the first item
- check if the module isnt deprecated gemma2 had wrong path to scheduler #2013 (comment)
-
Add regression test for peak memory (especially with gradient accumulation enabled)
-
We should have tests that run for 2 epochs, instead of 1, to make sure intermediate_checkpoint are saved correctly
-
Add tests for Gemma2 attention
-
Too many of our tests use torchtune-format checkpoints (e.g.). We should have a couple of these, but the majority of our test checkpoints should be HF format.
### Tasks
- [ ] Add regression tests for adapter accuracy (maybe manually merge original checkpoint + adapter and resume from checkpoint))
- [ ] Define manual test to resume from checkpoint
- [ ] https://github.com/pytorch/torchtune/pull/1989
- [ ] Add recipe tests for vision models
- [ ] Test a distributed recipe with state dict hooks (e.g. Llama 3.2 Vision) to ensure state dict save and load work properly. This caused breakages such as #2277