Testing tracker

The purpose of this issue is to keep track of all changes we need to make in our tests for them to be more robust

- [ ] FSDP sharding
	- [ ] Unit test for [training.shard_model](https://github.com/pytorch/torchtune/blob/f560cbb21567ebf4852267681f7202526fa249ca/torchtune/training/_distributed.py#L639) (see #1889)
	- [ ] Unit test for [training.get_shard_conditions](https://github.com/pytorch/torchtune/blob/f560cbb21567ebf4852267681f7202526fa249ca/torchtune/training/_distributed.py#L590) 
	- [ ] Test [custom_sharded_layers](https://github.com/pytorch/torchtune/blob/f560cbb21567ebf4852267681f7202526fa249ca/recipes/full_finetune_distributed.py#L272)
	- [ ] Test if if memory(FSDP) < memory(single_device)
- [ ] Test correct initialization of LoRA params when using meta device (a la [test_lora_meta_device_init_fsdp](https://github.com/pytorch/torchtune/blob/f560cbb21567ebf4852267681f7202526fa249ca/tests/torchtune/training/test_distributed.py#L237))

- [ ] https://github.com/pytorch/torchtune/issues/1873
- [ ] https://github.com/pytorch/torchtune/issues/1411
- [ ] https://github.com/pytorch/torchtune/issues/1306
- [x] Add recipe tests which use DoRA (see https://github.com/pytorch/torchtune/issues/1903) (closed by #2139)
- [ ] Config CI: https://github.com/pytorch/torchtune/pull/1717
	- [ ] In the config CI, add test to make sure that _component_ is always the first item
	- [ ] check if the module isnt deprecated https://github.com/pytorch/torchtune/pull/2013#issuecomment-2479678567
- [ ] Add regression test for peak memory (especially with gradient accumulation enabled)
- [ ] We should have tests that run for 2 epochs, instead of 1, to make sure intermediate_checkpoint are saved correctly
- [ ] Add tests for Gemma2 attention
- [ ] Too many of our tests use torchtune-format checkpoints ([e.g.](https://github.com/pytorch/torchtune/blob/f15ba777eefcf3c1261805df0d3379e59a6aadba/tests/recipes/test_lora_finetune_distributed.py#L210-L211)). We should have a couple of these, but the majority of our test checkpoints should be HF format.
```[tasklist]
### Tasks
- [ ] Add regression tests for adapter accuracy (maybe manually merge original checkpoint + adapter and resume from checkpoint))
- [ ] Define manual test to resume from checkpoint
- [ ] https://github.com/pytorch/torchtune/pull/1989
- [ ] Add recipe tests for vision models
- [ ] Test a distributed recipe with state dict hooks (e.g. Llama 3.2 Vision) to ensure state dict save and load work properly. This caused breakages such as #2277 
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testing tracker #1890

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Testing tracker #1890

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions