Description
Hi there 👋
Back in March, I left a comment under a PR that tries to refactor recipes, to make them simpler.
My idea (which is quite straightforward) is to avoid recipes that include complete code, essentially deviating from the single-file approach.
Let’s consider full-time fine-tuning and LoRA fine-tuning as examples. In this case, only the model instantiation and checkpoint saving should differ.
The proposal is to have full fine-tuning as the basis, and in the LoRA variant, only override the LoRA-specific methods.
The advantages and disadvantages, plus the example code, can be found in this PR.
Now, let’s discuss another matter.
If we adhere to this approach and:
- Maintain full fine-tuning as the basis
- Override only the LoRA-specific methods in the LoRA fine-tuning variant
- Override only the distributed-specific methods in the full fine-tuning variant
… what should we do with the LoRA fine-tuning distributed variant?
- Option A: Override both the LoRA- and distributed-specific methods, which essentially means duplicating the entire code, and that brings us back to square one.
- Option B: Reuse the LoRA-specific code from the LoRA recipe and the distributed-specific code from the distributed recipe. In this case, to read the LoRA distributed recipe, we would need to jump between three files, which is too cumbersome and not ideal.
Another option would be to create hardware environment-agnostic code. Regardless of whether there is a single GPU or multiple GPUs, the code will remain the same. All the logic will be encapsulated in a separate class.
This approach would eliminate the need for separate recipes for distributed cases.
The question is what the core team thinks about it?