Question about fine-tuning settings for LLaMA-3.1-8B-Instruct

Hi, thanks again for sharing this great work!
﻿
I have a small question about the fine-tuning setup in your paper. In Appendix B.1, you describe the implementation details, such as using LoRA with rank r = 8 and α = 16, training for 3 epochs with a peak learning rate of 5e-5 under a cosine decay schedule, and AdamW with a warmup ratio of 0.1.
﻿
Could I kindly ask whether these hyperparameters and the overall training/inference recipe are exactly the same for LLaMA-3.1-8B-Instruct as for Qwen2.5-7B?
﻿
If you have time to reply, I would really appreciate it.
﻿

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about fine-tuning settings for LLaMA-3.1-8B-Instruct #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about fine-tuning settings for LLaMA-3.1-8B-Instruct #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions