Skip to content

Question about fine-tuning settings for LLaMA-3.1-8B-Instruct #10

@zjwulbx

Description

@zjwulbx

Hi, thanks again for sharing this great work!

I have a small question about the fine-tuning setup in your paper. In Appendix B.1, you describe the implementation details, such as using LoRA with rank r = 8 and α = 16, training for 3 epochs with a peak learning rate of 5e-5 under a cosine decay schedule, and AdamW with a warmup ratio of 0.1.

Could I kindly ask whether these hyperparameters and the overall training/inference recipe are exactly the same for LLaMA-3.1-8B-Instruct as for Qwen2.5-7B?

If you have time to reply, I would really appreciate it.


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions