Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Reproduce Fine-Tuning #30

Open
Concyclics opened this issue Oct 28, 2024 · 1 comment
Open

Cannot Reproduce Fine-Tuning #30

Concyclics opened this issue Oct 28, 2024 · 1 comment

Comments

@Concyclics
Copy link

First and foremost, thank you for your outstanding work on this project. We‘d like to follow this work and fine-tune a model from deepseek-coder 1.3B by your datasets. But we cannot achieve a promising result. So may we get the fine-tuning settings like batch-size, learning-rate and other specification?

@Anindyadeep
Copy link
Member

If I remember (I need to check though) Batch size was specified to 4, learning rate was set to 1e-5 and we used some synthetic dataset in order to fine-tune the models. I will also give you one observation which I saw. These small models are not very much generalizable. PremSQL-1B was very much focussed on BirdBench, what we tried was generated some synthetic samples which was similar to BirdBench training data. Training with those gave a huge leap in the results.

As of now, the scripts for fine-tuning in PremSQL, might be bit buggy, and I am working on it. However the main sauce was the different datasets we used and also continual fine-tuning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants