-
Notifications
You must be signed in to change notification settings - Fork 45
[QEff Finetune] : Made fixes to training script #439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
||
|
||
def get_preprocessed_samsum(dataset_config, tokenizer, split, context_length=None): | ||
dataset = datasets.load_dataset("Samsung/samsum", split=split, trust_remote_code=True) | ||
dataset = datasets.load_dataset("knkarthick/samsum", split=split, trust_remote_code=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if this dataset can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not distributing this dataset hence, it should not be a problem.
6d833cf
to
d269d0c
Compare
@@ -235,11 +241,23 @@ def train( | |||
train_step_metric.append(step_metric_val) | |||
|
|||
if train_config.grad_scaler: | |||
scaler.scale(loss).backward() # backward pass | |||
if train_config.enable_ddp: | |||
with model.no_sync(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will result in no syncing of gradients at any step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct. Removed this no_sync change from this PR. We will raise separate PR for that.
if train_config.enable_ddp: | ||
# FIXME: We can not stop transfer of gradient across devices every time. | ||
# In grad accumulation last step should transfer gradients across devices. | ||
with model.no_sync(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will result in no syncing of gradients at any step here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct. Removed this no_sync change from this PR. We will raise separate PR for that.
Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>
12e48a9
to
085052e
Compare
…ght parameter to make the loss for padded samples as zero. Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
…well. Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
… and zero the loss for padded samples. Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
085052e
to
21eb82d
Compare
Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
dfa26af
to
19697d0
Compare
…e loss fn. Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
dde5ebb
to
eee5328
Compare
Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
07b5f70
to
7f5f3b4
Compare
Below are the numbers with this PR:
Dataset: Samsum
Model: Llama-3.2-1B
Epoch: 1
Dataset: Samsum
Model: Llama-3.1-8B
Epoch: 1