-
Notifications
You must be signed in to change notification settings - Fork 45
[QEff. Finetune]: Removed samsum dataset references from FT code. #482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
quic-meetkuma
commented
Jun 27, 2025
- Removed all the references of samsum dataset from finetuning code.
- Samsum dataset can be used via custom dataset path.
|
||
DATASET_PREPROC = { | ||
"alpaca_dataset": partial(get_alpaca_dataset), | ||
"grammar_dataset": get_grammar_dataset, | ||
"samsum_dataset": get_samsum_dataset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just removing this line of code is enough for the finetune.py to not throw DatasetNotFoundError error in case of "--dataset samsum_dataset". It will raise an error as follows: 'finetune.py: error: argument --dataset: invalid choice: 'samsum_dataset' (choose from 'alpaca_dataset', 'grammar_dataset', 'gsm8k_dataset', 'custom_dataset', 'imdb_dataset')'
Rest of the code changes of this PR are not required.
This way we can still keep the code for samsum_dataset for internal testing purpose and also if huggingface puts back the Samsum dataset, we would just need a single line of code to support it through QEfficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is discussed with Anuj and VB to remove all the references of this code. User should use this only via custom_dataset path.
5deac57
to
6c98570
Compare
|
||
DATASET_PREPROC = { | ||
"alpaca_dataset": partial(get_alpaca_dataset), | ||
"grammar_dataset": get_grammar_dataset, | ||
"samsum_dataset": get_samsum_dataset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is discussed with Anuj and VB to remove all the references of this code. User should use this only via custom_dataset path.
@@ -171,6 +170,28 @@ pipeline { | |||
} | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quic-hemagnih , @vbaddi , @quic-rishinr - FYI, Made a separate env for FT tests.
5015cba
to
8fc47cd
Compare
4955cf2
to
3f3728a
Compare
Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
3f3728a
to
88c3cd6
Compare