Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to fine-tune T5 with a Casual Language Modeling object? #1097

Open
nanbeitk opened this issue Apr 29, 2023 · 0 comments
Open

How to fine-tune T5 with a Casual Language Modeling object? #1097

nanbeitk opened this issue Apr 29, 2023 · 0 comments

Comments

@nanbeitk
Copy link

Dear all,
I am new to NLP and has some strange questions, I try to explain them clearly.

My goal is to using a specific corpus to fine-tune t5-base model with a casual language modeling, I find this document and it use AutoModelForCasualLM, but this liabrary just not include series of t5 models.

So my question is:

  1. How should I do to finetune t5 model for CLM object? In my understanding, CLM is a process of predicting token_2 from token_1 , token_3 from token_1, token_2 until the end of input sequence, so i am confused how to finish this process myself.

  2. I try to spilt one my train data into something like this (ti == token_i, 1 == eos_token):
    input_ids                                                     labels

  • [t1, 1, 1, 1, 1, 1, ...]         [t1, t2, 1, 1, 1, 1, ...]

  • [t1, t2, 1, 1, 1, 1, ...]        [t1, t2, t3, 1, 1, 1, ...]

  • [t1, t2, t3, 1, 1, 1, ...]       [t1, t2, t3, t4, 1, 1, ...]

  • [t1, t2, t3, t4, 1, 1, ...]      [t1, t2, t3, t4, t5, 1, ...]
    The first problem is obvious, the expanded dataset is too large and requires more time to fine-tune; The second problem is that this seems strange, and I don't know if this fulfills the CLM's mission requirements. This is the only idea that i can catch up to solve this problem, does it work?

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant