Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Question about the tokenizer of required pretrained model stabilityai/stablelm-2-1_6 #88

Open
Taylorfire opened this issue Aug 5, 2024 · 1 comment

Comments

@Taylorfire
Copy link

Question

Thanks for your excellent work! When I try to fine-tune with the LLM as StableLM1.6B, I am confused about the tokenizer inconsistency.

As the ./scripts/stablelm/finetune.sh requires, I download the pretrained LLM "stabilityai/stablelm-2-1_6" from huggingface. The tokenizer_config.json indicates that the tokenizer belongs to the class "GPT2TokenizerFast". While in your code moellava/train/train.py the tokenizer class for stablelm is Arcade100kTokenizer. Thus, this inconsistency leads to the failure of loading the tokenizer.

Can you tell me if there is any wrong with my implementation? Should I still use the "stabilityai/stablelm-2-1_6" as the pretrained LLM?

@2002DQJ
Copy link

2002DQJ commented Aug 20, 2024

There is no avaliable API according Huggignface because of development team's mistake, you can search "Auto classes" from Google and get correct API,and then choose the avaliable VLM model.

Auto classes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants