Translator layer to cut on vocab size #588
Replies: 1 comment
-
@IlyaGazman I was also thinking on a similar line. We can have a good strong model trained only on English. Now, for other languages we can have encoder and decoder for specific languages that can be used to translate from language to English and vice versa. Using this we can have strong models for that support other language as well. @karpathy Also, was curious to explore if we can create a new language that has a bigger vocab size but is independent of grammar (Which word should come first, different arrangement of words based on tenses) . Then the training process would focus more on which word to output rather than the placement of word. Not sure how good or bad would this model perform |
Beta Was this translation helpful? Give feedback.
-
What do you think about adding a translator layer that will pre process all the training data and translate all the documents to smaller vocab size. For example
The translator will be an LLM by it self, but it will only perform the task of translation.
Then it will be used again during reference, both for input and output.
What do you think about this idea? How impactful can this layer be in reducing vocab size? And how that can effect performance?
Beta Was this translation helpful? Give feedback.
All reactions