-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add Korean documentation for BitNet #297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit introduces three new files: 1. README_ko.md: A Korean translation of the original README.md. 2. Pretrain-Tuning.md: A document in Korean outlining methods and considerations for pretraining BitNet language models, with a focus on Korean language data. It highlights that BitNet models are trained from scratch with ternary weights. 3. SFT-Tuning.md: A document in Korean detailing Supervised Fine-Tuning (SFT) techniques for Korean BitNet models, including dataset preparation and the training process. These documents aim to provide you with essential information about understanding, pretraining, and fine-tuning BitNet models.
@samuggi please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
…tical details. I've significantly updated the Korean BitNet documentation (`Pretrain-Tuning.md` and `SFT-Tuning.md`) based on your feedback and my analysis of the tinyllama-bitnet repository. Here are the key improvements I made: - **Clarified BitNet definition:** I emphasized BitNet as a comprehensive training and inference framework, not just a model architecture. - **Detailed `Pretrain-Tuning.md`:** - I added practical guidance on data preparation for Korean LLMs. - I included environment setup details, including library requirements and model configuration. - I provided an in-depth explanation of the BitLinear layer and its implementation (activation/weight quantization, STE). - I outlined step-by-step pretraining execution methods with hyperparameter details, referencing `tinyllama-bitnet`. - **Detailed `SFT-Tuning.md`:** - I included guidance on preparing Korean SFT datasets (instruction/dialogue). - I added SFT environment setup and model loading information. - I detailed SFT execution methods with specific hyperparameters. - I introduced Direct Preference Optimization (DPO). - **Added comprehensive reference sections** to both documents, including links to relevant papers, code repositories (tinyllama-bitnet, microsoft/unilm, microsoft/BitNet), and other resources. These enhancements should provide you with more practical, in-depth, and actionable information for pretraining or fine-tuning BitNet models for the Korean language.
This commit introduces three new files:
These documents aim to provide you with essential information about understanding, pretraining, and fine-tuning BitNet models.