This repository contains a demo script for you to fine-tune a Gemma model for train.flock.io.
To set up your environment, run the following commands:
conda create -n training-node python==3.10
conda activate training-node
pip install -r requirements.txt
dataset.py
- Contains the logic to process the raw data fromdemo_data.jsonl
.demo_data.jsonl
- Follows the shareGPT format. The training data you receive from thefed-ledger
is in exactly the same format.merge.py
- Contains the utility function for merging LoRA weights. If you are training with LoRA, please ensure you merge the adapter before uploading to your Hugging Face repository.demo.py
- A training script that implements LoRA fine-tuning for a Gemma-2B model.full_automation.py
- A script that automate everything including get a task, download the training data, finetune Gemma-2B on training data, merge weights, upload to your HuggingFace model repo, and submit the task to fed-ledger.
Execute the following command to start the training:
HF_TOKEN="hf_yourhftoken" CUDA_VISIBLE_DEVICES=0 python demo.py
The HF token is required due to the Gemma License.
This command initiates fine-tuning on the demo dataset, saves the fine-tuned model, merges the adapter to the base model, and saves the final model.
curl --location 'https://fed-ledger-prod.flock.io/api/v1/tasks/submit-result' \
--header 'flock-api-key: <your-api-key-with-staking-on-this-task-as-node>' \
--header 'Content-Type: application/json' \
--data '{
"task_id": <task id>,
"data":{
"hg_repo_id": "<your-hg-repo-id>",
"base_model": "gemma"
}
}'
Simply run
TASK_ID=<task-id> FLOCK_API_KEY="<your-flock-api-key-stakes-as-node-for-the-task>" HF_TOKEN="<your-hf-token>" CUDA_VISIBLE_DEVICES=0 HF_USERNAME="your-hg-user-name" python full_automation.py
We also have python interface for the train node.
from node import TrainNode
TASK_ID = "<task-id>"
# If you have already setup the environment variables, you can skip the following lines.
FLOCK_API_KEY = "<your-flock-api-key-stakes-as-node-for-the-task>"
HF_TOKEN = "<your-hf-token>"
HF_USERNAME = "your-hg-user-name"
# Define the TrainNode.
node = TrainNode(TASK_ID, FLOCK_API_KEY=FLOCK_API_KEY, HF_TOKEN=HF_TOKEN,
HF_USERNAME=HF_USERNAME)
# Train
node.train()
# You can also add customized training args.
node.train({'num_train_epochs':3})
# Push the model.
node.push()
# You can also load your local model and push directly.
model = 'LOADED_MODEL'
tokenizer = 'LOADED_TOKENIZER'
node.push(model, tokenizer)