You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Bootstrap shell commands which will be executed before running entry commands.
16
16
# Support multiple lines, which can be empty.
17
-
bootstrap: |
18
-
echo "Bootstrap finished."
17
+
bootstrap: bash bootstrap.sh
19
18
20
19
computing:
21
20
minimum_num_gpus: 1# minimum # of GPUs to provision
22
21
maximum_cost_per_hour: $3000# max cost per hour for your job per gpu card
23
-
#allow_cross_cloud_resources: true # true, false
24
-
#device_type: CPU # options: GPU, CPU, hybrid
25
22
resource_type: A100-80G # e.g., A100-80G, please check the resource type list by "fedml show-resource-type" or visiting URL: https://open.fedml.ai/accelerator_resource_type
> when using PyTorch DDP with LoRA and gradient checkpointing, you need to turn off `find_unused_parameters`
93
71
> by passing `--ddp_find_unused_parameters "False"` in the command line.
94
72
73
+
### Train with FEDML Launch
74
+
75
+
If you have trouble finding computing resources, you can launch your training job via [FEDML Launch](https://doc.fedml.ai/launch) and left FEDML to find the most cost-effective resource for your task.
76
+
77
+
```shell
78
+
# install fedml library
79
+
pip3 install fedml
80
+
81
+
# launch your training job
82
+
fedml launch job.yaml
83
+
```
84
+
85
+
You can modify the training command in [job.yaml](job.yaml) by
86
+
- specify training settings in `job` section
87
+
- specify environment setup settings in `bootstrap` section
88
+
- specify compute resources in `computing` section
89
+
90
+
## How to Use Llama 2
91
+
92
+
Our example uses Pythia by default, but we recently added support for Llama2.
93
+
If you'd like to use Llama2, please see the following instructions before getting started.
94
+
95
+
To use [Llama 2](https://ai.meta.com/llama/), you need to apply access from Meta and request Meta's private
96
+
Hugging Face repo access.
97
+
98
+
1. Make sure your `transformers` version is `4.31.0` or newer. You could update your transformers via
99
+
`pip install --upgrade transformers`.
100
+
2. Please visit the [Meta website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and apply for
101
+
access.
102
+
3. Apply for [Meta's private repo](https://huggingface.co/meta-llama/Llama-2-7b-hf)
103
+
on [Hugging Face](https://huggingface.co/meta-llama/Llama-2-7b-hf). See below image for detail.
104
+

105
+
4. Once both access are granted, you can start using Llama by passing `--model_name "meta-llama/Llama-2-7b-hf"` to the training script.
106
+
107
+
> **Warning**
108
+
> Since Llama 2 is on a private Hugging Face repo, you need to either login to Hugging Face or provide your access token.
109
+
> - To login to huggingface (see https://huggingface.co/settings/tokens for detail), run `huggingface-cli login` in
110
+
command line.
111
+
> - To pass an access token, you need to do one of the following:
112
+
> - Set environment variable `HUGGING_FACE_HUB_TOKEN="<your access token>"`
113
+
> - For centralized/conventional training, pass `--auth_token "<your access token>"` in the command line.
114
+
115
+
95
116
### Dependencies
96
117
97
118
We have tested our implement with the following setup:
0 commit comments