Skip to content

Commit af9ced2

Browse files
committed
Add blog post "Integrating Ascend Backend with Torchtune through PyTorch Multi-Device Support"
Signed-off-by: Chris Abraham <cjyabraham@gmail.com>
1 parent 2b501a7 commit af9ced2

File tree

2 files changed

+199
-0
lines changed

2 files changed

+199
-0
lines changed
Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
---
2+
layout: blog_detail
3+
title: "Integrating Ascend Backend with Torchtune through PyTorch Multi-Device Support"
4+
author: "Huawei PyTorch Team: Chenguang Li (Huawei), Mengqing Cao (Huawei)"
5+
---
6+
7+
In this blog, we will briefly introduce torchtune, the Ascend backend, and demonstrate how torchtune can be used to fine-tune models with Ascend.
8+
9+
10+
## Introduction to Torchtune
11+
12+
Torchtune is a PyTorch-native library designed to simplify the fine-tuning of Large Language Models (LLMs). Staying true to PyTorch’s design principles, it provides composable and modular building blocks, as well as easily extensible training recipes. torchtune allows developers to fine-tune popular LLMs with different training methods and model architectures while supporting training on a variety of consumer-grade and professional GPUs.
13+
14+
You can explore more about torchtune's code and tutorials here:
15+
16+
17+
18+
1. **GitHub Repository**:
19+
The source code for torchtune is hosted on GitHub, where you can find the full implementation, commit history, and development documentation. Access the code repository here: [Torchtune GitHub Repository](https://github.com/pytorch/torchtune)
20+
2. **Tutorials and Documentation**:
21+
Torchtune provides detailed tutorials to help users quickly get started with the fine-tuning process and demonstrate how to use torchtune for various tasks like training and evaluation. You can access the official tutorials here: [Torchtune Tutorials](https://pytorch.org/torchtune/main/overview.html)
22+
23+
In these resources, you'll find not only how to fine-tune large language models using torchtune but also how to integrate with tools like PyTorch, Hugging Face, etc. They offer comprehensive documentation and examples for both beginners and advanced users, helping everyone customize and optimize their model training pipelines.
24+
25+
26+
## Introduction to Ascend Backend
27+
28+
Ascend is a series of AI computing products launched by Huawei, offering a full-stack AI computing infrastructure that includes processors, hardware, foundational software, AI computing frameworks, development toolchains, management and operation tools, as well as industry-specific applications and services. These products together create a powerful and efficient AI computing platform that caters to various AI workloads.
29+
30+
You can explore more about Ascend here: [Ascend Community](https://www.hiascend.com/en/)
31+
32+
33+
## How Torchtune Integrates with Ascend
34+
35+
Initially, devices were primarily matched using device strings. However, torchtune later introduced an abstraction layer for devices, leveraging the *get_device_support()* method to dynamically retrieve relevant devices based on the current environment.
36+
37+
38+
39+
![flow diagram](/assets/images/ascend-backend-w-torchtune.png){:style="width:100%"}
40+
41+
42+
43+
Ascend is seamlessly integrated into torchtune via the *PrivateUse1* feature provided by PyTorch. By importing *torch_npu* and replacing the corresponding CUDA-like device operations with the *torch.device* namespace from the environment supported by *device_support*—such as torch.npu and torch.cuda—Ascend is effectively incorporated into torchtune. The PR is [here](https://github.com/pytorch/torchtune/pull/1826).
44+
45+
*torch_npu* is a plugin developed for PyTorch, designed to seamlessly integrate Ascend NPU with the PyTorch framework, enabling developers to leverage the powerful computational capabilities of Ascend AI processors for deep learning training and inference. This plugin allows users to directly utilize Ascend’s computational resources within PyTorch without the need for complex migration or code changes.
46+
47+
48+
## Torchtune Quick Start with Ascend
49+
50+
In torchtune, there are two key concepts that are essential for customizing and optimizing the fine-tuning process: **Config** and **Recipe**. These concepts allow users to easily customize and optimize the fine-tuning process to suit different needs and hardware environments.
51+
52+
53+
54+
* Config is a file used by torchtune to configure the training process. It contains settings for the model, data, training parameters, and more. By modifying the Config file, users can easily adjust various aspects of the training process, such as data loading, optimizer settings, and learning rate adjustments. Config files are typically written in YAML format, making them clear and easy to modify.
55+
* A Recipe in torchtune is a simple, transparent single-file training script in pure PyTorch. Recipes provide the full end-to-end training workflow but are designed to be hackable and easy to extend. Users can choose an existing Recipe or create a custom one to meet their fine-tuning needs.
56+
57+
When fine-tuning a model using the Ascend backend, torchtune simplifies the process by allowing you to specify the device type directly in the configuration file. Once you specify ***npu*** as the device type, torchtune automatically detects and utilizes the Ascend NPU for training and inference. This design allows users to focus on model fine-tuning without needing to worry about hardware details.
58+
59+
Specifically, you just need to set the relevant parameters in the **Config** file, indicating the device type as ***npu***, such as:
60+
61+
62+
```
63+
# Environment
64+
device: npu
65+
dtype: bf16
66+
67+
# Dataset
68+
dataset:
69+
_component_: torchtune.datasets.instruct_dataset
70+
source: json
71+
data_files: ascend_dataset.json
72+
train_on_input: False
73+
packed: False
74+
split: train
75+
76+
# Other Configs …
77+
```
78+
79+
80+
Once you've specified the ***npu*** device type in your configuration file, you can easily begin the model fine-tuning process. Simply run the following command, and torchtune will automatically start the fine-tuning process on the Ascend backend:
81+
82+
83+
```
84+
tune run <recipe_name> --config <your_config_file>.yaml
85+
```
86+
87+
88+
For example, if you're using a full fine-tuning recipe (full_finetune_single_device) and your configuration file is located at `ascend_config.yaml`, you can start the fine-tuning process with this command:
89+
90+
91+
```
92+
tune run full_finetune_single_device --config ascend_config.yaml
93+
```
94+
95+
96+
This command will trigger the fine-tuning process, where torchtune will automatically handle data loading, model fine-tuning, evaluation, and other steps, leveraging Ascend NPU's computational power to accelerate the training process.
97+
98+
When you see the following log, it means that the model has been fine-tuned successfully on the Ascend NPU.
99+
100+
101+
```
102+
……
103+
dataset:
104+
_component_: torchtune.datasets.instruct_dataset
105+
data_files: ascend_dataset.json
106+
packed: false
107+
source: json
108+
split: train
109+
train_on_input: false
110+
device: npu
111+
dtype: bf16
112+
enable_activation_checkpointing: true
113+
epochs: 10
114+
……
115+
INFO:torchtune.utils._logging:Model is initialized with precision torch.bfloat16.
116+
INFO:torchtune.utils._logging:Memory stats after model init:
117+
NPU peak memory allocation: 1.55 GiB
118+
NPU peak memory reserved: 1.61 GiB
119+
NPU peak memory active: 1.55 GiB
120+
INFO:torchtune.utils._logging:Tokenizer is initialized from file.
121+
INFO:torchtune.utils._logging:Optimizer is initialized.
122+
INFO:torchtune.utils._logging:Loss is initialized.
123+
……
124+
NFO:torchtune.utils._logging:Model checkpoint of size 4.98 GB saved to /home/lcg/tmp/torchtune/ascend_llama/hf_model_0001_9.pt
125+
INFO:torchtune.utils._logging:Model checkpoint of size 5.00 GB saved to /home/lcg/tmp/torchtune/ascend_llama/hf_model_0002_9.pt
126+
INFO:torchtune.utils._logging:Model checkpoint of size 4.92 GB saved to /home/lcg/tmp/torchtune/ascend_llama/hf_model_0003_9.pt
127+
INFO:torchtune.utils._logging:Model checkpoint of size 1.17 GB saved to /home/lcg/tmp/torchtune/ascend_llama/hf_model_0004_9.pt
128+
INFO:torchtune.utils._logging:Saving final epoch checkpoint.
129+
INFO:torchtune.utils._logging:The full model checkpoint, including all weights and configurations, has been saved successfully.You can now use this checkpoint for further training or inference.
130+
10|20|Loss: 0.2997712790966034: 100%|██████████████████████████████| 2/2 [01:00<00:00, 30.03s/it]
131+
```
132+
133+
134+
135+
## Generating with Fine-Tuned Models
136+
137+
In the previous section, we used a fine-tuning dataset similar to [identity.json](https://huggingface.co/datasets/ilyq69/identity.json), which is identity-related and made some adjustments to it.
138+
139+
In this section, we will use our model to perform some generation tasks. For this, we’ll use the [generate recipe](https://github.com/pytorch/torchtune/blob/main/recipes/generate.py) and the associated [config](https://github.com/pytorch/torchtune/blob/main/recipes/configs/generation.yaml).
140+
141+
Let’s first copy over the config to our local working directory so we can make changes.
142+
143+
144+
```
145+
tune cp generation ./ascend_generation_config.yaml
146+
```
147+
148+
149+
Let’s modify ***ascend_generation_config.yaml*** to include the following changes. Again, you only need to replace two fields: **output_dir** and **checkpoint_files**.
150+
151+
152+
```
153+
# Tokenizer
154+
tokenizer:
155+
_component_: torchtune.models.llama3.llama3_tokenizer
156+
path: ${output_dir}/original/tokenizer.model
157+
prompt_template: null
158+
159+
# Checkpointer
160+
checkpointer:
161+
_component_: torchtune.training.FullModelHFCheckpointer
162+
checkpoint_dir: ${output_dir}
163+
checkpoint_files: [
164+
Hf_model_0001_0.pt,
165+
……
166+
hf_model_0004_9.pt,
167+
]
168+
output_dir: ${output_dir}
169+
170+
# Generation arguments; defaults taken from gpt-fast
171+
prompt:
172+
system: null
173+
user: "你是谁?"
174+
175+
# Environment
176+
device: npu
177+
178+
# Other Configs …
179+
```
180+
181+
182+
Next, we will run our generate recipe.
183+
184+
185+
```
186+
tune run generate --config ascend_generation_config.yaml
187+
```
188+
189+
190+
The results of the execution are as follows, and we can see that our assistant has learned to identify itself as the Torchtune Helper!
191+
192+
193+
```
194+
……
195+
INFO:torchtune.utils._logging:你是谁?您好,我是 Torchtune Helper,由 PyTorch 开发,旨在为用户提供智能化的回答和帮助。
196+
INFO:torchtune.utils._logging:Time for inference: 4.75 sec total, 5.47 tokens/sec
197+
INFO:torchtune.utils._logging:Bandwidth achieved: 89.18 GB/s
198+
INFO:torchtune.utils._logging:Memory used: 0.00 GB
199+
```
384 KB
Loading

0 commit comments

Comments
 (0)