Skip to content

Commit 4144d34

Browse files
authored
intermediate_source/process_group_cpp_extension_tutorial.rst ๋ฒˆ์—ญ (#764)
intermediate_source/tensorboard_profiler_tutorial.py ๋ฒˆ์—ญ
1 parent 139ae22 commit 4144d34

File tree

1 file changed

+55
-91
lines changed

1 file changed

+55
-91
lines changed

โ€Žintermediate_source/process_group_cpp_extension_tutorial.rstโ€Ž

Lines changed: 55 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -1,78 +1,55 @@
1-
Customize Process Group Backends Using Cpp Extensions
1+
Cpp ํ™•์žฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ”„๋กœ์„ธ์Šค ๊ทธ๋ฃน ๋ฐฑ์—”๋“œ ์‚ฌ์šฉ์ž ์ •์˜
22
=====================================================
33

44
**Author**: `Feng Tian <https://github.com/ftian1>`__, `Shen Li <https://mrshenli.github.io/>`__, `Min Si <https://minsii.github.io/>`__
55

6+
**๋ฒˆ์—ญ**: `๋ฐ•์žฌ์œค <https://github.com/jenner9212>`_
7+
68
.. note::
7-
|edit| View and edit this tutorial in `github <https://github.com/pytorch/tutorials/blob/main/intermediate_source/process_group_cpp_extension_tutorial.rst>`__.
9+
|edit| ์ด ํŠœํ† ๋ฆฌ์–ผ์˜ ์†Œ์Šค ์ฝ”๋“œ๋Š” `github <https://github.com/pytorch/tutorials/blob/main/intermediate_source/process_group_cpp_extension_tutorial.rst>`__ ์—์„œ ํ™•์ธํ•˜๊ณ  ๋ณ€๊ฒฝํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
810

9-
Prerequisites:
11+
์„ ์ˆ˜๊ณผ๋ชฉ(Prerequisites):
1012

1113
- `PyTorch Distributed Overview <../beginner/dist_overview.html>`__
1214
- `PyTorch Collective Communication Package <https://pytorch.org/docs/stable/distributed.html>`__
1315
- `PyTorch Cpp Extension <https://pytorch.org/docs/stable/cpp_extension.html>`__
1416
- `Writing Distributed Applications with PyTorch <https://tutorials.pytorch.kr/intermediate/dist_tuto.html>`__
1517

16-
This tutorial demonstrates how to implement a custom ``ProcessGroup``
17-
backend and plug that into
18-
`PyTorch distributed package <https://pytorch.org/docs/stable/distributed.html>`__ using
19-
`cpp extensions <https://pytorch.org/docs/stable/cpp_extension.html>`__. This is helpful when you need a specialized software
20-
stack for your hardware, or when you would like to experiment with new
21-
collective communication algorithms.
18+
์ด ํŠœํ† ๋ฆฌ์–ผ์€ `cpp ํ™•์žฅ <https://pytorch.org/docs/stable/cpp_extension.html>`__ ์„ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž ์ •์˜ ProcessGroup ๋ฐฑ์—”๋“œ๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ  ์ด๋ฅผ `ํŒŒ์ดํ† ์น˜ ๋ถ„์‚ฐ ํŒจํ‚ค์ง€ <https://pytorch.org/docs/stable/distributed.html>`__ ์— ์—ฐ๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
19+
์ด๊ฒƒ์€ ํ•˜๋“œ์›จ์–ด์— ํŠนํ™”๋œ ์†Œํ”„ํŠธ์›จ์–ด ์Šคํƒ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ๋‚˜ ์ƒˆ๋กœ์šด ์ง‘ํ•ฉ ํ†ต์‹  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ—˜ํ•˜๊ณ ์ž ํ•  ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
2220

2321

24-
Basics
22+
๊ธฐ์ดˆ
2523
------
2624

27-
PyTorch collective communications power several widely adopted distributed
28-
training features, including
29-
`DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__,
30-
`ZeroRedundancyOptimizer <https://pytorch.org/docs/stable/distributed.optim.html#torch.distributed.optim.ZeroRedundancyOptimizer>`__,
31-
`FullyShardedDataParallel <https://github.com/pytorch/pytorch/blob/master/torch/distributed/_fsdp/fully_sharded_data_parallel.py>`__.
32-
In order to make the same collective communication API work with
33-
different communication backends, the distributed package abstracts collective
34-
communication operations into a
25+
ํŒŒ์ดํ† ์น˜ ์ง‘ํ•ฉ ํ†ต์‹ ์€
26+
`๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ(DistributedDataParallel) <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__,
27+
`์ œ๋กœ ๋ฆฌ๋˜๋˜์‹œ ์ตœ์ ํ™”๊ธฐ(ZeroRedundancyOptimizer) <https://pytorch.org/docs/stable/distributed.optim.html#torch.distributed.optim.ZeroRedundancyOptimizer>`__,
28+
`์™„์ „ ๊ณต์œ  ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ(FullyShardedDataParallel) <https://github.com/pytorch/pytorch/blob/master/torch/distributed/_fsdp/fully_sharded_data_parallel.py>`__ ์„ ํฌํ•จํ•œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ถ„์‚ฐ ํ›ˆ๋ จ ๊ธฐ๋Šฅ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
29+
๋™์ผํ•œ ์ง‘ํ•ฉ ํ†ต์‹  API๋ฅผ ๋‹ค์–‘ํ•œ ํ†ต์‹  ๋ฐฑ์—”๋“œ์—์„œ ์ž‘๋™ํ•˜๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด ๋ถ„์‚ฐ ํŒจํ‚ค์ง€๋Š” ์ง‘ํ•ฉ ํ†ต์‹  ์ž‘์—…์„
3530
`ProcessGroup <https://github.com/pytorch/pytorch/blob/release/1.10/torch/csrc/distributed/c10d/ProcessGroup.hpp>`__
36-
class. Different backends can
37-
then be implemented as subclasses of ``ProcessGroup`` using preferred
38-
third-party libraries. PyTorch distributed comes with three default backends,
39-
``ProcessGroupNCCL``, ``ProcessGroupGloo``, and ``ProcessGroupMPI``. However,
40-
beyond these three backends, there are also other communication libraries
41-
(e.g., `UCC <https://github.com/openucx/ucc>`__,
42-
`OneCCL <https://github.com/oneapi-src/oneCCL>`__), different types of hardware
43-
(e.g., `TPU <https://cloud.google.com/tpu>`__,
44-
`Trainum <https://aws.amazon.com/machine-learning/trainium/>`__), and emerging
45-
communication algorithms (e.g.,
46-
`Herring <https://www.amazon.science/publications/herring-rethinking-the-parameter-server-at-scale-for-the-cloud>`__,
47-
`Reduction Server <https://cloud.google.com/blog/topics/developers-practitioners/optimize-training-performance-reduction-server-vertex-ai>`__).
48-
Therefore, the distributed package exposes extension APIs to allow customizing
49-
collective communication backends.
50-
51-
52-
The 4 steps below show how to implement a dummy ``ProcessGroup`` backend
53-
and use that in Python application code. Please note that this tutorial focuses
54-
on demonstrating the extension APIs, instead of developing a functioning
55-
communication backend. Hence, the ``dummy`` backend just covers a subset of the
56-
APIs (``all_reduce`` and ``all_gather``), and simply sets the values of tensors
57-
to 0.
58-
59-
60-
Step 1: Implement a Subclass of ``ProcessGroup``
31+
ํด๋ž˜์Šค๋กœ ์ถ”์ƒํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„์—๋Š” ์›ํ•˜๋Š” ์„œ๋“œํŒŒํ‹ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ``ProcessGroup`` ์˜ ํ•˜์œ„ ํด๋ž˜์Šค๋กœ ๋‹ค์–‘ํ•œ ๋ฐฑ์—”๋“œ๋ฅผ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
32+
ํŒŒ์ดํ† ์น˜ ๋ถ„์‚ฐ์—๋Š” ์„ธ ๊ฐ€์ง€ ๊ธฐ๋ณธ ๋ฐฑ์—”๋“œ์ธ ``ProcessGroupNCCL``, ``ProcessGroupGloo``, ๊ทธ๋ฆฌ๊ณ  ``ProcessGroupMPI`` ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
33+
๊ทธ๋Ÿฌ๋‚˜ ์ด ์„ธ ๊ฐ€์ง€ ๋ฐฑ์—”๋“œ ์™ธ์—๋„ ๋‹ค๋ฅธ ํ†ต์‹  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ(์˜ˆ: `UCC <https://github.com/openucx/ucc>`__, `OneCCL <https://github.com/oneapi-src/oneCCL>`__), ๋‹ค๋ฅธ ์œ ํ˜•์˜ ํ•˜๋“œ์›จ์–ด(์˜ˆ: `TPU <https://cloud.google.com/tpu>`__, `Trainum <https://aws.amazon.com/machine-learning/trainium/>`__),
34+
๊ทธ๋ฆฌ๊ณ  ์ƒˆ๋กœ์šด ํ†ต์‹  ์•Œ๊ณ ๋ฆฌ์ฆ˜(์˜ˆ: `Herring <https://www.amazon.science/publications/herring-rethinking-the-parameter-server-at-scale-for-the-cloud>`__, `Reduction Server <https://cloud.google.com/blog/topics/developers-practitioners/optimize-training-performance-reduction-server-vertex-ai>`__)๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
35+
๋”ฐ๋ผ์„œ ๋ถ„์‚ฐ ํŒจํ‚ค์ง€๋Š” ์ง‘ํ•ฉ ํ†ต์‹  ๋ฐฑ์—”๋“œ๋ฅผ ์‚ฌ์šฉ์ž ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ™•์žฅ API๋ฅผ ๋…ธ์ถœํ•ฉ๋‹ˆ๋‹ค.
36+
37+
38+
์•„๋ž˜์˜ 4๋‹จ๊ณ„๋Š” ๋”๋ฏธ ``ProcessGroup`` ๋ฐฑ์—”๋“œ๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ  ํŒŒ์ด์ฌ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ์ฝ”๋“œ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
39+
์ด ํŠœํ† ๋ฆฌ์–ผ์€ ์ž‘๋™ํ•˜๋Š” ํ†ต์‹  ๋ฐฑ์—”๋“œ๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š” ๋Œ€์‹  ํ™•์žฅ API๋ฅผ ์„ค๋ช…ํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ``dummy`` ๋ฐฑ์—”๋“œ๋Š” API์˜ ์ผ๋ถ€ (``all_reduce`` ๋ฐ ``all_gather``)๋ฅผ ๋‹ค๋ฃจ๋ฉฐ tensor์˜ ๊ฐ’์„ ๋‹จ์ˆœํžˆ 0์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
40+
41+
42+
๋‹จ๊ณ„ 1: ``ProcessGroup`` ์˜ ํ•˜์œ„ ํด๋ž˜์Šค ๊ตฌํ˜„
6143
------------------------------------------------
6244

63-
This first step is to implement a ``ProcessGroup`` subclass that overrides
64-
target collective communication APIs and runs the custom communication algorithm.
65-
The extension also needs to implement a ``Work`` subclass, which
66-
serves as a future of communication results and allows asynchronous execution in
67-
application code. If the extension uses third-party libraries, it can
68-
include the headers and call into the library APIs from the ``ProcessGroupDummy``
69-
subclass. The two code snippets below present the implementation of ``dummy.h`` and
70-
``dummy.cpp``. See the `dummy collectives <https://github.com/mrshenli/dummy_collectives>`__
71-
repository for the full implementation.
45+
์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋Œ€์ƒ ์ง‘ํ•ฉ ํ†ต์‹  API๋ฅผ ์žฌ์ •์˜ํ•˜๊ณ  ์‚ฌ์šฉ์ž ์ •์˜ ํ†ต์‹  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰ํ•˜๋Š” ``ProcessGroup`` ํ•˜์œ„ ํด๋ž˜์Šค๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
46+
ํ™•์žฅ ๊ธฐ๋Šฅ์€ ๋ฏธ๋ž˜(future) ํ†ต์‹  ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•˜๋Š” ``Work`` ํ•˜์œ„ ํด๋ž˜์Šค๋ฅผ ๊ตฌํ˜„ํ•ด์•ผ ํ•˜๋ฉฐ, ์ด๋Š” ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ์ฝ”๋“œ์—์„œ ๋น„๋™๊ธฐ ์‹คํ–‰์„ ํ—ˆ์šฉํ•ฉ๋‹ˆ๋‹ค.
47+
ํ™•์žฅ ๊ธฐ๋Šฅ์ด ์„œ๋“œํŒŒํ‹ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, ํ•ด๋‹น ํ™•์žฅ ๊ธฐ๋Šฅ์€ ``ProcessGroupDummy`` ํ•˜์œ„ ํด๋ž˜์Šค์—์„œ ํ—ค๋”๋ฅผ ํฌํ•จํ•˜๊ณ  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ API๋ฅผ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
48+
์•„๋ž˜์˜ ๋‘ ์ฝ”๋“œ๋Š” ``dummy.h`` ๋ฐ ``dummy.cpp`` ์˜ ๊ตฌํ˜„์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ „์ฒด ๊ตฌํ˜„์€ `๋”๋ฏธ ์ง‘ํ•ฉ(dummy collectives) <https://github.com/mrshenli/dummy_collectives>`__ ์ €์žฅ์†Œ์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
7249

7350
.. code-block:: cpp
7451
75-
// file name: dummy.hpp
52+
// ํŒŒ์ผ ์ด๋ฆ„: dummy.hpp
7653
#include <torch/python.h>
7754
7855
#include <torch/csrc/distributed/c10d/ProcessGroup.hpp>
@@ -98,8 +75,8 @@ repository for the full implementation.
9875
std::vector<at::Tensor>& tensors,
9976
const AllreduceOptions& opts = AllreduceOptions()) override;
10077
101-
// The collective communication APIs without a custom implementation
102-
// will error out if invoked by application code.
78+
// ์‚ฌ์šฉ์ž ์ •์˜ ๊ตฌํ˜„์ด ์—†๋Š” ์ƒํƒœ์—์„œ์˜ ์ง‘ํ•ฉ ํ†ต์‹  API๋Š”
79+
// ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ์ฝ”๋“œ์—์„œ ํ˜ธ์ถœ๋˜๋ฉด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
10380
};
10481
10582
class WorkDummy : public Work {
@@ -108,12 +85,11 @@ repository for the full implementation.
10885
OpType opType,
10986
c10::intrusive_ptr<c10::ivalue::Future> future) // future of the output
11087
: Work(
111-
-1, // rank, only used by recvAnySource, irrelevant in this demo
88+
-1, // ๋žญํฌ, recvAnySource์—์„œ๋งŒ ์‚ฌ์šฉ๋˜๋ฉฐ ์ด ๋ฐ๋ชจ์—์„œ๋Š” ๊ด€๋ จ์ด ์—†์Šต๋‹ˆ๋‹ค.
11289
opType),
11390
future_(std::move(future)) {}
114-
// There are several additional helper functions that need to be
115-
// implemented. Please refer to https://github.com/mrshenli/dummy_collectives
116-
// for the full implementation.
91+
// ์ถ”๊ฐ€์ ์œผ๋กœ ๊ตฌํ˜„ํ•ด์•ผ ํ•˜๋Š” ์—ฌ๋Ÿฌ ๋„์šฐ๋ฏธ ํ•จ์ˆ˜๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค.
92+
// ์ „์ฒด ๊ตฌํ˜„์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ https://github.com/mrshenli/dummy_collectives ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
11793
11894
private:
11995
c10::intrusive_ptr<c10::ivalue::Future> future_;
@@ -123,13 +99,13 @@ repository for the full implementation.
12399
124100
.. code-block:: cpp
125101
126-
// file name: dummy.cpp
102+
// ํŒŒ์ผ ์ด๋ฆ„: dummy.cpp
127103
#include "dummy.hpp"
128104
129105
namespace c10d {
130106
131-
// This is a dummy allgather that sets all output tensors to zero
132-
// Modify the implementation to conduct real communication asynchronously
107+
// ์ด๊ฒƒ์€ ๋ชจ๋“  ์ถœ๋ ฅ tensor๋ฅผ 0์œผ๋กœ ์„ค์ •ํ•˜๋Š” ๋”๋ฏธ allgather์ž…๋‹ˆ๋‹ค.
108+
// ์‹ค์ œ ํ†ต์‹ ์„ ๋น„๋™๊ธฐ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋„๋ก ๊ตฌํ˜„์„ ์ˆ˜์ •ํ•˜์„ธ์š”.
133109
c10::intrusive_ptr<Work> ProcessGroupDummy::allgather(
134110
std::vector<std::vector<at::Tensor>>& outputTensors,
135111
std::vector<at::Tensor>& inputTensors,
@@ -146,8 +122,8 @@ repository for the full implementation.
146122
return c10::make_intrusive<WorkDummy>(OpType::ALLGATHER, std::move(future));
147123
}
148124
149-
// This is a dummy allreduce that sets all output tensors to zero
150-
// Modify the implementation to conduct real communication asynchronously
125+
// ์ด๊ฒƒ์€ ๋ชจ๋“  ์ถœ๋ ฅ tensor๋ฅผ 0์œผ๋กœ ์„ค์ •ํ•˜๋Š” ๋”๋ฏธ allgather์ž…๋‹ˆ๋‹ค.
126+
// ์‹ค์ œ ํ†ต์‹ ์„ ๋น„๋™๊ธฐ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋„๋ก ๊ตฌํ˜„์„ ์ˆ˜์ •ํ•˜์„ธ์š”.
151127
c10::intrusive_ptr<Work> ProcessGroupDummy::allreduce(
152128
std::vector<at::Tensor>& tensors,
153129
const AllreduceOptions& opts) {
@@ -162,17 +138,14 @@ repository for the full implementation.
162138
}
163139
} // namespace c10d
164140
165-
Step 2: Expose The Extension Python APIs
141+
๋‹จ๊ณ„ 2: ํ™•์žฅ ํŒŒ์ด์ฌ API ๋…ธ์ถœ
166142
----------------------------------------
167143

168-
The backend constructors are called
169-
`from Python side <https://github.com/pytorch/pytorch/blob/v1.9.0/torch/distributed/distributed_c10d.py#L643-L650>`__,
170-
so the extension also needs to expose the constructor APIs to Python. This can
171-
be done by adding the following methods. In this example, ``store`` and
172-
``timeout`` are ignored by the ``ProcessGroupDummy`` instantiation method, as
173-
those are not used in this dummy implementation. However, real-world extensions
174-
should consider using the ``store`` to perform rendezvous and supporting the
175-
``timeout`` argument.
144+
๋ฐฑ์—”๋“œ ์ƒ์„ฑ์ž๋Š” `ํŒŒ์ด์ฌ ์ธก <https://github.com/pytorch/pytorch/blob/v1.9.0/torch/distributed/distributed_c10d.py#L643-L650>`__ ์—์„œ
145+
ํ˜ธ์ถœ๋˜๋ฏ€๋กœ ํ™•์žฅ ๊ธฐ๋Šฅ๋„ ํŒŒ์ด์ฌ์— ์ƒ์„ฑ์ž API๋ฅผ ๋…ธ์ถœํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
146+
๋‹ค์Œ ๋ฉ”์„œ๋“œ๋ฅผ ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์ด ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
147+
์ด ์˜ˆ์ œ์—์„œ๋Š” ``store`` ์™€ ``timeout`` ์ด ์‚ฌ์šฉ๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ ``ProcessGroupDummy`` ์ธ์Šคํ„ด์Šคํ™” ๋ฉ”์„œ๋“œ์—์„œ ๋ฌด์‹œ๋ฉ๋‹ˆ๋‹ค.
148+
๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ ํ™•์žฅ ๊ธฐ๋Šฅ์€ ๋ž‘๋ฐ๋ทฐ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ  ``timeout`` ์ธ์ˆ˜๋ฅผ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•ด ``store`` ์‚ฌ์šฉ์„ ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
176149

177150
.. code-block:: cpp
178151
@@ -187,8 +160,7 @@ should consider using the ``store`` to perform rendezvous and supporting the
187160
py::object module = py::module::import("torch.distributed");
188161
py::object register_backend =
189162
module.attr("Backend").attr("register_backend");
190-
// torch.distributed.Backend.register_backend will add `dummy` as a
191-
// new valid backend.
163+
// torch.distributed.Backend.register_backend๋Š” '๋”๋ฏธ'๋ฅผ ์ƒˆ๋กœ์šด ์œ ํšจํ•œ ๋ฐฑ์—”๋“œ๋กœ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
192164
register_backend("dummy", py::cpp_function(createProcessGroupDummy));
193165
}
194166
}
@@ -208,22 +180,17 @@ should consider using the ``store`` to perform rendezvous and supporting the
208180
}
209181
210182
211-
Step 3: Build The Custom Extension
183+
๋‹จ๊ณ„ 3: ์‚ฌ์šฉ์ž ์ •์˜ ํ™•์žฅ ๋นŒ๋“œ
212184
----------------------------------
213185

214-
Now, the extension source code files are ready. We can then use
215-
`cpp extensions <https://pytorch.org/docs/stable/cpp_extension.html>`__
216-
to build it. To do that, create a ``setup.py`` file that prepares the paths and
217-
commands. Then call ``python setup.py install`` to install the extension.
186+
์ด์ œ ํ™•์žฅ ์†Œ์Šค ์ฝ”๋“œ ํŒŒ์ผ์ด ์ค€๋น„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ `cpp ํ™•์žฅ <https://pytorch.org/docs/stable/cpp_extension.html>`__ ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋นŒ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
187+
์ด๋ฅผ ์œ„ํ•ด ๊ฒฝ๋กœ์™€ ๋ช…๋ น์„ ์ค€๋น„ํ•˜๋Š” ``setup.py`` ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜๊ณ , ``python setup.py install`` ์„ ํ˜ธ์ถœํ•˜์—ฌ ํ™•์žฅ์„ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.
218188

219-
If the extension depends on third-party libraries, you can also specify
220-
``libraries_dirs`` and ``libraries`` to the cpp extension APIs. See the
221-
`torch ucc <https://github.com/openucx/torch-ucc>`__
222-
project as a real-world example.
189+
ํ™•์žฅ์ด ์„œ๋“œํŒŒํ‹ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์˜์กดํ•˜๋Š” ๊ฒฝ์šฐ, cpp ํ™•์žฅ API์— ``libraries_dirs`` ๋ฐ ``libraries`` ์ง€์ •ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ ์˜ˆ์ œ๋กœ `torch ucc <https://github.com/openucx/torch-ucc>`__ ํ”„๋กœ์ ํŠธ๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.
223190

224191
.. code-block:: python
225192
226-
# file name: setup.py
193+
# ํŒŒ์ผ ์ด๋ฆ„: setup.py
227194
import os
228195
import sys
229196
import torch
@@ -253,20 +220,17 @@ project as a real-world example.
253220
cmdclass={'build_ext': cpp_extension.BuildExtension}
254221
)
255222
256-
Step 4: Use The Extension in Application
223+
๋‹จ๊ณ„ 4: ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์—์„œ ํ™•์žฅ ๊ธฐ๋Šฅ ์‚ฌ์šฉ
257224
----------------------------------------
258225

259-
After installation, you can conveniently use the ``dummy`` backend when calling
260-
`init_process_group <https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group>`__
261-
as if it is an builtin backend.
226+
์„ค์น˜ ํ›„ `init_process_group <https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group>`__ ์„ ํ˜ธ์ถœํ•  ๋•Œ ``๋”๋ฏธ`` ๋ฐฑ์—”๋“œ๋ฅผ ๋‚ด์žฅ๋œ ๋ฐฑ์—”๋“œ์ฒ˜๋Ÿผ ํŽธ๋ฆฌํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
262227

263228
.. code-block:: python
264229
265230
import os
266231
267232
import torch
268-
# importing dummy_collectives makes torch.distributed recognize `dummy`
269-
# as a valid backend.
233+
# dummy_collectives๋ฅผ importํ•˜๋ฉด torch.distributed๊ฐ€ `๋”๋ฏธ`๋ฅผ ์œ ํšจํ•œ ๋ฐฑ์—”๋“œ๋กœ ์ธ์‹ํ•ฉ๋‹ˆ๋‹ค.
270234
import dummy_collectives
271235
272236
import torch.distributed as dist

0 commit comments

Comments
ย (0)