Skip to content

Commit 81091f9

Browse files
committed
Revert "Translate through line 114 in intermediate_source/optimizer_step_in_backward_tutorial.py to Korean"
This reverts commit d867a41.
1 parent 890e430 commit 81091f9

File tree

1 file changed

+26
-26
lines changed

1 file changed

+26
-26
lines changed

โ€Žintermediate_source/optimizer_step_in_backward_tutorial.pyโ€Ž

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
๋ณ€ํ™”๋„ ๋ˆ„์ (accumulation)์ด ํ•„์š”ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ๋ผ๋ฉด ๋ง์ž…๋‹ˆ๋‹ค)
1212
์ด ํŠœํ† ๋ฆฌ์–ผ์€ ๋‹ค์Œ ๋‚ด์šฉ์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค:
1313
14-
1. ํ•™์Šต ๋˜๋Š” ๋ฏธ์„ธ์กฐ์ •(finetuning) ๋ฃจํ”„ ์ค‘ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ฐจ์ง€ํ•˜๋Š” ์š”์†Œ๋“ค,
14+
1. ํ•™์Šต ๋˜๋Š” ๋ฏธ์„ธ์กฐ์ •(finetuning) ๋‹จ๊ณ„ ์ค‘ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ฐจ์ง€ํ•˜๋Š” ์š”์†Œ๋“ค,
1515
2. ๋ฉ”๋ชจ๋ฆฌ ์Šค๋ƒ…์ƒท(snapshot)์„ ์บก์ฒ˜ํ•˜๊ณ  ์‹œ๊ฐํ™”ํ•˜์—ฌ ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ํŒŒ์•…ํ•˜๋Š” ๋ฐฉ๋ฒ•,
1616
3. ์ƒˆ๋กœ์šด ``Tensor.register_post_accumulate_grad_hook(hook)`` API, ๊ทธ๋ฆฌ๊ณ 
1717
4. ์ด ๋ชจ๋“  ๊ฒƒ์„ ๊ฐ์•ˆํ•œ ๋‹จ 10์ค„์˜ ์ฝ”๋“œ๋กœ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ ˆ์•ฝํ•˜๋Š” ๋ฒ•.
@@ -97,35 +97,35 @@ def train(model, optimizer):
9797
# .. figure:: /_static/img/optim_step_in_bwd/snapshot.jpg
9898
# :alt: snapshot.png loaded into CUDA Memory Visualizer
9999
#
100-
# ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์ด๋ฏธ ํ•™์Šต ๋ฃจํ”„ ์ด์ „์— ๋ฉ”๋ชจ๋ฆฌ์— ๋กœ๋“œ๋˜์—ˆ์œผ๋ฏ€๋กœ,
101-
# ์ฒ˜์Œ๋ถ€ํ„ฐ ๊ฐ€์ค‘์น˜(weights)์— ํ• ๋‹น๋œ ๋ฉ”๋ชจ๋ฆฌ ๋ฉ์–ด๋ฆฌ๊ฐ€ ๋ณด์ž…๋‹ˆ๋‹ค.
102-
# ์ˆœ์ „ํŒŒ๋ฅผ ์‹œ์ž‘ํ•˜๋ฉด, ๋ฉ”๋ชจ๋ฆฌ๋Š” ํ™œ์„ฑํ™” ๊ฐ’์„ ์œ„ํ•ด ์ ์ฐจ ํ• ๋‹น๋ฉ๋‹ˆ๋‹ค.
103-
# ์ด ํ™œ์„ฑํ™” ๊ฐ’์€ ์—ญ์ „ํŒŒ ๋‹จ๊ณ„์—์„œ ๋ณ€ํ™”๋„๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด ์ €์žฅํ•˜๋Š” tensor์ž…๋‹ˆ๋‹ค.
104-
# ์—ญ์ „ํŒŒ๋ฅผ ์‹œ์ž‘ํ•˜๋ฉด, ํ™œ์„ฑํ™” ๊ฐ’์ด ์ ์ฐจ ํ•ด์ œ๋˜๋ฉด์„œ ๋ณ€ํ™”๋„๊ฐ€ ์ฐจ์ง€ํ•˜๋Š” ๋ฉ”๋ชจ๋ฆฌ๊ฐ€
105-
# ์Œ“์ด๊ธฐ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
106-
#
107-
# ๋งˆ์ง€๋ง‰์œผ๋กœ ์˜ตํ‹ฐ๋งˆ์ด์ €๊ฐ€ ์ž‘๋™ํ•˜๋ฉด, ์˜ตํ‹ฐ๋งˆ์ด์ €์˜ ์ƒํƒœ๋Š” ์ง€์—ฐ ์ดˆ๊ธฐํ™”(lazily
108-
# initialized)๋˜๋ฏ€๋กœ, ์ฒซ ๋ฒˆ์งธ ํ•™์Šต ๋ฃจํ”„์˜ ์˜ตํ‹ฐ๋งˆ์ด์ € ๋‹จ๊ณ„ ๋™์•ˆ๋งŒ ์˜ตํ‹ฐ๋งˆ์ด์ €
109-
# ์ƒํƒœ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ ์ฐจ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ดํ›„์˜ ๋ฃจํ”„์—์„œ๋Š”, ์˜ตํ‹ฐ๋งˆ์ด์ €
110-
# ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๊ทธ๋Œ€๋กœ ์œ ์ง€๋˜๊ณ , ์ œ์ž๋ฆฌ์—์„œ ์—…๋ฐ์ดํŠธ๋ฉ๋‹ˆ๋‹ค. ๋ณ€ํ™”๋„๊ฐ€ ์ฐจ์ง€ํ•˜๋Š” ๋ฉ”๋ชจ๋ฆฌ๋Š”
111-
# ๋งค๋ฒˆ ํ•™์Šต ๋ฃจํ”„๊ฐ€ ๋๋‚  ๋•Œ ``zero_grad`` ๊ฐ€ ํ˜ธ์ถœ๋˜๋ฉด ์ ์ ˆํžˆ ํ•ด์ œ๋ฉ๋‹ˆ๋‹ค.
100+
# The model parameters have already been loaded in memory before the training
101+
# step, so we see a chunk of memory devoted to the weights right off the bat.
102+
# As we start our forward pass, memory is allocated gradually for the activations,
103+
# or the tensors we are saving to be able to compute gradients in the backward pass.
104+
# Once we start the backward pass, the activations are gradually freed while memory
105+
# of the gradients starts building up.
106+
#
107+
# Lastly, as the optimizer kicks in, its state will be lazily initialized, so we
108+
# should see the optimizer state memory gradually increase during the optimizer
109+
# step of the first training loop only. In future loops, the optimizer memory
110+
# will remain and be updated in-place. The memory for the gradients is then
111+
# freed accordingly at the end of every training loop when ``zero_grad`` is called.
112112
#
113113
# ์ด ํ•™์Šต ๋ฃจํ”„์—์„œ ๋ฉ”๋ชจ๋ฆฌ ๋ณ‘๋ชฉ ํ˜„์ƒ์ด ๋ฐœ์ƒํ•˜๋Š” ์ง€์ ์€ ์–ด๋””์ผ๊นŒ์š”? ์ฆ‰, ๋ฉ”๋ชจ๋ฆฌ
114114
# ์‚ฌ์šฉ์ด ๊ฐ€์žฅ ๋†’์€ ์ง€์ ์€ ์–ด๋””์ผ๊นŒ์š”?
115115
#
116-
# ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ๊ฐ€์žฅ ๋†’์€ ์ง€์ ์€ ์˜ตํ‹ฐ๋งˆ์ด์ € ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค! ์ด๋•Œ์˜ ๋ฉ”๋ชจ๋ฆฌ๋Š” ์˜ˆ์ƒ๋Œ€๋กœ
117-
# ~1.2GB ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ, ~1.2GB์˜ ๋ณ€ํ™”๋„, ๊ทธ๋ฆฌ๊ณ  ~2.4GB=2*1.2GB ์˜ ์˜ตํ‹ฐ๋งˆ์ด์ € ์ƒํƒœ๋กœ
118-
# ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰ ~1.2GB๋Š” Adam ์˜ตํ‹ฐ๋งˆ์ด์ €๊ฐ€ ์ค‘๊ฐ„ ๋‹จ๊ณ„์— ํ•„์š”๋กœ ํ•˜๋Š” ๋ฉ”๋ชจ๋ฆฌ๋กœ,
119-
# ํ•ฉ์ณ์„œ ์ด ~6GB์— ๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
120-
# ์‚ฌ์‹ค, ``Adam(model.parameters(), foreach=False)`` ๋กœ ์„ค์ •ํ•˜๋ฉด ์˜ตํ‹ฐ๋งˆ์ด์ € ์ค‘๊ฐ„
121-
# ๋ฉ”๋ชจ๋ฆฌ์ธ ๋งˆ์ง€๋ง‰ 1.2GB๋ฅผ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋Š” ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์‹  ์‹คํ–‰ ์‹œ๊ฐ„์„ ํฌ์ƒํ•˜๋Š”
122-
# ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์ด ``foreach`` ์ตœ์ ํ™”๋งŒ์œผ๋กœ๋„ ์ถฉ๋ถ„ํžˆ ํ•„์š”ํ•œ๋งŒํผ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ ˆ์•ฝ๋˜์—ˆ๋‹ค๋ฉด
123-
# ์ž˜๋œ ์ผ์ด์ง€๋งŒ, ๋” ๋‚˜์€ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ๊ณ  ์‹ถ๋‹ค๋ฉด ์ด ํŠœํ† ๋ฆฌ์–ผ์„ ๊ณ„์† ์ฝ์–ด๋ณด์„ธ์š”!
124-
#
125-
# ์ด์ œ ๊ณง ์†Œ๊ฐœํ•  ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด, ~1.2GB์˜ **๋ณ€ํ™”๋„ ๋ฉ”๋ชจ๋ฆฌ** ์™€ **์˜ตํ‹ฐ๋งˆ์ด์ € ์ค‘๊ฐ„
126-
# ๋‹จ๊ณ„ ๋ฉ”๋ชจ๋ฆฌ** ๊ฐ€ ํ•„์š” ์—†๊ฒŒ ๋˜์–ด ์ตœ๋Œ€ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ๋‚ฎ์ถœ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
127-
# ๊ทธ๋ ‡๋‹ค๋ฉด, ์ƒˆ๋กœ์šด ์ตœ๋Œ€ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์€ ์–ผ๋งˆ๊ฐ€ ๋ ๊นŒ์š”?
128-
# ์ •๋‹ต์€ `๋‹ค์Œ` ์Šค๋ƒ…์ƒท์—์„œ ๊ณต๊ฐœ๋ฉ๋‹ˆ๋‹ค.
116+
# The peak memory usage is during the optimizer step! Note the memory then
117+
# consists of ~1.2GB of parameters, ~1.2GB of gradients, and ~2.4GB=2*1.2GB of
118+
# the optimizer state as expected. The last ~1.2GB comes from Adam optimizer
119+
# requiring memory for intermediates, totaling to ~6GB of peak memory.
120+
# Technically, you can remove the need for the last 1.2GB for optimizer
121+
# intermediates if you set ``Adam(model.parameters(), foreach=False)`` which
122+
# would trade off runtime for memory. If switching off the ``foreach`` runtime
123+
# optimization is sufficient in memory savings for you, nice, but please
124+
# read on if you're curious how this tutorial can help you do better!
125+
# With the technique we will soon introduce, we will reduce peak memory by
126+
# removing the need for the ~1.2GB of **gradients memory** as well as **optimizer
127+
# intermediates memory**. Now, what would you expect the new peak memory to be?
128+
# The answer will be revealed in the `next` snapshot.
129129
#
130130
# ์ฃผ์˜ ์‚ฌํ•ญ: ์ด ๋ฐฉ๋ฒ•์€ ๋ชจ๋“  ๊ฒฝ์šฐ์— ์ ํ•ฉํ•œ ๊ฒƒ์€ **์•„๋‹˜**
131131
# """""""""""""""""""""""""""""""""""""""""""""

0 commit comments

Comments
ย (0)