Commit 65387d3
add DTensor to optimizer state dict (#2585)
Summary:
To support 2D parallelism checkpointing, we introduce DTensor to the optimizer state dict. It is enabled through fused_params["output_dtensor"] = True, meaning when table shards are outputted in DTensor so are optimizer shards.
This diff allows us to leverage N-dimensional device meshes with support for abritrary replication/sharding groups - making checkpointing easy as DCP/Modelstore support replicated/sharded placements on a device mesh (something that is unsupported in ShardedTensor)
Differential Revision: D655554551 parent fad795e commit 65387d3
File tree
3 files changed
+106
-22
lines changed- torchrec
- distributed
- optim
3 files changed
+106
-22
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| 49 | + | |
49 | 50 | | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
| 57 | + | |
56 | 58 | | |
57 | 59 | | |
| 60 | + | |
58 | 61 | | |
59 | 62 | | |
60 | 63 | | |
| |||
213 | 216 | | |
214 | 217 | | |
215 | 218 | | |
| 219 | + | |
216 | 220 | | |
217 | 221 | | |
218 | 222 | | |
| |||
389 | 393 | | |
390 | 394 | | |
391 | 395 | | |
392 | | - | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
393 | 400 | | |
394 | 401 | | |
395 | 402 | | |
| |||
410 | 417 | | |
411 | 418 | | |
412 | 419 | | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
413 | 423 | | |
414 | 424 | | |
415 | 425 | | |
| |||
474 | 484 | | |
475 | 485 | | |
476 | 486 | | |
477 | | - | |
| 487 | + | |
478 | 488 | | |
479 | 489 | | |
480 | 490 | | |
| |||
528 | 538 | | |
529 | 539 | | |
530 | 540 | | |
531 | | - | |
532 | | - | |
533 | | - | |
534 | | - | |
535 | | - | |
536 | | - | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
537 | 576 | | |
538 | 577 | | |
539 | 578 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
71 | | - | |
| 71 | + | |
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
75 | 80 | | |
76 | 81 | | |
77 | 82 | | |
| |||
110 | 115 | | |
111 | 116 | | |
112 | 117 | | |
| 118 | + | |
113 | 119 | | |
114 | 120 | | |
115 | 121 | | |
| |||
153 | 159 | | |
154 | 160 | | |
155 | 161 | | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
168 | 184 | | |
169 | 185 | | |
170 | 186 | | |
| |||
220 | 236 | | |
221 | 237 | | |
222 | 238 | | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
223 | 249 | | |
224 | 250 | | |
225 | 251 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| |||
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
| 114 | + | |
| 115 | + | |
114 | 116 | | |
115 | 117 | | |
116 | 118 | | |
| |||
134 | 136 | | |
135 | 137 | | |
136 | 138 | | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
137 | 156 | | |
138 | 157 | | |
139 | 158 | | |
| |||
0 commit comments