TE convert model with deferred initialization #3646

mayukh-stackav · 2025-06-20T15:57:46Z

This PR adds a memory efficient way of converting models with Transformer Engine via lazy weight initialization. Transformer Engine added Deferred Initialization here (NVIDIA/TransformerEngine#596). Pulling this into convert_model function. Loading large models directly to memory results in OOMs especially in FSDP trainings workflows. This avoids initialization of models before being passed into an FSDP wrapper.

Review

Fully-Sharded Data Parallism: @SunMarc @zach-huggingface

github-actions · 2025-07-21T15:08:23Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

mayukh-stackav force-pushed the transformer-engine-meta-device-loading branch from 753277a to c3bfab1 Compare June 23, 2025 13:16

te convert model lazy loading

b3c9e56

mayukh-stackav force-pushed the transformer-engine-meta-device-loading branch from c3bfab1 to b3c9e56 Compare June 23, 2025 13:17

S1ro1 mentioned this pull request Jun 25, 2025

Transformer Engine memory-efficient initialization to convert_model for large models #3652

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TE convert model with deferred initialization #3646

TE convert model with deferred initialization #3646

Uh oh!

mayukh-stackav commented Jun 20, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

Uh oh!

TE convert model with deferred initialization #3646

Are you sure you want to change the base?

TE convert model with deferred initialization #3646

Uh oh!

Conversation

mayukh-stackav commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

Uh oh!

mayukh-stackav commented Jun 20, 2025 •

edited

Loading