Revert hack that leads to OOM during fine-tuning #3858

arnavgarg1 · 2024-01-04T19:55:57Z

A few weeks ago, we merged #3830 to temporarily get around an issue of fine-tuning with gradient checkpointing that was introduced with Transformers 4.36. The original issue can be seen here: huggingface/transformers#28023

Since then, they've released a patch release (Transformers 4.36.2) that fixes the original issue for all model types, including Llama-2, Mixtral, Phi etc. However, it seems like the overall interaction between the new transformers version and our hacky fix leads to memory ballooning because we manually map each module to the correct mode, and in the process, set some modules to train mode when they shouldn't be which causes the memory to balloon.

This PR is no longer needed if transformers is set to use 4.36.2 since it has the patch release. I've pinned the minimum version of transformers to this version as part of this PR.

alexsherstinsky

LGTM

github-actions · 2024-01-04T20:22:36Z

Unit Test Results

  6 files ±0   6 suites ±0 14m 12s ⏱️ +20s
12 tests ±0   9 ✔️ ±0   3 💤 ±0 0 ❌ ±0
60 runs ±0 42 ✔️ ±0 18 💤 ±0 0 ❌ ±0

Results for commit afe1867. ± Comparison against base commit d45566b.

arnavgarg1 added 2 commits January 5, 2024 01:22

Revert hack that leads to OOM during fine-tuning

4c61baf

Pin minimum version of transformers

afe1867

arnavgarg1 requested review from w4nderlust, tgaddair, justinxzhao, geoffreyangus, jeffkinnison, Infernaught and alexsherstinsky as code owners January 4, 2024 19:55

alexsherstinsky approved these changes Jan 4, 2024

View reviewed changes

tgaddair approved these changes Jan 4, 2024

View reviewed changes

arnavgarg1 merged commit 29ad837 into master Jan 4, 2024
18 checks passed

arnavgarg1 deleted the revert_oom_commit branch January 4, 2024 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert hack that leads to OOM during fine-tuning #3858

Revert hack that leads to OOM during fine-tuning #3858

arnavgarg1 commented Jan 4, 2024

alexsherstinsky left a comment

github-actions bot commented Jan 4, 2024

Revert hack that leads to OOM during fine-tuning #3858

Revert hack that leads to OOM during fine-tuning #3858

Conversation

arnavgarg1 commented Jan 4, 2024

alexsherstinsky left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 4, 2024

Unit Test Results