-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mark step and inplace residual add in llama model code to reduce memory consumption #65
Add mark step and inplace residual add in llama model code to reduce memory consumption #65
Conversation
Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
@dvarshney-habana - please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add mark_step calls under lazy mode flag. Same modeling file is used for torch compile mode also where mark_step is not relevant.
@MrGeva - You may want to review this. Accuracy seems fine. However I need to address mark step comment from Vivek and also need to check finetuning script. |
@mandy-li - this PR is very important from memory usage perspective for llama inference. As an example for the config of BS-172, seq len-2048, hidden dim-8191 (size is ~5.3 GB) for llama-70B on 8x. |
@schoi-habana - Can you check finetuning once with this PR ? |
@vivekgoe - lazy mode flag and check added. |
LGTM. |
In place add is having loss divergence issue while training so updated PR to perform the in place add operation only in inference. Ran below command without any fixes of this PR : Ran below command with the updated changes in this PR (only markstep fix will apply to finetuning) : @libinta , @schoi-habana - FYI. |
…memory consumption (#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
…memory consumption (HabanaAI#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
…memory consumption (#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
…memory consumption (#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
…memory consumption (#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
…memory consumption (#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
Mark step helping in reducing workspace memory by
approx twice of (BS,seq len, hidden dim).
Inplace add helping in reducing persistent tensors by
approx twice of (BS, seq len, hidden dim).
What does this PR do?
Fixes # (issue)
Before submitting