Add Context Parallelism support to cudnn Flash Attention #1133

kocchop · 2025-01-01T00:31:08Z

Description

This PR adds Context Parallelism support to GPU Flash Attention. It is necessary to support large sequence lengths in MaxText. Right now, the support is offered through Transformer-Engine and uses an All-Gather type implementation. Note that, it requires mask type to be causal and does not work with sliding window attention yet. Also it requires transformer-engine==1.13 or above.

NEW

Implemented input sequence re-ordering at the data loading stage
Added context_parallel_load_balance flag in base.yml to turn on/off load balancing.
Added sequence packing flag enable_packing in base.yml and also modified the associated data processing files.
Added/modified unit tests for context parallelism and GPU flash attention cudnn_flash_te
Added sharding of parameters across CP rank

Tests

Unit test is included with the PR with the base model for 4 x a100 gpus,

Checklist

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

gobbleturk

Awesome, this looks great! Just one change to the sharding rules (with current state of this PR it looks like the old sequence parallelism is broken)

I'd appreciate it if you could also run bash code_style.sh to run our linter (I recommend saving the branch before running this just in case...)

MaxText/configs/base.yml

MaxText/layers/attentions.py

A9isha

This is great! thank you so much

Curious, what kind of improvement are you observing with the loadbalancing enabled?

MaxText/input_pipeline/_tfds_data_processing_c4_mlperf.py

MaxText/configs/base.yml

MaxText/layers/attentions.py

A9isha · 2025-03-07T23:34:02Z

Oh also please merge all commits into one before pushing

kocchop · 2025-03-11T00:12:56Z

This is great! thank you so much

Curious, what kind of improvement are you observing with the loadbalancing enabled?

for llama3-8b, with load balancing and cp=2, we're able to see around ~10% perf improvement

MaxText/configs/base.yml

MaxText/max_utils.py

MaxText/configs/base.yml

MaxText/max_utils.py

MaxText/tests/train_tests.py

kocchop · 2025-03-22T01:09:29Z

@gobbleturk @A9isha once I get a LGTM from you, I can work on resolving the merge conflicts and also merge all the commits into one.

gobbleturk

LGTM! I would like either @khatwanimohit or @aireenmei to review for the data loading change part enable_packing=True

A9isha

LGTM - thank you!

MaxText/max_utils.py

MaxText/configs/base.yml

kocchop · 2025-04-11T19:36:05Z

Hi @gobbleturk could you please approve so that the unit tests can run?

MaxText/configs/base.yml

1. Implemented input sequence re-ordering at the data loading stage 2. Added context_parallel_load_balance flag in base.yml to turn on/off load balancing. 3. Added sequence packing flag packing in base.yml and also modified the associated data processing files. 4. Added/modified unit tests for context parallelism and GPU flash attention cudnn_flash_te 5. Added sharding of parameters across CP rank (modified the logical axis sharding rules) 6. fixed the q,k,v and out_proj sharding axis names by adding MODEL_MODE_TRAIN in attention.py

A9isha

Thank you!

aireenmei

LGTM for data input changes!

gobbleturk · 2025-04-12T01:33:59Z

All tests passed, manually adding pull ready

gobbleturk · 2025-04-12T01:35:25Z

pull ready might have added successfully automatically after an additional allowing step by me to run next steps, not sure

kocchop requested review from RissyRan, bvandermoon, gobbleturk, khatwanimohit and vipannalla as code owners January 1, 2025 00:31

gobbleturk requested changes Jan 1, 2025

View reviewed changes

MaxText/configs/base.yml Outdated Show resolved Hide resolved

mgoldfarb-nvidia reviewed Jan 7, 2025

View reviewed changes

MaxText/layers/attentions.py Outdated Show resolved Hide resolved

kocchop requested review from gagika, richjames0 and rni418 as code owners March 3, 2025 07:18

kocchop requested review from gobbleturk and mgoldfarb-nvidia March 3, 2025 07:27

gobbleturk assigned A9isha Mar 3, 2025

A9isha reviewed Mar 7, 2025

View reviewed changes

gobbleturk reviewed Mar 13, 2025

View reviewed changes

MaxText/configs/base.yml Outdated Show resolved Hide resolved

MaxText/max_utils.py Outdated Show resolved Hide resolved

kocchop requested review from SurbhiJainUSC, hengtaoguo, shralex, wang2yn84, wyzhang and yangyuwei as code owners March 17, 2025 09:35

kocchop requested review from A9isha and gobbleturk March 17, 2025 20:26

gobbleturk reviewed Mar 21, 2025

View reviewed changes

MaxText/max_utils.py Outdated Show resolved Hide resolved

gobbleturk reviewed Mar 21, 2025

View reviewed changes

MaxText/configs/base.yml Outdated Show resolved Hide resolved

gobbleturk reviewed Mar 21, 2025

View reviewed changes

MaxText/max_utils.py Outdated Show resolved Hide resolved

gobbleturk reviewed Mar 21, 2025

View reviewed changes

MaxText/tests/train_tests.py Outdated Show resolved Hide resolved

kocchop requested a review from gobbleturk March 22, 2025 00:58

kocchop requested review from gpolovets1, mailvijayasingh and mitalisi as code owners March 22, 2025 01:06

gobbleturk approved these changes Mar 22, 2025

View reviewed changes

A9isha approved these changes Mar 22, 2025

View reviewed changes

A9isha reviewed Mar 22, 2025

View reviewed changes

MaxText/max_utils.py Outdated Show resolved Hide resolved

A9isha reviewed Mar 22, 2025

View reviewed changes

MaxText/max_utils.py Outdated Show resolved Hide resolved

kocchop requested a review from A9isha March 24, 2025 02:08

gobbleturk assigned aireenmei and khatwanimohit Mar 24, 2025

aireenmei requested changes Mar 24, 2025

View reviewed changes

MaxText/configs/base.yml Outdated Show resolved Hide resolved

kocchop requested a review from aireenmei April 11, 2025 02:06

kocchop force-pushed the faysal/add-cp-to-cudnn-flash-te branch from a3e1739 to 5016d35 Compare April 11, 2025 03:12

gobbleturk assigned suexu1025 Apr 11, 2025

gobbleturk reviewed Apr 11, 2025

View reviewed changes

MaxText/configs/base.yml Outdated Show resolved Hide resolved

kocchop force-pushed the faysal/add-cp-to-cudnn-flash-te branch from 5016d35 to dbf3d70 Compare April 11, 2025 21:27

kocchop force-pushed the faysal/add-cp-to-cudnn-flash-te branch from dbf3d70 to 336fb38 Compare April 11, 2025 23:09

kocchop requested a review from gobbleturk April 11, 2025 23:09

gobbleturk approved these changes Apr 11, 2025

View reviewed changes

A9isha approved these changes Apr 12, 2025

View reviewed changes

aireenmei approved these changes Apr 12, 2025

View reviewed changes

gobbleturk added the pull ready label Apr 12, 2025

copybara-service bot merged commit c4f7060 into AI-Hypercomputer:main Apr 12, 2025
18 of 21 checks passed

A9isha mentioned this pull request Apr 17, 2025

Add context parallelism #1445

Merged

4 tasks

Add Context Parallelism support to cudnn Flash Attention #1133

Add Context Parallelism support to cudnn Flash Attention #1133

Uh oh!

Conversation

kocchop commented Jan 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

gobbleturk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

A9isha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

A9isha commented Mar 7, 2025

Uh oh!

kocchop commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kocchop commented Mar 22, 2025

Uh oh!

gobbleturk left a comment

Choose a reason for hiding this comment

Uh oh!

A9isha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kocchop commented Apr 11, 2025

Uh oh!

Uh oh!

A9isha left a comment

Choose a reason for hiding this comment

Uh oh!

aireenmei left a comment

Choose a reason for hiding this comment

Uh oh!

gobbleturk commented Apr 12, 2025

Uh oh!

gobbleturk commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

kocchop commented Jan 1, 2025 •

edited

Loading

gobbleturk commented Apr 12, 2025 •

edited

Loading