Accept inputs shape in Attention/MLA constructor #2089

bvandermoon · 2025-08-06T07:46:26Z

Description

Accept inputs shape instead of the full inputs in Attention/MLA. This will help since we are about to move Attention initialization to __init__ of the decoder layers. The full inputs are not available at that point.

Example:

This Attention initialization needs to move to __init__ in LlamaDecoderLayer. lnx is generated in __call__, so it won't be available in __init__

Tests

Base model train gives same perf before/after:

python3 -m MaxText.train MaxText/configs/base.yml  \
    run_name=<run_name> \
    base_output_directory=gs://<gcs_bucket> \
    dataset_type=synthetic \
    steps=10

Deepseek3-test gives same perf before/after:

python3 -m MaxText.train MaxText/configs/base.yml \
    run_name=bvandermoon-$RANDOM \
    base_output_directory=gs://bvandermoon-multipod-maxtext \
    dataset_type=synthetic \
    steps=10 \
    model_name=deepseek3-test \
    mla_naive_kvcache=False \ # Not needed for train
    max_target_length=256 \
    per_device_batch_size=1 \
    ici_fsdp_parallelism=-1 \
    scan_layers=false \
    weight_dtype=bfloat16 opt_type=sgd

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

MaxText/layers/attentions.py

NuojCheng

LGTM

richjames0

LGTM

bvandermoon requested review from A9isha, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, gagika, gobbleturk, hengtaoguo, khatwanimohit, richjames0, shralex, vipannalla and yangyuwei as code owners August 6, 2025 07:46

bvandermoon force-pushed the bvandermoon-nnx-attention-shape branch from 6030d5e to 1f0ebd8 Compare August 6, 2025 07:49

NuojCheng reviewed Aug 6, 2025

View reviewed changes

MaxText/layers/attentions.py Outdated Show resolved Hide resolved

Accept inputs shape in Attention/MLA constructor

dffa82b

bvandermoon force-pushed the bvandermoon-nnx-attention-shape branch from 1f0ebd8 to dffa82b Compare August 6, 2025 17:04

NuojCheng approved these changes Aug 6, 2025

View reviewed changes

richjames0 approved these changes Aug 6, 2025

View reviewed changes

github-actions bot added the pull ready label Aug 6, 2025

copybara-service bot merged commit 3c30585 into main Aug 6, 2025
23 checks passed

copybara-service bot deleted the bvandermoon-nnx-attention-shape branch August 6, 2025 18:32

shuningjin mentioned this pull request Aug 8, 2025

Add support for EP to context parallelism in self-attention #2023

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept inputs shape in Attention/MLA constructor #2089

Accept inputs shape in Attention/MLA constructor #2089

Uh oh!

bvandermoon commented Aug 6, 2025

Uh oh!

Uh oh!

NuojCheng left a comment

Uh oh!

richjames0 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Accept inputs shape in Attention/MLA constructor #2089

Accept inputs shape in Attention/MLA constructor #2089

Uh oh!

Conversation

bvandermoon commented Aug 6, 2025

Description

Example:

Tests

Checklist

Uh oh!

Uh oh!

NuojCheng left a comment

Choose a reason for hiding this comment

Uh oh!

richjames0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants