Skip to content

Conversation

@shuningjin
Copy link
Collaborator

@shuningjin shuningjin commented Aug 29, 2025

Description

Due to change in #2023, "activation_length" definition changed. Specifically, "expert" is added there, and causing some conflicts:

  • As a result, it should not be used together with "activation_batch", otherwise sharding like CP/SP will be ignored. See details in b/441547754.
  • Similarly, should not be used together with "activation_embed_and_logits_batch". e.g, b/433561718#comment22

1. For conflicting combination "activation_batch" + "activation_length" in file. Either update to "activation_length_no_exp" or "activation_norm_length".

  • we can restore old definition by using "activation_length_no_exp"
  • In some cases, they should be "activation_norm_length", aligning with Tensor Sequence Parallelism (TSP).

2. For conflicting combination "activation_embed_and_logits_batch", "activation_length", we can restore old definition by using "activation_length_no_exp"

FIXES: b/441547754

1 Changes related to "activation_batch"

TSP should use "activation_norm_length" for decoder block

TSP should use "activation_length_no_exp" for MLP and MoE

  • as in TSP PR: remain as activation_length, which now should be activation_length_no_exp
  • linears.py: restore, activation_length -> activation_length_no_exp
  • moe.py: restore, activation_length -> PR2023 -> activation_norm_length -> activation_length_no_exp

Additional changes

  • gpt3.py: attention layer, restore
  • test: restore
  • pyconfig.py: correct typo

2 Changes related to "activation_embed_and_logits_batch"

train.py, decoder.py, embedding.py: restore to "activation_length_no_exp"

Tests

N/A

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

@github-actions
Copy link

This PR has been automatically marked as stale because it has not had recent activity. It will be closed soon if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Automatically applied to stale PRs. label Sep 30, 2025
@github-actions
Copy link

github-actions bot commented Oct 8, 2025

This PR was closed because it has been inactive for a while. Please reopen it if you are still working on it.

@github-actions github-actions bot closed this Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Automatically applied to stale PRs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant