Skip to content

Conversation

@grimoire
Copy link
Collaborator

backends/moe.py and nn/moe.py has been refactored.
Reuse token dispatcher in DLBlas

@lvhan028 lvhan028 requested a review from CUHKSZzxy November 24, 2025 03:49
# we don't need to read this, it would be passed to ray workers
# If Ray is launched from outside, it may fail to access the environment variables.
os.getenv('DEEPEP_MAX_BATCH_SIZE', None)
os.getenv('DEEPEP_MAX_TOKENS_PER_RANK', None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set those envs manually?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hidden_dim: int,
top_k: int,
layer_idx: int = 0,
chunk_size: Optional[int] = 32 * 1024,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_size is not used by FusedMoENormal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants