[Feature] Support flashinfer allreduce fusion tuning

### Checklist

- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 2. Please use English, otherwise it will be closed.

### Motivation

We need tuning flashinfer allreduce fusion workspace buffer sizes to get best performance in more general cases. Currently, we hardcoded `max_token_num=2048` at [here](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/layers/flashinfer_comm_fusion.py#L208) .

Refer to benchmark results [here](https://github.com/vllm-project/vllm/pull/22086#issuecomment-3178874180) .

### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Support flashinfer allreduce fusion tuning #9901

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Support flashinfer allreduce fusion tuning #9901

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions