Skip to content

[Feature] Support flashinfer allreduce fusion tuning #9901

@BBuf

Description

@BBuf

Checklist

Motivation

We need tuning flashinfer allreduce fusion workspace buffer sizes to get best performance in more general cases. Currently, we hardcoded max_token_num=2048 at here .

Refer to benchmark results here .

Related resources

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions