Checklist
Motivation
We need tuning flashinfer allreduce fusion workspace buffer sizes to get best performance in more general cases. Currently, we hardcoded max_token_num=2048 at here .
Refer to benchmark results here .
Related resources
No response