flashinfer-ai / flashinfer Public

Notifications You must be signed in to change notification settings
Fork 178
Star 1.8k

Code
Issues 57
Pull requests 7
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: flashinfer-ai/flashinfer

[Roadmap] FlashInfer v0.2 to v0.3

#675 opened Dec 17, 2024 by yzh119

Open

Deprecation Notice: Python 3.8 Wheel Support to End in future...

#682 opened Dec 18, 2024 by yzh119

Open 2

Labels 15 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

44 Open 95 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

C++ benchmarks CMake error caused by enable_fp16 option in generate.py

#734 opened Jan 13, 2025 by rtxxxpro

[RFC]: Introducing ReproSpec for Strong Reproducibility in LLM Inference

#733 opened Jan 11, 2025 by yzh119

ROPE problem in MLA decode kernel

#730 opened Jan 9, 2025 by lsw825

after torch compile with 0.2.0, speed is become very slow

#727 opened Jan 9, 2025 by MichoChan

Inconsistent results between different sequences with sequence lengths less than a single page size

#725 opened Jan 8, 2025 by fergusfinn

Custom mask slows down attention.

#724 opened Jan 8, 2025 by qiyuxinlin

How to use low bit KV Cache

#721 opened Jan 7, 2025 by sitabulaixizawaluduo

RuntimeError: Qwen2-VL does not support _Backend.FLASHINFER backend now

#720 opened Jan 7, 2025 by duzw9311

v0.1.6 release confusion

#715 opened Jan 3, 2025 by lflis

[RFC] Un-fused softmax for short-query(decode) attention

#707 opened Dec 30, 2024 by yzh119

[RFC] Unifying AOT and JIT C++ Code via C Macros

#706 opened Dec 29, 2024 by yzh119

[Question] How to support custom stride of paged_kv for hopper prefill attention

#702 opened Dec 27, 2024 by jianfei-wangg

Fused-RoPE Attention with q_offset and k_offset

#701 opened Dec 26, 2024 by qiyuxinlin

Load the qwen2_5_insturcut_7b model, because the deployment request response using sglang is slower than that of vllm deployment. When troubleshooting, print("mutates_args",mutates_args,"schema_str",schema_str) is this result reasonable?

#695 opened Dec 24, 2024 by qingzhong1

Deprecation Notice: Python 3.8 Wheel Support to End in future releases

#682 opened Dec 18, 2024 by yzh119

[Roadmap] FlashInfer v0.2 to v0.3 roadmap

#675 opened Dec 17, 2024 by yzh119

15 tasks

[Bug] FlashInfer latest main wheel issue bug

Something isn't working

priority: high

#669 opened Dec 16, 2024 by zhyncs

ValueError: The dtype of q torch.bfloat16 does not match the q_data_type torch.float16 specified in plan function.

#638 opened Nov 25, 2024 by Godlovecui

[Question] Overflow risks when batch size and sequence length grows extremely large

#596 opened Nov 8, 2024 by rchardx

[Feature Request] Add an argument to control the number of CTAs used in attention APIs

#591 opened Nov 7, 2024 by yzh119

Have any plans to optimize the decode kernel for NV-Hopper

#576 opened Oct 31, 2024 by JamesLim-sy

[Feature request] Adding optional cpu_indptr/cpu_qo_indptr parameter to plan method to avoid synchronized device to host copy.

#565 opened Oct 28, 2024 by reyoung

ImportError: cannot import name '_grouped_size_compiled_for_decode_kernels' from 'flashinfer.decode'

#549 opened Oct 23, 2024 by Hutlustc

Runtime error with single_prefill_with_kv_cache while Compilation

#541 opened Oct 20, 2024 by YudiZh

[Question] Sampling kernel only support FP32 now?

#531 opened Oct 15, 2024 by yz-tang

Previous 1 2 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly