-
Notifications
You must be signed in to change notification settings - Fork 179
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Doc] Documentation of Python APIs and tutorials on page layout (#90)
- Loading branch information
Showing
17 changed files
with
1,279 additions
and
188 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
.. _apicascade: | ||
|
||
flashinfer.cascade | ||
================== | ||
|
||
.. currentmodule:: flashinfer.cascade | ||
|
||
.. _api-merge-states: | ||
|
||
Merge Attention States | ||
---------------------- | ||
|
||
.. autosummary:: | ||
:toctree: ../../generated | ||
|
||
merge_state | ||
merge_state_in_place | ||
merge_states | ||
|
||
.. _api-cascade-attention: | ||
|
||
Cascade Attention | ||
----------------- | ||
|
||
.. autosummary:: | ||
:toctree: ../../generated | ||
|
||
batch_decode_with_shared_prefix_padded_kv_cache | ||
|
||
|
||
Cascade Attention Wrapper Classes | ||
--------------------------------- | ||
|
||
.. autoclass:: BatchDecodeWithSharedPrefixPagedKVCacheWrapper | ||
:members: | ||
|
||
.. autoclass:: BatchPrefillWithSharedPrefixPagedKVCacheWrapper | ||
:members: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
.. _apidecode: | ||
|
||
flashinfer.decode | ||
================= | ||
|
||
.. currentmodule:: flashinfer.decode | ||
|
||
Single Request Decoding | ||
----------------------- | ||
|
||
.. autosummary:: | ||
:toctree: ../../generated | ||
|
||
single_decode_with_kv_cache | ||
|
||
Batch Decoding | ||
-------------- | ||
|
||
.. autosummary:: | ||
:toctree: ../../generated | ||
|
||
batch_decode_with_padded_kv_cache | ||
batch_decode_with_padded_kv_cache_return_lse | ||
|
||
.. autoclass:: BatchDecodeWithPagedKVCacheWrapper | ||
:members: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
.. _apipage: | ||
|
||
flashinfer.page | ||
=============== | ||
|
||
Kernels to manipulte paged kv-cache. | ||
|
||
.. currentmodule:: flashinfer.page | ||
|
||
Append new K/V tensors to Paged KV-Cache | ||
---------------------------------------- | ||
|
||
.. autosummary:: | ||
:toctree: ../../generated | ||
|
||
append_paged_kv_cache |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
.. _apiprefill: | ||
|
||
flashinfer.prefill | ||
================== | ||
|
||
Attention kernels for prefill & append attention in both single request and batch serving setting. | ||
|
||
.. currentmodule:: flashinfer.prefill | ||
|
||
Single Request Prefill/Append Attention | ||
--------------------------------------- | ||
|
||
.. autosummary:: | ||
:toctree: ../../generated | ||
|
||
single_prefill_with_kv_cache | ||
single_prefill_with_kv_cache_return_lse | ||
|
||
Batch Prefill/Append Attention | ||
------------------------------ | ||
|
||
.. autoclass:: BatchPrefillWithPagedKVCacheWrapper | ||
:members: | ||
|
||
.. autoclass:: BatchPrefillWithRaggedKVCacheWrapper | ||
:members: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
.. _installation: | ||
|
||
Installation | ||
============ | ||
|
||
Python Package | ||
-------------- | ||
FlashInfer is available as a Python package, built on top of `PyTorch <https://pytorch.org/>`_ to | ||
easily integrate with your python applications. | ||
|
||
Prerequisites | ||
^^^^^^^^^^^^^ | ||
|
||
- OS: Linux only | ||
- Python: 3.10, 3.11 | ||
- PyTorch CUDA 11.8/12.1 | ||
- Use ``python -c "import torch; print(torch.version.cuda)"`` to check your PyTorch CUDA version. | ||
- Supported GPU architectures: sm_80, sm_86, sm_89, sm_90 (sm_75 support is working in progress). | ||
|
||
Quick Start | ||
^^^^^^^^^^^ | ||
|
||
.. tabs:: | ||
.. tab:: PyTorch CUDA 11.8 | ||
|
||
.. code-block:: bash | ||
pip install flashinfer -i https://flashinfer.ai/whl/cu118/ | ||
.. tab:: PyTorch CUDA 12.1 | ||
|
||
.. code-block:: bash | ||
pip install flashinfer -i https://flashinfer.ai/whl/cu121/ | ||
C++ API | ||
------- | ||
|
||
FlashInfer is a header-only library with only CUDA/C++ standard library dependency | ||
that can be directly integrated into your C++ project without installation. | ||
|
||
You can check our `unittest and benchmarks <https://github.com/flashinfer-ai/flashinfer/tree/main/src>`_ on how to use our C++ APIs at the moment. | ||
|
||
.. note:: | ||
The ``nvbench`` and ``googletest`` dependency in ``3rdparty`` directory are only | ||
used to compile unittests and benchmarks, and are not required for the library itself. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,7 @@ | ||
sphinx-tabs == 3.4.1 | ||
sphinx-rtd-theme | ||
sphinx == 5.2.3 | ||
sphinx == 7.2.6 | ||
sphinx-toolbox == 3.4.0 | ||
tlcpack-sphinx-addon==0.2.2 | ||
sphinxcontrib_httpdomain==1.8.1 | ||
sphinxcontrib-napoleon==0.7 | ||
sphinx-reredirects==0.1.2 | ||
|
||
sphinxcontrib_httpdomain == 1.8.1 | ||
sphinxcontrib-napoleon == 0.7 | ||
sphinx-reredirects == 0.1.2 | ||
furo == 2024.01.29 |
Oops, something went wrong.