Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: flashinfer-ai/flashinfer
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.0.6
Choose a base ref
...
head repository: flashinfer-ai/flashinfer
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.0.7
Choose a head ref
  • 16 commits
  • 52 files changed
  • 3 contributors

Commits on Jun 22, 2024

  1. ci: separate update_whl_index from github action files (#328)

    and bump doc version.
    yzh119 authored Jun 22, 2024
    Configuration menu
    Copy the full SHA
    1df7b03 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f237f5f View commit details
    Browse the repository at this point in the history

Commits on Jun 23, 2024

  1. doc: bugfix on documentation about mask usage (#331)

    This PR should fix #330 .
    yzh119 authored Jun 23, 2024
    Configuration menu
    Copy the full SHA
    947830b View commit details
    Browse the repository at this point in the history

Commits on Jun 24, 2024

  1. bugfix: fix the scheduler behavior of large batch size (#333)

    when `128 / page == 0`, our binary search might run into division by
    zero issue.
    yzh119 authored Jun 24, 2024
    Configuration menu
    Copy the full SHA
    4d08c63 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ea89492 View commit details
    Browse the repository at this point in the history

Commits on Jun 27, 2024

  1. perf: more options for kv tile size (#336)

    For small query size setting, we might use large kv tile size.
    yzh119 authored Jun 27, 2024
    Configuration menu
    Copy the full SHA
    bf2a6c7 View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2024

  1. bugfix: fix the forward_return_lse function in `BatchPrefillWithRag…

    …gedKVCache` class (#337)
    
    Add more tests for coverage.
    yzh119 authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    10e6b17 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3afb6d3 View commit details
    Browse the repository at this point in the history
  3. feat: customize logits_soft_cap value (#339)

    This PR supports customized logits soft cap values. Different models
    might use different logits soft cap values (e.g. Grok-1 uses 30 and
    Gemma-2 uses 50).
    yzh119 authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    a2498f5 View commit details
    Browse the repository at this point in the history
  4. chore(main): release 0.0.7 (#327)

    🤖 I have created a release *beep* *boop*
    ---
    
    
    ##
    [0.0.7](v0.0.6...v0.0.7)
    (2024-06-28)
    
    ### Bugfix
    
    * fix the `forward_return_lse` function in
    `BatchPrefillWithRaggedKVCache` class
    ([#337](#337))
    * fix the scheduler behavior of large page size
    ([#333](#333))
    
    ### Features
    
    * customize `logits_soft_cap` value
    ([#339](#339))
    ([a2498f5](a2498f5))
    
    
    ### Performance Improvements
    
    * change minimal `kv_chunk_size` back to 128
    ([#329](#329))
    ([f237f5f](f237f5f))
    * more options for kv tile size
    ([#336](#336))
    ([bf2a6c7](bf2a6c7))
    
    ---
    This PR was generated with [Release
    Please](https://github.com/googleapis/release-please). See
    [documentation](https://github.com/googleapis/release-please#release-please).
    
    ---------
    
    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: Zihao Ye <expye@outlook.com>
    github-actions[bot] and yzh119 authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    95f507f View commit details
    Browse the repository at this point in the history
  5. [CMake][Bugfix] Set default value for FLASHINFER_GEN_MASK_MODES (#340)

    This commit resolves a build-time error with the following message:
    
    ```
    CMake Error at 3rdparty/flashinfer/CMakeLists.txt:313 (add_library):
      No SOURCES given to target: prefill_kernels
    ```
    
    This occurred after
    #266, which replaces the
    `FLASHINFER_GEN_CASUALS` option with `FLASHINFER_GEN_MASK_MODES`.
    However, the definition of `flashinfer_option(FLASHINFER_GEN_CASUALS ...
    )` was not replaced. As a result, loop over the empty `MASK_MODES` does
    not produce any kernels that should be compiled.
    
    This commit updates the `flashinfer_option(FLASH_GEN_CASUALS ...)` line
    to instead define `FLASH_GEN_MASK_MODES`, using the same default value
    as `config.cmake`.
    Lunderberg authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    df59f71 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    457eb78 View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2024

  1. misc: use https for submodule spdlog (#342)

    Replace ssh with https
    yzh119 authored Jun 29, 2024
    Configuration menu
    Copy the full SHA
    e0a233a View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2024

  1. refactor: reduce the binary size of batch decode kernels (#343)

    This PR refactors the batch decode related kernels, and make the
    following breaking changes:
    1. remove `batch_decode_with_padded_kv_cache` operator, we encourage
    user to use `BatchDecodeWithPagedKVCacheWrapper`.
    2. Delete redundant DTypeQ * DTypeKV combinations, now we only support
    the following cases:
      1. DTypeQ == DTypeKV
      2. DTypeQ is a float16 and DTypeKV is a float8
    
    The output data type follows the query data type.
    yzh119 authored Jun 30, 2024
    Configuration menu
    Copy the full SHA
    0d333ff View commit details
    Browse the repository at this point in the history
  2. ci: update CHANGELOG (#344)

    Also reduce binary size but limit the maximum number of registers for
    `x_frag` and `o_frag` to 200.
    yzh119 authored Jun 30, 2024
    Configuration menu
    Copy the full SHA
    80a376f View commit details
    Browse the repository at this point in the history
  3. ci: remove redundant NUM_FRAGS_Z (#345)

    Do not compile `NUM_FRAGS_Z=6` to reduce wheel size.
    Also revert #341 as they don't make effect.
    yzh119 authored Jun 30, 2024
    Configuration menu
    Copy the full SHA
    fec77d0 View commit details
    Browse the repository at this point in the history
Loading