Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Neuron][Kernel] Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency #12921

Merged
merged 7 commits into from
Feb 12, 2025
Prev Previous commit
Next Next commit
fix typing
Signed-off-by: Lingfan Yu <lingfany@amazon.com>
  • Loading branch information
lingfanyu committed Feb 7, 2025
commit 0d0601d9bf6ce38bc08b9fa30c0f26d94b1fa7d8
4 changes: 2 additions & 2 deletions tests/neuron/test_prefix_prefill.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,8 +212,8 @@ def test_contexted_kv_attention(
"--internal-hlo2tensorizer-options='--verify-hlo'",
"--retry_failed_compilation",
]
compiler_flags = " ".join(compiler_flags)
os.environ["NEURON_CC_FLAGS"] = compiler_flags
compiler_flags_str = " ".join(compiler_flags)
os.environ["NEURON_CC_FLAGS"] = compiler_flags_str

torch.manual_seed(0)
torch.set_printoptions(sci_mode=False)
Expand Down