Adds no-topk variant for kernel performance analysis #20

LoserCheems · 2025-06-21T03:39:01Z

Introduces a new benchmark function that skips topk computation to isolate
pure kernel performance from selection overhead. This enables more accurate
measurement of the core attention mechanism by removing the topk bottleneck.

Expands benchmark configurations to include inference scenarios with varying
key lengths and updates output formatting to display both standard and
no-topk performance metrics with speedup comparisons.

Provides insights into topk overhead impact on overall performance.

Introduces a new benchmark function that skips topk computation to isolate pure kernel performance from selection overhead. This enables more accurate measurement of the core attention mechanism by removing the topk bottleneck. Expands benchmark configurations to include inference scenarios with varying key lengths and updates output formatting to display both standard and no-topk performance metrics with speedup comparisons. Provides insights into topk overhead impact on overall performance.

Changes keep_window_size from key_len to 0 when topk is disabled, preventing potential memory issues or incorrect attention calculations. Also removes trailing whitespace from main function call.

Copilot

Pull Request Overview

This PR introduces a new benchmark function that isolates kernel performance by skipping the topk computation. It expands benchmark configurations to capture both standard and no-topk performance and updates output formatting to include additional speedup comparisons.

Added a new function dynamic_mask_attention_cuda_no_topk to measure kernel performance without topk overhead.
Updated benchmark loops and result reporting to include no-topk metrics.
Extended the benchmark configuration with new inference scenarios.

Comments suppressed due to low confidence (1)

benchmarks/benchmark_forward_performance.py:248

Consider adding an inline comment explaining why keep_window_size is set to 0 in the subsequent call to apply_dynamic_mask_attention, clarifying its role in skipping the topk computation for kernel performance measurements.

    zero_hold_states = calculate_zero_hold_states(value_states, dt_proj, A, causal_mask)

Copilot · 2025-06-21T03:50:38Z

benchmarks/benchmark_forward_performance.py

+    torch.cuda.empty_cache()
+
+    # Warmup runs
+    for _ in range(warmup_runs):


[nitpick] The loops for warmup and benchmark runs for dynamic_mask_attention_no_topk are similar to those used for the standard dynamic_mask_attention. Consider refactoring these into a helper function to reduce duplication and improve maintainability.

Copilot · 2025-06-21T03:50:38Z

benchmarks/benchmark_forward_performance.py

+    print(f"🔧 {'Configuration':<42} ⚡ {'Flash (ms)':<12} 🚀 {'DMA (ms)':<12} 🚀 {'DMA-Skip-All (ms)':<22} 📈 {'Speedup':<12} 📈 {'Skip-All-Speedup':<20} 💾 {'Memory':<10}")
+    print("🔄" + "-" * 155 + "🔄")


[nitpick] The header printing uses hard-coded column widths. Consider extracting these widths as constants or using a formatting helper to improve flexibility and maintainability.

Suggested change

print(f"🔧 {'Configuration':<42} ⚡ {'Flash (ms)':<12} 🚀 {'DMA (ms)':<12} 🚀 {'DMA-Skip-All (ms)':<22} 📈 {'Speedup':<12} 📈 {'Skip-All-Speedup':<20} 💾 {'Memory':<10}")

print("🔄" + "-" * 155 + "🔄")

print(f"🔧 {'Configuration':<{CONFIGURATION_WIDTH}} ⚡ {'Flash (ms)':<{FLASH_WIDTH}} 🚀 {'DMA (ms)':<{DMA_WIDTH}} 🚀 {'DMA-Skip-All (ms)':<{DMA_SKIP_ALL_WIDTH}} 📈 {'Speedup':<{SPEEDUP_WIDTH}} 📈 {'Skip-All-Speedup':<{SKIP_ALL_SPEEDUP_WIDTH}} 💾 {'Memory':<{MEMORY_WIDTH}}")

print("🔄" + "-" * (CONFIGURATION_WIDTH + FLASH_WIDTH + DMA_WIDTH + DMA_SKIP_ALL_WIDTH + SPEEDUP_WIDTH + SKIP_ALL_SPEEDUP_WIDTH + MEMORY_WIDTH + 20) + "🔄")

SNHuan

LGTM

SNHuan

LGTM

Copilot AI review requested due to automatic review settings June 21, 2025 03:39

LoserCheems self-assigned this Jun 21, 2025

LoserCheems requested review from Evanwu1125, SNHuan and wubingheng111 and removed request for Copilot June 21, 2025 03:39

LoserCheems assigned Copilot, Evanwu1125, SNHuan and wubingheng111 Jun 21, 2025

LoserCheems added the feature New feature request label Jun 21, 2025

This comment was marked as outdated.

Sign in to view

Fixes window size parameter in dynamic mask attention

568d4b8

Changes keep_window_size from key_len to 0 when topk is disabled, preventing potential memory issues or incorrect attention calculations. Also removes trailing whitespace from main function call.

LoserCheems requested a review from Copilot June 21, 2025 03:49

Copilot AI reviewed Jun 21, 2025

View reviewed changes

SNHuan approved these changes Jun 21, 2025

View reviewed changes

LoserCheems merged commit 477c47e into main Jun 21, 2025

LoserCheems deleted the Update-benchmarks branch November 13, 2025 04:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds no-topk variant for kernel performance analysis #20

Adds no-topk variant for kernel performance analysis #20

Uh oh!

LoserCheems commented Jun 21, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 21, 2025

Uh oh!

Copilot AI Jun 21, 2025

Uh oh!

SNHuan left a comment

Uh oh!

SNHuan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		print(f"🔧 {'Configuration':<42} ⚡ {'Flash (ms)':<12} 🚀 {'DMA (ms)':<12} 🚀 {'DMA-Skip-All (ms)':<22} 📈 {'Speedup':<12} 📈 {'Skip-All-Speedup':<20} 💾 {'Memory':<10}")
		print("🔄" + "-" * 155 + "🔄")

Adds no-topk variant for kernel performance analysis #20

Adds no-topk variant for kernel performance analysis #20

Uh oh!

Conversation

LoserCheems commented Jun 21, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

SNHuan left a comment

Choose a reason for hiding this comment

Uh oh!

SNHuan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants