Skip to content

Conversation

@yurekami
Copy link

Summary

Add support for SM120 architecture (RTX PRO 6000 Blackwell workstation GPUs).

Changes

  • Add get_gpu_arch() function to detect GPU compute capability via nvidia-smi
  • Add FLASH_MLA_DISABLE_SM120 environment variable to control SM120 compilation
  • Generate SM120 arch flags when NVCC 12.9+ is available
  • Auto-detect SM120 GPUs and log detection message

Test plan

  • Build with NVCC 12.9+ and verify SM120 arch flags are generated
  • Verify build works on SM120 GPU

🤖 Generated with Claude Code

yurekami and others added 3 commits December 26, 2025 01:52
Changed `flash_mla_with_kvcache_sm90` to `flash_mla_with_kvcache`
in get_mla_metadata docstring to match the actual function name.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CUDA 13 moved cuda/std/utility and other standard library headers to
CCCL (CUDA C++ Core Library). This adds the CUDA include path explicitly
to resolve build errors on CUDA 13+ ARM64 systems.

Fixes deepseek-ai#121

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add get_gpu_arch() function to detect GPU compute capability
- Add FLASH_MLA_DISABLE_SM120 environment variable
- Generate SM120 arch flags when NVCC 12.9+ is available
- Auto-detect SM120 GPUs and log detection message

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant