Skip to content

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18436

This PR adds architectures to enable the recent Blackwell-specific MXFP4 optimizations for non-native builds. The problem with 120f-virtual which we were using in the initial PR is that it doesn't match some regex that CMake was using to validate CUDA architectures. But the same regex seems to be compatible with 120a-real and 121a-real so I would suggest that we for now simply build those since there is no other hardware to cover. Newer CMake versions come with a bugfix for the regex so presumably this will be less problematic for us to handle in the future. @CISC is there a way to run the Windows CUDA release CI without merging a PR?

@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

Perfect! I've generated a summary report for your project. Here are the key highlights:

Summary Report for llama.cpp PR #731

Performance Impact: ✅ MINIMAL (Safe to Merge)

The analysis compared two versions of the auroralabs-loci/llama.cpp repository and found:

  • No significant performance changes detected (threshold: 2%)
  • Response Time: No functions showed changes > 2%
  • Throughput Time: No functions showed changes > 2%

Conclusion: This pull request is performance-neutral and safe to merge from a performance perspective. The changes introduced do not cause any measurable performance regressions or improvements.

The report analyzed:

  • Base Version: 793d9b2c-82a8-4f82-b9eb-c416e43dcac0
  • Target Version: 3ecc3b5d-3ee9-407e-bb36-8b93071a6286

@loci-dev loci-dev force-pushed the main branch 2 times, most recently from f2e8c7f to b3f45e1 Compare December 29, 2025 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants