[Example] Add block level high performance gemv example #1097
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
This pull request refactors and extends the GEMV example to introduce a new block reduction kernel (
gemv_alloc_reducer), reorganize autotuning configurations, and improve benchmarking and testing workflows. The changes make the codebase more modular and enable easier experimentation with different GEMV kernel variants.New kernel and autotuning infrastructure:
gemv_alloc_reducerkernel using block reduction and allocation-based reduction, with its own autotuning configuration generator (get_block_template_configs) and kernel definition usingtl.autotuneandtl.jitdecorators.get_thread_template_configs) and updated the kernel interface to useget_autotuned_kernelfor clarity and modularity.Benchmarking and correctness improvements:
gemv_alloc_reducerkernel, and expanded the benchmarking logic to compare both SIMT and block reduction implementations whendo_benchis set toFalse.Testing improvements:
main(do_bench=False), ensuring all kernel variants are tested without running the full benchmark suite.Code cleanup and bug fixes:
do_benchflag for more flexible execution.Summary by CodeRabbit
Release Notes
New Features
Changes