Closed
Description
Please leave any comments or edit this issue directly to adjust the release notes! Also see the rc0 vote thread in #12103.
Introduction
The TVM community has worked since the v0.8 release to deliver many exciting features and improvements. v0.9.0 is the first release on the new quarterly release schedule and includes many highlights, such as:
- MetaSchedule's full implementation
- ARM cascading scheduler for Arm Ethos(TM)-U NPUs
- Collage which brings tuning to BYOC
- Several microTVM improvements
- New
tvm.relay.build
parameters:runtime=
,executor=
, - AOT: support for the C++ runtime (with
llvm
andc
targets only) and support for host-driven AOT in the C runtime - Hexagon RPC support
- Testing via Hexagon SDK simulator and on device via Snapdragon-based HDK boards and phones
- AOT and USMP support
- Threading
- Initial op support
- MLF: support for multiple modules in a single MLF artifact
- Several TIR schedule primitives and transforms including (abridged):
schedule.transform_layout
- Applies a layout transformation to a buffer as specified by an IndexMap.schedule.transform_block_layout
- Applies a schedule transformation to a block as specified by an IndexMap.schedule.set_axis_separators
- Sets axis separators in a buffer to lower to multi-dimensional memory (e.g. texture memory).transform.InjectSoftwarePipeline
- Transforms annotated loop nest into a pipeline prologue, body and epilogue where producers and consumers are overlapped.transform.CommonSubexprElimTIR
- Implements common-subexpression elimination for TIR.transform.InjectPTXAsyncCopy
- Rewrites global to shared memory copies in CUDA with async copy when annotated tir::attr::async_scope.transform.LowerCrossThreadReduction
- Enables support for reductions across threads on GPUs.
- And many more! See the list of RFCs and PRs included in v0.9.0 for a complete list, as well as the full change list.
RFCs
These RFCs have been merged in apache/tvm-rfcs since the last release.
- [RFC] TUNIP: TVMScript Unified Printer (#74) (
48d47c5
) - [RFC][Backend] RFC-CSI-NN2-Integration (#75) (
cfcf114
) - [RFC] Introducing DeclBuffer (#70) (
87ff1fa
) - [RFC][MLF] Model Library Format with Multiple Modules (#76) (
f47c6ad
) - [RFC] UMA Universal Modular Accelerator Interface (#60) (
6990e13
) - [RFC] DietCode: An Auto-Scheduler for Dynamic Tensor Programs (#72) (
a518000
) - [RFC] Quarterly Releases (#67) (
70293c7
) - RFC-BYOC-DNNL-Integration (#73) (
7aed0ca
) - [RFC] Relay Next Roadmap (#69) (
ac15f2a
) - RFC: clarifying buffer declaration and access (#63) (
de4fe97
) - Inclusive Language RFC (#68) (#68) (
4203bd2
) - [USMP] Adding U4 usecase (#65) (
b9e246f
) - Collage RFC (#62) (
23250f5
) - Replace codeowners with more relevant automation (#58) (
540c1f8
) - [RFC][TIR] Layout transformations on buffer access (#39) (
b675ef8
) - Module Based Model Runtime for AOT (#46) (
d9dd6eb
) - @slow test RFC (#55) (
9b6203a
) - [RFC][Roadmap] TVM Continuous Integration & Testing Roadmap (#54) (
41e5ba0
) - Bring
PackedFunc
into TVM Object System (#51) (2e0de6c
) - [RFC][OpenCLML] OpenCLML integration as BYOC (#52) (
f5ef65f
) - Introduce the Arm(R) Ethos(TM)-U Cascading Scheduler (#37) (
f9fa824
) - [RFC][Roadmap] microTVM roadmap (#53) (
1b14456
) - Add Managed Jenkins Infrastructure for TVM RFC (#49) (
a3a7d2c
) - TVM Roadmap RFC (#50) (
263335f
) - [RFC] Integrate LIBXSMM with TVM. (#47) (
1a3d4f1
) - [RELAY][AST] Add virtual device as a first class field to Relay expressions (#45) (
67c39d2
)
What's Changed
Note that this list is not comprehensive of all PRs and discussions since v0.8. Please visit the full listing of commits for a complete view: v0.8.0...v0.9.0.rc0.
AOT
- [AOT] Calculate used memory at the callsite of primitive functions #11208 - Calculate used memory at the callsite of primitive functions
- Fix function number datatype from char to uint16_t #11365 - Fix function number datatype from char to uint16_t
- [AOT] Enable A-Normal Form in the AOT executor #11091 - Enable A-Normal Form in the AOT executor
- [AOT] Support LLVM backend with C++ runtime #10753 - Support LLVM backend with C++ runtime
- [AOT] Use python temporary directory for AOT tests #10518 - Use python temporary directory for AOT tests
- [AOT] BugFix of workspace calculation #10337 - BugFix of workspace calculation
- [runtime] Add Metadata classes for AOTExecutor #10282 - [runtime] Add Metadata classes for AOTExecutor
- [3/3][AOT][DeviceAPI] Wire up cpacked Device API context #9501 - [3/3][DeviceAPI] Wire up cpacked Device API context
- [2/3][AOT][DeviceAPI] Add Hooks for Activate/Deactivate/Open/Close #9500 - [2/3][DeviceAPI] Add Hooks for Activate/Deactivate/Open/Close
- [1/3][AOT][DeviceAPI] Connecting devices structure to relevant operators #9395 - [1/3][DeviceAPI] Connecting devices structure to relevant operators
BYOC
- [BYOC] Two helper passes for external codegen using RelayToTIR custom pass machinery #11474 - Two helper passes for external codegen using RelayToTIR custom pass machinery
- Remove support for run-time linked-params from codegen #11144 - Remove support for run-time linked-params from codegen
- Add order to functions in C Codegen #10590 - Add order to functions in C Codegen
- [DNNL][CBLAS][BYOC] Unifles all MKLDNN/DNNL to DNNL #11638 - [DNNL][CBLAS]Unifles all MKLDNN/DNNL to DNNL
- [BYOC] RelayToTIR custom codegen passes can still depend on dynamic shape functions #11619 - RelayToTIR custom codegen passes can still depend on dynamic shape functions
- DNNL - [DNNL] Add bfloat16 type support for dnnl conv2d kernel #11902, Enable QNN primitives for DNNL runtime #11642, [BYOC][DNNL] Improve performance of DNNL BYOC dense operator #11513, [DNNL][Relay extern-schedule] DNNL Conv2D Kernel enable by assigning "-libs=mkldnn" #11571, [DNNL] Fix end of line in test_dnnl UT file #11560, [DNNL] Add TensorRequisite concept. Multi instance support #11345, [BYOC] Enable bfloat16 in DNNL BYOC #11111, [BYOC-DNNL] enable conv3d->bn folding #10837, [BYOC-DNNL] Support DNNL optimal layout #10421, [BYOC-DNNL] add support for more ops and fusion patterns #9995, DNNL-BYOC enhancement #9797
- TensorRT - [BYOC] InlineCompilerFunctions helper pass #11923, [TENSORRT] Improvements and fixes for TensorRT #11203, [BYOC][TRT] Add DFPattern support for TRT backend #10759, [TRT] Remove depreceated intergration tests with mxnet zoo importers #10772, [BYOC][TENSOORT] Add support for FP16 on TensorRT BYOC flow #10388
- CMSIS-NN - [CMSIS-NN] Fixed the case with repeating operands in the QNN binary ops #11732, Making CMSIS-NN tests pylint compliant #11625, [CMSIS-NN] Moved TFLite model making to common area #10939, [CMSIS-NN] Add Arm(R) Cortex(R)-M55 CPU and CMSIS-NN demo #11013, [CMSIS-NN] Aligned scale computation with TFLM to fix numerical mismatch #10817, [CMSIS-NN] Scalar to tensor constant pass to support only qnn.add and qnn.multiply #10563, [CMSIS-NN] enable USMP with CMSIS-NN #10224, [CMSIS-NN] Moved all asserts in tests under a single utils function #10148, [CMSIS-NN] Convert scalar constants to tensor constants #10100, [4a/10] [CMSIS-NN] Calculate CMSIS-NN buffer size with respect to architecture extensions #9338, [7/10] Code generation for Pooling and Fully Connected via CMSIS-NN #9531, [5/10] Code generation for Depthwise Convolution via CMSIS-NN #9409, [4/10] Code generation for Conv2D via CMSIS-NN #9331
- OpenCLML - [BYOC-OpenCLML] OpenCLML integration with TVM. #10243
- CUTLASS - [BYOC] Make CUTLASS BYOC integration 'Collage friendly' #11631, [CUTLASS] Add parallel split-k support to wgrad #10185, [CUTLASS] Initial support for conv2d wgrad #10177, [CUTLASS] Conv2d dgrad #10110, [CUTLASS] Profile only the largest-possible alignment by default #10036, [CUTLASS] Support more kernels: int8, tf32, and 3xtf32 #9899, [CUTLASS] Residual connection fusion #9820, [CUTLASS] Refactor cutlass kernel generation and selection #9800, [CUTLASS] Conv2d activation fusion, part 2: Sigmoid fp16, SiLU and HardSwish #9795, [CUTLASS] Support conv2d activation fusion #9746, [CUTLASS] Add conv2d profiler #9737, [CUTLASS] More robust support for pattern matching and alignment #9698, [CUTLASS] Initial conv2d support #9595, [CUTLASS] Refactor GEMM generator in preparation for conv2d #9571
- CUDNN - [CUDNN] Add partitioning support for fused conv2d+bias+act #10997, [CUDNN] Support gradient kernels #9986, [CUDNN] Refactor descriptor initialization, remove
cudnn.conv.output_shape_from_cudnn
#9948 - ACL - [BYOC][ACL] Fix list is not supported as an input node #10801
- PTX - [PTX]
ldmatrix
builtin to accelerate copying data from shared memory to warp memory #10855, [PTX] Support mma.sp to use Sparse Tensor Cores and refactor mma codegen #10339, [PTX-MMA] Add full PTX MMA code generation support #9909 - CUBLAS - [CUBLAS] Add support for nn.dense and nn.batch_matmul #10826, [CUBLAS] Add cuBLAS as a Relay partitioning target (BYOC) #10820
CI
- [CI] Refactor of tvm.testing.requires_* annotations #11313 - Refactor of tvm.testing.requires_* annotations
- [ci] Enable pylint for tests/python/ci #11666 - Enable pylint for tests/python/ci
- [CI] Apply linting rules to AOT tests #11657 - Apply linting rules to AOT tests
- [ci] Restructure Jenkinsfile #11380 - Restructure Jenkinsfile
- Automation - [ci][docker] Send a PR to bump the Docker images nightly #11813, [ci][docker] Fall back to tlcpackstaging if images don't exist #11775, [ci] Add @tvm-bot rerun #11480, [ci] Clean up mergebot commit messages #11437, [ci] Add GitHub Actions bot to merge PRs on demand #10833, [ci] Add auto-updating
last-successful
branch #10056, Add bot to ping reviewers after no activity #9973, Add Action to add cc'ed people as reviewers #9934 - User experience improvements - [ci] Remove apt cache from the docker images #11470, [ci][1/n] Rebuild Docker images if necessary #11329, [ci] Add guards to pytest_wrapper #11553, [ci][docker] Prune all non-relevant images #11497, [ci] Add local test re-run info #11051, [ci] Don't diff when running clang-format #10933, [CI] Bump black version to 22.3.0 #10960, [ci] Rebuild docker images on docker changes #10525, Add remaining targets to ci.py #10425, [ci] Add workflow to cc teams #10322, Fix JUnit failure reporting #10121, Use ci.py explicitly in docs building instructions #9971, Implement [skip ci] for Jenkins #9554, Usability fixes to CI runner script #9752, Add labels to each Jenkins step #9556
- Reduce CI runtime - [ci] Add more test shards #11402, [ci] Use S3 for artifacts #11349, [ci] Use r5.large nodes for builds and lint #11258, [ci] Shard Hexagon tests #11132, [ci] Break out test steps for Hexagon / microTVM #10946, [ci] Remove hardcoded test shards #10743, [ci] Use sccache and all available CPUs in builds #10359
- Code cleanups - [ci] Migrate all test steps to macros #10968, [ci] Generate Jenkinsfile from a template #10740
Frontends
- PaddlePaddle - [Frontend][PaddlePaddle]split test_forward_math_api function #11537, [Frontend][PaddlePaddle] Enhance paddlepaddle frontend with more operators #9724, [Frontend][PaddlePaddle] Support conv2d_transpose/rnn/fill_constant_batch_size_like #9564
- TFLite - [TFLite] Add support to int16 data type in TFLite frontend #10915, [TFLite] Quantized unary elemwise ops #10566
- Oneflow - Oneflow fronted support more model and fix bug #11321, Add oneflow fronted tutorials #11036, [RELAY][FRONTEND] Initial OneFlow frontend support. #8790
- PyTorch - [Frontend][PyTorch] Add: Relay stft operator #11190, Complete pytorch grid_sample #10504, Support PyTorch grid_sample #10184, [Torch] Experimental support for FX-quantized models #10091
- ONNX - [ONNX] Add imports for BERT contrib operators #10949, [Frontend][ONNX] Support ONNX Scan operator #9438, [ONNX] Add MatMulInteger16 contrib op #9186, [Frontend][ONNX] Support RandomNormal operator #9493, [ONNX][Relay] Support "tf_crop_and_resize" in relay Resize op. #9475
- Keras - [frontend][keras] Add support for TimeDistributed #7006
Hexagon
- [HEXAGON] Initial clip operator for Hexagon #11549 - Initial clip operator for Hexagon
- [HEXAGON] Add op resize2d for hexagon #11834 - Add op resize2d for hexagon
- [Hexagon] Softmax slice op initial version #11559 - Softmax slice op initial version
- [HEXAGON] Slice ops added - add, subtract, multiply #11529 - Slice ops added - add, subtract, multiply
- [hexagon][testing] add max_pool2d benchmark #11720 - [testing] add max_pool2d benchmark
- [Hexagon] Implement avg_pool2d slice op #11417 - Implement avg_pool2d slice op
- [Hexagon] Add HexagonThreadManager #11653 - Add HexagonThreadManager
- [Hexagon] Run single RPC server on Android in each testing session #11547 - Run single RPC server on Android in each testing session
- [hexagon][testing] add TVMScript elemwise-add #11490 - [testing] add TVMScript elemwise-add
- [hexagon][testing] refactor benchmark-table code #11400 - [testing] refactor benchmark-table code
- [Hexagon] moves conftest.py to tvm.contrib.hexagon so outside repos can access the testing fixtures #11277 - moves conftest.py to tvm.contrib.hexagon so outside repos can access the testing fixtures
- [Hexagon] Add unit tests for Hexagon Device API #11319 - Add unit tests for Hexagon Device API
- [Hexagon] Add USMP tests #11279 - Add USMP tests
- [Hexagon] Update Readme #11283 - Update Readme
- [Hexagon] capture gtest output and return over FFI #11239 - capture gtest output and return over FFI
- [Hexagon] Add schedule and test for conv2d_transpose_nchw #11175 - Add schedule and test for conv2d_transpose_nchw
- [Hexagon][Runtime] Add QuRT thread pool backend #11018 - [Runtime] Add QuRT thread pool backend
- [Hexagon] Add support for on-device unit testing using gtest #11145 - Add support for on-device unit testing using gtest
- [Hexagon] Add test for depthwise conv2d schedule #11138 - Add test for depthwise conv2d schedule
- [Hexagon] Add test for registered schedules #11016 - Add test for registered schedules
- [Hexagon] Add mobilenet test #11104 - Add mobilenet test
- [Hexagon] Delete offload runtime, move files to right places #11090 - Delete offload runtime, move files to right places
- [Hexagon] AoT with LLVM Codegen on Hexagon #11065 - AoT with LLVM Codegen on Hexagon
- [Hexagon] Deprecate USE_HEXAGON_DEVICE, introduce USE_HEXAGON #11025 - Deprecate USE_HEXAGON_DEVICE, introduce USE_HEXAGON
- HVX scheduling and bench-marking of TE element-wise add #10604 - HVX scheduling and bench-marking of TE element-wise add
- [Hexagon][LLVM] Enable/test tensorized Hexagon DMA on 2d transformed layout #10905 - [LLVM] Enable/test tensorized Hexagon DMA on 2d transformed layout
- [Hexagon] Move aot/graph_executor interactions into launcher #10907 - Move aot/graph_executor interactions into launcher
- [Hexagon] Register basic strategies and schedules for common operators #10919 - Register basic strategies and schedules for common operators
- [Hexagon] Add unit tests executing 2-d VTCM usage #10904 - Add unit tests executing 2-d VTCM usage
- [Hexagon] Refactor to keep HexagonBuffer private to the device api #10910 - Refactor to keep HexagonBuffer private to the device api
- [Hexagon][LLVM][CodeGen] Make CodeGenHexagon a subclass of CodeGenCPU #10908 - [LLVM][CodeGen] Make CodeGenHexagon a subclass of CodeGenCPU
- [Hexagon] Generalized HexagonBuffer::CopyTo/CopyFrom #10878 - Generalized HexagonBuffer::CopyTo/CopyFrom
- [Hexagon] Support both 1-d and 2-d VTCM allocations #10846 - Support both 1-d and 2-d VTCM allocations
- [Hexagon] Improved ergonomics of HexagonLauncher in unit tests. #10581 - Improved ergonomics of HexagonLauncher in unit tests.
- [Hexagon] Refactor tvm.contrib.hexagon, NFC #10616 - Refactor tvm.contrib.hexagon, NFC
- [Hexagon] Deprecate SDK 3.x, rewrite HexagonSDK.cmake #10612 - Deprecate SDK 3.x, rewrite HexagonSDK.cmake
- [Hexagon] Codegen for 2d Load/Store #10586 - Codegen for 2d Load/Store
- [Hexagon] Generalize builtin for Nd memory alloc with storage scope and add lowering for VTCM / Hexagon #10558 - Generalize builtin for Nd memory alloc with storage scope and add lowering for VTCM / Hexagon
- [Runtime][PipelineExecutor] Add the pipeline internal forwarding logic. #10543 - [Runtime][PipelineExecutor] Add the pipeline internal forwarding logic.
- [Hexagon] Add doc on TVM - Hexagon RPC flow #10507 - Add doc on TVM - Hexagon RPC flow
- [Hexagon] Resolve breakage in test_hexagon/test_cache_read_write #10520 - Resolve breakage in test_hexagon/test_cache_read_write
- [runtime][Hexagon] AOTExecutor implementation for C Codegen #10311 - [runtime]AOTExecutor implementation for C Codegen
- [Hexagon] Allow execution on target or simulator from HexagonLauncher #10454 - Allow execution on target or simulator from HexagonLauncher
- Lower cache_read and cache_write to Hexagon DMA via tensorize #10365 - Lower cache_read and cache_write to Hexagon DMA via tensorize
- [Hexagon] RPC server/client for simulator #10361 - RPC server/client for simulator
- [CI][Hexagon] Add Hexagon Tests to pipeline #10302 - [CI]Add Hexagon Tests to pipeline
- [Docker][Hexagon] Add docker file and scripts #10263 - [Docker]Add docker file and scripts
- [Hexagon] Refactor Hexagon.cmake #10227 - Refactor Hexagon.cmake
- Adding support for Hexagon User DMA Engine #10217 - Adding support for Hexagon User DMA Engine
- [Hexagon] Update hexagon API build instruction and cleanup hexagon_proxy_rpc #10068 - Update hexagon API build instruction and cleanup hexagon_proxy_rpc
- [Hexagon] Do not auto-build apps when building TVM #9970 - Do not auto-build apps when building TVM
- Add unit tests for HexagonBuffer #9736 - Add unit tests for HexagonBuffer
- Add Hexagon VTCM and discontiguous allocation support #9525 - Add Hexagon VTCM and discontiguous allocation support
- [Hexagon] Add RPC Mechanism for Hexagon #9631 - Add RPC Mechanism for Hexagon
- cleanup Hexagon conv2d tests #9473 - cleanup Hexagon conv2d tests
MetaSchedule
- [MetaSchedule] Postproc: Rewrite-Layout #11884 - Postproc: Rewrite-Layout
- [OpStrategy] Support MetaSchedule Layout #11848 - [OpStrategy] Support MetaSchedule Layout
- [Relay][Pass] Meta-Schedule-Layout-Rewrite #11845 - [Relay][Pass] Meta-Schedule-Layout-Rewrite
- [MetaSchedule][Runtime] Enhance Runner RandomFill #11758 - [Runtime] Enhance Runner RandomFill
- [MetaSchedule] Distributed Measurement #11683 - Distributed Measurement
- [MetaSchedule][Minor] Organize Testing Scripts #11751 - [Minor] Organize Testing Scripts
- [MetaSchedule] Modify Profiler Timers #11735 - Modify Profiler Timers
- [MetaSchedule] Developer Ergonomics Enhancement II #11727 - Developer Ergonomics Enhancement II
- [MetaSchedule] Apply-History-Best Task Filtering #11692 - Apply-History-Best Task Filtering
- [MetaSchedule] Add Profiler Support For Tuning Efficiency Optimization #11486 - Add Profiler Support For Tuning Efficiency Optimization
- [MetaSchedule] JSONDatabase Utilities #11680 - JSONDatabase Utilities
- [MetaSchedule] Generate MetaSchedule Dataset #11641 - Generate MetaSchedule Dataset
- [MetaSchedule] Developer Ergonomics Enhancement #11622 - Developer Ergonomics Enhancement
- [MetaSchedule] Resolve dependencies between header files #11604 - Resolve dependencies between header files
- [MetaSchedule] Add Testing Script with ONNX Support #11587 - Add Testing Script with ONNX Support
- [MetaSchedule] Evo Independence from TaskScheduler #11590 - Evo Independence from TaskScheduler
- [MetaSchedule] No explicit unrolling for spatial PrimFunc #11534 - No explicit unrolling for spatial PrimFunc
- [MetaSchedule] Enable Task Filtering #11512 - Enable Task Filtering
- [MetaSchedule] AutoBind rule and MutateThreadBinding #11177 - AutoBind rule and MutateThreadBinding
- [MetaSchedule] Logging Interface Unification #11157 - Logging Interface Unification
- [Metaschedule] Auto tensorization for CPU / GPU dot product #11088 - Auto tensorization for CPU / GPU dot product
- [MetaSchedule][Refactor] Introduce TuneConfig #10986 - [Refactor] Introduce TuneConfig
- [Metaschedule, Refactor] Move MultiLevelTilingNode decl to a header #11020 - [Metaschedule, Refactor] Move MultiLevelTilingNode decl to a header
- [MetaSchedule][Refactor] Clarify Integration Logic #10927 - [Refactor] Clarify Integration Logic
- [Metaschedule] Add utility API to ease using manual schedules #10876 - Add utility API to ease using manual schedules
- [MetaSchedule][BugFix] Fix skipped tests #10885 - [BugFix] Fix skipped tests
- [MetaSchedule] Add Gradient Based Task Scheduler #10366 - Add Gradient Based Task Scheduler
- [MetaSchedule] Fine-Grained Rewrite Unbound Block #10823 - Fine-Grained Rewrite Unbound Block
- [Metaschedule] Add demonstration of selectively tuning relay ops with TIR schedules #10793 - Add demonstration of selectively tuning relay ops with TIR schedules
- [MetaSchedule] Support grouping in the cost model #10811 - Support grouping in the cost model
- [MetaSchedule] Extract task weights during task extraction #10810 - Extract task weights during task extraction
- [TIR][MetaSchedule] Estimate TIR FLOPs #10782 - [TIR]Estimate TIR FLOPs
- [MetaSchedule] Misc updates for tuning end-to-end workloads #10776 - Misc updates for tuning end-to-end workloads
- [MetaSchedule] Upstream the leftover changes #10689 - Upstream the leftover changes
- [Meta Schedule] Refactor meta schedule testing utils #10648 - [Meta Schedule] Refactor meta schedule testing utils
- [Metaschedule] New relay backend for meta schedule task extraction #10578 - New relay backend for meta schedule task extraction
- [MetaSchedule] Bug Fix for Relay Integration #10534 - Bug Fix for Relay Integration
- [MetaSchedule] Update scripts for subgraph tuning #10501 - Update scripts for subgraph tuning
- [MetaSchedule] Refactor testing workloads #10497 - Refactor testing workloads
- [MetaSchedule] Enable AutoTVM-style template-based search space #10461 - Enable AutoTVM-style template-based search space
- [MetaSchedule] Fix Cyclic Dependency in PyClass Family #10368 - Fix Cyclic Dependency in PyClass Family
- [MetaSchedule] Arithmetic analysis #10403 - Arithmetic analysis
- [MetaSchedule] Update Tuning Interfaces. #10367 - Update Tuning Interfaces.
- [MetaSchedule][M4a] User-API: Tune-TE/TIR/Relay #10079 - [M4a] User-API: Tune-TE/TIR/Relay
- [MetaSchedule][M4a] Rewrite-Cooperative-Fetch #10081 - [M4a] Rewrite-Cooperative-Fetch
- [MetaSchedule][M4b] Testcases for TensorRT builder/runner #10055 - [M4b] Testcases for TensorRT builder/runner
- [MetaSchedule][M4a] Mutator: Mutate-Tile-Size #10092 - [M4a] Mutator: Mutate-Tile-Size
- [MetaSchedule][M4a] Mutator: Mutate Parallel #10096 - [M4a] Mutator: Mutate Parallel
- [MetaSchedule][M4a] PostProcessor: Rewrite-Parallel-Vectorize-Unroll #10071 - [M4a] PostProcessor: Rewrite-Parallel-Vectorize-Unroll
- [MetaSchedule][M4a] Schedule Rule: Multi-Level-Tiling #10043 - [M4a] Schedule Rule: Multi-Level-Tiling
- [MetaSchedule] Mutator: Mutate-Unroll #10045 - Mutator: Mutate-Unroll
- [MetaSchedule][M4a] Schedule Rule: Parallelize-Vectorize-Unroll #10033 - [M4a] Schedule Rule: Parallelize-Vectorize-Unroll
- [MetaSchedule][M4a] PostProcessor: Rewrite-Unbound-Block #10027 - [M4a] PostProcessor: Rewrite-Unbound-Block
- [MetaSchedule] Mutator: Mutate-Compute-Location #10028 - Mutator: Mutate-Compute-Location
- [MetaSchedule][M4a] PostProcessor: Disallow-Dynamic-Loop #9997 - [M4a] PostProcessor: Disallow-Dynamic-Loop
- [MetaSchedule][M4a] Schedule Rule: Cross-Thread-Reduction #9994 - [M4a] Schedule Rule: Cross-Thread-Reduction
- [MetaSchedule][M4a] PostProcessor: Rewrite Reduction Block #10013 - [M4a] PostProcessor: Rewrite Reduction Block
- [MetaSchedule][M4a] Schedule Rule: Add-RFactor #9975 - [M4a] Schedule Rule: Add-RFactor
- [MetaSchedule][M4a] PostProcessor: Verify-GPU-Code #9945 - [M4a] PostProcessor: Verify-GPU-Code
- [MetaSchedule][M4a] Schedule Rule: Random-Compute-Location #9940 - [M4a] Schedule Rule: Random-Compute-Location
- [MetaSchedule][M4a] Schedule Rule: Auto-Inline #9943 - [M4a] Schedule Rule: Auto-Inline
- [MetaSchedule][M3c] Add Per-Store-Feature #9860 - [M3c] Add Per-Store-Feature
- [MetaSchedule][M3c] XGB-based Cost Model #9859 - [M3c] XGB-based Cost Model
- [MetaSchedule][M4a] Add EvolutionarySearch Search Strategy #9836 - [M4a] Add EvolutionarySearch Search Strategy
- [MetaSchedule][M4a] Add ReplayFunc Search Strategy #9799 - [M4a] Add ReplayFunc Search Strategy
- [MetaSchedule][M3c] Update TuneContext, TaskScheduler & Search Strategy Design #9789 - [M3c] Update TuneContext, TaskScheduler & Search Strategy Design
- [MetaSchedule][M3c] Add More Measure Callbacks #9780 - [M3c] Add More Measure Callbacks
- [MetaSchedule][M4a] Add ScheduleRule class & PostOrderApply space generator #9761 - [M4a] Add ScheduleRule class & PostOrderApply space generator
- [MetaSchedule][M3c] Random Feature Extractor #9760 - [M3c] Random Feature Extractor
MicroTVM
- [microTVM] Refactor RVM scripts and fix DNS network issue #11741 - Refactor RVM scripts and fix DNS network issue
- [microTVM][ARM]Add tests for arm schedules #11472 - [ARM]Add tests for arm schedules
- [microTVM] Update pyproject to python3.7 #11634 - Update pyproject to python3.7
- Zephyr support - [microTVM][zephyr] Add support for host-driven AoT execution on zephyr #11650
- RPC - [RPC] Revert "Implemented rpc logging (#10967)" #11227, [rpc] Implemented rpc logging #10967
Relay
- [realy][pass]add split infer shape with convert op layout pass #11825 - [realy][pass]add split infer shape with convert op layout pass
- [Relay] Finish implementations of WithFields #11674 - Finish implementations of WithFields
- [Relay] IndexedGraph improvements in preparation for Collage #11481 - IndexedGraph improvements in preparation for Collage
- [Relay] Plumb external codegen target via Target.current() #11432 - Plumb external codegen target via Target.current()
- [Pass] Add MaxPool, AvgPool to FoldExplicitPadding #11494 - [Pass] Add MaxPool, AvgPool to FoldExplicitPadding
- Add unidirectional sequence lstm #11183 - Add unidirectional sequence lstm
- Add 'static_library' runtime::Module #11442 - Add 'static_library' runtime::Module
- [Topi][Relay] Support for FP16 ERF on CPU. #11413 - [Topi]Support for FP16 ERF on CPU.
- Finish support for list-of-targets #11382 - Finish support for list-of-targets
- [Tests] Replace the Relay interpreter with the VM in the op tests #11386 - [Tests] Replace the Relay interpreter with the VM in the op tests
- [Relay] Support i16, f16 scalars in Relay text #11224 - Support i16, f16 scalars in Relay text
- Fix eltwise alter op layout for broadcast axis #11337 - Fix eltwise alter op layout for broadcast axis
- [Relay] Flexible shape dispatch transformation #11199 - Flexible shape dispatch transformation
- [Relay] Support 'external codegen targets'. #11173 - Support 'external codegen targets'.
- Add FlattenAtrousConv transformation #10996 - Add FlattenAtrousConv transformation
- [CUDNN] Add cuDNN as a Relay partitioning target (BYOC) #10871 - [CUDNN] Add cuDNN as a Relay partitioning target (BYOC)
- [Pass][Bugfix] Disable re-use of non-flat buffers in StorageRewrite. #10787 - [Pass][Bugfix] Disable re-use of non-flat buffers in StorageRewrite.
- [FQ2I] Add leaky relu to FQ21 #10378 - [FQ2I] Add leaky relu to FQ21
- [Relay] RelayViz graphviz renderer #10400 - RelayViz graphviz renderer
- [RELAY] [VIRTUALDEVICE] Change syntax for device planning and store parameter virtual devices in virtual_device_ field #10352 - [VIRTUALDEVICE] Change syntax for device planning and store parameter virtual devices in virtual_device_ field
- [ARM_CPU] Conv2d int8 intrinsic for cortex-A72 #10310 - [ARM_CPU] Conv2d int8 intrinsic for cortex-A72
- RelayViz interface and terminal ast-dump #10085 - RelayViz interface and terminal ast-dump
- Add a conversion of individual operations in FQ2I pass. #10239 - Add a conversion of individual operations in FQ2I pass.
- [Refactor] Clean up type relations that are declared as template for no reason #10236 - [Refactor] Clean up type relations that are declared as template for no reason
- Fix broadcast InferCorrectLayout #10156 - Fix broadcast InferCorrectLayout
- [Relay][VM] Relay VM memory liveness/lifetime analysis #10026 - [VM] Relay VM memory liveness/lifetime analysis
- [Relay][Pass] Add a relay pass to extract fake quantized ops #10089 - [Pass] Add a relay pass to extract fake quantized ops
- Change function constructors to WithFields #9690 - Change function constructors to WithFields
- [Relay][DefuseOps pass] bug fix: To support function body types other… #10069 - [DefuseOps pass] bug fix: To support function body types other…
- [Relay] Add
conv2d_backward_weight
op (without topi) #9954 - Addconv2d_backward_weight
op (without topi) - [FoldScaleAxis] Support dense and bias_add op in fold scale axis #9838 - [FoldScaleAxis] Support dense and bias_add op in fold scale axis
- Add sliding_window operator #9816 - Add sliding_window operator
- Add a JSON converter for 0.7 -> 0.8 and 0.8 -> 0.9 #9874 - Add a JSON converter for 0.7 -> 0.8 and 0.8 -> 0.9
- [AMP][Pass][Typing] Add faster type inference #9735 - [AMP][Pass][Typing] Add faster type inference
- [Frontend] Add Span filling for frontends to Relay #9723 - [Frontend] Add Span filling for frontends to Relay
- [Relay] Fix invalid shape function for "copy" operator #9749 - Fix invalid shape function for "copy" operator
- [Relay] s/SEScope/VirtualDevice/g #9759 - s/SEScope/VirtualDevice/g
- [Relay] Support large constants saved/loaded outside of VM executable #9734 - Support large constants saved/loaded outside of VM executable
- [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. #9613 - Re-run PlanDevices after LowerTE to flow new memory scope constraints.
- [Relay] PlanDevices supports 'free' on_device annotations #9693 - PlanDevices supports 'free' on_device annotations
- [RELAY] [AST] Add virtual_device as a first class field in Relay #9641 - [AST] Add virtual_device as a first class field in Relay
- [Relay] Switch the VM to use the LowerTE pass instead of TECompiler::{Lower,LowerShapeFunc}. #9483 - Switch the VM to use the LowerTE pass instead of TECompiler::{Lower,LowerShapeFunc}.
- [Relay] WithFields method for Call, Function, Var, TupleGetItem, If, Let, RefCreate, RefRead, RefWrite, Match, and Clause #9569 - WithFields method for Call, Function, Var, TupleGetItem, If, Let, RefCreate, RefRead, RefWrite, Match, and Clause
- WithFields for Tuples #9533 - WithFields for Tuples
- [Relay] Prepare for switching VM to LowerTEPass. #9550 - Prepare for switching VM to LowerTEPass.
- [Relay] Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc. #9542 - Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc.
- [TVMC][Relay] Introduce executor and runtime parameters #9352 - [TVMC]Introduce executor and runtime parameters
- Add the Arm(R) Ethos(TM)-U NPU identity operator #9457 - Add the Arm(R) Ethos(TM)-U NPU identity operator
- Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes. #9326 - Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes.
- QNN - [QNN] Enable constant folding for QNN operations. #11228, [QNN] Add per-channel quantization to add/subtract/multiply #10718, [QNN] Register a bunch of unary elementwise ops #10086, [QNN] Lookup operations for hard to implement operators #10053, Add FP requantize flow. Set float32 flow by default for llvm x86 targets with sse4.1 support. #9637, [QNN] Add qnn.rsqrt op #9982
Runtime
- [Runtime][PipelineExecutor] Add graph manually splitting logic into the unit test. #11334 - [PipelineExecutor] Add graph manually splitting logic into the unit test.
- [Runtime][PipelineExecutor] Refactor PipelineExecutor.py and Add cross compile support for pipeline executor. #11133 - [PipelineExecutor] Refactor PipelineExecutor.py and Add cross compile support for pipeline executor.
- Move WrapTimeEvaluator from RPC to profiling, NFC #11172 - Move WrapTimeEvaluator from RPC to profiling, NFC
- [Runtime][PipelineExecutor]Add forwarding queue logic for set input. #10990 - [PipelineExecutor]Add forwarding queue logic for set input.
- [Runtime][Vulkan] Add RGP support to TVM for vulkan device #10953 - [Vulkan] Add RGP support to TVM for vulkan device
- [Runtime][PipelineExecutor] Getting the asynchronous output #10723 - [PipelineExecutor] Getting the asynchronous output
- [runtime] AOTExecutor implementation and c target code-generator #10283 - AOTExecutor implementation and c target code-generator
- [Runtime][ThreadPool]Refactor affinity function and support CPU affinity list setting. #9802 - [ThreadPool]Refactor affinity function and support CPU affinity list setting.
- [Runtime][Pipeline Executor] multiple threads management and the data forwarding notification mechanism. #10234 - [Pipeline Executor] multiple threads management and the data forwarding notification mechanism.
- [Runtime] Improved log information with function signature #10326 - Improved log information with function signature
- [Runtime][PackedFunc] Bring
PackedFunc
into TVM Object System #10032 - [PackedFunc] BringPackedFunc
into TVM Object System - [Runtime][PipelineExecutor] Pipeline Executor Sequential execution #10082 - [PipelineExecutor] Pipeline Executor Sequential execution
- [Runtime][PipelineExecutor] Add Pipeline Executor Interface #10010 - [PipelineExecutor] Add Pipeline Executor Interface
- [Runtime][Pipeline executor] Global parameters group name and runtime modules parameters map. #9846 - [Pipeline executor] Global parameters group name and runtime modules parameters map.
- [GraphExecutor] Add API
get_input_info
to graph_executor #9889 - [GraphExecutor] Add APIget_input_info
to graph_executor - [Runtime][Pipeline Executor] Add the map logic of global input and subgraph input. #9751 - [Pipeline Executor] Add the map logic of global input and subgraph input.
TE
- [TE] Support schedulable TIR compute definitions in TOPI #11589 - Support schedulable TIR compute definitions in TOPI
- [TE] Optimized version of concatenation layer #11341 - Optimized version of concatenation layer
- [TECompiler] Decouple TE compute and schedule lowering in ScheduleBuilder #10561 - [TECompiler] Decouple TE compute and schedule lowering in ScheduleBuilder
TIR
- [TIR] HoistExpression, generalization of HoistIfThenElse #11592 - HoistExpression, generalization of HoistIfThenElse
- [TIR][Pass] Remove-Weight-Layout-Rewrite-Block #11870 - [Pass] Remove-Weight-Layout-Rewrite-Block
- [TIR, analysis] Add GetAutoTensorizeMappingInfo to generate transforms for auto tensorization #11740 - [TIR, analysis] Add GetAutoTensorizeMappingInfo to generate transforms for auto tensorization
- [TIR] Add preserve-unit-iters #11585 - Add preserve-unit-iters
- [TIR] Register CUDA WMMA tensor intrinsics #11677 - Register CUDA WMMA tensor intrinsics
- [TIR, CUDA] Add pass to replace global to shared memory copy with cp.async #11658 - [TIR, CUDA] Add pass to replace global to shared memory copy with cp.async
- [TIR][Schedule] Allow named block and buffer arguments in Schedule #11624 - [Schedule] Allow named block and buffer arguments in Schedule
- [PASS] Refactor a couple of TIR passes - BindTarget, AnnotateEntryFunc, Filter, LowerInitBlock #11628 - [PASS] Refactor a couple of TIR passes - BindTarget, AnnotateEntryFunc, Filter, LowerInitBlock
- [TIR] CSE pass : Restrict the equivalence to be decided by a normal form - avoids comparison of terms #11574 - CSE pass : Restrict the equivalence to be decided by a normal form - avoids comparison of terms
- [TIR] Schedule Primitive: Add-Unit-Loop #11575 - Schedule Primitive: Add-Unit-Loop
- [TIR] Add schedule primitive ReIndex #11515 - Add schedule primitive ReIndex
- [TIR][Arith] Additional Simplifications Inside Conditionals #11524 - [Arith] Additional Simplifications Inside Conditionals
- [TIR] Add schedule primitive TransformBlockLayout #11485 - Add schedule primitive TransformBlockLayout
- [Software pipeline] Fix hardcoded index in
access_ptr
rewriting, add a GPU test with depth 4 #11495 - [Software pipeline] Fix hardcoded index inaccess_ptr
rewriting, add a GPU test with depth 4 - [TIR][Schedule] Transform layout quality of life #11269 - [Schedule] Transform layout quality of life
- [TIR] Support tensorization using ldmatrix + MMA #11355 - Support tensorization using ldmatrix + MMA
- [Schedule] Allowed typing.Tuple in tir.schedule._type_checker #11289 - [Schedule] Allowed typing.Tuple in tir.schedule._type_checker
- [TIR] Support affine expressions as indices in reverse compute inline #11317 - Support affine expressions as indices in reverse compute inline
- [TIR][Arith] Implemented padded inverses in IndexMap #11235 - [Arith] Implemented padded inverses in IndexMap
- [ROOFLINE] Calculate roofline from existing TIR PrimFunc #11238 - [ROOFLINE] Calculate roofline from existing TIR PrimFunc
- [TIR] Add schedule primitive SetAxisSeparator #11225 - Add schedule primitive SetAxisSeparator
- [TIR] Get read/write access precisely for opaque access. #11110 - Get read/write access precisely for opaque access.
- [TIR] Enhance software pipeline validation and fix predicate of epilogue #11106 - Enhance software pipeline validation and fix predicate of epilogue
- [TIR] StmtFunctor RenewDefs #10843 - StmtFunctor RenewDefs
- [TIR] Add function to tile a block according to a given tensor intrinsic #11075 - Add function to tile a block according to a given tensor intrinsic
- [TIR] Utility function to decide loop mapping for auto tensorization #11050 - Utility function to decide loop mapping for auto tensorization
- [ROCM] DP4A intrinsic support for TE/TIR #11009 - [ROCM] DP4A intrinsic support for TE/TIR
- [TIR] VNNI and ARM dot product intrinsic for tensorization #10925 - VNNI and ARM dot product intrinsic for tensorization
- [TIR][Schedule] Relax reorder primitive's affine binding check #10887 - [Schedule] Relax reorder primitive's affine binding check
- [TIR][Analysis] Add SuggestIndexMap for layout rewriting #10732 - [Analysis] Add SuggestIndexMap for layout rewriting
- [TIR][Schedule] Transform layout #10538 - [Schedule] Transform layout
- [TIR] Change the behavior of read/write region analysis for reduction blocks. #10638 - Change the behavior of read/write region analysis for reduction blocks.
- Use local complete block and local reduction block to identify compact dataflow #10705 - Use local complete block and local reduction block to identify compact dataflow
- [TIR] Tuple Reduction Support in CreatePrimFunc #10671 - Tuple Reduction Support in CreatePrimFunc
- [TE][TIR] Implement layout transformations, non-flat memory buffers #9727 - [TE]Implement layout transformations, non-flat memory buffers
- [TensorIR] Update VerifyGPU #10405 - [TensorIR] Update VerifyGPU
- [TensorIR] Renormalize split pattern #10401 - [TensorIR] Renormalize split pattern
- [TIR, Relay] improve bfloat16 support #10112 - [TIR, Relay] improve bfloat16 support
- [TIR] Tir constants integration into compilation pipeline #8509 - Tir constants integration into compilation pipeline
- [TIR] add support for multi-blocking layout and their transformation #9996 - add support for multi-blocking layout and their transformation
- [TIR] Add software pipelining #10066 - Add software pipelining
- Support sub warp reduction for CUDA target. #10207 - Support sub warp reduction for CUDA target.
- Implementation of Common Subexpression Elimination for TIR #9482 - Implementation of Common Subexpression Elimination for TIR
- [TIR] Allow compute_at create block predicate for non-trivial bounds and support floordiv pattern #9527 - Allow compute_at create block predicate for non-trivial bounds and support floordiv pattern
- [TIR][Schedule] Update compact_dataflow constraint #10158 - [Schedule] Update compact_dataflow constraint
- [TIR][Schedule] Blockize and Tensorize #9871 - [Schedule] Blockize and Tensorize
- [BugFix][TIR] Fix cross-thread reduction when single reduction loop with predicate #10016 - [BugFix]Fix cross-thread reduction when single reduction loop with predicate
- [TIR] Encode conditional accesses info into block read/write regions #9880 - Encode conditional accesses info into block read/write regions
- [TIR] Affine utility support iter lowerbound and diagnostics #9699 - Affine utility support iter lowerbound and diagnostics
- [TIR][Schedule] Add Annotate/Unannotate primitive #9742 - [Schedule] Add Annotate/Unannotate primitive
- [TensorIR] Primitive "SetScope" #9738 - [TensorIR] Primitive "SetScope"
- [TIR][Schedule] Analysis functions to check if compute_inline and com… #9743 - [Schedule] Analysis functions to check if compute_inline and com…
- [TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs #9689 - Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs
- [TensorIR][UX] Type annotation-based runtime type checking #9559 - [TensorIR][UX] Type annotation-based runtime type checking
- Add a 'rolling_buffer' scheduling primitive #9444 - Add a 'rolling_buffer' scheduling primitive
- [TensorIR] Cross-Thread Reduction #9360 - [TensorIR] Cross-Thread Reduction
TOPI
- [TOPI] TE implementation of LSTM using scan #11531 - TE implementation of LSTM using scan
- Add Adreno GPU target and topi supporting textures with dynamically allocated textures #11161 - Add Adreno GPU target and topi supporting textures with dynamically allocated textures
- [TOPI] VNNI support for batch matmul #10332 - VNNI support for batch matmul
- [TOPI] Add support for groupped conv3d #9873 - Add support for groupped conv3d
- [TOPI] VNNI support for int8 dense #10230 - VNNI support for int8 dense
- [Op][Topi] 5 ops can accept unsigned integers as indices #10098 - [Op]5 ops can accept unsigned integers as indices
- [TOPI] Support grouped conv1d #9832 - Support grouped conv1d
- [TOPI] Add generic batch norm #9694 - Add generic batch norm
- [Topi] Cortex-M DSP support #9233 - Cortex-M DSP support
TVMScript
- [TVMScript] Represent ramp as index slice #11308 - Represent ramp as index slice
- [TVMScript] Support T.buffer_decl using data pointer from Let/Allocate #10099 - Support T.buffer_decl using data pointer from Let/Allocate
- [TVMScript] Improve printer for TIR syntax sugar #9680 - Improve printer for TIR syntax sugar
- [TVMScript] Add syntax sugar for T.handle and T.match_buffer #9492 - Add syntax sugar for T.handle and T.match_buffer
- [TVMScript] Add for loop syntax sugar #9620 - Add for loop syntax sugar
- [TVMSCRIPT] Misc error message improvements #9543 - Misc error message improvements
- [TVMScript][Fix] Add type hints for more uncovered cases #9505 - [Fix] Add type hints for more uncovered cases
USMP
- [usmp] U3 use case #11015 - U3 use case
- [USMP] Adding support for U1 usecase for constant pools #10189 - Adding support for U1 usecase for constant pools
- [USMP] Adding support for U4 usecase #10785 - Adding support for U4 usecase
- [USMP] adding support for U2 and U3 usecases #10193 - adding support for U2 and U3 usecases
- [USMP] Add performance characteristics to PoolInfo #10005 - Add performance characteristics to PoolInfo
- [TIR][USMP] Integrating USMP to AoT Executor #9565 - [TIR]Integrating USMP to AoT Executor
- [USMP] Hill Climb allocator #9704 - Hill Climb allocator
- [TIR][USMP] adding the pass to convert to pool offsets #9418 - [TIR]adding the pass to convert to pool offsets
- [TIR][USMP] Augmenting the algo interface with memory pressure #9649 - [TIR]Augmenting the algo interface with memory pressure
- [TIR][USMP] Greedy memory planning algorithm #9214 - [TIR]Greedy memory planning algorithm
- [TIR][USMP] Added buffer info extraction pass #8468 - [TIR]Added buffer info extraction pass
microNPU
- [microNPU] Optimize separate padding operation for conv2d #11468 - Optimize separate padding operation for conv2d
- [microNPU] Add transform matrices and part matcher to identity op #11453 - Add transform matrices and part matcher to identity op
- [microNPU] add E2E tests with cascader wo striping #11410 - add E2E tests with cascader wo striping
- [microNPU] Expose compute cycle annotations to TIR lowering #11288 - Expose compute cycle annotations to TIR lowering
- [microNPU] Add a pass to reorder copy and compute nodes #10959 - Add a pass to reorder copy and compute nodes
- [microNPU] Add various options to the cascader #10509 - Add various options to the cascader
- [microNPU] Adding a option to enable striping #11263 - Adding a option to enable striping
- [microNPU] Add support for conv2d running on two cores on U65 #10251 - Add support for conv2d running on two cores on U65
- [microNPU] Integrate the cascader #10862 - Integrate the cascader
- [microNPU] Integrate rolling buffers in Arm(R) Ethos(TM)-U #10344 - Integrate rolling buffers in Arm(R) Ethos(TM)-U
- [microNPU] Some housekeeping in the test_ethosu folder #10824 - Some housekeeping in the test_ethosu folder
- [microNPU] Tweak a layout transform matrix #10763 - Tweak a layout transform matrix
- [microNPU] Add a pass to move allocate nodes to the outer scope #10725 - Add a pass to move allocate nodes to the outer scope
- [microNPU] Determine block configs using the cascader #10695 - Determine block configs using the cascader
- [microNPU] Refactor Relay to TIR hook #10599 - Refactor Relay to TIR hook
- [microNPU] Improve cascader memory transfer estimates #10508 - Improve cascader memory transfer estimates
- [microNPU] Add support for TFLite FULLY_CONNECTED #10345 - Add support for TFLite FULLY_CONNECTED
- [microNPU] Introduce a pass to remove redundant identity operations #10254 - Introduce a pass to remove redundant identity operations
- [microNPU][5] Convert Proposals to te.Schedules #10062 - [5] Convert Proposals to te.Schedules
- [microNPU][4] Add the cascader Proposal generator #9959 - [4] Add the cascader Proposal generator
- [microNPU] enable USMP #10022 - enable USMP
- [microNPU] Add support for LeakyReLU #10127 - Add support for LeakyReLU
- Add FreeRTOS variant of NPU demo #10004 - Add FreeRTOS variant of NPU demo
- [microNPU] Refactor type inference data type checks #10060 - Refactor type inference data type checks
- [microNPU] Add support for pack and unpack #9960 - Add support for pack and unpack
- [microNPU] Fix layout assignment in layout optimizer pass #10143 - Fix layout assignment in layout optimizer pass
- [microNPU][3] Plan generation for the cascader #9890 - [3] Plan generation for the cascader
- [microNPU] Add support for transpose convolution #9855 - Add support for transpose convolution
- [microNPU] Add support for nearest neighbor and bilinear upsampling #9841 - Add support for nearest neighbor and bilinear upsampling
- [microNPU] Removing constant args from PrimFunc #9951 - Removing constant args from PrimFunc
- [microNPU] Refactor base address determination to codegen #9929 - Refactor base address determination to codegen
- [microNPU] Add support for requantize #9910 - Add support for requantize
- [microNPU] Move optimization passes to be a module pass and ensure they are running #9831 - Move optimization passes to be a module pass and ensure they are running
- [microNPU][2d] Add more Part matchers to cascader #9785 - [2d] Add more Part matchers to cascader
- [microNPU][2c] Add performance modelling to cascader #9778 - [2c] Add performance modelling to cascader
- [microNPU][2b] Create CascaderGraphs from TE graphs #9471 - [2b] Create CascaderGraphs from TE graphs
- [microNPU][2a] Add CascaderGraph for cascading analysis #9469 - [2a] Add CascaderGraph for cascading analysis
- [microNPU] Add support for SPLIT and SPLIT_V #9621 - Add support for SPLIT and SPLIT_V
- [microNPU] Update Conv2D Tests to Use TF API to Gen Test Cases #9508 - Update Conv2D Tests to Use TF API to Gen Test Cases
- [microNPU] Add support for SIGMOID #9627 - Add support for SIGMOID
- [microNPU] Add support for TFLite concatenate #9589 - Add support for TFLite concatenate
- [microNPU] Refactor codegen tests #9623 - Refactor codegen tests
- [microNPU] Add NHWC -> NHCWB16 layout transformation pass #9561 - Add NHWC -> NHCWB16 layout transformation pass
- [microNPU] Mean legalization support #9576 - Mean legalization support
- [microNPU] Move the compilation to use Target Hooks. #9597 - Move the compilation to use Target Hooks.
- [microNPU][1] Add affine analysis structures for the cascader #9458 - [1] Add affine analysis structures for the cascader
- [microNPU] Add the infrastructure for lookup table and TANH #9547 - Add the infrastructure for lookup table and TANH
- [microNPU] Support binary elementwise with non-4D inputs #9521 - Support binary elementwise with non-4D inputs
- [microNPU] Fix incorrectly calculated stride when converting NHWC to NHCWB16 #9560 - Fix incorrectly calculated stride when converting NHWC to NHCWB16
- [microNPU] Add unary elementwise operator infrastructure with ABS #9530 - Add unary elementwise operator infrastructure with ABS
- [microNPU] Adding rounding mode attribute to operators #9514 - Adding rounding mode attribute to operators
- [microNPU] Allow constants to be given as input to an operator #9515 - Allow constants to be given as input to an operator
microTVM
- [microTVM][ARM] Add Relay tests for conv2d registered schedules #11250 - [ARM] Add Relay tests for conv2d registered schedules
- [rpc] Implemented rpc logging #11232 - [rpc] Implemented rpc logging
- [microTVM] Add support for host-driven AoT Executor #11044 - Add support for host-driven AoT Executor
- Better version handling for Arduino #11043 - Better version handling for Arduino
- [microTVM] Enable micro tvmc tutorial testing in CI #10555 - Enable micro tvmc tutorial testing in CI
- [microtvm][RVM] Add scripts for automated build and testing #10194 - [RVM] Add scripts for automated build and testing
- [microTVM] TVMCon 2021 Zephyr Demo with CMSIS-NN #10144 - TVMCon 2021 Zephyr Demo with CMSIS-NN
- [microTVM][tvmc] Add TVMC Micro tutorial for Zephyr #10024 - [tvmc] Add TVMC Micro tutorial for Zephyr
- [microTVM] Fix zephye/test_zephyr_armv7m test #9684 - Fix zephye/test_zephyr_armv7m test
- [microTVM][TVMC] Add TVMC test for Arduino and Zephyr #9584 - [TVMC] Add TVMC test for Arduino and Zephyr
- Add minimal forwarding RPC server for host driven python execution on Hexagon #9526 - Add minimal forwarding RPC server for host driven python execution on Hexagon
- Zephyr support - [microTVM][ARM][Zephyr] Add CMSIS dependencies in Zephyr project build #11362, [microTVM][Zephyr] Update RVM to Zephyr 2.7 #10138
Misc
- Add cooldown interval logic for the profiling functional #11465 - Add cooldown interval logic for the profiling functional
- [LLVM] Include LLVM headers in files that use them, not in llvm_common.h #11888 - [LLVM] Include LLVM headers in files that use them, not in llvm_common.h
- [Arith] Simplification of ceil, log2, and left_shift #11646 - [Arith] Simplification of ceil, log2, and left_shift
- [MLF] Add support for multiple modules in Model Library Format #11464 - [MLF] Add support for multiple modules in Model Library Format
- [AutoTVM][Autoscheduler] Default build funcs inherit PassContext #11632 - [AutoTVM][Autoscheduler] Default build funcs inherit PassContext
- [OpenCL] Implement conv2d_winograd algorithm for Adreno #11543 - [OpenCL] Implement conv2d_winograd algorithm for Adreno
- [Arith] Merge surjective/non-surjective iter mapping detections #11287 - [Arith] Merge surjective/non-surjective iter mapping detections
- Add utility to replace direct call to pytest.main #11393 - Add utility to replace direct call to pytest.main
- [ROOFLINE] Roofline analysis over RPC #11252 - [ROOFLINE] Roofline analysis over RPC
- [Graph Debugger] Expose way to benchmark individual nodes. #11000 - [Graph Debugger] Expose way to benchmark individual nodes.
- bump PyTorch version to 1.11 #10794 - bump PyTorch version to 1.11
- [REFACTOR] Remove legacy nnvm folder #10821 - [REFACTOR] Remove legacy nnvm folder
- [Arith] Remove diagnostic ctx argument from DetectIterMap #10798 - [Arith] Remove diagnostic ctx argument from DetectIterMap
- [Refactor] Reduced repetition in CodeGenLLVM's buffer access #10567 - [Refactor] Reduced repetition in CodeGenLLVM's buffer access
- [AUTO_SCHEDULER] Add feature extraction directly from PrimFunc #10455 - [AUTO_SCHEDULER] Add feature extraction directly from PrimFunc
- RFC: initial stab at TorchScript fallback #7401 - RFC: initial stab at TorchScript fallback
- [vulkan] Add integer dot product (4xint8, 4xuint8) tensorization for the vulkan SPIR-V target. #10391 - [vulkan] Add integer dot product (4xint8, 4xuint8) tensorization for the vulkan SPIR-V target.
- [VirtualMachine] new method allowing to set one input tensor by its index or name #10293 - [VirtualMachine] new method allowing to set one input tensor by its index or name
- Generate correct output tensor names in C Interface API #10191 - Generate correct output tensor names in C Interface API
- Parameterize test_link_params #9276 - Parameterize test_link_params
- [Rust] Update Rust bindings #9808 - [Rust] Update Rust bindings
- [PROFILING] Add ability to profile a single function_profiling #9553 - [PROFILING] Add ability to profile a single function_profiling
- [CMAKE] Automatically detect newly added source files #9611 - [CMAKE] Automatically detect newly added source files
- #9544 - [Target] enable -arch=sm_xx for assigning cuda target arch and deprecate autotvm.measure.set_cuda_target_arch api
- Profiler - #11530, #11066
- Docs - #10921, #11403, #10774, #10912, #9633, #9906, #9534, #9307, #9654, #9580
- Android - #11241
- ETHOSN - #11261, #10486, #10018, #9596
- TVMC - #11012, #10962, #10722, #9817, #9529, #9229
Metadata
Metadata
Assignees
Labels
No labels