Skip to content

Conversation

Hzfengsy
Copy link
Member

@Hzfengsy Hzfengsy commented Jul 3, 2019

Now it can be run on GPU server and support schedule primitive bind

@Hzfengsy
Copy link
Member Author

@merrymercy @tqchen I have finished shared memory support and passed the GEMM test.

@merrymercy
Copy link
Contributor

merrymercy commented Jul 24, 2019

We synced with upstream/master to get the recent updates

Please rebase.

Other changes to file arangement:

  • I think the function Shedule::ToHalide is very important and will become larger as we involve, so I move it to its own file src/tensorir/to_halide.cc

@Hzfengsy
Copy link
Member Author

@merrymercy Rebased.

Copy link
Contributor

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments. Will review changes to schedule part later this morning.

return tx, yi

split_calc(CC)
tx, yi = split_calc(C)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original schedule, we only do split_calc once. Can we do it once in the new schedule?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible after we implement the cache read. But it is difficult if the cache_read is called in original schedule but split in Tensor IR.

@Hzfengsy
Copy link
Member Author

Please take another look @merrymercy

return fused_node;
}

Array<AxisTreeNode> Schedule::reorder(AxisTreeNode outer, AxisTreeNode inner) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #5.
Choose one with reasons and document the behavior in the code.

}
now = operator->()->father_map[now];
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not delete these blank lines.

}
}
Array<IntSet> iter_domain = arith::SolveCover(block->vars, produces, Flatten2DArray(ranges));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not delete these blank lines.

for (int i = static_cast<int>(iter_domain.size()) - 1; i >= 0; --i) {
Var iter_var("ax" + std::to_string(i));
AxisTreeNode node;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not delete these blank lines.


// dependency analysis
Array<Array<IntSet> > Schedule::GatherRegion(Array<Tensor> tensors, AxisTreeNode axis, int start_child_index) const {
Array<Array<IntSet> > Schedule::GatherRegion(BlockTreeNode block, AxisTreeNode axis, int start_child_index) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you change the signature? The original one is more general, and you only use block->outputs in this function.

}

if (const arith::IntervalSetNode* set = iter_domain[i].as<arith::IntervalSetNode>()) {
if (const auto * variable = block->args[i].as<Variable>())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already checked by L453. Remove this if/else and directly use const auto * variable = block->args[i].as<Variable>()

@merrymercy
Copy link
Contributor

merrymercy commented Jul 26, 2019

Add abstraction of memory scope

  • Memory scope
    1. Local: local intermidiate buffers. These buffers can be shrinked arbitrarily (i.e. they should be stored at a position to greedily reduce the memory size)
    2. Shared: must between BlockIdx and ThreadIdx
    3. Global: inputs (placeholder) and output buffers. Their size cannot be changed.
  • Calculate the buffer size and insertion place of Realize in to_halide.cc
  • possible new schedule primitive store_at

Other problems

  • Some annotations are missing, we should add these annotations back when lowering to HalideIR
    1. unroll, vectorize
    2. double_buffer_scope

Copy link
Contributor

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the GPU support looks good! Some comments on style.

return ScheduleUntensorize(self, block)

def annotate(self, axis, type):
return ScheduleAnnotate(self, axis, type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not use reserved key word

CHECK(outer->children.size() == 1 && outer->children[0] == inner);

ReplaceChild(outer, inner);
Array<AxisTreeNode> Schedule::reorder(Array<AxisTreeNode> axises) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axises -> axes

return origin;
}

Array<AxisTreeNode> Schedule::binary_reorder(AxisTreeNode outer, AxisTreeNode inner) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google c style : binary_reorder -> BinaryReorder


class Attr;

class AttrNode : public Node {
Copy link
Contributor

@merrymercy merrymercy Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class seems to be redundant or will cause ambiguous naming issues (so many Attrs in tvm and relay). Could we reuse the existing AttrStmt but use a nullptr for the body field?


s.compute_at(AA, ko)
s.compute_at(BB, ko)
# s.unroll(kt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable this?

if (const arith::IntervalSetNode* set = o.as<arith::IntervalSetNode>()) {
region.push_back(Range::make_by_min_extent(set->min_value, set->max_value - set->min_value + 1));
}
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code style } else {

This pattern appears many times in your PR. Please replace all of them.


void CheckFatherLink();

bool isAncestor(ScheduleTreeNode outer, ScheduleTreeNode inner) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this after L110?

// Go upwards until all used vars are fetched.
// Then we find the root node of this block
ScheduleTreeNode now = node;
ScheduleTreeNode shared_memory_node = node;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless variable

for (const auto& x : n->outputs) {
related_nodes[x->data].push_back(now);
if (operator->()->raw_realize_scope.count(x->data->op)) {
if (operator->()->raw_realize_scope.at(x->data->op) != "") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"" indicates it will use default scope, which is "global". So the condition here should be

if (str == global || str == "") {
 ...
} else {
 ...
}


for (const auto& x : n->outputs) {
related_nodes[x->data].push_back(now);
if (operator->()->raw_realize_scope.count(x->data->op)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this condition always true? Otherwise, we'd better fix the construction of raw_realize_scope

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the block inputs contain the input of the compute graph. (In GEMM, the input will be placeholder A and B) Those inputs will not in the realize scope.

auto regions = GatherRegion(Array<Tensor>{x.first}, GetRef<AxisTreeNode>(axis), 0);
Region region;
for (const auto& int_set : regions[0]) {
arith::Analyzer analyzer;
Copy link
Contributor

@merrymercy merrymercy Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creation of analyzer takes some time.. move it outside of the loop.

Or we could use Simplify(expr) directly

Array<Stmt> new_stmts;
for (const auto &stmt : stmts) {
arith::Analyzer analyzer;
new_stmts.push_back(SubstituteAndEquationSimplify(stmt, vmap, &analyzer));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need equation simplify here. Remove analyzer and just use Substitute(stmt, vmap)

// translate nodes
if (const AxisTreeNodeNode* n = node.as<AxisTreeNodeNode>()) {
bool new_thread_axis;
if (operator->()->bind_var.count(n->loop_var)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the usage of this part?
new_thread_axis and binded_threads seem useless

ForType::Serial, DeviceAPI::None,
ArrayToBlock(stmts)));
auto var = n->loop_var;
auto bind_var = operator->()->bind_var;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto&

auto var = n->loop_var;
auto bind_var = operator->()->bind_var;
auto it = bind_var.find(var);
if (it != bind_var.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the nest level shallow, we can append this if-else block to the if-elseif nest of L203.

for (const auto& child : lca->children) {
if (FindAccess(child, x.first)) {
attached_allocation[child].push_back(x.first);
if (operator->()->raw_realize_scope.at(x.first->op) != "") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!= "" && != "global"

dom_map[iter_var]->min,
dom_map[iter_var]->extent,
const auto * variable = block->args[i].as<Variable>();
Var var(variable->GetNodePtr());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Var var = Downcast<Var>(block->args[i]);

output_tensors.push_back(x->data);
}
Array<Array<IntSet> > ranges = GatherRegion(output_tensors, axis, after_pos);
auto& realize_region = operator->()->raw_realize_region;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete all the code you added from L410 -L451. They are useless now.


return tx, yi

tx, yi = split_calc(CC)
Copy link
Contributor

@merrymercy merrymercy Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just safely delete this line. In L176, all upper threadIdx, blockIdx will be inherited from C.

@merrymercy
Copy link
Contributor

merrymercy commented Aug 5, 2019

After fixing all the bugs. The final remaining thing is to add the feature of automatically set scope when doing compute_at.

scope = "" means its scope can be changed accordingly. In the original tvm's schedule,
If a stage is compute at BlockIdx, its scope will be set to "shared" automatically.
If a stage is compute at ThreadIdx, its scope will be set to "local".
If an intermidiate stage is compute at an axis of another stage, its scope will be set to "local"

However, our argument of compute_at is a block, not a stage. So I think we should do

  1. At the beginning of to_halide analyze the realize location of buffers and assign correct scope to buffer with scope = ""
  2. Translates according to the scope

@Hzfengsy
Copy link
Member Author

Hzfengsy commented Aug 5, 2019

cc @merrymercy


void GatherVarDomain(ScheduleTreeNode node,
std::unordered_map<const Variable*, arith::IntSet>* dom_map) {
// todo(@siyuan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the content of todo?

AxisTreeNode fuse(AxisTreeNode outer, AxisTreeNode inner);
Array<AxisTreeNode> reorder(AxisTreeNode outer, AxisTreeNode inner);
Array<AxisTreeNode> reorder(Array<AxisTreeNode> axes);
Array<AxisTreeNode> BinaryReorder(AxisTreeNode outer, AxisTreeNode inner);
Copy link
Contributor

@merrymercy merrymercy Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my last review commont... To keep the consistency of schedule primitive APIs. This should be binary_reorder

for (size_t i = 0; i < n->args.size(); ++i) {
used_vars.insert(GatherVars(n->args[i]));
const auto vars = GatherVars(n->args[i]);
used_vars.insert(vars);
Copy link
Contributor

@merrymercy merrymercy Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo this change

attached_allocation[child].push_back(x.first);
const auto& str = operator->()->raw_realize_scope.at(x.first->op);
if (str != "" && str != "global") {
const auto *axis = lca.as<AxisTreeNodeNode>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add CHECK(axis != nullptr)

}
}

std::unordered_set<std::string> binded_threads;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless

np.testing.assert_allclose(x, y, atol=1e-4)
import os
import sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
Copy link
Contributor

@merrymercy merrymercy Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this. You should use your own environment variables

inline void ReplaceChild(ScheduleTreeNode old_child, ScheduleTreeNode new_child);
inline void ReplaceChild(ScheduleTreeNode old_child, Array<ScheduleTreeNode> new_children);
void RemoveLeaf(ScheduleTreeNode node);
bool isAncestor(ScheduleTreeNode outer, ScheduleTreeNode inner) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool isAncestor(ScheduleTreeNode outer, ScheduleTreeNode inner) const;
bool IsAncestor(ScheduleTreeNode outer, ScheduleTreeNode inner) const;

@merrymercy merrymercy merged commit 46454cc into tlc-pack:master Aug 6, 2019
junrushao pushed a commit that referenced this pull request Aug 28, 2021
* Add C++ API for computing type key from type index

* Try and isolate leak

* Rewrite the bindings to fix the ArgValue lifetime issue

There are still quite a few issues left to resolve in this patch, but I believe the runtime
changes stablize memory consumption as long as the parameters are only set once. ByteArray
also has some totally broken unsafe code which I am unsure of how it was introduced.

* Finish handling tvm-rt issues due to ArgValue lifetime

This patch further refactors the bindings to better handle the
lifetime issues introduced by detecting the argument memory leak.

* WIP memory leak

* There is issue using TVMCb function which is breaking refcount

* Fix fallout from the lifetime refactor

* Another tweak

* Follow up work from the memory leak, attempt to clean up ByteArray

* Add some todos for future work

* Fix doc string

* Clean up the changes

* Format
junrushao pushed a commit that referenced this pull request Sep 3, 2021
…ter (#8835)

* # This is a combination of 2 commits.
# This is the 1st commit message:

Initial changes

# This is the commit message #2:

Ftarget string -> Target object works!

* Fix remaining target strings

* fix bad rebase

* Fix typo

* 1 more bad rebase fix

* Lint

* typo

* Forgot to commit this

* Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn

* Passing most tests, yay

* remove some comments

* lint

* target-str-to-target-object

* Respond to change requests

Co-authored-by: Jared Roesch <roeschinc@gmail.com>
zxybazh pushed a commit that referenced this pull request Oct 4, 2021
… only to `/docs` (#9031)

* Add script to look for changed in doc dir

* Modify Jenkinsfile

* Minor changes in scripts

* Working Jenkinsfile on selective stages on docs

* Pass groovy formater on Jenkinsfile

* Implementation of relay_to_tir target hook (#8423)

This the first new hook proposed in the Additional Target Hooks RFC, longer
term the compilation should move to using `Target` proper but this unblocks our current work whilst illustrating the eventual interface via `Target` in `src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc`

Ideally the host target would be annotated onto the `IRModule` so as this `Pass` could use it instead of defaulting to C but this is fine for now.

* [CUDA] Fix dense tensorcore legalize type error when units is specified (#9030)

* Fix dense tensorcore legalize type error when units is specified

* revert black change due to different version from CI

* [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op (#9017)

* [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op

* Fix linter error for variable name and else after return

* Separate quantized avg_pool impl and add TODO for global_avg_pool

* Fix comment typo

* Fix line break in `setup.py` (#9029)

* [Onnx] Add SoftmaxCrossEntropyLoss (#8906)

* nll loss v1

* add converter

* decode strings in byte form

* decode variable length inputs

* make shapes correct

* unsqueeze

* proper weight handling

* simplify if statement

* fix tests

* add comment about tests

* delete extra file

* lint

* so cool

* Update CI Lint Image Version (#8841)

* Update CI Lint Image Version

* trigger

* [BUG] ToBasicBlockNormalForm immutability (#8778)

* ToBasicBlockNormalForm immutability

* better comment on ToBasicBlock

* refine comment of ToBasicBlockForm

* [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (#8807)

* [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm

This new benchmarking function is just a convenience function for
calling time_evaluator on the underlying module. Hopefully this should
make it easier for users to get good benchmarks of their code.

* formatting

* import order

* more test, more comments, more precision

* fix tests

* add seconds descriptions to doc

* Apply CPPLint to CRT Tests (#8844)

This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting.

* [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (#8584)

* [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost.

Added initial tunable autotvm templates for depthwise conv2d with
NHWC layout for Mali and Bifrost.

* [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost.

- Fix assert for Bifrost.
- Set reasonable default axis splits to avoid using tophub for NHWC.
- Fixed typo: arm cpu -> Mali.

* [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost.

* Support for CMSIS-NN in Corstone300 Makefile (#8831)

Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a

* [microtvm][Zephyr] Increase timeout to fix flaky tests (#8846)

* increase timeout

* trigger

* [AMP] Bump up tolerance on flaky test (#8850)

* bumpy up tol

* bumped tolerance up even more

* jostle ci

* [Hexagon] Rework tvm.target.hexagon() interface (#8823)

* [Hexagon] Rework tvm.target.hexagon() interface

Make the tvm.target.hexagon() function take most options as keyword
parameters. This will allow adding additional parameters without changing
the interface.

No changes are required to existing code, except for changing positional
parameters following the CPU version to keyword parameters, and updating
the names of the keyword parameters:
  sim_args  -> sim_options,
  llvm_args -> llvm_options,
although the old names will be accepted for the time being.

* formatting

* change ' to "

* Rename 'args' to 'config' for clarity

* Use 'strip' instad of 'replace'

* Restart build

* [Pattern matching] Add an option to rewrite the graph only once (#8843)

* [Pattern matching] Add an option to rewrite the graph only once

If the graph returned from the callback consists of the original
pattern, the rewriter will run in the loop, which is not always desired.
So this patch proposes an option to run the rewriter only once.

Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06

* Make rewrite_once default to false

Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0

* update gpu and cpu (#8853)

* VTA cmake change to include Verilator header for building tsim library (#8797)

* VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI

* Refactor Verialtor check conditions

* Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target.

* remove build/libvta_tsim.so from non tsim targeting builds

* Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings

* [FIX] Bug fix for a floormod rewrite simplify rule (#8852)

* Update rewrite_simplify.cc

* Update test_arith_rewrite_simplify.py

* Update test_arith_rewrite_simplify.py

* Update test_arith_rewrite_simplify.py

* move rust lint script (#8726)

* [AMP] Disallow fp16 conversion for summation-like ops (#8810)

* [AMP] Disallow fp16 conversion for summation-like ops

* test only structural equality

* [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (#8605)

* [topi] add spconv2d_3x3 nhwc

* [relay] sparse_conv2d: add kernel_size attr

* [relay] add strategy for spconv2d_3x3 nhwc

* [relay] pass to convert spconv2d with const args

* [relay] convert sparse conv2d pass fixes

* use array for sparse conv2d attr

* fixup 1x1 tests; new 3x3 tests

* extend repeat_interleave op for relay.Expr (#8839)

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>

* Change AOT from ExprVisitor to MixedModeVisitor (#8856)

This should allow better scale-ability for AOT when targeting larger networks.

* Add a PaddlePaddle Frontend (#8645)

* fix some problems for matmul

* fix some problems for matmul

* add alpha parameter for matmul

* remove unnecessary condition

* add TranslatedLayer which support model loaded by jit.load

* add mul operator support

* Add padding mode support for conv/pool2d

* support 4 two-tuples

* add paddle test case

* add paddle conv2d  case

* update test_forward.py

* fix paddle convert_matmul

* add paddle multiply and matmul op test case

* add test case and fix bug

* delete import pandas

* add paddlepaddle tests

* modify the variable name of convert_reshape

* formatting

* formatting

* use black to format python code

* pylint check

* Remove fluid api

* black format

Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com>
Co-authored-by: wjj19950828 <wjjisloser@163.com>
Co-authored-by: heliqi <1101791222@qq.com>
Co-authored-by: Junru Shao <junrushao1994@gmail.com>

* [Runtime] add set_output_zero_copy (#8497)

* Update graph_executor.h

* Update graph_executor.cc

* modify zero copy UT add set input zero copy

* modify C style

* add runtime test

* realy build  generatr the json

Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local>

* [Hexagon] Change declaration order of unique_ptr objects to fix crash (#8859)

A crash occurs when automatically deleting an instance of
CodeGenHexagon because the LLVMContext object has already been
freed. Objects of both types are created using unique_ptr, but
the object managed by the LLVMContext unique_ptr is passed to
CodeGenHexagon object (not as a unique_ptr).

This crash is fixed by moving the declaration of the LLVMContext
object before the CodeGenHexagon object. I'm not sure if this
is the best way to fix this, but it does fix the crash. Also,
in other files, the LLVMContext object is always created first.

Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com>

* [Graph Executor, VM] Add end to end benchmarking of models (#8858)

Add benchmarking that includes ovearhead of transfering inputs and
outputs to and from the device. This should give an accurate measurement
of the runtime a user would see when using the model. This is
accomplished by adding functions that run from inputs to return values
into the graph executor and the VM.

* [UnitTests] Expose TVM pytest helpers as plugin (#8532)

* [UnitTests] Expose TVM pytest helpers as plugin

Previously, pytest helper utilities such as automatic parametrization
of `target`/`dev`, or `tvm.testing.parameter` were only available for
tests within the `${TVM_HOME}/tests` directory.  This PR extracts the
helper utilities into an importable plugin, which can be used in
external tests (e.g. one-off debugging).

* [UnitTests] Refactor the plugin-specific logic out into plugin.py.

* [UnitTests] Moved marker definition out to global variable.

* Remove AOT Executor header from Arduino project (#8857)

* [Community] @mdw-octoml -> Reviewer (#8868)

* [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (#8855)

* init

* fix

* Update src/tir/transforms/plan_update_buffer_allocation_location.cc

Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>

* Update src/tir/transforms/plan_update_buffer_allocation_location.cc

Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>

* address

Co-authored-by: Junru Shao <junrushao1994@gmail.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>

* [Autoscheduler] Configurable workload keys (#8862)

* change workload keys

* remove binary string comparison

* append the tuple not every integer

* clean up

* lint

* dump workload keys to dags

* fix things

* change some strings

* misc fixes, add tests

* jostle ci

* [Tutorial][Executor] Fix the usage of executors in tutorials (#8586)

* fix: executor usage for keras tutorial

* fix: executor usage for onnx tutorial

* [Tutorial][Executor] Fix executors in tutorials

* [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (#8867)

* Simplify onnx input since name accesses are no longer supported.

* move Celu importer.

* [TIR] GetBlockReadWriteRegion (#8875)

* [TIR] GetBlockReadWriteRegion

* Fix black issue

* Use constant reference for the interface

* Fix lint issue

* [RISCV] Add support for llvm parameter -mabi (-target-abi) (#8860)

* [Community] @manupa-arm -> Committer (#8870)

* adding Manupa to the contributors list

* re-trigger CI

* [RPC] Fix ios_rpc build (#8864)

* [Vulkan][Target] Added the driver name to the vulkan target string. (#8882)

Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read
from the `driverName` property in
[VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html),
or is left as `"unknown_driver_name"` if the driver does not support
querying the driver name.

* [ONNX][TOPI] Support select_last_index for argmin/max (#8816)

* support select_last_index for argmin/max

* reverse conditions which made on accident

* forward args in reduce.py

* make proper nodes for reduction ops

* remove complicated nested lambdas

* fix lambda capture for conversion

* forward more arguments

* forward more args

* enable onnx tests

* wrapping casts to remove ambiguity

* revert changes extraneous

* correct incorrect attrs being used for ops

* change attributes

* remove old impl

* register new attribute node

* clean up test

* reformat

* reformat

* coolio

* stable comparison

* casts to avoid ambiguity

* casting more

* correct arg passing

* support select_last_index for argmin/max

* reverse conditions which made on accident

* forward args in reduce.py

* make proper nodes for reduction ops

* remove complicated nested lambdas

* fix lambda capture for conversion

* forward more arguments

* forward more args

* enable onnx tests

* wrapping casts to remove ambiguity

* revert changes extraneous

* correct incorrect attrs being used for ops

* change attributes

* remove old impl

* register new attribute node

* clean up test

* reformat

* reformat

* coolio

* stable comparison

* casts to avoid ambiguity

* casting more

* correct arg passing

* fix broken input

* OneElementReduceAttrs-->ArgReduceAttrs"

* reduce boilerplate

* change names

* remove log statement

* jostle ci

Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>

* refactor optimize GEMM on CPU tutorial (#8825)

* refactor optimize GEMM on CPU tutorial

* fix lint errors

* fix more lint errors

* fix typo

* fix problem with redefinition of `k`
add TODO and comments around loop unrolling
clarify note on the array packing figure

* reword general description of array packing

* grap kaxis from compute definition

* remove duplicate comments on unrolling

* Change target string to Target object in the TE compiler and interpreter (#8835)

* # This is a combination of 2 commits.
# This is the 1st commit message:

Initial changes

# This is the commit message #2:

Ftarget string -> Target object works!

* Fix remaining target strings

* fix bad rebase

* Fix typo

* 1 more bad rebase fix

* Lint

* typo

* Forgot to commit this

* Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn

* Passing most tests, yay

* remove some comments

* lint

* target-str-to-target-object

* Respond to change requests

Co-authored-by: Jared Roesch <roeschinc@gmail.com>

* [TensorIR][M2a] CacheRead/Write (#8863)

Co-authored-by: Junru Shao <junrushao1994@gmail.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>

* [CI] make pre-commit hooks to run on every push instead of every commit (#8888)

* [TVMScript] Fix printing ForNode annotations (#8891)

* [1/10] CMSIS-NN graph partitioner for softmax (#8653)

* cmsis graph partitioner for softmax

Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83

* Updated docstring in the partioning function

Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055

* [microTVM][RVM] Add Arduino RVM (#8748)

* Functioning Arduino Vagrant VM

Begin building Arduino Vagrant VM

Mostly working Vagrant VM

Changes for debugging

Add ignored json file

Fix venv path

* Generalize parts of RVM for multiple platforms

cwd hack

Add unit tests from apps directory to task_python_microtvm.sh

Generalize parts of RVM for multiple platforms

* Add Vagrantfile lint exceptions

* Address PR comments

Address Mehrdad's PR comments

More PR comments

Documentation tweaks

Add dialout group to user

* Rerun tests

* Spresense fix

* Rerun CI tests

* Rerun tests

* sce loss example

* add comments, remove other tests

* lint

* lint

* jostle

* lint up

* jostle

* uncomment some tests

* proper return

* clean up

* lint

* minor merge errors

Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>
Co-authored-by: Mehrdad Hessar <mhessar@octoml.ai>
Co-authored-by: Jiawei Liu <jaway.liu@gmail.com>
Co-authored-by: Tristan Konolige <tkonolige@octoml.ai>
Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com>
Co-authored-by: Anastasia Stulova <38433336+AnastasiaStulova@users.noreply.github.com>
Co-authored-by: Ashutosh Parkhi <86472128+ashutosh-arm@users.noreply.github.com>
Co-authored-by: Krzysztof Parzyszek <kparzysz@quicinc.com>
Co-authored-by: Elen Kalda <elen.kalda@arm.com>
Co-authored-by: Anton Sorokin <anton.a.sorokin@intel.com>
Co-authored-by: Chenfan <jcf94@outlook.com>
Co-authored-by: masahi <masahi129@gmail.com>
Co-authored-by: Tantalus13A98B5F <jsl_713@live.com>
Co-authored-by: Valery Chernov <black.chervi@gmail.com>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: Jason <928090362@qq.com>
Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com>
Co-authored-by: wjj19950828 <wjjisloser@163.com>
Co-authored-by: heliqi <1101791222@qq.com>
Co-authored-by: Junru Shao <junrushao1994@gmail.com>
Co-authored-by: Swift.Sun <sunjiwei@yeah.net>
Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local>
Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com>
Co-authored-by: Lunderberg <Lunderberg@users.noreply.github.com>
Co-authored-by: Yizhi Liu <liuyizhi@apache.org>
Co-authored-by: Siyuan Feng <Hzfengsy@vip.qq.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Josh Fromm <jwfromm@octoml.ai>
Co-authored-by: Alexander Pivovarov <pivovaa@amazon.com>
Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>
Co-authored-by: Egor Churaev <egor.churaev@gmail.com>
Co-authored-by: Adam Straw <astraw@octoml.ai>
Co-authored-by: Lily Orth-Smith <lilyorthsmith@gmail.com>
Co-authored-by: Jared Roesch <roeschinc@gmail.com>
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Michalis Papadimitriou <mikepapadim@users.noreply.github.com>
Co-authored-by: Gavin Uberti <guberti@users.noreply.github.com>

* [Hexagon] Don't use {} initialization with FastRPC structures (#9033)

The data members in FastRPC structures aren't guaranteed to remain
in the same order. Replace aggregate initialization with direct,
member-by-member initialization.

* Test

* Minor checkstyle issue

* Test

* Test file

* Revert changed in unit tests

* Change script name

* Test

* Revert format on groovy file

* Remove test file

* Minor change in script

* Minor formating changes

* Revert logic in conditions for changed files

Co-authored-by: Christopher Sidebottom <christopher.sidebottom@arm.com>
Co-authored-by: masahi <masahi129@gmail.com>
Co-authored-by: Anirudh Sundar <quic_sanirudh@quicinc.com>
Co-authored-by: Leandro Nunes <leandro.nunes@arm.com>
Co-authored-by: AndrewZhaoLuo <andrew.zhao.luo@gmail.com>
Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>
Co-authored-by: Mehrdad Hessar <mhessar@octoml.ai>
Co-authored-by: Jiawei Liu <jaway.liu@gmail.com>
Co-authored-by: Tristan Konolige <tkonolige@octoml.ai>
Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com>
Co-authored-by: Anastasia Stulova <38433336+AnastasiaStulova@users.noreply.github.com>
Co-authored-by: Ashutosh Parkhi <86472128+ashutosh-arm@users.noreply.github.com>
Co-authored-by: Krzysztof Parzyszek <kparzysz@quicinc.com>
Co-authored-by: Elen Kalda <elen.kalda@arm.com>
Co-authored-by: Anton Sorokin <anton.a.sorokin@intel.com>
Co-authored-by: Chenfan <jcf94@outlook.com>
Co-authored-by: Tantalus13A98B5F <jsl_713@live.com>
Co-authored-by: Valery Chernov <black.chervi@gmail.com>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: Jason <928090362@qq.com>
Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com>
Co-authored-by: wjj19950828 <wjjisloser@163.com>
Co-authored-by: heliqi <1101791222@qq.com>
Co-authored-by: Junru Shao <junrushao1994@gmail.com>
Co-authored-by: Swift.Sun <sunjiwei@yeah.net>
Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local>
Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com>
Co-authored-by: Lunderberg <Lunderberg@users.noreply.github.com>
Co-authored-by: Yizhi Liu <liuyizhi@apache.org>
Co-authored-by: Siyuan Feng <Hzfengsy@vip.qq.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Josh Fromm <jwfromm@octoml.ai>
Co-authored-by: Alexander Pivovarov <pivovaa@amazon.com>
Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>
Co-authored-by: Egor Churaev <egor.churaev@gmail.com>
Co-authored-by: Adam Straw <astraw@octoml.ai>
Co-authored-by: Lily Orth-Smith <lilyorthsmith@gmail.com>
Co-authored-by: Jared Roesch <roeschinc@gmail.com>
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Gavin Uberti <guberti@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants