-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Autoscheduler][Sparse] Add sparse dense end to end model tuning support for x86/arm cpu & Some bug fix #7635
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cc @antinucleon |
Do you have any performance numbers or comparisons against existing manual schedules? |
This PR currently just enables the ability of tuning E2E Sparse network, makes it from 0 to 1. My colleague @yuchaoli has some results on ARM mobile phone, but currently there seems to have some problem with the TVM main branch that ARM cannot get best performance as we expected. I'll spend some time to fix them. |
python/tvm/topi/nn/sparse.py
Outdated
@@ -470,6 +470,38 @@ def _traverse(t): | |||
return sparse_input_map | |||
|
|||
|
|||
def random_bsr_matrix(m, n, bs_r, bs_c, density, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this should not be part of Topi. Either you can put where it is used or I testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this should not be part of Topi. Either you can put where it is used or I testing.
Fine, just I'm finding that this has been used in many different places. I'll try to find a better postion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to topi/sparse/utils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry for such delayed response, i missed your reply somehow. What my suggestion is, random_bsr_matrix() does not qualify to be in Topi unless it is required by some Ops. What i could see it is just utility for Tutorial, so lets keep this utility func in Tutorial file itself. Otherwise we have one more option, we can put it as part of tvm.testing which can help other tutorials and testcases as well.
1 - sparsity, | ||
) | ||
register_task_input_buffer( | ||
"default", prefix + "W_data", tvm.runtime.ndarray.array(sparse_weight.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not hard-code it, we can use the {name + ".data", name + ".indices", name + ".indptr"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we cannot get the "name" during measuring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for clarification. But i just wonder if name
is not available than, how the logic above prefix is working (i mean the line number 98). Its in the same flow right ? Please let me know in case i am mistaken.
I was just wondering whether we can divide this PR into 2, one with bug fix for conv2d and other one with sparse_dense auto scheduler & TFLite bug fix. These 2 are quite unrelated, may not be good to go in 1 PR. |
Yeah, I agree. |
@@ -1872,7 +1872,7 @@ def convert_fully_connected(self, op): | |||
out_dtype="int32", | |||
) | |||
else: | |||
out = _op.nn.dense(in_expr, weight_expr) | |||
out = _op.nn.dense(in_expr, weight_expr, units=weight_shape[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small suggestion, if possible, can we add a test case which will indicate the issue fixed here. So that it will help in future breaks.
if use_sparse: | ||
# This is a test workload that manually transforms a dense model to sparse | ||
# Check `tutorials/frontend/deploy_sparse.py` for more examples on how to import a | ||
# pretrained model. | ||
|
||
def random_sparse_dense_params(func, params, density, BS_R, BS_C): | ||
def deepcopy(param_dic): | ||
ret = {} | ||
for k, v in param_dic.items(): | ||
ret[k] = tvm.nd.array(v.asnumpy()) | ||
return ret | ||
|
||
new_params = deepcopy(params) | ||
dense_weight_names = relay.analysis.sparse_dense._search_dense_op_weight(func) | ||
for item in dense_weight_names: | ||
name = str(item) | ||
shape = new_params[name].shape | ||
if shape[0] % BS_R == 0 and shape[1] % BS_C == 0: | ||
new_w = random_bsr_matrix( | ||
shape[0], shape[1], BS_R, BS_C, density, "float32" | ||
).todense() | ||
new_params[name] = tvm.nd.array(new_w) | ||
return new_params | ||
|
||
bs_r = 1 | ||
sparsity = 0.85 | ||
|
||
# Currently we only support to conver dense matmul to sparse dense matmul | ||
mod, params = ddo.simplify_fc_transpose.convert(mod["main"], params) | ||
params = random_sparse_dense_params(mod, params, BS_R=bs_r, BS_C=1, density=1 - sparsity) | ||
mod, params = ddo.bsr_dense.convert(mod, params, (bs_r, 1), sparsity_threshold=0.8) | ||
|
||
mod = tvm.IRModule.from_expr(mod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap this as a function. Do not let this huge block of code confuse readers who only want to know how to use auto-scheduler for regular networks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap this as a function. Do not let this huge block of code confuse readers who only want to know how to use auto-scheduler for regular networks.
Simplified this part, and moved the big block to sparse.utils
.
@merrymercy @FrozenGene Currently removed the modifications about ARM CPU from this PR. |
I only have one comment on the tutorial. We should make the tutorials more readable and modifiable for new users. Other parts look good to me. |
Dismiss for the fix about ARM will be in another PR.
…ort for x86/arm cpu & Some bug fix (apache#7635) * Add sparse dense end to end model tuning support * Add sparse tuning for arm network * Bug fix for tflite frontend dense with layout rewrite * Move the random_bsr_matrix to sparse.utils
…ort for x86/arm cpu & Some bug fix (apache#7635) * Add sparse dense end to end model tuning support * Add sparse tuning for arm network * Bug fix for tflite frontend dense with layout rewrite * Move the random_bsr_matrix to sparse.utils
…ort for x86/arm cpu & Some bug fix (apache#7635) * Add sparse dense end to end model tuning support * Add sparse tuning for arm network * Bug fix for tflite frontend dense with layout rewrite * Move the random_bsr_matrix to sparse.utils
…ort for x86/arm cpu & Some bug fix (apache#7635) * Add sparse dense end to end model tuning support * Add sparse tuning for arm network * Bug fix for tflite frontend dense with layout rewrite * Move the random_bsr_matrix to sparse.utils
#7313 has introduced the tuning support for Sparse, now this PR brings the end to end model tuning support.
cc @merrymercy @comaniac @FrozenGene @yuchaoli
This PR also contains some bug fix: