-
Notifications
You must be signed in to change notification settings - Fork 22
Add attention operator and adapter for onert #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7cb756e to
454bbc5
Compare
|
Generated circle needs to run using I heard there is: from test.utils import tag
@tag.use_onert
class Gemma3(TestModuleBase):
def __init__(self):But it requires class definition while my case does not have class definition at all. How can I run the generated |
454bbc5 to
1089e80
Compare
| from torch.library import Library | ||
|
|
||
| lib = Library("circle", "DEF") | ||
| lib.define( | ||
| """ | ||
| attention.llama( | ||
| Tensor hidden_states, | ||
| Tensor wq, | ||
| Tensor wk, | ||
| Tensor wv, | ||
| Tensor wo, | ||
| Tensor position_cos, | ||
| Tensor position_sin, | ||
| Tensor attention_mask, | ||
| Tensor past_key, | ||
| Tensor past_value, | ||
| Tensor cache_position | ||
| ) -> Tensor | ||
| """ | ||
| ) | ||
|
|
||
| # ATTENTION FUSER | ||
|
|
||
|
|
||
| @torch.library.register_fake("circle::attention.llama") | ||
| def attention_llama(*args, **kwargs): | ||
| ( | ||
| hidden_states, | ||
| q_proj, | ||
| k_proj, | ||
| v_proj, | ||
| o_proj, | ||
| position_cos, | ||
| position_sin, | ||
| attention_mask, | ||
| past_key, | ||
| past_value, | ||
| cache_position, | ||
| ) = args | ||
| return hidden_states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use @custom_op decorator instead. And, the registration codes should be placed at tico/utils/register_custom_op.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhs4670go Thank you for review. I already added "No Merge" after comparing with #266. I will rearrange the code after TICO's way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I run the genereated circle usign onert?
I think you can refactor this script into a single class. test/modules/model/TinyLlamaWithFusedRMSNorm/model.py can be a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I already know the PR. I did not want tedious and unnecessary jobs like making class, providing hand-made random inputs and so on. I think there would be better way. However, I am not familiar with TICO and it seems the necessary thing for TICO's value test.
|
When I ran below commands, I got errors. pip install -r test/modules/model/TinyLlamaWithFusedAttention/requirements.txt
python test/modules/model/TinyLlamaWithFusedAttention/decode.py
Traceback (most recent call last):
File "/home/seongwoo/TICO/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 3119, in inline_call_
sub_locals = func.bind_args(parent, args, kwargs)
File "/home/seongwoo/TICO/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 233, in bind_args
bound = inspect.signature(fake_func).bind(*args, **kwargs)
File "/home/seongwoo/.pyenv/versions/3.10.4/lib/python3.10/inspect.py", line 3179, in bind
return self._bind(args, kwargs)
File "/home/seongwoo/.pyenv/versions/3.10.4/lib/python3.10/inspect.py", line 3149, in _bind
raise TypeError('missing a required argument: {arg!r}'. \
TypeError: missing a required argument: 'past_key_value' |
|
@mhs4670go |
torch: 2.6.0 |
|
After installing transforemrs==4.49.0 and using |
| @@ -0,0 +1,154 @@ | |||
| # Copyright (c) 2025 Samsung Electronics Co., Ltd. All Rights Reserved | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinevening Are you okay for this location? You told me that you prefer seperate directory for onert-only operators like op_attention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The location can be like below.
- Visitor:
tico/serialize/operators/ - Custom op registration:
tico/utils/register_custom_op.py - Adapter:
tico/serialize/operators/adapters/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you put adapters in the current directory (tico/serialize/operators/adapters/onert) and place other codes according to the @mhs4670go 's suggestion?
adapters may be open to users, so I think it would be good to specify "This adapter is only for onert".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you put adapters in the current directory (
tico/serialize/operators/adapters/onert) and place other codes according to the @mhs4670go 's suggestion?
adaptersmay be open to users, so I think it would be good to specify "This adapter is only for onert".
Yes, it is exactly same to my understanding. Thank you for confirming.
As I wrote in #400 (comment), I will follow TICO's way like #266.
(Though I don't think it is a good way to put all operators in a single file — register_custom_op.py)
(It is 740 lines and will grow and grow as supported operators increase.)
(It would be good to give each operator its own file, and let register_custom_op.py do registering only.)
Anyway, again, I will follow TICO's current way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adaptersmay be open to users so I think it would be good to specify "This adapter is only for onert".
@jinevening I totally agree. adapters will vary per models.
I am trying encoder-decoder model (for translation task).
I will sugegst adapters code structure after finishing encoder-decoder model, which requires another adpater (i.e.TRIVMultiheadAttention).
Thank you.
|
69f5e92 to
82460ad
Compare
a6b7de6 to
8f43620
Compare
test/pt2_to_circle_test/builder.py
Outdated
| ) | ||
|
|
||
| verify_circle(circle_model_path, opt_circle_model_path) | ||
| # verify_circle(circle_model_path, opt_circle_model_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Samsung/tico_developers Can I disable verify_circle? The output model contains attention op, which is not supported by circle2circle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I don't think it's possible. The workaround I come up with as of now is as follows.
- Add a
skipdecorator (Before merging this PR, the pass should be passed withoutverify_circlethough. - Wait until
circle2circlesupportsCircleAttention. - Remove the decorator.
|
|
||
| if self._call_count == 2: | ||
| return self.fused_model(*args, **kwargs) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TICO test framework get expected values using original model, then convert the model. For 1st run, it should not use adapter.
I tried not to use call_count, but to use inspect and get the caller's name. However, it is not allowed during torch tracing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if torch.ops.circle_custom.attention had own kernel instead of returning None, this trick wouldn't be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a trick at all. circle_custom.attention is a virtual operator for fusing, not for computing. I found no reason to put the actual implementation here.
The call_count related code can be eliminated if the TICO test framework is improved by isolating infering model and converting model. I see it already had problem:
TICO/test/pt2_to_circle_test/builder.py
Lines 139 to 147 in b941ab2
| # Let's infer torch model before `export` | |
| # WHY? | |
| # Some model changes its state during export (e.g., EfficientFormerL1) | |
| # See https://github.com/pytorch/pytorch/issues/155114 | |
| torch_result = infer_nnmodule( | |
| self.nnmodule, | |
| forward_args=deepcopy(self.forward_args), | |
| forward_kwargs=deepcopy(self.forward_kwargs), | |
| ) |
8d9b46a to
7d9f385
Compare
|
@Samsung/tico_developers The output itself is valid in term of TICO, but it is incomplete in term of It is the reason it gots the following errors: Before running circle, I manipulate the TICO generated model further using Samsung/ONE#16233. If you want to run test using Also, These are beyond the PR and this repo's responsibility. I would like to hear your opinions. |
Even though I installed Could you try to run this test to check whether you meet the same error? |
Hmm, I added |
Here's some workaround to come up with. Let me know if there's something wrong.
First error can be resolved by generating inputs with
If you implement a kernel for |
I already wrote error with
The kernel spec should not be changed by the limitation of frontend tool. I don't like the way to change something ( manual inputs, original models, runtime kernels, ... ) to generate what I want using TICO. It is the reason No, the detail minipulation needs to be done with explicit tool using offline. I don't understand what you mean by
If you're asking to change example inputs, it just gets prompt string. |
Maybe I missed something. Could you elaborate more? Because I got a model that has inputs = tokenizer(
prompt,
return_tensors="pt",
padding="max_length",
max_length=32, # HERE
truncation=True,
)The inputs' shape should be [1, 16, 30, 4]? |
|
As I said in #400 (comment), could you wait just a few more days until we have our meeting? |
I was confused and am still confused by "custom": As I understand,
I hope TICO define custom precisely, and support custom operator (2nd case) soon. |
|
As a result of the meeting, we agreed to enable the following features without testing codes.
To make things easier, I separated the relevant portion into this PR. Please take a loot at it. I'll close this PR soon. Regarding the testing, a number of components are not compatible with TICO. As demonstrated in this PR, it would be more appropriate to keep the tests in the ONE repository or another suitable location. |
|
@Samsung/tico_developers
Yes, I wrote Samsung/ONE#16283 to show how the whole end-to-end process is done (You can find TICO in the middle of the process.) @mhs4670go |
This one:) Please feel free to modify this PR instead of mine. |
|
Thank you. It already has 13 commits and 46 comments (including this). |
543463b to
a9f93c9
Compare
| past_value: torch.Tensor, | ||
| cache_position: torch.Tensor, | ||
| ) -> torch.Tensor: | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glistening If you'd like to enable some tests on TICO, it could be done by implementing this function and add simple 'attention' function with KV cache. TICO also uses onert nightly, so it's testable.
# FILE: test/requirements_pre.txt
onert==0.2.0.dev250922There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea to test attention in TICO. It is beyond the scope of TICO.
It would be done in the scope of GGMA (end-to-end) perhaps in other repo.
I edited the title to specify the scope of this PR.
If it is meaningful, prefill phase of tinyllama (which has no fused-attention) may be testable usign onert in TICO w/o modifying or extending TICO test framework much as you suggested.
It fuses LlamaAttention from TinyLlama model. Fused attention works as onert attention op. TICO-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>
a9f93c9 to
c817971
Compare
mhs4670go
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fuses LlamaAttention from TinyLlama model.
Fused attention works as onert attention op.
TICO-DCO-1.0-Signed-off-by: Sanggyu Lee sg5.lee@samsung.com