-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dialects (arm): Add mixed vector/scalar fmul op #4053
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## emma/arm_fpsimd_reg_type #4053 +/- ##
============================================================
- Coverage 89.04% 89.04% -0.01%
============================================================
Files 320 318 -2
Lines 43432 43437 +5
Branches 5403 5405 +2
============================================================
+ Hits 38675 38677 +2
- Misses 3416 3419 +3
Partials 1341 1341 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
comment: str | StringAttr | None = None, | ||
): | ||
if isinstance(arrangement, str): | ||
valid_arrangements = {"4H", "8H", "2S", "4S", "2D"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the bitwidth specifiers were per register?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand this correctly - they are specified with the register yes, but the same register can be used with different arrangement specifiers, as I understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand well, the arrangement specifier describes the registers, but is carried by the instruction (whereas in x86, the arrangement specifier is essentially carried by the registers' names) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that’s my understanding too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about SADDL2 V0.2D, V1.4S, V2.4S
from page 102 of https://cs140e.sergio.bz/docs/ARMv8-A-Programmer-Guide.pdf ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I'm aware of some of these instructions where the destination has a different specifier than the source. My initial approach had just been trying to get it to work for we wanted with the intention of adding handling for the different cases as we build it up. But maybe that's not how we want to go, in which case I'm happy to do some more digging around the docs to try to establish the rules for these specifiers:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now think it's worth not overthinking it, we can fix things later. It would be great to add more documentation around the place to explain the design for future readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it's fine. What we basically need is (to generate) runnable code implementing tiled matrix multiplications. We can refine it incrementally.
I'm now thinking this actually does not fully reflect what I need for my asm implementation yet. Instead of the vector/vector instruction, it should be the vector/scalar version so I will try to update this |
- "2D" → 2 double-precision floats | ||
""" | ||
|
||
arrangement = attr_def(StringAttr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an EnumAttribute
, maybe we can do this as a first PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rather before this one, and after the register type
Note: Stacked PR.
Add fmul op for NEON vector registers