Skip to content

[TMTensor] Add lowering of attention with an explicit scale value#4593

Open
AGindinson wants to merge 1 commit into
llvm:mainfrom
AGindinson:tmtensor-attention
Open

[TMTensor] Add lowering of attention with an explicit scale value#4593
AGindinson wants to merge 1 commit into
llvm:mainfrom
AGindinson:tmtensor-attention

Conversation

@AGindinson

Copy link
Copy Markdown
Contributor

This allows to pass along the scale constant when set at the Torch level and lifts the restriction on dynamic head dim size.

Essentially follows up on commit f53b5be.

This allows to pass along the scale constant when set at the Torch level
and lifts the restriction on dynamic head dim size.

Essentially follows up on commit f53b5be.
if (relativeError > 1e-6) {
return rewriter.notifyMatchFailure(
loc, "scale must be None or 1/sqrt(headDim)");
double expectedScale = 1.0 / std::sqrt(static_cast<double>(headDim));

@ziereis ziereis Jun 11, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this? for my understanding the limitation before for restricting the scale to 1.0/sqrt(head_dim) is because we could not pass the scale as a parameter to the op and IREE would always implicitly assume its 1.0/sqrt(head_dim) if we can set it as attribute now i assume would not need this limitation anymore?

mlir::FloatAttr scaleAttr{};
if (!isa<Torch::NoneType>(scale.getType())) {
double scaleFloat;
if (!matchPattern(scale, m_TorchConstantFloat(&scaleFloat)))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess it would even be nice we could just make the scale a regular input to the op so we don't have to require it to be a constant. However i see how this might be a bit complicated with variadic arg handling the op currently has.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants