Skip to content

[Torch] Fix verifier crashes on malformed global slot initializer IR#4612

Open
alex1xu wants to merge 1 commit into
llvm:mainfrom
alex1xu:torch-global-slot-initializer-verifier-crash
Open

[Torch] Fix verifier crashes on malformed global slot initializer IR#4612
alex1xu wants to merge 1 commit into
llvm:mainfrom
alex1xu:torch-global-slot-initializer-verifier-crash

Conversation

@alex1xu

@alex1xu alex1xu commented Jun 17, 2026

Copy link
Copy Markdown

Fixes #4411.

torch-mlir-opt asserts instead of emitting a diagnostic on a malformed torch.global_slot.module_initializer. #4411 reports one case — a non-symbol slotSymNames entry — but the same root cause produces two related crashes on the same op.

The root cause is verification ordering. GlobalSlotModuleInitializerOp used hasVerifier, so its verify() runs on op entrance, before the nested torch.initialize.global_slots terminator's own invariants. The verifier reaches into that terminator (getBody()->getTerminator(), then cast<FlatSymbolRefAttr> over slotSymNames) and dereferences attributes the child verifier has not validated yet. Three parseable inputs crash:

Repro:

torch.global_slot.module_initializer {
  %0 = torch.constant.int 1
  "torch.initialize.global_slots"(%0) <{slotSymNames = [159]}> : (!torch.int) -> ()
}

The fix closes each at the layer that owns the invariant:

  • verify()verifyRegions(), so the module-wide checks run on op exit, after the nested terminator is verified; the parent's casts then only run once the child's invariants hold,
  • slotSymNames: SymbolRefArrayAttrFlatSymbolRefArrayAttr, enforcing the element type declaratively (covers both [159] and [@a::@b]); the custom parser only emits flat refs, so no valid IR changes,
  • add HasParent<ModuleOp>, matching the sibling torch.initialize.global_slots op.

Testing: test/Dialect/Torch/invalid.mlir adds three negative cases under -verify-diagnostics, one per fix — a nested-ref slot name (rejected by the FlatSymbolRefArrayAttr constraint), an operand/slot count mismatch (rejected by the terminator's own verifier, which now runs before the parent's per-slot loop), and a non-module parent (rejected by the HasParent trait). The existing module_initializer error tests and the GlobalizeObjectGraph / inline-global-slots / erase-module-initializer suites pass under the verifyRegions() move. No e2e test is added — this is a verifier guard, so the lit negative tests are the correct layer.

@alex1xu alex1xu force-pushed the torch-global-slot-initializer-verifier-crash branch 2 times, most recently from 28c0944 to 3e13583 Compare June 17, 2026 19:05
@alex1xu alex1xu marked this pull request as ready for review June 17, 2026 19:09
@alex1xu alex1xu force-pushed the torch-global-slot-initializer-verifier-crash branch from 3e13583 to ee0007d Compare June 17, 2026 20:53
GlobalSlotModuleInitializerOp::verify() inspected its nested
InitializeGlobalSlotsOp terminator before that op's own invariants ran,
so malformed IR reached unchecked casts and crashed instead of producing
a diagnostic: a non-flat or non-symbol slot name hit
cast<FlatSymbolRefAttr>, a slot/operand count mismatch indexed out of
bounds, and a non-module parent hit cast<ModuleOp>.

Move the module-wide checks to verifyRegions() so the nested op is
verified before the enclosing op runs, tighten slotSymNames to
FlatSymbolRefArrayAttr so the element type is enforced declaratively, and
constrain the parent with HasParent<ModuleOp>.
@alex1xu alex1xu force-pushed the torch-global-slot-initializer-verifier-crash branch from ee0007d to 0ca73cb Compare June 18, 2026 03:14
@alex1xu

alex1xu commented Jun 18, 2026

Copy link
Copy Markdown
Author

@IanWood1 @zjgarvey, would you have time to take a look when you get a chance? This is my first contribution to torch-mlir (fixing #4411). happy to loop in someone else if youre not the right reviewers. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash in GlobalSlotModuleInitializerOp verifier due to invalid attribute type in slotSymNames

1 participant