-
Notifications
You must be signed in to change notification settings - Fork 130
feat[vortex-expr]: add expr rewrite rules #5348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@claude review please |
CodSpeed Performance ReportMerging #5348 will not alter performanceComparing Summary
Benchmarks breakdown
Footnotes
|
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
| /// Untyped reduce rules (require only RewriteContext), indexed by expression ID | ||
| reduce_rules: HashMap<ExprId, Vec<Arc<dyn DynReduceRule>>>, | ||
| /// Parent reduce rules, indexed by expression ID | ||
| parent_rules: HashMap<ExprId, Vec<Arc<dyn DynParentReduceRule>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we store both child and parent id?
|
@claude review please |
|
Claude finished @joseph-isaacs's task —— View job Code Review Complete ✅I've completed a thorough review of this PR introducing the expression rewrite rule system. Overall, this is excellent architectural work with strong type safety and clean separation of concerns. Below are my findings: 🎯 Architecture & DesignStrengths
Design Observations
📋 Code Quality Review✅ Adherence to Project Standards
🔍 Potential Issues1. Missing Doc Comment (Minor)File: // TODO(joe): follow up with rule debug info.
impl Debug for RewriteRuleRegistry {The TODO suggests this needs follow-up. The Recommendation: Consider adding rule metadata (name, description) to improve debuggability. 2. Code Coverage Gaps (Per Codecov)The PR has 62.57% patch coverage with 128 lines missing. Key areas:
Recommendation: Add more tests for error paths and edge cases, especially in the registry and traversal code. 3. Potential Infinite Loop Risk (Medium)Files: The traversal applies rules until no more rules match. If a rule is buggy and returns the same expression (or creates a cycle), this could loop indefinitely. Current code (simplify_typed.rs:52-55): expr.transform(
|node| rewrite(node, ctx, session),
|node| rewrite(node, ctx, session),
)Recommendation: Consider adding:
4. Review Comment: Child vs Parent ID StorageThe author asked: "should we store both child and parent id?" in the rewrite.rs file. Context: Currently rules are indexed by child expression ID only. When applying parent rules, we look up rules by the child's ID. Analysis:
Recommendation: The current approach is correct. Parent rules already receive both the child expression AND the parent expression, so they can check the parent's type in the rule logic itself. Indexing by both would be premature optimization unless profiling shows the type-checking overhead is significant. 5. Rule Application OrderFiles: Rules are applied in registration order within each category (typed/untyped, parent/reduce). This could lead to non-deterministic behavior if rules have overlapping conditions. Current: Typed rules before untyped, parent before reduce, but within each category it's insertion order. Recommendation: Document this behavior clearly, or consider:
6. Memory Efficiency (Minor)File: type RuleRegistry<Rule> = HashMap<ExprId, Vec<Arc<Rule>>>;Each rule is Recommendation: Document if rules are expected to be singleton objects or if the same rule instance should be reused across multiple registrations. 🧪 TestingStrengths
Gaps
🐛 Potential Bugs1. Traversal Order AssumptionFile: The Severity: Low (rules shouldn't reorder children by design) 2. Error Propagation in Merge RuleFile: if merge.data() == &DuplicateHandling::Error && !duplicate_names.is_empty() {
vortex_bail!(
"merge: duplicate fields in children: {}",
duplicate_names.into_iter().format(", ")
)
}This check happens inside the rule, meaning duplicate errors are only caught during rewrite, not when the Merge expression is created. Recommendation: Consider validating at Merge construction time for earlier error detection. 🎨 Code Style & ReadabilityExcellent
Minor Suggestions
📊 Performance Considerations
Recommendation: Add benchmarks to track rule application overhead as the system grows. ✨ Recommendations SummaryHigh Priority
Medium Priority
Low Priority
💬 Response to Review CommentQuestion: "should we store both child and parent id?" Answer: No, the current approach (indexing by child ID only) is the right choice:
🎯 Final VerdictThis is excellent work. The architecture is well-thought-out, type-safe, and extensible. The code quality is high and follows project conventions. The main areas for improvement are:
I recommend approving and merging this PR with follow-up issues for the medium/low priority items. Files Reviewed
|
gatesn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some early comments - I have only gone through vortex-array atm
| ); | ||
|
|
||
| let expr = simplify_typed(expr.clone(), scope)?; | ||
| let expr = simplify_typed(expr.clone(), scope, session)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an odd place to run a full optimizer pass no? Isn't this itself run within an array optimizer rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its useful to apply get_item + pack reductions. Shall I add a todo to revisit?
| } | ||
|
|
||
| /// Type-erased wrappers that allows dynamic dispatch. | ||
| pub(crate) trait DynReduceRule: Send + Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't got there yet, but presumably there are some structs that wrap the generics version and impl the Dyn versions. I think they should be defined/moved here and called ReduceRuleAdapter<T>(T) etc to follow the expression/array pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they where called RuleAdapter
| pub fn simplify(e: Expression, session: &ExprSession) -> VortexResult<Expression> { | ||
| let ctx = EmptyRewriteContext; | ||
|
|
||
| let e = apply_parent_rules(e, &ctx, session)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to reduce child first no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I think both make sense, really we want to iterate
…i/expr-reduce-rules # Conflicts: # vortex-gpu/benches/gpu_bitunpack.rs # vortex-layout/src/layouts/dict/reader.rs
# Conflicts: # vortex-gpu/benches/gpu_bitunpack.rs
| impl RewriteContext for RuleContext {} | ||
|
|
||
| impl private::Sealed for &RuleContext {} | ||
| impl RewriteContext for &RuleContext {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need these
Features
Adds pluggable rewrite rule and use them in the optimizer.
Breaking:
vortex/vortex-layout/src/layouts/struct_/reader.rs
Line 145 in 39004fc
Using a vortex session in more places