Skip to content

Conversation

@joseph-isaacs
Copy link
Contributor

@joseph-isaacs joseph-isaacs commented Nov 14, 2025

Features

Adds pluggable rewrite rule and use them in the optimizer.

Breaking:

Using a vortex session in more places

  • simplify_typed/simplify
  • Layout::new_reader

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs
Copy link
Contributor Author

@claude review please

@joseph-isaacs joseph-isaacs added the changelog/feature A new feature label Nov 14, 2025
@joseph-isaacs joseph-isaacs changed the title wip feat[vortex-expr]: add expr rewrite rules Nov 14, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 14, 2025

CodSpeed Performance Report

Merging #5348 will not alter performance

Comparing ji/expr-reduce-rules (f7a9031) with develop (34a6ac1)

Summary

✅ 1420 untouched
🆕 7 new
⏩ 638 skipped1
🗄️ 28 archived benchmarks run2

Benchmarks breakdown

Benchmark BASE HEAD Change
🆕 decompress[("alp_for_bp_f64", 0x46395e0)] N/A 24.2 ms N/A
🆕 decompress[("datetime_for_bp", 0x463c4a0)] N/A 34.9 ms N/A
🆕 decompress[("dict_fsst_varbin_bp_string", 0x463b7e0)] N/A 14.5 ms N/A
🆕 decompress[("dict_fsst_varbin_string", 0x463b340)] N/A 14.5 ms N/A
🆕 decompress[("dict_varbinview_string", 0x463a000)] N/A 14.7 ms N/A
🆕 decompress[("for_bp_u64", 0x4638e90)] N/A 2.5 ms N/A
🆕 decompress[("runend_for_bp_u32", 0x463a490)] N/A 2 ms N/A

Footnotes

  1. 638 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. 28 benchmarks were run, but are now archived. If they were deleted in another branch, consider rebasing to remove them from the report. Instead if they were added back, click here to restore them.

@codecov
Copy link

codecov bot commented Nov 14, 2025

Codecov Report

❌ Patch coverage is 87.64881% with 83 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.99%. Comparing base (07f5039) to head (f7a9031).
⚠️ Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
vortex-array/src/expr/session/rewrite.rs 73.38% 33 Missing ⚠️
vortex-array/src/expr/session/mod.rs 41.66% 21 Missing ⚠️
vortex-array/src/expr/exprs/select/transform.rs 87.30% 8 Missing ⚠️
vortex-array/src/expr/transform/simplify_typed.rs 87.27% 7 Missing ⚠️
vortex-array/src/expr/exprs/merge/transform.rs 91.66% 5 Missing ⚠️
vortex-array/src/expr/transform/simplify.rs 94.79% 5 Missing ⚠️
vortex-array/src/expr/transform/optimizer.rs 66.66% 3 Missing ⚠️
vortex-array/src/expr/traversal/mod.rs 92.30% 1 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
t
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Nov 18, 2025

Deploying vortex-bench with  Cloudflare Pages  Cloudflare Pages

Latest commit: 56a05f4
Status:🚫  Build failed.

View logs

@joseph-isaacs joseph-isaacs marked this pull request as ready for review November 18, 2025 11:49
@vortex-data vortex-data deleted a comment from claude bot Nov 18, 2025
/// Untyped reduce rules (require only RewriteContext), indexed by expression ID
reduce_rules: HashMap<ExprId, Vec<Arc<dyn DynReduceRule>>>,
/// Parent reduce rules, indexed by expression ID
parent_rules: HashMap<ExprId, Vec<Arc<dyn DynParentReduceRule>>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we store both child and parent id?

u
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
u
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
u
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs
Copy link
Contributor Author

@claude review please

@claude
Copy link
Contributor

claude bot commented Nov 18, 2025

Claude finished @joseph-isaacs's task —— View job


Code Review Complete ✅

I've completed a thorough review of this PR introducing the expression rewrite rule system. Overall, this is excellent architectural work with strong type safety and clean separation of concerns. Below are my findings:


🎯 Architecture & Design

Strengths

  1. Excellent type safety through generics - The ReduceRule<V, C> and ParentReduceRule<V, C> design provides compile-time type safety while using trait objects (dyn) for runtime flexibility
  2. Clean separation of typed vs untyped contexts - The RewriteContext vs TypedRewriteContext hierarchy is well-designed
  3. Unified adapter pattern - RuleAdapter<V, R> elegantly handles all four trait combinations (DynReduceRule, DynTypedReduceRule, DynParentReduceRule, DynTypedParentReduceRule) through conditional trait implementations
  4. Two-phase traversal - Top-down parent rules followed by bottom-up reduce rules is a sensible design for expression optimization

Design Observations

  1. Rule registry organization - Storing rules in separate HashMaps by type (typed/untyped, reduce/parent) is clean and enables efficient lookup
  2. Built-in rules are well-chosen - RemoveSelectRule, RemoveMergeRule, and PackGetItemRule demonstrate practical optimizations

📋 Code Quality Review

✅ Adherence to Project Standards

  • Doc comments: All public APIs have doc comments ✅
  • Error handling: Uses vortex_err! and vortex_bail! appropriately ✅
  • Tests: Good coverage with rstest parameterization in places ✅
  • Naming: Test modules named tests (not test) ✅
  • Module structure: Correctly promoted files to directory modules (get_item, merge, select) ✅

🔍 Potential Issues

1. Missing Doc Comment (Minor)

File: vortex-array/src/expr/session/rewrite.rs:123

// TODO(joe): follow up with rule debug info.
impl Debug for RewriteRuleRegistry {

The TODO suggests this needs follow-up. The Debug impl shows counts but not rule details, which could make debugging harder.

Recommendation: Consider adding rule metadata (name, description) to improve debuggability.

2. Code Coverage Gaps (Per Codecov)

The PR has 62.57% patch coverage with 128 lines missing. Key areas:

  • session.rs: 61 lines missing (53.78% coverage)
  • transform/reducer.rs: 49 lines missing (42.35% coverage)

Recommendation: Add more tests for error paths and edge cases, especially in the registry and traversal code.

3. Potential Infinite Loop Risk (Medium)

Files: simplify.rs, simplify_typed.rs

The traversal applies rules until no more rules match. If a rule is buggy and returns the same expression (or creates a cycle), this could loop indefinitely.

Current code (simplify_typed.rs:52-55):

expr.transform(
    |node| rewrite(node, ctx, session),
    |node| rewrite(node, ctx, session),
)

Recommendation: Consider adding:

  • Maximum iteration count with an error after N iterations
  • Cycle detection by tracking expression hashes/ids
  • Debug logging when rules are applied repeatedly

4. Review Comment: Child vs Parent ID Storage

The author asked: "should we store both child and parent id?" in the rewrite.rs file.

Context: Currently rules are indexed by child expression ID only. When applying parent rules, we look up rules by the child's ID.

Analysis:

  • Current approach (index by child ID): Simpler, fewer lookups, less memory
  • Alternative (index by both): Would allow rules like "GetItem under Binary" but adds complexity

Recommendation: The current approach is correct. Parent rules already receive both the child expression AND the parent expression, so they can check the parent's type in the rule logic itself. Indexing by both would be premature optimization unless profiling shows the type-checking overhead is significant.

5. Rule Application Order

Files: simplify.rs:31-41, simplify_typed.rs:64-78

Rules are applied in registration order within each category (typed/untyped, parent/reduce). This could lead to non-deterministic behavior if rules have overlapping conditions.

Current: Typed rules before untyped, parent before reduce, but within each category it's insertion order.

Recommendation: Document this behavior clearly, or consider:

  • Priority-based rule ordering
  • Warning when multiple rules match the same expression
  • Ensuring rules are mutually exclusive by design

6. Memory Efficiency (Minor)

File: session/rewrite.rs:105

type RuleRegistry<Rule> = HashMap<ExprId, Vec<Arc<Rule>>>;

Each rule is Arc-wrapped. For a small number of rules this is fine, but if there are many registrations of the same rule, this could be optimized.

Recommendation: Document if rules are expected to be singleton objects or if the same rule instance should be reused across multiple registrations.


🧪 Testing

Strengths

  • Good test coverage in rule implementation files
  • Tests use clear naming and assert expected behavior
  • Multi-level and nested cases are tested (e.g., test_deeply_nested_pack_get_item)
  • Parent rule testing in simplify.rs with AddZeroRule example

Gaps

  • Limited testing of rule registry edge cases (empty registry, no matching rules)
  • No tests for rule application ordering
  • Missing tests for error conditions (malformed expressions, type mismatches)
  • No performance/benchmarking tests for rule application overhead

🐛 Potential Bugs

1. Traversal Order Assumption

File: traversal/mod.rs:492-529

The Node implementation for Expression assumes children are always traversed in order. If any rule reorders children, this could break.

Severity: Low (rules shouldn't reorder children by design)

2. Error Propagation in Merge Rule

File: vortex-array/src/expr/exprs/merge/transform.rs:54-59

if merge.data() == &DuplicateHandling::Error && !duplicate_names.is_empty() {
    vortex_bail!(
        "merge: duplicate fields in children: {}",
        duplicate_names.into_iter().format(", ")
    )
}

This check happens inside the rule, meaning duplicate errors are only caught during rewrite, not when the Merge expression is created.

Recommendation: Consider validating at Merge construction time for earlier error detection.


🎨 Code Style & Readability

Excellent

  • Clear naming throughout (ReduceRule, ParentReduceRule, TypedRewriteContext)
  • Good use of documentation explaining "why" not just "what"
  • Type parameters are well-explained in doc comments

Minor Suggestions

  1. Consistent terminology: Sometimes "rewrite" vs "reduce" vs "transform" - they're used somewhat interchangeably
  2. More inline comments: Complex trait bound logic (e.g., in RuleAdapter) could benefit from explaining the variance

📊 Performance Considerations

  1. HashMap lookups: Rules are looked up by ExprId for every node in the tree. For deep trees, this could be a hotspot. Consider profiling.
  2. Arc cloning: Children are cloned during traversal (line 507 in traversal/mod.rs: cloned()). For large expressions, this could be expensive.
  3. Repeated traversals: Each rule category requires a full tree traversal. Could potentially be combined.

Recommendation: Add benchmarks to track rule application overhead as the system grows.


✨ Recommendations Summary

High Priority

  1. Merge this PR - The architecture is solid and well-implemented
  2. 🔄 Add cycle detection or iteration limits to prevent infinite loops in rule application
  3. 📊 Improve test coverage, especially for registry edge cases and error paths

Medium Priority

  1. 📝 Document rule application order guarantees
  2. 🐛 Consider moving validation from rules to construction time where applicable
  3. 🔍 Add debug metadata to rules for better debugging (address the TODO)

Low Priority

  1. ⚡ Add performance benchmarks
  2. 📖 Clarify terminology (rewrite vs reduce vs transform)
  3. 🧪 Add integration tests showing the full rewrite pipeline

💬 Response to Review Comment

Question: "should we store both child and parent id?"

Answer: No, the current approach (indexing by child ID only) is the right choice:

  • Simpler: Single-key lookup is faster and uses less memory
  • Sufficient: Parent rules receive both child and parent expressions, so they can check parent.id() in the rule logic
  • Flexible: Adding parent ID indexing would optimize for a specific pattern ("rule X only applies to child Y under parent Z") that may not be common
  • Alternative: If this pattern becomes common, consider a builder pattern for rules that can express such constraints, rather than changing the index structure

🎯 Final Verdict

This is excellent work. The architecture is well-thought-out, type-safe, and extensible. The code quality is high and follows project conventions. The main areas for improvement are:

  1. Test coverage (especially edge cases)
  2. Safety against infinite loops
  3. Documentation of behavior guarantees

I recommend approving and merging this PR with follow-up issues for the medium/low priority items.


Files Reviewed

  • vortex-array/src/expr/transform/rules.rs - Core trait definitions
  • vortex-array/src/expr/session/rewrite.rs - Registry and adapters
  • vortex-array/src/expr/session/mod.rs - Session integration
  • vortex-array/src/expr/exprs/select/transform.rs - RemoveSelectRule
  • vortex-array/src/expr/exprs/merge/transform.rs - RemoveMergeRule
  • vortex-array/src/expr/exprs/get_item/transform.rs - PackGetItemRule
  • vortex-array/src/expr/transform/simplify.rs - Untyped simplification
  • vortex-array/src/expr/transform/simplify_typed.rs - Typed simplification
  • vortex-array/src/expr/traversal/mod.rs - Traversal infrastructure
  • vortex-array/src/expr/transform/partition.rs - Partitioning logic

@joseph-isaacs joseph-isaacs requested a review from gatesn November 18, 2025 13:06
Copy link
Contributor

@gatesn gatesn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some early comments - I have only gone through vortex-array atm

);

let expr = simplify_typed(expr.clone(), scope)?;
let expr = simplify_typed(expr.clone(), scope, session)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an odd place to run a full optimizer pass no? Isn't this itself run within an array optimizer rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its useful to apply get_item + pack reductions. Shall I add a todo to revisit?

}

/// Type-erased wrappers that allows dynamic dispatch.
pub(crate) trait DynReduceRule: Send + Sync {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't got there yet, but presumably there are some structs that wrap the generics version and impl the Dyn versions. I think they should be defined/moved here and called ReduceRuleAdapter<T>(T) etc to follow the expression/array pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they where called RuleAdapter

pub fn simplify(e: Expression, session: &ExprSession) -> VortexResult<Expression> {
let ctx = EmptyRewriteContext;

let e = apply_parent_rules(e, &ctx, session)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to reduce child first no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I think both make sense, really we want to iterate

joseph-isaacs and others added 5 commits November 18, 2025 15:10
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
…i/expr-reduce-rules

# Conflicts:
#	vortex-gpu/benches/gpu_bitunpack.rs
#	vortex-layout/src/layouts/dict/reader.rs
# Conflicts:
#	vortex-gpu/benches/gpu_bitunpack.rs
@joseph-isaacs joseph-isaacs requested a review from gatesn November 18, 2025 16:25
@joseph-isaacs joseph-isaacs added the changelog/break A breaking API change label Nov 18, 2025
@joseph-isaacs joseph-isaacs enabled auto-merge (squash) November 18, 2025 16:28
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
impl RewriteContext for RuleContext {}

impl private::Sealed for &RuleContext {}
impl RewriteContext for &RuleContext {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need these

@joseph-isaacs joseph-isaacs merged commit 60492c4 into develop Nov 18, 2025
40 checks passed
@joseph-isaacs joseph-isaacs deleted the ji/expr-reduce-rules branch November 18, 2025 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/break A breaking API change changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants