Skip to content

Add materialized views#3464

Merged
robfrank merged 22 commits intomainfrom
materialized-views
Feb 20, 2026
Merged

Add materialized views#3464
robfrank merged 22 commits intomainfrom
materialized-views

Conversation

@robfrank
Copy link
Collaborator

What does this PR do?

implementation for #3463

@robfrank robfrank added this to the 26.3.1 milestone Feb 17, 2026
@robfrank robfrank self-assigned this Feb 17, 2026
@robfrank robfrank added the enhancement New feature or request label Feb 17, 2026
@robfrank robfrank linked an issue Feb 17, 2026 that may be closed by this pull request
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @robfrank, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers materialized views to ArcadeDB, a significant feature enabling efficient caching of query results. The implementation provides flexible refresh strategies—manual, incremental (event-driven post-commit), and periodic—and integrates deeply with the database's schema and SQL language. A key architectural enhancement is the new transaction post-commit callback system, which ensures data consistency during incremental updates and offers a reusable pattern for other transactional features. The changes span core engine components, SQL parsing, and schema management, all backed by a robust testing suite to guarantee reliability.

Highlights

  • Materialized Views Introduction: Introduced materialized views as a schema-level abstraction to store results of SQL SELECT queries, supporting full, incremental, and periodic refresh modes.
  • Transaction Post-Commit Callbacks: Implemented a new general-purpose post-commit callback mechanism within TransactionContext, crucial for event-driven incremental refreshes and extensible for future features.
  • SQL DDL Integration: Added comprehensive SQL DDL statements (CREATE MATERIALIZED VIEW, DROP MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW, ALTER MATERIALIZED VIEW) for managing materialized views.
  • Schema Persistence and Management: Integrated materialized view definitions into schema.json and extended LocalSchema to manage their lifecycle, including creation, retrieval, dropping, and crash recovery.
  • Extensive Testing Coverage: Included a wide range of tests: unit tests for core functionality, SQL syntax, incremental refresh, edge cases, remote client integration, HTTP API integration, and HA replication tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/plans/2026-02-06-materialized-views-design.md
    • Added a comprehensive design document outlining the architecture and concepts for materialized views.
  • docs/plans/2026-02-06-materialized-views-impl.md
    • Added a detailed implementation plan for materialized views, breaking down the work into phases and tasks.
  • docs/plans/2026-02-14-materialized-views-implementation.md
    • Added an updated and more detailed implementation plan for materialized views, including specific tasks and code snippets.
  • docs/plans/2026-02-15-materialized-views-testing.md
    • Added a plan for integration and end-to-end testing of materialized views, including remote client and HA replication scenarios.
  • engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLLexer.g4
    • Modified the SQL lexer to include new keywords required for materialized view DDL statements.
  • engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLParser.g4
    • Modified the SQL parser grammar to define rules for creating, dropping, refreshing, and altering materialized views.
  • engine/src/main/java/com/arcadedb/database/TransactionContext.java
    • Modified the transaction context to introduce a general-purpose post-commit callback mechanism.
  • engine/src/main/java/com/arcadedb/query/sql/executor/FetchFromSchemaMaterializedViewsStep.java
    • Added a new execution step to retrieve metadata about materialized views from the schema.
  • engine/src/main/java/com/arcadedb/query/sql/executor/SelectExecutionPlanner.java
    • Updated the select execution planner to handle queries against 'schema:materializedViews'.
  • engine/src/main/java/com/arcadedb/query/sql/parser/AlterMaterializedViewStatement.java
    • Added a new parser class for the 'ALTER MATERIALIZED VIEW' SQL statement.
  • engine/src/main/java/com/arcadedb/query/sql/parser/CreateMaterializedViewStatement.java
    • Added a new parser class for the 'CREATE MATERIALIZED VIEW' SQL statement.
  • engine/src/main/java/com/arcadedb/query/sql/parser/DropMaterializedViewStatement.java
    • Added a new parser class for the 'DROP MATERIALIZED VIEW' SQL statement.
  • engine/src/main/java/com/arcadedb/query/sql/parser/RefreshMaterializedViewStatement.java
    • Added a new parser class for the 'REFRESH MATERIALIZED VIEW' SQL statement.
  • engine/src/main/java/com/arcadedb/schema/LocalSchema.java
    • Modified the local schema to manage materialized view definitions, handle persistence, and integrate with refresh schedulers.
  • engine/src/main/java/com/arcadedb/schema/MaterializedView.java
    • Added a new interface defining the contract for materialized views.
  • engine/src/main/java/com/arcadedb/schema/MaterializedViewBuilder.java
    • Added a new builder class to facilitate the creation of materialized views.
  • engine/src/main/java/com/arcadedb/schema/MaterializedViewChangeListener.java
    • Added a new listener class responsible for triggering incremental refreshes of materialized views.
  • engine/src/main/java/com/arcadedb/schema/MaterializedViewImpl.java
    • Added a new concrete implementation of the 'MaterializedView' interface.
  • engine/src/main/java/com/arcadedb/schema/MaterializedViewQueryClassifier.java
    • Added a new utility class to classify the complexity of materialized view queries.
  • engine/src/main/java/com/arcadedb/schema/MaterializedViewRefreshMode.java
    • Added a new enum to define the different refresh modes for materialized views.
  • engine/src/main/java/com/arcadedb/schema/MaterializedViewRefresher.java
    • Added a new utility class to handle the full refresh logic for materialized views.
  • engine/src/main/java/com/arcadedb/schema/MaterializedViewScheduler.java
    • Added a new scheduler class to manage periodic refreshes of materialized views.
  • engine/src/main/java/com/arcadedb/schema/Schema.java
    • Modified the schema interface to expose methods for managing materialized views.
  • engine/src/test/java/com/arcadedb/database/TransactionCallbackTest.java
    • Added new unit tests to verify the functionality of transaction post-commit callbacks.
  • engine/src/test/java/com/arcadedb/schema/MaterializedViewEdgeCaseTest.java
    • Added new unit tests to cover various edge cases and scenarios for materialized views.
  • engine/src/test/java/com/arcadedb/schema/MaterializedViewIncrementalTest.java
    • Added new unit tests to validate the incremental refresh mechanism of materialized views.
  • engine/src/test/java/com/arcadedb/schema/MaterializedViewSQLTest.java
    • Added new unit tests to verify the correctness of SQL syntax for materialized view DDL.
  • engine/src/test/java/com/arcadedb/schema/MaterializedViewTest.java
    • Added new unit tests for the core functionality and JSON serialization of materialized views.
  • network/src/main/java/com/arcadedb/remote/RemoteMaterializedView.java
    • Added a new class to represent materialized views when accessed via a remote database connection.
  • network/src/main/java/com/arcadedb/remote/RemoteSchema.java
    • Modified the remote schema to correctly handle materialized view operations and metadata retrieval.
  • server/src/test/java/com/arcadedb/remote/RemoteMaterializedViewIT.java
    • Added new integration tests to verify remote client interactions with materialized views.
  • server/src/test/java/com/arcadedb/server/HTTPMaterializedViewIT.java
    • Added new integration tests to validate materialized view operations through the HTTP REST API.
  • server/src/test/java/com/arcadedb/server/ha/ReplicationMaterializedViewIT.java
    • Added new integration tests to ensure materialized view schema and data replicate correctly in an HA cluster.
Activity
  • The pull request introduces a significant new feature: materialized views.
  • It includes detailed design and implementation plans in the 'docs/plans' directory.
  • Extensive unit and integration tests have been added across 'engine', 'network', and 'server' modules.
  • The implementation of 'RemoteSchema' methods for materialized views was completed, enabling remote client interaction.
  • HA replication tests were added to ensure the feature works correctly in a clustered environment.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-designed feature: materialized views. The implementation is comprehensive, covering core engine changes, SQL syntax, transaction handling, and testing across different layers (unit, integration, HA). The design documents are thorough, though there are some minor inconsistencies between them and the final implementation. I've pointed out a few areas for improvement, mainly around making some logic more robust by using the SQL parser instead of string manipulation, and aligning documentation with the code. Overall, this is an excellent contribution.

Comment on lines 172 to 193
private List<String> extractSourceTypes(final String sql) {
// Parse the SQL and extract type names from the FROM clause
final List<String> types = new ArrayList<>();
final String upper = sql.toUpperCase();
int fromIdx = upper.indexOf("FROM ");
while (fromIdx >= 0) {
final int start = fromIdx + 5;
int i = start;
while (i < sql.length() && sql.charAt(i) == ' ')
i++;
final int nameStart = i;
while (i < sql.length() && (Character.isLetterOrDigit(sql.charAt(i)) || sql.charAt(i) == '_'))
i++;
if (i > nameStart) {
final String typeName = sql.substring(nameStart, i);
if (!typeName.isEmpty() && !types.contains(typeName))
types.add(typeName);
}
fromIdx = upper.indexOf("FROM ", i);
}
return types;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The extractSourceTypes method uses a simple string-matching approach to find the source type from the FROM clause. This can be brittle and may fail with more complex queries, such as those with quoted identifiers or JOINs. The design documents suggest using the SQL parser for this, which would be a more robust solution.

import com.arcadedb.query.sql.SQLQueryEngine;
import com.arcadedb.query.sql.parser.FromClause;
import com.arcadedb.query.sql.parser.FromItem;
import com.arcadedb.query.sql.parser.SelectStatement;
import com.arcadedb.query.sql.parser.Statement;

...

  private List<String> extractSourceTypes(final String sql) {
    final List<String> types = new ArrayList<>();
    try {
      final Statement parsed = ((SQLQueryEngine) database.getQueryEngine("sql")).parse(sql, database);
      if (parsed instanceof SelectStatement select) {
        final FromClause from = select.getTarget();
        if (from != null) {
          final FromItem item = from.getItem();
          if (item != null && item.getIdentifier() != null) {
            types.add(item.getIdentifier().getStringValue());
          }
          // TODO: Add support for more complex FROM clauses like JOINs if materialized views can be defined on them.
        }
      }
    } catch (final Exception e) {
      // Parser failed, could log a warning. For now, this will result in an empty list of source types.
    }
    return types;
  }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation at L191-203 already uses the SQL parser — it calls database.getStatementCache().get(sql) which returns a fully parsed Statement and then walks the AST (SelectStatement, FromClause, FromItem, identifier). There is no string matching happening here. The suggested replacement is functionally equivalent but bypasses the statement cache, which is a mild regression. No change needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation already uses the SQL parser — database.getStatementCache().get(sql) parses and caches the AST. The subsequent item.getIdentifier().getStringValue() call reads from the parsed AST node, not from string matching. Using the cache is preferred over calling parse() directly since it avoids re-parsing queries that are seen multiple times.

Comment on lines 30 to 63
public static boolean isSimple(final String sql, final Database database) {
final String upper = sql.toUpperCase().trim();

if (upper.contains("GROUP BY"))
return false;
if (upper.contains("SUM(") || upper.contains("COUNT(") || upper.contains("AVG(") ||
upper.contains("MIN(") || upper.contains("MAX("))
return false;
if (upper.contains("TRAVERSE"))
return false;
if (upper.contains(" JOIN "))
return false;

// Check for subqueries
final int firstSelect = upper.indexOf("SELECT");
if (firstSelect >= 0) {
final int secondSelect = upper.indexOf("SELECT", firstSelect + 6);
if (secondSelect >= 0)
return false;
}

// Count FROM clauses
int fromCount = 0;
int searchFrom = 0;
while ((searchFrom = upper.indexOf("FROM ", searchFrom)) >= 0) {
fromCount++;
searchFrom += 5;
}
if (fromCount > 1)
return false;

return true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The isSimple method uses string contains checks to classify a query. This approach is fragile and can be easily fooled by comments, string literals, or aliased column names that happen to match keywords (e.g., SELECT my_sum AS total FROM...). Using the SQL parser to inspect the AST of the query would provide a much more reliable classification.

import com.arcadedb.query.sql.SQLQueryEngine;
import com.arcadedb.query.sql.parser.SelectStatement;
import com.arcadedb.query.sql.parser.Statement;

...

  public static boolean isSimple(final String sql, final Database database) {
    try {
      final Statement parsed = ((SQLQueryEngine) database.getQueryEngine("sql")).parse(sql, database);
      if (!(parsed instanceof SelectStatement select)) {
        return false;
      }

      if (select.getGroupBy() != null) {
        return false;
      }
      if (select.getProjection() != null && select.getProjection().isAggregate()) {
        return false;
      }
      if (select.getTarget() != null && select.getTarget().getItem() != null && select.getTarget().getItem().getJoin() != null) {
          return false;
      }
      // NOTE: This does not check for subqueries or TRAVERSE yet, but is more robust than string matching.

      return true;
    } catch (final Exception e) {
      // If parsing fails, it's safer to treat it as a complex query.
      return false;
    }
  }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version of isSimple already uses the SQL parser via database.getStatementCache().get(sql) and inspects the AST — no string contains checks are present. It handles GROUP BY, aggregate functions per projection item, subqueries in FROM, function calls in FROM, and absence of a type identifier. Also note: the suggested replacement calls FromItem.getJoin() which does not exist in this codebase, so that code would not compile.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSimple already inspects the parsed AST — select.getGroupBy(), item.isAggregate(ctx), item.getStatement(), item.getFunctionCall(), and item.getIdentifier() are all AST node accessors, not string checks. The suggested replacement is actually less complete: it omits the per-item aggregate check (item.isAggregate(ctx)) that the current implementation has, which would cause aggregate functions like sum() or count() to be misclassified as simple.

Comment on lines 25 to 29
public enum MaterializedViewRefreshMode {
MANUAL,
ON_COMMIT,
PERIODIC
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This implementation plan consistently uses ON_COMMIT for the incremental refresh mode. However, the final implementation in MaterializedViewRefreshMode.java and other related files uses INCREMENTAL. To avoid confusion, this document should be updated to use INCREMENTAL to match the actual code and the main design document.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid — this plan doc was written before the naming was finalised. The term ON_COMMIT in it is stale; the implementation landed as INCREMENTAL. The plan docs are kept for historical context rather than as live documentation, so no code change is needed, but acknowledged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced all ON_COMMIT references with INCREMENTAL in this plan doc to match the final implementation.

Comment on lines 116 to 117
assertThat(database.getSchema().buildMaterializedView()).isNull();
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test plan suggests that database.getSchema().buildMaterializedView() should return null for a remote database. However, the implementation in RemoteSchema.java throws an UnsupportedOperationException, which is a better practice for unsupported methods. The test plan should be updated to reflect this, asserting that an exception is thrown.

Suggested change
assertThat(database.getSchema().buildMaterializedView()).isNull();
});
// buildMaterializedView is not supported remotely — use SQL instead
assertThatThrownBy(() -> database.getSchema().buildMaterializedView())
.isInstanceOf(UnsupportedOperationException.class);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — the plan doc is stale here. The actual test in RemoteMaterializedViewIT.java already asserts assertThatThrownBy(() -> database.getSchema().buildMaterializedView()).isInstanceOf(UnsupportedOperationException.class), which matches the implementation. No code change needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated the test plan snippet to assert UnsupportedOperationException instead of isNull(), matching the actual RemoteSchema implementation.

@claude
Copy link

claude bot commented Feb 17, 2026

Code Review — Add Materialized Views (#3464)

This is a well-scoped and architecturally coherent first implementation of materialized views. The design document, phased plan, and good test coverage make it easy to follow the intent. Below is detailed feedback organized by severity.


Critical / Bugs

1. SQL injection in RemoteSchemaexistsMaterializedView / getMaterializedView

The view name is injected directly into a SQL string without escaping:

"SELECT FROM schema:materializedViews WHERE name = '" + viewName + "'"

An attacker (or just an unusual view name containing a ') can break the query or potentially execute arbitrary SQL. The dropTrigger method in the same class already uses the correct backtick-quoting pattern:

remoteDatabase.command("sql", "drop trigger `" + triggerName + "`");

Apply the same backtick quoting here, or use query parameters.

2. INTEGER_LITERAL ambiguity in visitCreateMaterializedViewStmt

The buckets count is parsed as:

if (bodyCtx.BUCKETS() \!= null)
    stmt.buckets = Integer.parseInt(bodyCtx.INTEGER_LITERAL().getText());

bodyCtx.INTEGER_LITERAL() returns the first INTEGER_LITERAL in the rule's child list. If REFRESH EVERY N SECOND is also present in the same rule body, the refresh interval N is also an INTEGER_LITERAL, and calling .INTEGER_LITERAL() (no index) will return the first one — which could be the interval, not the bucket count. Use an indexed accessor or give each integer a distinct label in the grammar.

3. fullRefresh status transition is not thread-safe

view.setStatus("BUILDING");          // volatile write
database.transaction(() -> { ... }); // may take a long time
view.updateLastRefreshTime();        // volatile write — only on success
view.setStatus("VALID");             // volatile write — only on success

A concurrent periodic refresh can start a second refresh while the first is still running (BUILDING), and if the second also fails, status stays BUILDING indefinitely. Consider making lastRefreshTime + status updates atomic, or guard against concurrent refreshes.

4. Incremental refresh does a full re-scan — misleading naming

MaterializedViewChangeListener calls MaterializedViewRefresher.fullRefresh() on every record change — a full DELETE + re-execute, not truly incremental. At scale this is O(N) on every write to a source type. The INCREMENTAL mode name implies per-record delta application. The simpleQuery classification is computed but unused in the refresh code. Consider renaming the mode to ON_COMMIT or AUTO, or adding a prominent disclaimer.


Design / Architecture Concerns

5. Source type extraction via indexOf("FROM ") is fragile

MaterializedViewBuilder.extractSourceTypes and MaterializedViewQueryClassifier both do raw-string analysis with indexOf/contains. This matches FROM inside string literals (e.g., WHERE name = 'FROM here') and mishandles subqueries. Since CreateMaterializedViewStatement already holds the fully parsed SelectStatement AST, walking the AST to extract from-clause type names would be far more robust and consistent with the rest of the pipeline.

6. Backing type name equals view name — no namespace separation

The materialized view's backing DocumentType has exactly the same name as the view. A plain SELECT FROM ViewName queries the raw backing data, and a DELETE FROM ViewName corrupts the view silently. A prefix convention (e.g., __mv_ViewName) or a type-level flag marking the type as view-managed would add a safety boundary.

7. status field uses raw String instead of an enum

MaterializedViewImpl.setStatus(String) accepts any string, making typos (e.g., "STALE" vs "stale") silently produce invalid states. An internal enum Status { VALID, BUILDING, STALE, ERROR } would be safer; the public API can still return String for compatibility.

8. Listener deregistration happens outside recordFileChanges in dropMaterializedView

Cancelling the scheduler and unregistering listeners happen before recordFileChanges. If HA replication fails mid-operation, listeners are already gone in memory but the schema on disk/replicas still has the view. Moving listener deregistration inside recordFileChanges would make the operation more consistent.


Code Quality

9. AlterMaterializedViewStatement ships as a fully wired-up stub that always throws

Grammar rule + statement class + visitor are all present, but executeDDL immediately throws CommandExecutionException("ALTER MATERIALIZED VIEW is not yet supported"). Users who discover ALTER in the grammar will be surprised. Either defer the grammar additions until ALTER is implemented, or document the limitation clearly.

10. MaterializedViewQueryClassifier — aggregate detection misses spaced calls

upper.contains("COUNT(") misses COUNT (*) (space before parenthesis). Use a regex like \b(SUM|COUNT|AVG|MIN|MAX)\s*\( for reliable detection.

11. MaterializedViewScheduler — single thread for all periodic views

Executors.newSingleThreadScheduledExecutor() means a long-running refresh blocks all other periodic views from firing on time. Document this limitation or use newScheduledThreadPool(N) with a configurable pool size.

12. MaterializedView.toJSON() in the public interface couples to an internal type

Returning com.arcadedb.serializer.json.JSONObject from the public interface exposes an internal class to all implementors (including RemoteMaterializedView). Consider keeping toJSON() on MaterializedViewImpl only, or returning Map<String, Object> from the interface.

13. docs/plans/ AI planning artifacts committed to the repo

Four large planning Markdown files (~5000 lines total) are under docs/plans/. These are useful during development but add noise. If retained, link them from proper user-facing documentation rather than leaving them as a standalone plans/ folder.


Minor

  • LocalSchema.materializedViews is protected final but accessed directly from MaterializedViewBuilder. A package-private accessor would be cleaner.
  • MaterializedViewImpl.fromJSON accesses JSON fields without the two-argument default-value form recommended by CLAUDE.md (use json.getString("field", defaultValue) to avoid null checks).
  • Test @AfterEach teardown should drop views/types unconditionally to prevent state from leaking between tests on failure.
  • RemoteMaterializedView.refresh() and drop() throw UnsupportedOperationException at runtime. Interface Javadoc should make it clear which methods are unavailable remotely.

Summary

The architecture is sound and the v1 feature set is well-chosen. Most important items to resolve before leaving DRAFT:

  1. SQL injection in RemoteSchema — use backtick quoting consistently
  2. INTEGER_LITERAL ambiguity in grammar visitor (buckets vs. interval)
  3. Regex-based source type extraction — use the existing AST instead of indexOf("FROM ")
  4. Rename INCREMENTAL or prominently document that it currently triggers full refreshes

The remaining items are suggestions for follow-up iterations.

@mergify
Copy link
Contributor

mergify bot commented Feb 17, 2026

🧪 CI Insights

Here's what we observed from your CI run for 1390ad1.

🟢 All jobs passed!

But CI Insights is watching 👀

@codacy-production
Copy link

codacy-production bot commented Feb 17, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-10.32% 75.05%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (5ebbdb7) 86378 61629 71.35%
Head commit (fe9fda3) 116780 (+30402) 71270 (+9641) 61.03% (-10.32%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3464) 517 388 75.05%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 78.30018% with 120 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.87%. Comparing base (7c92d6f) to head (1390ad1).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...a/com/arcadedb/schema/MaterializedViewBuilder.java 75.75% 14 Missing and 10 partials ⚠️
...va/com/arcadedb/query/sql/antlr/SQLASTBuilder.java 70.90% 8 Missing and 8 partials ⚠️
...va/com/arcadedb/remote/RemoteMaterializedView.java 65.78% 7 Missing and 6 partials ⚠️
...src/main/java/com/arcadedb/schema/LocalSchema.java 84.21% 4 Missing and 8 partials ⚠️
...java/com/arcadedb/schema/MaterializedViewImpl.java 85.18% 11 Missing and 1 partial ⚠️
...executor/FetchFromSchemaMaterializedViewsStep.java 71.05% 9 Missing and 2 partials ⚠️
...cadedb/schema/MaterializedViewQueryClassifier.java 54.16% 6 Missing and 5 partials ⚠️
...com/arcadedb/schema/MaterializedViewScheduler.java 71.87% 7 Missing and 2 partials ⚠️
...com/arcadedb/schema/MaterializedViewRefresher.java 75.86% 6 Missing and 1 partial ⚠️
...rcadedb/schema/MaterializedViewChangeListener.java 83.33% 4 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3464      +/-   ##
==========================================
+ Coverage   63.43%   63.87%   +0.44%     
==========================================
  Files        1321     1331      +10     
  Lines       87643    88194     +551     
  Branches    17970    18063      +93     
==========================================
+ Hits        55594    56332     +738     
+ Misses      24226    23929     -297     
- Partials     7823     7933     +110     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude
Copy link

claude bot commented Feb 17, 2026

Code Review: PR #3464 — Add Materialized Views

This is a substantial feature (~9000 lines). The overall architecture is sound and follows existing ArcadeDB patterns well. Below is my detailed review.


What Works Well

  • Reusing DocumentType as the backing store is clever — views are queryable as regular types without changes to the query planner.
  • The post-commit callback mechanism in TransactionContext is a clean, general-purpose addition.
  • addAfterCommitCallbackIfAbsent() keyed deduplication elegantly handles bulk inserts in a single transaction.
  • Schema persistence in schema.json mirrors the triggers pattern exactly.
  • Crash recovery (BUILDINGSTALE on reopen) is handled.
  • MaterializedViewQueryClassifier uses the real ANTLR AST rather than fragile string matching.
  • New keywords added as soft keywords in the grammar's identifier rule prevents conflicts with existing code using VIEW, REFRESH, etc. as identifiers.
  • RemoteMaterializedView correctly rejects mutations with helpful error messages.
  • Good test breadth covering rollback semantics, HA replication, and HTTP API.

Issues

Bugs

B1 — SQL injection risk in MaterializedViewRefresher

database.command("sql", "DELETE FROM " + backingTypeName);

Should use backtick escaping to be safe:

database.command("sql", "DELETE FROM `" + backingTypeName + "`");

The same pattern should be applied to the defining query re-execution.

B2 — DELETE FROM instead of TRUNCATE TYPE for refresh (performance + correctness)

Full refresh uses DELETE FROM <backingType>, which generates per-record WAL entries and fires delete event listeners on the backing type. Use TRUNCATE TYPE <name> UNSAFE instead — it's a bulk operation and avoids spurious listener callbacks that could trigger unintended cascades (e.g., if someone attaches a listener to the backing type).

B3 — isSimpleQuery() flag computed but not used

MaterializedViewQueryClassifier.isSimple() result is stored in MaterializedViewImpl.simpleQuery and exposed via isSimpleQuery(), but MaterializedViewChangeListener always calls fullRefresh() regardless of this flag. The design intent was for simple queries to get per-record incremental updates, not a full rescan. Either implement the per-record path or remove the simpleQuery field and classifier to avoid misleading the reader.

B4 — RemoteSchema.existsMaterializedView() uses command() for a SELECT

remoteDatabase.command("sql", "SELECT FROM schema:materializedViews WHERE name = ...");

SELECT should use query(), not command(). This may work today but is semantically wrong and may bypass read-only optimizations.

B5 — AlterMaterializedViewStatement.toString() has wrong logical operator

if ("PERIODIC".equalsIgnoreCase(refreshMode) || refreshInterval > 0)
    sb.append(" EVERY ").append(refreshInterval)...

Should be && not ||. With ||, a non-PERIODIC mode with a positive interval would produce invalid SQL output.

B6 — MaterializedViewBuilder.create() called inside database.transaction() in edge-case tests

create() manages its own transaction coordination via schema.recordFileChanges(). Wrapping it inside an explicit database.transaction(() -> { ... }) (as done in MaterializedViewEdgeCaseTest) may cause nested transaction issues. The other test classes correctly call create() outside explicit transactions.

B7 — Empty transaction fires post-commit callbacks unexpectedly

In TransactionContext.commit(), when phase1 == null (nothing to commit), the code now calls resetAndFireCallbacks() instead of reset(). This means an empty/no-op transaction will fire post-commit callbacks. This should be intentional — but if so, it deserves an explicit comment explaining why.


Performance

P1 — No batching window for INCREMENTAL mode

addAfterCommitCallbackIfAbsent deduplicates callbacks within a single transaction (good), but not across successive transactions. If 100 separate transactions commit in rapid succession, there will be 100 full refreshes. Consider adding a short cooldown/debounce window (e.g., coalesce callbacks fired within the same millisecond or use a scheduled delay).

P2 — Full scan on every INCREMENTAL refresh

INCREMENTAL mode still performs a full rescan of the source type on every change, regardless of query complexity. For large source types, this means INCREMENTAL and PERIODIC are functionally equivalent in cost. This limitation should be clearly documented in the user-facing docs so users can make an informed choice.


Code Quality

Q1 — Planning documents should not be committed

The four docs/plans/*.md files are AI working documents and contain internal agent instructions (REQUIRED SUB-SKILL: Use superpowers:executing-plans). They are not appropriate for the production codebase. Please remove them before merging.

Q2 — MaterializedViewChangeListener stores Database then immediately casts it

private final Database database;
// ...
final DatabaseInternal db = (DatabaseInternal) database;

Store DatabaseInternal directly in the field.

Q3 — changeListener field in MaterializedViewImpl is not volatile

status and lastRefreshTime are correctly volatile, but changeListener is written from registerListeners()/unregisterListeners() and read from scheduled callbacks. It should also be volatile for correctness on non-x86 architectures.

Q4 — extractSourceTypes() only extracts the first FROM clause target

Queries joining multiple types (SELECT * FROM A, B) or using subqueries in FROM will return an incomplete source types list, so INCREMENTAL listeners won't be registered for all source types. This is mostly safe because complex queries fall through to full refresh, but the introspection metadata (view.getSourceTypes()) will be incorrect for multi-table queries. At minimum, log a warning when source type extraction is incomplete.

Q5 — ResultInternal() constructor used inconsistently

Some DDL statements use new ResultInternal() (no-arg) while others use new ResultInternal(context.getDatabase()). Verify the no-arg constructor is valid for the DDL return path.

Q6 — Fragile string assertion in ReplicationMaterializedViewIT

assertThat(content.contains("HighMetrics")).isTrue();

Parse the schema JSON and assert on the structured data instead, following the project's preferred assertThat(x).isEqualTo(y) style.


Thread Safety

TS1 — LocalSchema.materializedViews (LinkedHashMap) accessed concurrently without explicit synchronization

Verify this is protected by the same lock/guard used for triggers and triggerAdapters. If recordFileChanges() provides mutual exclusion for writes but reads are lock-free, a concurrent read during a write could cause a ConcurrentModificationException.


Missing Test Coverage

  • No test for periodic refresh actually firing (timer-driven)
  • No test for crash recovery (BUILDINGSTALEREFRESHED on reopen)
  • No test for REFRESH MATERIALIZED VIEW NonExistent (user-facing error message)
  • No test for multi-source-type queries in INCREMENTAL mode verifying both source types trigger refresh
  • ALTER MATERIALIZED VIEW survives HA replication (only CREATE/DROP are tested in the HA test)

Summary

Category Count
Bugs 7
Performance 2
Code quality 6
Thread safety 1
Missing tests 5

The core architecture is solid. The main items to address before merging are: remove the planning docs (Q1), fix the toString() logical operator bug (B5), replace DELETE FROM with TRUNCATE TYPE UNSAFE (B2), add backtick escaping (B1), and clarify the isSimpleQuery design gap (B3).

@claude
Copy link

claude bot commented Feb 17, 2026

Code Review: Add Materialized Views (#3464)

This is a well-structured, large-scope feature addition. The design doc is thorough, the code is generally clean, and the test coverage is solid. Below are my findings, organized by severity.


Correctness / Potential Bugs

1. resetAndFireCallbacks() fires callbacks even when commit2ndPhase() fails

In TransactionContext, resetAndFireCallbacks() is called in the finally block of commit2ndPhase(). If the commit fails mid-way (e.g., disk error), callbacks will still fire, causing a view refresh against source data that may not actually be committed. The callbacks should only fire on a successful commit, not in finally. The original reset() in the finally was correct for cleanup; callbacks need to be guarded with a success flag:

boolean committed = false;
try {
  // ... commit logic ...
  committed = true;
} finally {
  if (committed)
    resetAndFireCallbacks();
  else
    reset(); // discard callbacks without firing
}

2. Schema persistence race in MaterializedViewBuilder.create()

saveConfiguration() is called twice — once after storing the view in the map, and again after the initial full refresh. Between the two saves, the view status transitions from the initial VALID to BUILDING then back to VALID. If the process crashes between the first and second saveConfiguration(), the view will be persisted with status VALID but contain no data (since the refresh was interrupted). The view definition should be saved initially with status STALE so that crash recovery triggers a re-refresh on reopen.

3. extractSourceTypes() only handles a single FROM target

MaterializedViewBuilder.extractSourceTypes() extracts only the first identifier from the FROM clause. If the query has JOINs or multiple sources, those source types are silently ignored — the INCREMENTAL change listener won't be registered for them. The query classifier correctly marks such queries as "complex" (falling back to full refresh), but nothing prevents silent gaps in listener registration. At minimum, document this limitation with a comment.

4. fullRefresh() throws after setting status to ERROR, preventing schema save

In MaterializedViewRefresher.fullRefresh(), an exception is caught, status is set to ERROR, and then re-thrown. The caller (MaterializedViewBuilder.create()) does not catch this, so saveConfiguration() is never called with the ERROR status — the view remains in BUILDING state on disk. Crash recovery will promote BUILDING to STALE on reopen, which happens to be correct, but this is coincidental. The error path should explicitly document (or enforce) what state is persisted.

5. DELETE FROM inside a refresh transaction is expensive for large views

MaterializedViewRefresher.fullRefresh() uses database.command("sql", "DELETE FROM " + backingTypeName) to truncate the backing type. For large views, this executes a full scan-delete within the same transaction as the re-insert, creating massive transaction journal entries and memory pressure. Consider using the type's truncate API if available, or at least document this as a known constraint for large materialized views.


Design / API Concerns

6. MaterializedView interface exposes implementation details

isSimpleQuery() and getRefreshInterval() are internal optimization details — isSimpleQuery() is a refresh-strategy hint, and getRefreshInterval() is only meaningful for PERIODIC views. These leak internal concerns into the public interface. Consider keeping them on the implementation class only.

7. changeListener field is not volatile

MaterializedViewImpl has several volatile fields (status, lastRefreshTime) but changeListener is not:

private transient MaterializedViewChangeListener changeListener;

This field is written from one thread (during view creation/schema reload) and read from another (the record event callback thread during incremental refresh). Without volatile or synchronization, the listener reference may not be visible across threads.

8. RemoteSchema.existsMaterializedView() uses string concatenation for identifiers

"SELECT FROM schema:materializedViews WHERE name = `" + viewName + "`"

View names are user-controlled. Using string concatenation for identifiers is inconsistent with how parameters are handled elsewhere and is a potential injection vector. Use query parameters or validate/escape the identifier.

9. Inconsistency: SQL syntax is REFRESH INCREMENTAL but issue/design doc says REFRESH ON COMMIT

The linked issue (#3463) and the design document use REFRESH ON COMMIT as the syntax, but the implementation uses REFRESH INCREMENTAL. Pick one and align all references before merging.

10. Source type validation only at creation time

MaterializedViewBuilder.create() validates that source types exist at creation, but the refresh logic has no guard for missing source types. If a source type is dropped via a code path that bypasses the dropType guard (e.g., direct replication), refresh will throw unguardedly. The refresh logic should handle missing types gracefully.


Test Coverage Gaps

11. No test for crash recovery (BUILDING → STALE → auto-refresh)

The design specifies that a view with BUILDING status on database reopen should become STALE and be auto-refreshed. There is no test that simulates this scenario by directly setting BUILDING status in the persisted schema and verifying the recovery behavior on reopen.

12. No behavioral test for ALTER MATERIALIZED VIEW

MaterializedViewSQLTest tests that the SQL parses correctly, but no test verifies that changing from MANUAL to INCREMENTAL actually registers the change listener, or from INCREMENTAL to MANUAL actually unregisters it.

13. rollbackDoesNotAffectView test pattern may not exercise a true rollback

The test calls database.begin() and database.rollback() directly. Depending on TestHelper's configuration, this may not exercise the expected code path. Verify the post-commit callback is not invoked on rollback.


Minor / Style

14. Plan documents in docs/plans/ should not be merged

These files contain implementation checklists ("✅ COMPLETE", "⏳ IN PROGRESS") and agent-specific guidance ("For Claude: REQUIRED SUB-SKILL..."). They are working notes, not project documentation, and should be removed before merging.

15. CreateMaterializedViewStatement.toString() has a logic bug

if ("PERIODIC".equalsIgnoreCase(refreshMode) || refreshInterval > 0)
    sb.append(" EVERY ").append(refreshInterval)...

The || refreshInterval > 0 branch can produce invalid SQL when refreshMode is MANUAL or INCREMENTAL. This should be &&.

16. FetchFromSchemaMaterializedViewsStep shares result list as an instance field

The result list is an instance field, but the returned ResultSet.close() calls result.clear(). If the ResultSet is closed early (before iteration completes), the backing data is destroyed. Consider making result a local variable in syncPull() captured by the anonymous ResultSet.

17. transient keyword on non-Serializable class is misleading

private transient MaterializedViewChangeListener changeListener;

transient only applies to Java serialization. Since MaterializedViewImpl does not implement Serializable, this keyword has no effect and is misleading. Replace with a comment like // not persisted to JSON.


Summary

Category Count
Correctness bugs 5
Design concerns 5
Test gaps 3
Style/minor 4

The most important issues to address before merging:

  1. Bug build support #1 — post-commit callbacks fire even on failed commits (data consistency risk)
  2. Bug Bump gremlin.version from 3.4.10 to 3.5.1 #2 — schema persistence race during initial refresh
  3. Issue Bump maven-resources-plugin from 3.1.0 to 3.2.0 #14 — remove plan documents (they contain agent-specific metadata and are not project docs)
  4. Issue Bump json from 20201115 to 20210307 #9 — align SQL syntax (INCREMENTAL vs ON COMMIT) with the issue/docs

The architecture is sound and the overall implementation quality is high. Once the correctness issues are addressed, this will be a solid addition to ArcadeDB.

@claude
Copy link

claude bot commented Feb 18, 2026

Code Review: Add Materialized Views (#3464)

Overall this is a well-structured, ambitious feature with solid design documentation and reasonable test coverage. The architecture mirrors existing patterns (triggers, schema builders) appropriately. The comments below focus on areas that need attention before merging.


Critical / Bugs

1. Post-commit callbacks fire after reset() — transaction state is gone

In TransactionContext.resetAndFireCallbacks(), reset() is called before firing the callbacks. Each callback then calls MaterializedViewRefresher.fullRefresh(), which starts a new transaction on the same database. This is correct by design, but the naming resetAndFireCallbacks is misleading — the reset clears afterCommitCallbacks and registeredCallbackKeys, yet those are snapshotted beforehand. This is fine, but add a comment to make the intent explicit. More importantly: if commit2ndPhase is called inside a finally block, a callback failure will suppress the original exception path — confirm this is the intended behaviour.

2. Incremental refresh always does a full refresh

MaterializedViewChangeListener calls MaterializedViewRefresher.fullRefresh() for every source change, including when view.isSimpleQuery() is true. The design document describes per-record incremental updates for simple queries (§4.1, Task 4.2 with _sourceRID tracking), but the implementation skips this entirely and always does a full refresh. This means:

  • Inserting one row into a 10M-row source type truncates and rebuilds the entire view.
  • The simpleQuery flag is computed and stored but never used for its intended purpose.

Either implement the per-record path or document that incremental mode currently always does a full refresh and the _sourceRID approach is deferred.

3. MaterializedViewRefresher.fullRefresh re-throws after setting ERROR status

} catch (final Exception e) {
    view.setStatus("ERROR");
    LogManager.instance().log(..., Level.SEVERE, ...);
    throw e;   // <-- re-thrown
}

When called from the post-commit callback (inside resetAndFireCallbacks), the exception is caught and logged there — so the view ends up ERROR which is acceptable. But when called directly from MaterializedViewBuilder.create() during initial refresh, the exception propagates out and the backing DocumentType is left orphaned in the schema (the view entry was already added to materializedViews before the refresh). Add a cleanup step: on initial-refresh failure, call dropMaterializedView (or at minimum remove the entry and drop the backing type).

4. DELETE FROM <backingTypeName> inside the refresh transaction

The refresh uses a SQL DELETE FROM statement to truncate, which creates per-record WAL entries and can be slow for large views. Consider using the truncate() method on the backing type directly if one exists, or at minimum note this as a known limitation.

5. refreshMaterializedViewBody rule is ambiguous with identifier

REFRESH is added both as a new lexer token and to the identifier alternative list. If a user has a type named Refresh, existing queries using Refresh as an identifier would parse the keyword token first. Please verify there is no regression for queries like SELECT * FROM Refresh or column aliases named refresh.


Design / Architecture

6. Status field uses raw String instead of an enum

MaterializedViewImpl stores status as a volatile String with ad-hoc values "VALID", "STALE", "BUILDING", "ERROR". These are checked by string equality throughout the code ("BUILDING".equals(view.getStatus())). An enum MaterializedViewStatus would prevent typos, make switch statements exhaustive, and align with the enum pattern used for MaterializedViewRefreshMode.

7. alterMaterializedView in RemoteSchema throws UnsupportedOperationException

The interface Schema now declares alterMaterializedView(...), but RemoteSchema throws UnsupportedOperationException. This breaks the Liskov Substitution Principle for any code that holds a Schema reference. Either:

  • Remove the method from the interface and make it local to LocalSchema, exposing it via a LocalSchema-specific API, or
  • Implement it in RemoteSchema by issuing the equivalent SQL (same as buildMaterializedView() — the error message suggests this).

8. Concurrent refresh() calls are not guarded

Two threads calling view.refresh() concurrently will both execute fullRefresh, leading to interleaved DELETE FROM / INSERT in separate transactions. Add a lock or a AtomicBoolean refreshing guard that causes a concurrent call to either no-op or wait.

9. MaterializedViewScheduler uses a single shared ScheduledExecutorService

A single daemon thread shared across all PERIODIC views means views refresh serially. For long-running refreshes this can cause missed intervals. This is an acceptable initial trade-off, but it should be documented.


Code Quality

10. extractSourceTypes only supports a single FROM target

MaterializedViewBuilder.extractSourceTypes() extracts at most one source type and throws if the query has no FROM target. The design and grammar support JOINs conceptually; if multi-source queries are not supported yet, the error message should be explicit ("only single-source queries are supported") rather than a generic parse error.

11. SECOND / MINUTE / HOUR case-sensitivity in AlterMaterializedViewStatement

The unit conversion uses "MINUTE".equalsIgnoreCase(refreshUnit) — good. But the toString() method produces upper-case tokens unconditionally. Consistent, but worth noting the grammar tokens are defined uppercase-only in the lexer.

12. FetchFromSchemaMaterializedViewsSteprefreshInterval cast

The metadata step casts the internal MaterializedViewImpl to access refreshInterval. This leaks the implementation type through the query layer. Expose getRefreshInterval() on the MaterializedView interface instead.

13. Docs directory contains multiple overlapping design documents

Four planning documents were committed (2026-02-06-materialized-views-design.md, 2026-02-06-materialized-views-impl.md, 2026-02-14-materialized-views-implementation.md, 2026-02-15-materialized-views-testing.md). These are useful during development but add noise to the repository. Consider consolidating or removing them, keeping only what's relevant as end-user or developer documentation.


Test Coverage

14. No test for crash recovery (BUILDINGSTALE on reopen)

LocalSchema.readConfiguration() sets STALE for views caught in BUILDING state, but there is no test that validates this path. A test that sets status to BUILDING (e.g. by directly manipulating schema.json), reopens the database, and asserts status becomes STALE would cover this important durability guarantee.

15. No concurrency test for incremental refresh

MaterializedViewIncrementalTest tests single-threaded scenarios. Given the eventual-consistency model (post-commit callback fires a new transaction), a test with concurrent inserts from multiple threads would be valuable to catch race conditions.

16. MaterializedViewSQLTest — what does it cover?

The diff summary shows MaterializedViewSQLTest.java exists but its contents weren't visible in the diff. Please confirm it covers CREATE MATERIALIZED VIEW / DROP / REFRESH / ALTER SQL syntax end-to-end, including IF NOT EXISTS and IF EXISTS branches.

17. ReplicationMaterializedViewIT — HA behaviour not clear from diff

The HA integration test exists but its assertions were not visible. Key scenarios to verify: view refresh on replica after leader executes refresh, view state after leader failover.


Minor

  • MaterializedViewImpl constructor sets status = "VALID" before the first refresh has ever run. It should start as "STALE" and transition to "VALID" after the initial fullRefresh succeeds.
  • MaterializedViewBuilder.create() calls saveConfiguration() twice (once inside recordFileChanges, once after refresh). The second call is outside recordFileChanges — verify this is intentional and thread-safe.
  • view.setStatus("ERROR") in the scheduler's catch block does not attempt to reschedule a recovery refresh; views in ERROR status stay stuck until manually refreshed or the DB is restarted.
  • The schema:materializedviews metadata target uses lowercase with no separator, while similar targets use indexes, buckets, database — consistent, but worth documenting in the user guide.

Summary

Category Count
Critical / Bugs 5
Design / Architecture 4
Code Quality 4
Test Coverage 4
Minor 4

The most important items to address before merging are #2 (incremental always does full refresh — the feature is advertised differently), #3 (orphaned backing type on initial refresh failure), and #6 (string-based status — enum is safer). Items #4 and #8 are performance/correctness concerns worth addressing in this PR or in a fast-follow.

Good work on the overall structure and the thorough design documentation.

@claude
Copy link

claude bot commented Feb 18, 2026

Code Review: Add Materialized Views (#3464)

This is a well-structured implementation of a significant new feature. The design document is thorough and the code follows existing patterns closely. Below is detailed feedback organized by area.


Architecture & Design

Strengths:

  • Clean separation of concerns: MaterializedViewImpl, MaterializedViewRefresher, MaterializedViewChangeListener, MaterializedViewScheduler are each focused and small.
  • Good use of existing patterns (builder pattern matching TypeBuilder, JSON serialization matching TriggerImpl, listener adapter matching TriggerListenerAdapter).
  • The general-purpose post-commit callback mechanism in TransactionContext is a solid building block for future use.
  • Crash recovery (BUILDING → STALE on restart) is a nice detail.

Issues & Concerns

1. Incremental refresh is not truly incremental — it's "full refresh triggered by a change"

The MaterializedViewChangeListener and MaterializedViewQueryClassifier both exist to support "simple" vs "complex" query detection, but the actual MaterializedViewChangeListener always does a full refresh regardless of the query complexity classification:

private void schedulePostCommitRefresh() {
    tx.addAfterCommitCallbackIfAbsent(callbackKey, () -> {
        MaterializedViewRefresher.fullRefresh(database, view);  // always full
    });
}

The design doc describes per-record incremental updates with _sourceRID tracking for simple queries, but this isn't implemented — MaterializedViewQueryClassifier and simpleQuery field are stored but never used to differentiate behavior. This means INCREMENTAL mode is effectively the same as MANUAL with an auto-trigger, and the _sourceRID index described in the design doc was never added.

This is fine for an MVP, but the naming INCREMENTAL is misleading. Consider either:

  • Renaming to AUTO or ON_CHANGE to accurately describe the semantics, or
  • Documenting explicitly that incremental means "triggered full refresh" in this version

2. Post-commit refresh runs in a nested transaction that can deadlock

In MaterializedViewRefresher.fullRefresh():

database.transaction(() -> {
    database.command("sql", "DELETE FROM `" + backingTypeName + "`");
    // ... insert results
});

This callback fires inside resetAndFireCallbacks(), which is called at the end of commit2ndPhase() — after reset() clears the outer transaction. Starting a new transaction here is safe in the happy path, but if the source transaction holds file-level locks that haven't been released before callbacks fire, this could deadlock. Please verify the lock release order in commit2ndPhase() happens before resetAndFireCallbacks() is called.

Additionally, if the view refresh transaction itself fails, the source transaction already committed successfully. The view is marked STALE, but the user gets no notification. For a system that promises eventual consistency this is acceptable — but it should be documented clearly in the MaterializedView interface Javadoc.

3. SQL injection vulnerability in MaterializedViewRefresher

database.command("sql", "DELETE FROM `" + backingTypeName + "`");

The backingTypeName comes from the schema (not user input at this point), so the risk is lower, but this pattern is worth hardening. Consider using the Java API (database.getSchema().getType(backingTypeName).truncate() or iterating and deleting) rather than building SQL strings, consistent with how other internal operations work.

4. MaterializedViewImpl is not truly immutable but some fields should be final

refreshMode, simpleQuery, and refreshInterval are set once in the constructor and never changed — yet they're not final. The mutable fields (lastRefreshTime, status, changeListener) are correctly volatile. The immutable ones should be final to make the intent clear.

5. setStatus(String) in tests vs setStatus(MaterializedViewStatus) in production code

MaterializedViewImpl exposes setStatus(MaterializedViewStatus status) in its public API, but the test statusTransitions calls view.setStatus("STALE") — a String overload that doesn't appear in the production code shown. Either there's a missing overload or the test is incorrect. Please verify.

6. MaterializedViewBuilder.create() wraps creation in a transaction but view creation involves DDL

MaterializedViewBuilder.create() calls database.transaction(...) to perform the initial refresh. However, the backing type creation (schema.buildDocumentType()...create()) is already a DDL operation that is schema-managed (via recordFileChanges). Wrapping the refresh in a transaction is correct, but mixing DDL (type creation) with a DML transaction (the initial refresh) may interact poorly with HA replication. The dropMaterializedView correctly uses recordFileChanges — consider verifying the create path does too.

7. Thread safety of materializedViews map in LocalSchema

The materializedViews field is a LinkedHashMap accessed from:

  • The main thread (DDL operations)
  • The MaterializedViewScheduler daemon thread (periodic refresh writing status)
  • Post-commit callbacks (which fire on the committing thread)

The LocalSchema has a pattern of using readWriteLock for type/index operations, but the new materialized view methods don't follow this pattern consistently. getMaterializedViews() does materializedViews.values().toArray(...) which is not thread-safe if another thread is calling dropMaterializedView concurrently. Consider wrapping accesses in the existing schema lock.

8. alterMaterializedView has a TOCTOU window

// Tear down old refresh infrastructure
if (oldView.getRefreshMode() == MaterializedViewRefreshMode.INCREMENTAL)
    MaterializedViewBuilder.unregisterListeners(this, oldView);
if (materializedViewScheduler != null)
    materializedViewScheduler.cancel(viewName);

recordFileChanges(() -> {
    final MaterializedViewImpl newView = oldView.copyWithRefreshMode(newMode, newIntervalMs);
    materializedViews.put(viewName, newView);
    ...

Between unregisterListeners and materializedViews.put(newView), a source record mutation could fire the old listener which has already been unregistered but the new one isn't yet in place. The tear-down and re-setup should be atomic with the map update.

9. RemoteSchema.alterMaterializedView throws UnsupportedOperationException inconsistently

buildMaterializedView() and alterMaterializedView() throw UnsupportedOperationException while dropMaterializedView() works via SQL. Returning an UnsupportedOperationException for buildMaterializedView is understandable since it returns a MaterializedViewBuilder, but alterMaterializedView should be implementable via SQL (ALTER MATERIALIZED VIEW ...) just as drop is. This inconsistency may surprise users of the remote API.

10. Missing license header in some new files

RefreshMaterializedViewStatement.java, DropMaterializedViewStatement.java, and AlterMaterializedViewStatement.java are missing the Apache 2.0 license header that all other files in the codebase include. Compare to CreateMaterializedViewStatement.java which has it correctly.


Performance Considerations

  • Full refresh on every commit (INCREMENTAL mode): For high-write workloads with large views, this could be extremely expensive. Each insert to a source type triggers a full DELETE + re-INSERT of the entire view. The design doc acknowledges this as a future improvement, but it's worth a warning in the SQL reference docs.
  • FetchFromSchemaMaterializedViewsStep sorts by calling sorted() on a stream: Fine for the number of materialized views expected, no concern here.
  • materializedViews.values().toArray(new MaterializedView[0]): Minor — prefer new MaterializedView[materializedViews.size()] to avoid array reallocation, though this is negligible.

Test Coverage

The test suite is comprehensive and well-written. Specifically appreciated:

  • TransactionCallbackTest tests the callback mechanism in isolation before testing the higher-level feature.
  • rollbackDoesNotAffectView and incrementalRefreshWorksAfterRollback correctly test the important failure modes.
  • viewSurvivesReopen tests persistence correctly.
  • cannotDropBackingType tests the guard against accidental data loss.

Missing tests:

  • Concurrent access to a view (two threads writing to the source type simultaneously)
  • alterMaterializedView changing refresh mode from INCREMENTAL to MANUAL (listener unregistration)
  • HA replication test for materialized view state — is there a test in ReplicationMaterializedViewIT that verifies the view definition and data are replicated, not just the DDL command?
  • Periodic refresh: no test that actually waits for the scheduler to fire (understandable for CI speed, but worth a note)

Minor Style Issues

  • MaterializedViewBuilder uses var in CreateMaterializedViewStatement (final var builder = ...) which is fine for Java 21, but inconsistent with the rest of the codebase which uses explicit types. Check the existing style preference.
  • MaterializedViewImpl.toString() uses string concatenation rather than StringBuilder in a non-hot path — not a performance concern, just consistency with the rest of the codebase.
  • The docs/plans/ files are internal planning artifacts. Consider whether they belong in the repository long-term or should be removed before merge (or moved to a wiki/issue).

Summary

Category Assessment
Design Good — follows existing patterns, well-documented
Core functionality Works, but INCREMENTAL is misleading (full refresh, not per-record)
Thread safety Needs review around the materializedViews map and alterMaterializedView
Security SQL string building in MaterializedViewRefresher should use Java API instead
Test coverage Good breadth, a few concurrent/HA scenarios missing
License headers 3 files missing headers

The PR is a solid foundation. Recommend addressing issues #1 (naming clarity), #3 (SQL injection pattern), #7 (thread safety), and #10 (license headers) before merging.

@codacy-production
Copy link

codacy-production bot commented Feb 18, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-9.95% 82.24%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (533c7c5) 86378 62318 72.15%
Head commit (ce0f96f) 116798 (+30420) 72638 (+10320) 62.19% (-9.95%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3464) 535 440 82.24%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@claude
Copy link

claude bot commented Feb 18, 2026

Code Review: Add Materialized Views (#3464)

This is a significant feature addition and the overall architecture is well-thought-out. The design document is comprehensive and the test coverage is broad. Below are issues found across the implementation, grouped by severity.


Critical Issues

1. SQL Injection in RemoteSchemaexistsMaterializedView / getMaterializedView

The view name is interpolated directly into a SQL string with single-quote delimiters:

remoteDatabase.command("sql",
    "SELECT FROM schema:materializedViews WHERE name = '" + viewName + "'");

A view name containing a single quote will produce malformed SQL or allow injection. Use a parameterised form:

remoteDatabase.command("sql",
    "SELECT FROM schema:materializedViews WHERE name = ?", viewName);

2. FetchFromSchemaMaterializedViewsStep.close() Clears Shared Field — Corrupts Re-iteration

@Override
public void close() {
    result.clear(); // clears the field-level list
}

result is a field shared across syncPull calls. Clearing it in close() means any subsequent call (e.g., after a reset()) returns zero rows even though cursor is still valid. The inner anonymous ResultSet.close() should not mutate the outer step's state. Compare with other FetchFromSchema*Step classes in the codebase for the correct pattern.

3. Concurrent Refresh Race — No Mutex on fullRefresh

Two concurrent calls to MaterializedViewRefresher.fullRefresh for the same view (e.g., a manual refresh() racing with the periodic scheduler, or two near-simultaneous incremental post-commit callbacks from concurrent transactions) will produce two interleaved DELETE FROM + insert sequences. The result is non-deterministic — the view may end up empty or with duplicated rows. A per-view ReentrantLock or AtomicBoolean refreshing guard is needed.

4. extractSourceTypes Only Handles a Single FROM Target

final FromItem item = from.getItem();
if (item != null && item.getIdentifier() != null)
    types.add(item.getIdentifier().getStringValue());

Only one source type is extracted. Queries with JOINs, subqueries, or implicit multi-source traversals will have an empty (or incomplete) sourceTypeNames list, silently making INCREMENTAL mode non-functional for those types — no listeners are registered for the missed types. The method should either extract all referenced types or throw an error/log a warning when it detects a complex query in INCREMENTAL mode.


High Severity

5. DELETE FROM Instead of TRUNCATE in fullRefresh

database.command("sql", "DELETE FROM `" + backingTypeName + "`");

DELETE FROM without a WHERE clause creates individual WAL entries for every record. For large views this is orders of magnitude slower than a truncate. Use TRUNCATE TYPE or direct bucket truncation instead.

6. Status / lastRefreshTime Updated Outside Transaction

In MaterializedViewRefresher.fullRefresh, the status is set to VALID and lastRefreshTime is updated after the transaction commits:

database.transaction(() -> { /* DELETE + INSERT */ });
view.updateLastRefreshTime();   // ← outside transaction
view.setStatus(VALID);          // ← outside transaction

A JVM crash between the transaction commit and the status update leaves the in-memory state inconsistent with what was persisted. The crash-recovery path in readConfiguration handles the BUILDING case, but saveConfiguration() should be called after the status is set to VALID, not before.

7. First saveConfiguration() Before Initial Refresh Creates Crash Window

In MaterializedViewBuilder.create(), saveConfiguration() is called once before fullRefresh() and again after. A crash in the window between these two calls leaves schema.json with the view at BUILDING status. The recovery code handles this correctly, but the first saveConfiguration() call is unnecessary — only the post-refresh save is needed. Removing it eliminates the crash window.

8. getStatus() Returns String Instead of MaterializedViewStatus Enum

The public interface declares String getStatus() even though MaterializedViewStatus is a public enum. This forces callers to use fragile string comparisons (view.getStatus().equals("VALID")). The return type should be MaterializedViewStatus.

9. INCREMENTAL Mode Always Performs a Full Refresh — Misleading API

The isSimpleQuery() classifier exists and is exposed in the public interface and schema metadata, implying it has operational meaning. But MaterializedViewChangeListener always performs a full refresh regardless:

// NOTE: currently always performs a full refresh even for simple queries.

Either remove isSimpleQuery() from the public API until the optimization is implemented, or add clear Javadoc marking it as a reserved/future field. As-is, the INCREMENTAL mode name implies per-record deltas but behaves as "full refresh on every commit."


Medium Severity

10. AlterMaterializedViewStatement.toString() Emits Invalid SQL

The toString() output for a periodic view is:

ALTER MATERIALIZED VIEW foo REFRESH PERIODIC EVERY 5 MINUTE

REFRESH PERIODIC is not valid grammar — only REFRESH EVERY N UNIT is. PERIODIC is an internal enum value, not a SQL keyword. The same issue exists in CreateMaterializedViewStatement.toString().

11. ResultInternal() No-Arg Constructor in DDL Statements

All four new statement classes use new ResultInternal() (no database reference), while existing DDL statements (e.g., CreateDocumentTypeStatement) use new ResultInternal(context.getDatabase()). This inconsistency may cause failures if the result is introspected by callers expecting a database-backed result.

12. materializedViews Map — Inconsistent Synchronisation in LocalSchema

toJSON() and readConfiguration() are synchronized, but dropMaterializedView(), alterMaterializedView(), existsMaterializedView(), and getMaterializedView() are not. The materializedViews field is a plain LinkedHashMap, so concurrent DDL operations (e.g., a drop racing with schema serialisation) can cause ConcurrentModificationException or data loss. Either synchronise all access methods or upgrade to ConcurrentHashMap.

13. MaterializedViewScheduler.shutdown() — No awaitTermination

executor.shutdownNow() sends an interrupt but does not wait for completion. If a periodic refresh holds a database lock when shutdown is called (e.g., during LocalSchema.close()), the interrupt may leave a transaction mid-flight. Add executor.awaitTermination(timeout, unit) as done in other ArcadeDB scheduled services.

14. Possible Nested Transaction During Initial fullRefresh

MaterializedViewBuilder.create() calls schema.recordFileChanges(...), which runs inside a transaction. Inside that lambda, MaterializedViewRefresher.fullRefresh is called, which calls database.transaction(...). This is a nested transaction — the behaviour depends on ArcadeDB's transaction model but is worth verifying explicitly.

15. Backtick-Quoted Backing Type Name in DELETE FROM Is Fragile

database.command("sql", "DELETE FROM `" + backingTypeName + "`");

A type name containing a backtick would produce malformed SQL. While uncommon, the schema should explicitly forbid backticks in view names, or the deletion should use a non-SQL code path (e.g., direct bucket truncation).


Low Severity / Style

16. FetchFromSchemaMaterializedViewsStep — Unnecessary instanceof Cast

if (view instanceof MaterializedViewImpl)
    r.setProperty("refreshInterval", ((MaterializedViewImpl) view).getRefreshInterval());

getRefreshInterval() is declared on the MaterializedView interface, so the cast is unnecessary: view.getRefreshInterval() works directly.

17. context.setVariable("current", r) Set During Build, Not During Iteration

In FetchFromSchemaMaterializedViewsStep, current is set inside the construction loop. After the loop it points to the last view, not the currently iterated one. In other schema fetch steps current is updated inside next() so it always reflects the item being returned to the caller. This should be consistent.

18. MaterializedView Interface Leaks Implementation Details

toJSON(), isSimpleQuery(), and getRefreshInterval() are implementation/internal concerns that leak into the public interface. toJSON() is a serialisation detail; isSimpleQuery() is currently unused for any optimisation; getRefreshInterval() is only meaningful for PERIODIC mode. The public interface should be kept minimal.

19. RemoteMaterializedView — Unchecked Cast for sourceTypeNames

this.sourceTypeNames = result.getProperty("sourceTypes");

getProperty() returns Object. The implicit cast to List<String> is unchecked and will throw ClassCastException if the server returns a different type. A safe conversion (as done in MaterializedViewImpl.fromJSON) should be used.

20. MaterializedViewBuilder — No Validation of refreshInterval for Non-PERIODIC Modes

.withRefreshMode(MANUAL).withRefreshInterval(60000L) silently ignores the interval with no warning. A LogManager warning would help users diagnose misconfigured views.


Test Coverage Gaps

21. No Concurrent-Refresh Test
No test exercises two threads calling refresh() simultaneously on the same view, which is the scenario for the race condition in issue #3 above.

22. alterNonExistentViewThrows Uses Exception.class — Too Broad

assertThatThrownBy(...).isInstanceOf(Exception.class);

Use the specific expected exception type (e.g., SchemaException.class) so the test catches only the expected failure mode.

23. Missing: REFRESH MATERIALIZED VIEW on Non-Existent View
No test exercises the error path in RefreshMaterializedViewStatement when the view doesn't exist.

24. HTTPMaterializedViewIT — Fragile Index-0 Assertion

final JSONObject viewMeta = records.getJSONObject(0);
assertThat(viewMeta.getString("name")).isEqualTo("MetricView");

This assumes MetricView is always at index 0. With a shared server instance across tests, leftover views from a failed test cleanup would break this assertion with a confusing message.

25. Missing: Periodic Scheduler Lifecycle After Database Reopen
No test verifies that PERIODIC views are correctly rescheduled after a database close + reopen cycle.

26. Missing: Multi-Source INCREMENTAL Mode Behaviour
No test verifies the behaviour (or expected error) when INCREMENTAL mode is used with a query referencing multiple types. This is the scenario where extractSourceTypes silently returns an incomplete list.

27. viewSurvivesReopen Should Verify Property Values, Not Just Count
The test verifies the view exists after reopen but should also verify that the actual property values are correctly persisted through the serialisation round-trip.


Summary

Severity Count
Critical 4
High 5
Medium 6
Low/Style 5
Tests 7

The most urgent items to address before merging are:

  1. SQL injection in RemoteSchema (build support #1)
  2. close() bug in FetchFromSchemaMaterializedViewsStep (Bump gremlin.version from 3.4.10 to 3.5.1 #2)
  3. Concurrent refresh race condition (Bump slf4j-simple.version from 1.7.31 to 1.7.32 #3)
  4. TRUNCATE vs DELETE FROM performance (Bump jedis from 3.6.1 to 3.6.3 #5)
  5. Thread safety of the materializedViews map (Bump gson from 2.8.7 to 2.8.8 #12)

The feature is well-conceived and the test suite is commendably broad for a new feature of this size. With the issues above addressed this would be in good shape.

@claude
Copy link

claude bot commented Feb 18, 2026

Code Review: Add Materialized Views (#3464)

This is a well-structured implementation of materialized views with a solid design foundation. The architecture is sound and the TDD approach with comprehensive test coverage is excellent. Below is my detailed review.


Positive Highlights

  • Clean architecture: Using a real DocumentType as the backing store (rather than a virtual view) is a pragmatic choice that avoids query-planner changes and lets the existing execution pipeline handle view queries transparently.
  • TDD approach: Test-first development with good coverage across unit, SQL, incremental, and server integration tests.
  • addAfterCommitCallbackIfAbsent: The keyed deduplication mechanism is a clean solution to avoid registering multiple callbacks per view per transaction.
  • Crash recovery: Setting BUILDING → STALE on database reopen is a good safety net.
  • Correct use of volatile on MaterializedViewImpl.status, lastRefreshTime, and changeListener.
  • Listener cleanup on drop: Unregistering listeners and cancelling the scheduler on view drop is handled correctly.

Issues and Concerns

Critical / Correctness

1. SQL Injection in RemoteSchema.existsMaterializedView and getMaterializedView

remoteDatabase.command("sql", "SELECT FROM schema:materializedViews WHERE name = '" + viewName + "'");

View names are user-supplied and concatenated directly into SQL. A name like x' OR '1'='1 could produce unintended results. Use parameterized queries or at least backtick-quote the view name for consistency with dropMaterializedView which uses backtick quoting.

// In dropMaterializedView:
remoteDatabase.command("sql", "DROP MATERIALIZED VIEW `" + viewName + "`"); // backtick-quoted ✓

// But in existsMaterializedView/getMaterializedView:
"SELECT FROM schema:materializedViews WHERE name = '" + viewName + "'"  // string concat ✗

Also, schema:materializedViews as a SQL query target for a filter operation seems non-standard — this should be verified to work correctly with the FetchFromSchemaMaterializedViewsStep.

2. Post-commit callbacks fire after reset() — transaction state is cleared before callbacks run

In the resetAndFireCallbacks() pattern (as described in the plan docs and TransactionContext), callbacks fire after reset(). This means database.isTransactionActive() returns false during the callback. The MaterializedViewRefresher.fullRefresh() starts a new transaction inside, which is correct, but any code that checks isTransactionActive() assuming it's inside the committing transaction will be surprised. Ensure this is explicitly documented on the addAfterCommitCallback method.

3. MaterializedViewRefresher.fullRefresh re-throws exceptions but status is not persisted

When fullRefresh fails and sets status to ERROR, saveConfiguration() is not called, so the ERROR status is lost on the next database open (it will reload as VALID or whatever was last persisted). The view will then be queryable with stale/empty data but show VALID status.

// In fullRefresh():
view.setStatus(MaterializedViewStatus.ERROR);
// ❌ No saveConfiguration() call here
LogManager.instance().log(...);
throw e;

4. Refresh inside an existing transaction in MaterializedViewBuilder.create()

MaterializedViewRefresher.fullRefresh calls database.transaction(() -> { ... }) which starts a new transaction. But create() is itself called inside a transaction (tests do database.transaction(() -> { schema.buildMaterializedView()...create(); })). This would create a nested transaction. Verify that ArcadeDB's nested transaction semantics work here, or document that create() must be called outside a transaction.

5. extractSourceTypes only handles single-type FROM clauses

For queries like SELECT a.name, b.total FROM TypeA JOIN TypeB, only the first FROM item is extracted. The validation then only checks one source type, leaving other source types unchecked and unlisted in the view's metadata. Incremental refresh would miss changes to unlisted types. This should either error on multi-type queries or be documented as a known limitation.


Design / Architecture

6. The callbackKey in MaterializedViewChangeListener uses thread ID instead of a stable transaction ID

From the plan docs, an earlier design used Thread.currentThread().threadId() to track registered callbacks. The final implementation uses "mv-refresh:" + view.getName() as the key, which is cleaner and correct since there's one key per view per transaction. Good change. No issue here — this is confirmed correct in the final implementation.

7. MaterializedViewBuilder stores ifNotExists but the current behavior returns existing view without re-running refresh

if (schema.existsMaterializedView(name)) {
  if (ifNotExists)
    return schema.getMaterializedView(name); // Returns existing view, no refresh
  ...
}

This is acceptable behavior but should be documented — CREATE MATERIALIZED VIEW IF NOT EXISTS will not refresh the existing view.

8. Potential double saveConfiguration() call in create()

schema.saveConfiguration() is called once after materializedViews.put() and again after fullRefresh(). The first save persists the view with lastRefreshTime=0 and status=VALID (before the initial refresh completes). If the process crashes between the two saves, the view has an incorrect lastRefreshTime. Consider saving only once, after the refresh succeeds.

9. No concurrency protection for concurrent INCREMENTAL refreshes

If two transactions on different threads commit simultaneously with changes to the same source type, two post-commit callbacks for the same view will both start fullRefresh concurrently. Both will call database.transaction(() -> DELETE FROM ...; INSERT INTO ...). ArcadeDB's MVCC will handle data isolation, but the result could be two concurrent truncate+repopulate cycles interfering with each other.


Minor Issues

10. MaterializedViewImpl getters use single-line body style inconsistently with the codebase

@Override public String getName() { return name; }

Per CLAUDE.md, this project uses the standard Java formatting. Single-line methods should follow the project's existing style for consistency.

11. The plan docs reference ON_COMMIT but the enum uses INCREMENTAL

The earlier design plan (impl.md) references ON_COMMIT as the refresh mode name, but the actual implementation uses INCREMENTAL. Minor inconsistency between plans and implementation — not a code issue but worth cleaning up in the docs since they're checked into the repo.

12. AlterMaterializedViewStatement.toString() doesn't handle MANUAL or INCREMENTAL correctly

if ("PERIODIC".equalsIgnoreCase(refreshMode) && refreshInterval > 0)
  sb.append(" EVERY ").append(refreshInterval).append(' ').append(refreshUnit);

When refreshMode is MANUAL or INCREMENTAL, this is fine. But the toString() appends refreshMode directly from the field, which might be null if not set. Add a null check.

13. No test for periodic refresh actually triggering

The periodic refresh scheduler is tested only structurally (that it exists). A test that verifies the scheduler actually calls fullRefresh after the interval elapses would improve confidence.

14. Plan documents checked into docs/plans/ add 5,000+ lines

The three plan/design markdown files account for the bulk of the 9,171 additions. Consider whether these should live in the PR description / issue tracker or a wiki rather than the repository itself, as they're implementation notes rather than project documentation.


Summary

The core implementation is solid and correct for the happy path. The main issues to address before merging:

Priority Issue
🔴 High SQL injection in RemoteSchema string-concatenated queries
🔴 High ERROR status not persisted to schema.json on refresh failure
🟡 Medium Nested transaction in create() when called inside user transaction
🟡 Medium Multi-type FROM clause silently ignored in extractSourceTypes
🟡 Medium Concurrent INCREMENTAL refresh race condition
🟢 Low Double saveConfiguration() before initial refresh completes
🟢 Low Single-line method style inconsistency

@claude
Copy link

claude bot commented Feb 18, 2026

Code Review: Add Materialized Views (#3464)

This is a substantial feature addition covering schema management, SQL grammar, transaction callbacks, event-driven refresh, periodic scheduling, and remote API support. The overall architecture is sound and consistent with existing patterns. Below are my findings, organized by severity.


High Priority

1. extractSourceTypes() only handles single-table FROM clauses

MaterializedViewBuilder.extractSourceTypes() only extracts the first FromItem:

final FromItem item = from.getItem();
if (item != null && item.getIdentifier() != null)
    types.add(item.getIdentifier().getStringValue());

Any query with a subquery or multiple source types silently returns an empty or single-entry list. In INCREMENTAL mode, no listeners are registered for missing source types, so the view is never automatically refreshed. There's no error surfaced to the user — the view is created but silently broken for incremental refresh. Consider throwing an UnsupportedOperationException or at minimum a warning when source types cannot be fully extracted and INCREMENTAL mode is requested.

2. Crash recovery: BUILDING status not persisted before initial refresh

In MaterializedViewBuilder.create():

schema.materializedViews.put(name, view);  // view starts as VALID (constructor default)
schema.saveConfiguration();                // saves status=VALID  ← first save
MaterializedViewRefresher.fullRefresh(...); // sets status BUILDING (in memory only)
schema.saveConfiguration();                // saves status=VALID again after success

fullRefresh sets status to BUILDING in memory but never saves it to disk. If the JVM crashes mid-refresh, the schema file shows VALID with zero records — crash recovery in readConfiguration() only checks for BUILDING in the persisted file, so the empty view is never auto-recovered. Fix: set and persist BUILDING status before the refresh.

3. Crash recovery marks STALE but does not schedule auto-refresh

The design doc says "Schedule an automatic full refresh for all STALE views" on startup. The implementation only does:

if (MaterializedViewStatus.BUILDING.name().equals(view.getStatus()))
    view.setStatus(MaterializedViewStatus.STALE);

No auto-refresh is scheduled. Users must manually issue REFRESH MATERIALIZED VIEW to recover. This should either be fixed or documented explicitly as a known gap.

4. SQL injection risk via backtick quoting in TRUNCATE TYPE

database.command("sql", "TRUNCATE TYPE `" + backingTypeName + "` UNSAFE");

The backing type name is set to the user-supplied view name. A view name containing a backtick (`) would break the quoting. ArcadeDB should validate that view/type names don't contain backtick characters, or use a parameterized command form. The same pattern appears in MaterializedViewChangeListener (line with DELETE FROM ).


Medium Priority

5. Post-commit callbacks fire on no-op (empty) transactions

if (phase1 != null)
    commit2ndPhase(phase1);
else
    resetAndFireCallbacks();  // fires even when nothing was committed

When commit1stPhase() returns null (no dirty pages), resetAndFireCallbacks() is called instead of plain reset(). In practice, record event listeners only register callbacks when a mutation occurs, so this is safe for now. But it's a semantic trap for future callers: any callback registered during a transaction that touches no dirty pages (e.g., calling addAfterCommitCallback manually on a read-only transaction) will incorrectly fire. Add a comment documenting this behavior, or only fire callbacks in the non-null branch.

6. alterMaterializedView(): listener teardown outside recordFileChanges()

In LocalSchema.alterMaterializedView(), old infrastructure (listener unregistration, scheduler cancellation) is torn down before recordFileChanges(), while new infrastructure is set up inside. In HA mode, the recordFileChanges payload is replicated to followers. On a follower, applying the change would set up new infrastructure without first tearing down old, potentially resulting in both old and new listeners being active simultaneously.

7. dropType() iterates materializedViews without synchronization

for (final MaterializedViewImpl view : materializedViews.values()) { // not synchronized
    if (view.getBackingTypeName().equals(typeName))
        throw new SchemaException(...);
}

materializedViews is a LinkedHashMap (not thread-safe). If a MV is being created concurrently, this is a data race. Since dropMaterializedView() and alterMaterializedView() are synchronized, the guard in dropType() should be too (or the loop should be inside a synchronized(this) block).

8. RemoteMaterializedView: unchecked cast and missing null check

this.sourceTypeNames = result.getProperty("sourceTypes"); // unchecked cast, no null check

If the server doesn't return sourceTypes (e.g., older server version), this will be null and getSourceTypeNames() will throw NullPointerException. Use result.getProperty("sourceTypes", List.of()) or add an explicit null check.


Low Priority / Style

9. FetchFromSchemaMaterializedViewsStep: unnecessary instanceof cast

if (view instanceof MaterializedViewImpl)
    r.setProperty("refreshInterval", ((MaterializedViewImpl) view).getRefreshInterval());

getRefreshInterval() is declared in the MaterializedView interface, so the cast is unnecessary. Call view.getRefreshInterval() directly.

10. MaterializedView interface exposes getStatus() as String

The interface returns a String while internally MaterializedViewStatus is an enum. This forces callers to do string comparison ("STALE", "VALID") rather than type-safe enum comparison. Consider returning MaterializedViewStatus from the interface, or add convenience methods like isStale() / isValid().

11. New keywords may shadow existing identifiers

MATERIALIZED, VIEW, REFRESH, EVERY, SECOND, MINUTE, HOUR, MANUAL, INCREMENTAL are now lexer tokens. They're added back to the identifier rule, but this only covers grammar contexts that explicitly use identifier. If any existing user has a field or type named (for example) second, minute, manual, or view and queries it without backtick-quoting, the lexer will now emit keyword tokens. This is a potential compatibility break worth noting in the release notes.

12. MaterializedViewQueryClassifier.isSimple() is incomplete

The classifier does not check for TRAVERSE or LET subqueries (mentioned in the design doc as making queries "complex"). Since the incremental implementation always does a full refresh anyway, this is not a current correctness issue, but the isSimple flag would mislead future optimizations.

13. Docs/design files committed to docs/plans/

Four large planning documents (5900+ lines combined) are included in the PR. These are useful artifacts for understanding the design but probably don't belong in the main source tree. Consider moving them to a wiki page or PR description if the plan is to merge them.


Test Coverage

The test suite is comprehensive for a new feature of this size:

  • MaterializedViewTest — core Java API
  • MaterializedViewSQLTest — SQL DDL/DML
  • MaterializedViewIncrementalTest — INCREMENTAL mode
  • MaterializedViewEdgeCaseTest — edge cases
  • TransactionCallbackTest — the new afterCommitCallbacks mechanism
  • HTTPMaterializedViewIT / RemoteMaterializedViewIT — server/remote
  • ReplicationMaterializedViewIT — HA replication

A few gaps worth addressing:

  • No test for crash recovery (view in BUILDING state on restart)
  • No test for the empty-transaction callback-fires case
  • No test covering backtick in view name (security regression)
  • No test for alterMaterializedView() in HA mode

Summary

The overall design is well-thought-out and consistent with ArcadeDB's existing patterns. The most important issues to address before merging are:

  1. Crash recovery gap (persisting BUILDING before first save)
  2. Source type extraction limited to single-table FROM
  3. SQL injection via backtick in type name
  4. Auto-refresh of STALE views on startup (design intent vs. implementation gap)

The PR is marked as DRAFT, so these are well-timed to address. Good work on the breadth of this feature implementation.

@claude
Copy link

claude bot commented Feb 18, 2026

Code Review: Add Materialized Views (#3464)

This is a substantial and well-structured feature implementation. The overall design is solid and fits well within ArcadeDB's existing architecture. Below is my feedback organized by area.


Architecture & Design

Strengths:

  • Clean interface/implementation split (MaterializedView interface + MaterializedViewImpl)
  • Immutable-style copyWithRefreshMode() for ALTER MATERIALIZED VIEW avoids in-place mutation bugs
  • Correct use of recordFileChanges() for HA atomicity on create/drop
  • Crash recovery logic (BUILDING → STALE on startup) is a thoughtful defensive measure
  • WeakReference<Database> in the scheduler prevents memory leaks when a DB is closed

Concerns:

  1. INCREMENTAL mode is misleading. MaterializedViewChangeListener triggers a full refresh on every source record change, which is semantically indistinguishable from PERIODIC or MANUAL plus a trigger. The mode name suggests row-level delta tracking. If this is intentional technical debt, it should be documented in the Schema.java Javadoc or in the SQL grammar comment, not just in an inline code comment. Users choosing INCREMENTAL will expect better performance characteristics than they actually get, especially at scale.

  2. Single shared scheduler thread. MaterializedViewScheduler uses newSingleThreadScheduledExecutor() for all PERIODIC views across all databases in the JVM. If one refresh is slow (large dataset), it will delay all other scheduled views. Consider newScheduledThreadPool(N) or at minimum document the limitation.

  3. Backing type name equals view name. This conflation means users cannot create a regular document type with the same name as a materialized view. The guard in dropType() is correct, but the constraint should also be checked in createDocumentType() to prevent the reverse scenario.


Potential Bugs

  1. SQL injection via view name in MaterializedViewRefresher. The fullRefresh method builds:

    database.command("sql", "TRUNCATE TYPE `" + backingTypeName + "` UNSAFE");

    While backticks in names are rejected during MaterializedViewBuilder.create(), the backingTypeName stored in JSON and reloaded via fromJSON() bypasses that check. A carefully crafted schema.json could inject SQL. The backing type name should be validated on load.

  2. resetAndFireCallbacks() swallows exceptions silently. When a post-commit callback fails (e.g., the INCREMENTAL refresh fails), the view is set to STALE but the calling transaction has already committed successfully. The log level of WARNING may be too quiet for a data consistency event — consider SEVERE.

  3. MaterializedViewBuilder.extractSourceTypes() only handles simple FROM clauses. JOINs (e.g., SELECT a.x, b.y FROM TypeA a, TypeB b) will not extract both source types. For INCREMENTAL mode, this means changes to TypeB will not trigger refresh. This should be documented as a known limitation.


Code Quality

  1. RemoteMaterializedView constructor uses raw null-checks and casts. The constructor does:

    this.simpleQuery = result.getProperty("simpleQuery") != null ? (Boolean) result.getProperty("simpleQuery") : false;
    this.refreshInterval = result.getProperty("refreshInterval") != null ? ((Number) result.getProperty("refreshInterval")).longValue() : 0;

    If Result has typed getters, prefer those over manual null-checks and casts to reduce boilerplate.

  2. docs/plans/ files should not be committed to the main source tree. Four design/implementation planning markdown files (2026-02-06-materialized-views-design.md, 2026-02-06-materialized-views-impl.md, 2026-02-14-materialized-views-implementation.md, 2026-02-15-materialized-views-testing.md) are included in this PR. These belong in a wiki, GitHub discussions, or linked from the issue (Add Materialized Views support #3463) — not in the repository source tree. Remove these before merging.

  3. MaterializedViewQueryClassifier is untested directly. The isSimple classification is a non-trivial piece of logic (GROUP BY detection, aggregate functions, subqueries in FROM) that deserves its own focused unit test with various query patterns.


Test Coverage

Good coverage overall. The test suite (MaterializedViewTest, MaterializedViewIncrementalTest, MaterializedViewSQLTest, MaterializedViewEdgeCaseTest, RemoteMaterializedViewIT, HTTPMaterializedViewIT, ReplicationMaterializedViewIT) covers the main paths well.

Missing test cases:

  1. PERIODIC mode end-to-end test. There is no test that creates a view with REFRESH EVERY 1 SECOND, waits for a refresh, and confirms data is updated. The MaterializedViewScheduler code path is entirely untested.

  2. Concurrent refresh scenario. What happens if two threads trigger fullRefresh simultaneously on the same view (e.g., rapid inserts in INCREMENTAL mode)? There is no lock preventing concurrent refreshes — two concurrent TRUNCATE + INSERT cycles could result in an empty or partially populated view.

  3. alterMaterializedView from PERIODIC to MANUAL — the scheduler cancel() should be tested to ensure the task is actually cancelled and no further refreshes fire after the mode change.


Minor Issues

  • MaterializedViewRefreshMode.PERIODIC has no direct SQL keyword — only EVERY N UNIT syntax maps to it. This is fine but should be noted in the documentation.
  • The new top-level REFRESH keyword in REFRESH MATERIALIZED VIEW should be verified not to conflict with REFRESH used as an identifier in existing queries.

Summary

Priority Issue
High Remove docs/plans/ markdown files from the PR
High Document or rename INCREMENTAL mode (it currently performs a full refresh on change, not incremental)
Medium Add concurrent refresh protection in MaterializedViewRefresher
Medium Add MaterializedViewQueryClassifier unit tests
Medium Add PERIODIC mode end-to-end test
Low Single shared scheduler thread limitation — document or fix
Low extractSourceTypes() multi-type JOIN limitation — document

Given this is still in DRAFT state, these are the right issues to address before marking ready for review. The foundation is solid.

@claude
Copy link

claude bot commented Feb 18, 2026

Code Review: Add Materialized Views (#3464)

This is a well-structured, large feature addition. The design document is thorough and the implementation follows existing project conventions closely. Below is my detailed feedback.


Architecture & Design

The overall architecture is sound: materialized views backed by real DocumentType instances, schema persistence in schema.json, and a post-commit callback mechanism for incremental refresh. The design document is excellent and the implementation matches it closely.

Positive highlights:

  • Reuses the existing TriggerImpl JSON round-trip pattern for persistence.
  • The addAfterCommitCallbackIfAbsent(key, callback) deduplication is a clean solution for coalescing multiple mutations in one transaction into a single refresh.
  • The AtomicBoolean refreshInProgress guard in MaterializedViewRefresher correctly prevents concurrent refreshes.
  • Crash recovery (BUILDING → STALE on startup) is a nice touch.
  • The WeakReference<Database> in MaterializedViewScheduler prevents GC leaks.

Issues & Concerns

1. SQL Injection in MaterializedViewRefresher.fullRefresh()Security

database.command("sql", "TRUNCATE TYPE `" + backingTypeName + "` UNSAFE");

The backing type name is user-supplied (from the CREATE MATERIALIZED VIEW statement). While backtick quoting reduces risk, a name containing a backtick character would break or escape this. The backing type name should be validated to contain only safe characters at creation time, or the truncation should be done via the Java API directly rather than constructing a SQL string. Since the project already has type-safe APIs for truncation, this is the safer path.

2. Nested Transaction in fullRefresh()Potential Bug

MaterializedViewRefresher.fullRefresh() wraps the truncate+insert in database.transaction(() -> {...}). For incremental refresh this runs inside the post-commit callback, which executes after the outer transaction commits — that is correct. But for manual refresh triggered by REFRESH MATERIALIZED VIEW (which itself runs inside executeDDL() which may or may not be in a transaction), nesting could be silently swallowed depending on the transaction isolation mode. Please verify the behavior when refresh() is called inside an active transaction.

3. Stale Listener Reference After ALTERBug

The copyWithRefreshMode() method creates a whole new MaterializedViewImpl and replaces the old entry in the materializedViews map during alterMaterializedView(). Any code holding a reference to the old MaterializedViewImpl — including the MaterializedViewChangeListener whose view field still points to the old object — now has a stale reference. After an ALTER, the change listener would report status on the old (now orphaned) view object. Consider using a mutable refreshMode field (with appropriate synchronization) rather than the copy-on-write approach, or ensure listeners are re-pointed to the new instance.

4. RemoteSchema.alterMaterializedView() Throws UnsupportedOperationExceptionInterface Contract Violation

throw new UnsupportedOperationException(
    "alterMaterializedView() is not supported remotely. Use SQL ALTER MATERIALIZED VIEW instead.");

This breaks the Schema interface contract. Callers programming to the Schema interface have no way to know this method will throw at runtime. Either implement it by delegating to SQL (like dropMaterializedView does), throw a specific checked exception, or document it in the interface Javadoc. The same applies to buildMaterializedView() on RemoteSchema.

5. Instance-Level Cursor in FetchFromSchemaMaterializedViewsStepThread-Safety

private int cursor = 0;

The cursor field is instance-level state. If syncPull() is called concurrently (possible in a parallel execution plan), two threads would corrupt each other's cursor. Compare with FetchFromSchemaIndexesStep to see how this is handled in existing code.

6. TRUNCATE TYPE ... UNSAFE and Index Consistency — Data Safety

TRUNCATE TYPE ... UNSAFE bypasses the WAL and index updates. This could leave indexes in an inconsistent state if the backing type has any indexes defined. Crash recovery handles the empty-view case (BUILDING → STALE), but a corrupted index is harder to recover. Consider using the Java truncate API instead.

7. schema:materializedviews Case Sensitivity — Usability

In SelectExecutionPlanner.java:

case "materializedviews"-> plan.chain(new FetchFromSchemaMaterializedViewsStep(context));

All existing schema targets (indexes, database, buckets, etc.) are lowercase. The test uses SELECT FROM schema:materializedViews (camelCase). Verify whether the parser normalizes this before the switch statement, or if users must type schema:materializedviews (all lowercase).

8. Post-Commit Callback Reentrancy Risk — Correctness

The incremental refresh callback calls MaterializedViewRefresher.fullRefresh()database.transaction(...). If that inner transaction triggers record events (e.g., the backing type is also a source for another view), unbounded reentrancy is possible. A guard or depth counter should be considered.

9. refreshInProgress Package-Private Access — Encapsulation

final AtomicBoolean refreshInProgress = new AtomicBoolean(false);

This is accessed directly by MaterializedViewRefresher. Prefer a package-private method boolean tryBeginRefresh() / void endRefresh() that encapsulates the CAS logic, keeping the AtomicBoolean private.


Minor / Style

  • Dead code: MaterializedViewQueryClassifier and isSimpleQuery() in MaterializedViewImpl are currently dead — the per-record incremental path is not implemented. Consider removing them until needed, or file a follow-up issue to avoid confusing future contributors.

  • MaterializedViewScheduler: When WeakReference.get() returns null, the scheduled task silently stops. Log this at FINE level so operators can diagnose why a PERIODIC view stopped refreshing.

  • LocalSchema.dropType() guard: The error message when blocking a drop of a backing type should name the materialized view that owns it.

  • Test style: CLAUDE.md requires assertThat(x).isTrue() style. Verify no assertThat(...).isEqualTo(true) patterns remain. Also verify no System.out.println calls remain in test or implementation code.


Test Coverage

Overall the test coverage is good. Specific gaps:

  1. HA replication of incremental refresh: ReplicationMaterializedViewIT tests schema replication but not whether view data is replicated to replicas after an incremental refresh on the leader.

  2. Concurrent refresh suppression: No test exercises the refreshInProgress.compareAndSet guard — two threads attempting fullRefresh() simultaneously. This correctness guarantee deserves a test.

  3. Listener cleanup after ALTER: alterFromIncrementalToManual() verifies no auto-refresh happens, but does not verify that the change listener was actually unregistered.

  4. Error recovery: No test covers the ERROR status path — what happens when the defining query fails at runtime.


Summary

The feature is well-conceived and the implementation is largely solid. The main issues to address before merging:

  1. SQL injection risk in the TRUNCATE TYPE string construction (item 1)
  2. Stale listener reference after ALTER MATERIALIZED VIEW (item 3)
  3. UnsupportedOperationException contract violation on RemoteSchema (item 4)

The remaining items are lower priority but worth addressing before a stable release.

@codacy-production
Copy link

codacy-production bot commented Feb 18, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-10.59% 86.08%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (2eeeb3f) 87621 64089 73.14%
Head commit (2e4d962) 118059 (+30438) 73855 (+9766) 62.56% (-10.59%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3464) 553 476 86.08%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@claude
Copy link

claude bot commented Feb 19, 2026

Materialized Views — Code Review

This is a solid and well-structured implementation. The design is cohesive, the code follows existing ArcadeDB conventions, and test coverage is good. Below are observations organized by severity.


Correctness / Potential Bugs

1. commit2ndPhase fires callbacks even on rollback path

The resetAndFireCallbacks() is called inside finally { if (committed) ... }, which looks correct at first glance. However, the committed flag is set before the finally block, so if an exception is thrown after committed = true but before the method returns, callbacks will fire even though the commit may be in a broken state. This should be verified carefully — it's a subtle race condition.

2. Full refresh inside a post-commit callback re-opens a new transaction

In MaterializedViewRefresher.fullRefresh(), the refresh runs inside database.transaction(() -> ...). This transaction runs after the source transaction has committed (via post-commit callback). That's correct. But note that the INCREMENTAL change listener fires fullRefresh (not a per-record delta), so for workloads with many small transactions on large views, this will be expensive. The code comments acknowledge this as a future optimization, but it should be documented in the PR description and/or as a known limitation.

3. FetchFromSchemaMaterializedViewsStep schema query uses case-insensitive lower-case match

In SelectExecutionPlanner.java, the routing uses:

case "materializedviews" -> ...

This means SELECT FROM schema:materializedViews only works if the routing lowercases the token before the switch. Verify the existing infrastructure does this consistently for all other schema:* targets (e.g., schema:indexes). If not, schema:MaterializedViews (with capital M) would silently fall through to the default exception branch.

4. RemoteSchema.dropMaterializedView uses string concatenation, not a parameter

remoteDatabase.command("sql", "DROP MATERIALIZED VIEW `" + viewName + "`");

Even though backtick quoting is used, a view name containing a backtick would break this. The builder already validates names don't contain backticks on creation, but the remote drop path has no such validation. Consider using a parameterized form or at minimum re-validating.

5. Source type existence is validated at creation, but not at schema reload

In readConfiguration(), views are loaded from JSON and re-registered without re-validating that their source types still exist. If a source type is dropped between sessions (e.g., the schema.json is edited externally or through some crash scenario), the listener registration will throw at startup. Add a guard or log-and-skip for missing source types during reload.


Design / API Concerns

6. MaterializedView.getStatus() returns String instead of MaterializedViewStatus

The interface declares String getStatus() but the enum MaterializedViewStatus already exists. This forces callers to parse/compare strings rather than using the type-safe enum. The RemoteMaterializedView implementation stores it as a String for deserialization convenience, but the interface should return the enum and RemoteMaterializedView should parse it. This is a clean API issue that's worth fixing before the feature is considered stable.

7. RemoteMaterializedView.refresh() and drop() throw UnsupportedOperationException

Since the MaterializedView interface includes refresh() and drop(), remote clients calling these methods will get a runtime exception with no compile-time warning. Consider either:

  • Splitting the interface into a read-only MaterializedViewInfo and a write-capable MaterializedView
  • Or adding Javadoc to the interface clearly marking these methods as unsupported in remote contexts

8. MaterializedViewImpl has package-private refreshInProgress field

final AtomicBoolean refreshInProgress = new AtomicBoolean(false);

This is accessed directly from MaterializedViewRefresher (same package). While it works, it's better to expose this through a method (e.g., boolean tryLockRefresh() / void unlockRefresh()) to make the intent clearer and the coupling explicit.

9. alterMaterializedView does a full infrastructure teardown/rebuild

The current alterMaterializedView in LocalSchema tears down listeners and scheduler, creates a new MaterializedViewImpl copy, then re-registers everything. This means there's a window where the view is unprotected from changes. Consider holding the schema lock across the full operation (it may already be synchronized, verify this holds for the scheduler cancellation path too).


Performance

10. TRUNCATE TYPE UNSAFE inside a transaction

MaterializedViewRefresher.fullRefresh() wraps both the truncate and re-insert in a single transaction. For large views, this means an extremely large transaction that will put pressure on the WAL and page cache. Consider doing the truncate and re-insert as separate transactions, or documenting the size implications.

11. Sorting in FetchFromSchemaMaterializedViewsStep

Arrays.stream(views).sorted(...).collect(Collectors.toList())

This allocates a List on every query of schema:materializedViews. For a small number of views this is fine, but it's inconsistent with the performance-conscious style in the rest of ArcadeDB. Consider sorting the underlying LinkedHashMap on insertion instead.


Code Style

12. CreateMaterializedViewStatement refreshMode field is a String not an enum

The statement classes (CreateMaterializedViewStatement, AlterMaterializedViewStatement) store refreshMode as a plain String ("MANUAL", "INCREMENTAL", "PERIODIC") and then do case-insensitive string comparisons in executeDDL. Using the MaterializedViewRefreshMode enum directly in the AST node would be cleaner and eliminate the string-comparison boilerplate.

13. docs/plans/ files in the PR

The PR includes 4 planning documents under docs/plans/. These are useful for understanding the design, but they're AI-generated implementation planning artifacts (Claude agent output). Consider whether they belong in the repository long-term or whether the design rationale should be extracted into a proper docs/ page or the issue/PR description instead.

14. Missing getBackingTypeName() on the MaterializedView interface

MaterializedViewImpl has a public getBackingTypeName() method but the interface only exposes getBackingType(). The RemoteMaterializedView stores backingTypeName but can't return it via the interface. This creates an asymmetry.


Test Coverage

The test suite is comprehensive. A few gaps worth addressing:

  • Concurrent refresh: No test for concurrent calls to refresh() racing on the AtomicBoolean refreshInProgress — worth adding to verify the guard works correctly.
  • PERIODIC scheduler: MaterializedViewScheduler is tested indirectly, but there's no test that actually waits for a scheduled refresh to fire (understandable for timing reasons, but at least a unit test of the scheduler's schedule()/cancel()/shutdown() contract would be valuable).
  • HA/Replication test: ReplicationMaterializedViewIT is present (good!), but verify it covers the case where a refresh is triggered on a follower — schema-level views should only be refreshed on the leader.
  • ALTER MATERIALIZED VIEW with invalid view name: Test coverage for error paths in AlterMaterializedViewStatement appears missing.

Minor

  • MaterializedViewQueryClassifier.isSimple() silently returns false on parse exceptions. Consider logging at FINE level so operators can diagnose unexpected classifications.
  • The refreshInterval field stores milliseconds but the Javadoc/variable name doesn't mention the unit. Add a comment or rename to refreshIntervalMs.
  • MaterializedViewImpl.copyWithRefreshMode() is package-private but could be made explicitly /* package */ with a comment explaining why it's not private.

Overall this is a well-executed feature. The main items worth addressing before merge are #6 (type-safe status enum in the interface) and #3 (verifying the case-insensitive routing). Everything else can be addressed as follow-up issues or during review iteration.

@claude
Copy link

claude bot commented Feb 19, 2026

Comprehensive Code Review: PR #3464 - Materialized Views

Excellent implementation overall! The code demonstrates strong understanding of ArcadeDB's architecture and follows project conventions meticulously. Here's my detailed review:


✅ STRENGTHS

Code Quality

  • Perfect adherence to CLAUDE.md conventions: Uses assertThat() style, no System.out, proper use of final keyword
  • Clean architecture: Well-integrated with existing schema, transaction, and event systems
  • Excellent test coverage: Comprehensive unit and integration tests for core functionality
  • Proper HA support: Uses recordFileChanges() for atomic replication
  • Smart transaction handling: New after-commit callback mechanism is well-designed

Architecture Highlights

  • MaterializedViewImpl properly integrated into Schema interface
  • Persisted in schema.json with crash recovery (BUILDING → STALE on restart)
  • Event-driven incremental refresh uses existing listener infrastructure
  • Proper lifecycle management (create/drop/alter)

🐛 CRITICAL ISSUES (Must Fix)

1. Source Type Deletion Vulnerability

Location: MaterializedViewBuilder.java:186-194, LocalSchema.dropType()

Issue: If a source type is deleted while a materialized view depends on it:

  • Listener unregistration fails silently (line 191: continue when type doesn't exist)
  • View becomes orphaned with invalid query
  • Subsequent refresh will fail with SchemaException

Current protection: MaterializedViewBuilder validates source types exist at creation (lines 111-113), but nothing prevents deletion afterward.

Fix needed in LocalSchema.dropType():

// Check if type is source for any materialized view
for (MaterializedViewImpl view : materializedViews.values()) {
  if (view.getSourceTypeNames().contains(typeName))
    throw new SchemaException("Cannot drop type '" + typeName + 
      "' because it is a source for materialized view '" + view.getName() + "'");
}

2. Field Visibility Issue

Location: MaterializedViewImpl.java:42

final AtomicBoolean refreshInProgress = new AtomicBoolean(false);

Issue: Should be private for proper encapsulation. Package-private exposes internal state unnecessarily.


⚠️ HIGH PRIORITY ISSUES

3. Performance: Full Refresh on Every Change

Location: MaterializedViewChangeListener.java:64-76

The comment says it all:

// NOTE: currently always performs a full refresh even for simple queries.

Impact: For high-volume insert workloads, this causes severe performance degradation:

  • 1000 inserts = 1000 full table scans + rebuilds
  • No batching or throttling mechanism
  • The isSimpleQuery() classification exists but isn't utilized

Recommendation:

  • Document this limitation prominently in user-facing docs
  • Consider adding refresh throttling (e.g., max 1 refresh per 5 seconds) or batch window
  • Track as technical debt for incremental refresh implementation

4. Scheduler Resource Management Issues

Location: MaterializedViewScheduler.java:45-67

Issues:

final WeakReference<Database> dbRef = new WeakReference<>(database);
final ScheduledFuture<?> future = executor.scheduleAtFixedRate(() -> {
  final Database db = dbRef.get();
  if (db == null || !db.isOpen()) {
    cancel(view.getName());
  • Race condition: Database could be GC'd or closed between null check and refresh call
  • Single-threaded executor: One slow refresh blocks all others
  • No timeout: Long-running refresh can hang indefinitely

Recommendations:

  • Add explicit lifecycle management (scheduler notified on database close)
  • Consider thread pool with view-specific tasks
  • Add maximum refresh duration timeout

5. Silent Concurrent Refresh Skipping

Location: MaterializedViewRefresher.java:32-36

if (!view.refreshInProgress.compareAndSet(false, true)) {
  LogManager.instance().log(..., "Skipping concurrent refresh...");
  return;
}

Issue:

  • Logged at FINE level (typically not visible)
  • Incremental refresh callbacks silently dropped during manual refresh
  • No mechanism to queue or retry

Fix: Log at WARNING level or add refresh queue mechanism.

6. Incomplete Query Validation

Location: MaterializedViewBuilder.java:196-208

private List<String> extractSourceTypes(final String sql) {
  // Only extracts the FIRST FROM item
  if (item != null && item.getIdentifier() != null)
    types.add(item.getIdentifier().getStringValue());

Issues:

  • Doesn't handle JOINs, subqueries, or multiple FROM items
  • No validation that SELECT is read-only (could contain mutations)
  • Empty list returned for complex queries (limits incremental refresh)

Fix: Add validation to reject non-SELECT or statements with side effects.


🔒 SECURITY CONCERNS

7. SQL Injection Surface (Minor)

Location: MaterializedViewRefresher.java:43

database.command("sql", "TRUNCATE TYPE `" + backingTypeName + "` UNSAFE");

Current protection: Backtick validation at creation prevents injection (MaterializedViewBuilder.java:88-89).

Issue: String concatenation for SQL is risky. Defense-in-depth suggests using parameterized commands.

8. No Query Re-validation on Load

Issue: Defining query stored as string and executed without re-validation when loading from schema.json.

Risk: If schema.json modified externally:

  • Malicious query could read unauthorized data
  • Expensive operations could DOS the system
  • No validation of query safety

Fix: Re-validate queries on load from schema.json, not just at creation.

9. Missing Resource Limits

No protection against:

  • Maximum query execution time for refresh
  • Maximum result set size
  • Memory limits for materialized view data
  • Rate limiting on manual refresh calls

🧪 MISSING TEST COVERAGE

Excellent coverage overall, but missing:

Critical Scenarios:

  1. Concurrency:

    • Multiple threads refreshing same view
    • Concurrent ALTER and REFRESH
    • Refresh during source type modification
  2. Source Type Lifecycle:

    • Must add: Drop source type while view exists
    • Modify source type schema (add/remove properties)
    • Rename source type
  3. Error Scenarios:

    • Refresh failure mid-way (partial data)
    • Out of memory during refresh
    • Invalid query after schema change
  4. HA/Distributed:

    • Refresh during leader election
    • Replica reading stale view data
    • Network partition during refresh
  5. Periodic Refresh:

    • Scheduler behavior under load
    • Multiple overlapping periodic refreshes
    • Scheduler cleanup on database close

💡 NICE-TO-HAVE IMPROVEMENTS

Performance Optimizations (Future PRs):

  • Implement incremental refresh for simple queries (using _sourceRID tracking as documented)
  • Add refresh throttling/batching mechanism
  • Support differential refresh strategies (merge vs rebuild)
  • Add refresh progress tracking for large datasets

Architectural Enhancements:

  • Support views referencing other views (dependency graph)
  • Materialized view partitioning
  • Parallel refresh for large datasets

📋 SUMMARY RECOMMENDATIONS

Must Fix Before Merge:

  1. ✅ Add source type deletion protection (add check in LocalSchema.dropType())
  2. ✅ Make refreshInProgress field private in MaterializedViewImpl.java:42
  3. ✅ Add test for source type deletion scenario
  4. ✅ Document eventual consistency model and full-refresh limitation in user docs

Should Fix (High Priority):

  1. Improve scheduler resource management (thread pool + timeout)
  2. Log WARNING when concurrent refresh skipped (not FINE)
  3. Add query re-validation on schema.json load
  4. Add basic tests for concurrent operations
  5. Validate query is read-only (reject mutations)

Future Enhancements:

  1. Implement incremental refresh for simple queries
  2. Add refresh throttling/batching
  3. Add query resource limits
  4. Comprehensive HA/concurrency testing

🎯 OVERALL ASSESSMENT

Rating: ⭐⭐⭐⭐½ (9/10)

This is a high-quality, well-architected implementation that demonstrates excellent understanding of ArcadeDB internals. The code is clean, follows all project conventions, and has solid test coverage.

Main concerns:

  • Missing source type deletion protection (critical bug - easy fix)
  • Full refresh performance for high-volume workloads (design limitation - document it)
  • Some security hardening needed (medium priority)

Recommendation: Approve with required changes - Fix the source type deletion issue (#1-4 above), then this is ready to merge. Issues #5-9 can be addressed in follow-up PRs if desired, but aren't blocking given this is a new feature in draft status.

Excellent work overall! 🚀

@claude
Copy link

claude bot commented Feb 19, 2026

Code Review: Materialized Views Implementation (PR #3464)

This is an impressive implementation of materialized views for ArcadeDB. The feature is well-architected with comprehensive test coverage and good attention to security/safety concerns. Below are my findings organized by category.


Strengths

Architecture & Design

  • Clean separation of concerns: Materialized views are treated as first-class schema objects with dedicated interfaces and implementations
  • Post-commit callback mechanism: The new TransactionContext.addAfterCommitCallback() is a reusable pattern that prevents duplicate refreshes within a single transaction
  • Crash recovery: BUILDING status is persisted before initial refresh, enabling recovery on restart
  • HA replication support: Uses recordFileChanges() to ensure atomic replication of view creation/drop across replicas
  • Backing type safety: Prevents direct deletion of backing types - must use DROP MATERIALIZED VIEW

Security

  • SQL injection prevention: Good use of parameterized queries in RemoteSchema (e.g., :name parameters)
  • Backtick validation: Validates view names don't contain backticks in both create() and fromJSON() to prevent injection via tampered schema.json
  • Concurrent refresh protection: AtomicBoolean refreshInProgress prevents race conditions during TRUNCATE operations

Code Quality

  • Comprehensive tests:
    • Unit tests (MaterializedViewTest, MaterializedViewSQLTest)
    • Integration tests (MaterializedViewEdgeCaseTest, MaterializedViewIncrementalTest)
    • E2E tests (RemoteMaterializedViewIT, HTTPMaterializedViewIT, ReplicationMaterializedViewIT)
    • Load tests updated with MV scenarios
  • Thread safety: Proper use of volatile for lastRefreshTime, status, and changeListener
  • Resource cleanup: unregisterListeners() properly removes event listeners on view drop
  • Error handling: Callbacks logged but don't affect commit; refresh failures clean up orphaned types

⚠️ Issues & Recommendations

1. Thread Safety Concerns in LocalSchema

Issue: The materializedViews map is a LinkedHashMap without synchronization, but is accessed from multiple threads:

  • Line 95: protected final Map<String, MaterializedViewImpl> materializedViews = new LinkedHashMap<>()
  • Multiple accessor methods read/write this map without synchronization

Location: engine/src/main/java/com/arcadedb/schema/LocalSchema.java:95

Risk: Race conditions during concurrent view creation, queries, or schema reloads

Recommendation:

// Option 1: Use ConcurrentHashMap
protected final Map<String, MaterializedViewImpl> materializedViews = new ConcurrentHashMap<>();

// Option 2: If ordering matters for LinkedHashMap, synchronize all access methods
public synchronized MaterializedView getMaterializedView(String name) { ... }
public synchronized boolean existsMaterializedView(String name) { ... }
// etc.

2. Potential Memory Leak in TransactionContext

Issue: The registeredCallbackKeys HashSet is never cleared in reset(), only the afterCommitCallbacks list is nulled

Location: engine/src/main/java/com/arcadedb/database/TransactionContext.java:779-785

Current Code:

public void reset() {
  // ... other cleanup ...
  afterCommitCallbacks = null;
  txId = -1;
}

Missing:

registeredCallbackKeys = null;  // This line is missing!

Risk: Over time, the registeredCallbackKeys set will accumulate keys from old transactions, causing memory growth

Fix: Add registeredCallbackKeys = null; to the reset() method at line ~785


3. SQL Parser Error Handling

Issue: In MaterializedViewBuilder.extractSourceTypes(), if the query is not a valid SelectStatement, the method silently returns an empty list instead of throwing an error

Location: engine/src/main/java/com/arcadedb/schema/MaterializedViewBuilder.java:196-208

Current Behavior:

private List<String> extractSourceTypes(final String sql) {
  final List<String> types = new ArrayList<>();
  final Statement parsed = database.getStatementCache().get(sql);
  if (parsed instanceof SelectStatement select) {
    // ... extract types ...
  }
  return types;  // Returns empty list for non-SELECT statements
}

Risk: Creating a view with "DELETE FROM foo" or other non-SELECT statements would fail later with confusing errors

Recommendation:

if (!(parsed instanceof SelectStatement)) {
  throw new IllegalArgumentException("Materialized view query must be a SELECT statement, got: " + parsed.getClass().getSimpleName());
}

4. Inconsistent Error Messages

Issue: Schema exceptions use different formats for missing resources:

// In LocalSchema.getMaterializedView():
throw new SchemaException("Materialized view '" + viewName + "' not found");

// In LocalSchema.getType() (existing code):
throw new SchemaException("Type '" + typeName + "' does not exist");

Recommendation: Use consistent wording ("does not exist" vs "not found") across all schema exceptions for better UX


5. Documentation: Missing @param and @return

Issue: Several public methods lack JavaDoc parameter/return documentation:

Examples:

  • MaterializedViewBuilder.withName(String name) - missing @param
  • MaterializedViewBuilder.create() - missing @return
  • MaterializedViewImpl.tryBeginRefresh() - missing @return documentation
  • LocalSchema.getMaterializedView(String) - missing @param/@return/@throws

Recommendation: Add complete JavaDoc for all public API methods, especially the builder pattern methods


6. Potential Race in Periodic Refresh

Issue: In MaterializedViewScheduler, the ScheduledExecutorService is created lazily in a synchronized method, but the scheduler itself could be shut down while tasks are being scheduled

Location: engine/src/main/java/com/arcadedb/schema/MaterializedViewScheduler.java

Risk: If database is closing while periodic refresh is scheduled, could throw RejectedExecutionException

Recommendation: Add null/shutdown checks before scheduling:

public void schedule(final Database database, final MaterializedViewImpl view) {
  final ScheduledExecutorService exec = getScheduler();
  if (exec.isShutdown()) {
    LogManager.instance().log(..., "Scheduler is shut down, skipping periodic refresh");
    return;
  }
  // ... schedule task ...
}

7. Test Coverage Gap: Concurrent Access

Observation: While the implementation has good test coverage for functional scenarios, there are no concurrent access tests for:

  • Multiple threads creating views simultaneously
  • Concurrent refresh while querying
  • Concurrent ALTER operations
  • Schema reload during active refresh

Recommendation: Add a MaterializedViewConcurrencyIT test class with scenarios like:

@Test
void concurrentViewCreationFromMultipleThreads() { ... }

@Test
void queryDuringFullRefresh() { ... }

@Test
void alterDuringPeriodicRefresh() { ... }

8. Minor: Hardcoded TRUNCATE Command

Issue: The TRUNCATE TYPE command is built with string concatenation in MaterializedViewRefresher.fullRefresh()

Location: engine/src/main/java/com/arcadedb/schema/MaterializedViewRefresher.java:43

database.command("sql", "TRUNCATE TYPE `" + backingTypeName + "` UNSAFE");

While backtick-quoted names are safe here, consider using a constant or extracting to a method for better maintainability:

private static final String TRUNCATE_TEMPLATE = "TRUNCATE TYPE `%s` UNSAFE";
database.command("sql", String.format(TRUNCATE_TEMPLATE, backingTypeName));

📊 Performance Considerations

Positive

  • ✅ Uses TRUNCATE TYPE UNSAFE instead of DELETE FROM for full refresh (no per-record WAL entries)
  • ✅ Post-commit callback deduplication prevents N refreshes for N inserts in one transaction
  • ✅ Concurrent refresh skip prevents multiple simultaneous full refreshes

Potential Optimizations

  • Incremental refresh: Currently always does full refresh even for "simple" queries. The TODO comments mention per-record incremental updates via _sourceRID tracking - this would be a valuable future optimization
  • Batch insert during refresh: The refresh loop saves documents one at a time. Consider batching saves for large result sets:
    // Current: doc.save() in loop
    // Future: batch.add(doc); if (batch.size() >= 1000) flush()

🔒 Security Assessment

✅ Secure Practices

  • Parameterized queries in RemoteSchema prevent SQL injection
  • Backtick validation blocks injection via view names
  • Schema changes require appropriate permissions (inherits from database security model)

⚠️ Minor Concern

The RemoteSchema.existsTrigger() method at line 99-100 still uses string concatenation instead of parameterized query:

"select from schema:triggers where name = '" + triggerName + "'"

This is pre-existing code (not part of this PR), but should be fixed for consistency:

"select from schema:triggers where name = :name", Map.of("name", triggerName)

📝 Code Style & CLAUDE.md Compliance

✅ Follows Project Guidelines

  • Uses AssertJ's assertThat(...).isTrue() syntax in tests ✓
  • No System.out.println in final code ✓
  • Proper Apache 2.0 license headers ✓
  • No Claude listed as author ✓
  • Reuses existing components (JSONObject, JSONArray) ✓
  • Imports classes instead of fully qualified names ✓

✅ Testing Approach

  • Tests written and passing ✓
  • Regression tests included ✓
  • Integration tests for remote/HTTP/HA scenarios ✓

Minor Style Notes

  • Some if statements have single-statement blocks without braces (line 102-105 in MaterializedViewBuilder), which is acceptable per project style
  • Good use of final keyword on variables and parameters throughout

🎯 Recommendations Summary

Critical (Should fix before merge)

  1. Fix memory leak: Add registeredCallbackKeys = null to TransactionContext.reset()
  2. Add validation in extractSourceTypes() to reject non-SELECT statements

High Priority (Should fix)

  1. Ensure thread-safe access to materializedViews map in LocalSchema
  2. Add null/shutdown checks in MaterializedViewScheduler
  3. Complete JavaDoc for public APIs

Medium Priority (Nice to have)

  1. Add concurrent access integration tests
  2. Standardize error message wording across schema exceptions
  3. Extract hardcoded SQL commands to constants

Low Priority (Future work)

  1. Implement per-record incremental refresh for simple queries (noted in TODOs)
  2. Consider batch inserts during large refreshes

🏆 Overall Assessment

This is high-quality work with excellent architectural design and comprehensive testing. The feature is well-integrated with ArcadeDB's existing patterns (schema persistence, HA replication, transaction model) and includes thoughtful safety mechanisms.

The main concerns are:

  1. The memory leak in TransactionContext.reset() (critical)
  2. Thread safety of the materializedViews map (high priority)
  3. Missing validation for non-SELECT queries (critical)

Once these issues are addressed, this PR will be in excellent shape for merge. The implementation demonstrates strong understanding of both the feature requirements and ArcadeDB's internal architecture.

Suggested Action: Request changes for the critical items, then approve after fixes.


Total Lines Changed: ~19,666 additions (impressive scope!)
Test Coverage: Excellent - unit, integration, e2e, and load tests
Documentation: Good planning docs, could use more inline JavaDoc
Security: Strong, with SQL injection prevention and validation
Performance: Well-optimized with TRUNCATE and callback deduplication

Great work overall! 🎉

@claude
Copy link

claude bot commented Feb 19, 2026

Pull Request Review: Materialized Views Implementation

Overall Assessment

This is an excellent and comprehensive implementation of materialized views for ArcadeDB. The feature is well-designed, thoroughly tested, and follows the project's coding standards. The implementation includes:

  • Core materialized view infrastructure with multiple refresh modes (MANUAL, INCREMENTAL, PERIODIC)
  • SQL syntax support via ANTLR grammar extensions
  • Transaction integration with post-commit callbacks
  • Full test coverage including unit, integration, HA replication, and e2e tests
  • Comprehensive documentation in design and implementation plan docs

The code has been through multiple rounds of review and refinement, addressing SQL injection, concurrency, and HA concerns.


Code Quality: ✅ Excellent

Strengths

  1. Clean Architecture: Well-separated concerns with dedicated classes for builder, refresher, change listener, scheduler, and query classifier

  2. Thread Safety: Proper use of AtomicBoolean for refresh guards, volatile fields for shared state, and synchronized access to shared collections

  3. Transaction Integration: Elegant post-commit callback mechanism prevents listener leaks and ensures callbacks only fire on successful commits

  4. Error Handling: Comprehensive error handling with proper logging and status transitions (VALID → BUILDING → ERROR/STALE)

  5. Code Style Adherence: Follows project conventions (minimal braces, final keyword, no fully qualified names, assertThat() syntax)

Minor Observations

  1. Documentation Comments: Some public methods lack JavaDoc (e.g., MaterializedViewRefresher.fullRefresh)

  2. Magic Strings: The callback key construction "mv-refresh:" + view.getName() could be a constant


Potential Bugs: ✅ None Found

Key concerns have been addressed:

  • ✅ SQL injection prevented via backtick validation
  • ✅ Concurrent refresh properly guarded
  • ✅ Post-commit callbacks only fire on successful transactions
  • ✅ Listener cleanup on view drop
  • ✅ HA replication atomicity
  • ✅ Source type drop protection

Performance Considerations: ⚠️ Minor Concerns

Strengths

  1. TRUNCATE TYPE optimization: Avoids per-record WAL entries during full refresh
  2. Deduplication via CallbackKeys: Prevents redundant refreshes in a single transaction
  3. Lazy Scheduler Creation: Good pattern

Concerns

  1. Full Refresh on Every Change: Even for "simple queries", INCREMENTAL mode performs full refresh on every source record change. Could be expensive for large tables with frequent updates. The code acknowledges this as a "future optimization". Consider documenting performance characteristics and recommending PERIODIC mode for high-write-volume scenarios.

  2. Schema Lock Duration: Materialized view creation holds schema lock during initial population. For very large source tables, this could block other schema operations. Consider async population.


Security Concerns: ✅ Excellent

  • SQL injection prevention via backtick validation and AST-based parsing
  • Uses existing ArcadeDB security framework
  • Concurrent refresh guard prevents DoS

Test Coverage: ✅ Excellent

Comprehensive test coverage across multiple layers:

  • Unit Tests: MaterializedViewTest, EdgeCaseTest, IncrementalTest, SQLTest, TransactionCallbackTest
  • Integration Tests: HTTP, Remote, Replication tests
  • E2E Tests: RemoteDatabaseJavaApiTest
  • Load Tests: Added to load test suite

Coverage includes happy paths, error conditions, all refresh modes, concurrency, HA, remote/HTTP access, and persistence.


Specific File Reviews

TransactionContext.java ✅

Strengths: Clean callback mechanism, proper cleanup, exception isolation

Minor Issue: Line 136 - Callbacks fire even for empty transactions (phase1 == null). Low severity but could be cleaned up.

MaterializedViewImpl.java ✅

Strong implementation with immutable design, thread-safe refresh guard, ALTER support, and backtick validation.

MaterializedViewBuilder.java ✅

Excellent builder pattern with fluent API, comprehensive validation, atomic creation, and proper failure cleanup.

MaterializedViewRefresher.java ✅

Clean and focused with concurrent refresh guard, TRUNCATE optimization, and proper error handling.

MaterializedViewChangeListener.java ✅

Well-designed with deduplication, early returns, and graceful error handling.


Recommendations

Medium Priority

  1. Document Performance Characteristics: Add user-facing docs about INCREMENTAL mode's full-refresh behavior
  2. Verify Synchronization: Double-check all accesses to LocalSchema.materializedViews map are properly synchronized
  3. Add Metrics: Consider metrics for refresh duration, failure counts, health monitoring

Low Priority

  1. JavaDoc Coverage: Add JavaDoc to public API methods
  2. Async Initial Population: For very large tables
  3. Fix Empty Transaction Callback: Address TransactionContext.java:136

Conclusion

Recommendation: ✅ APPROVE with minor suggestions for follow-up improvements.

This is high-quality work demonstrating strong understanding of database internals, attention to security/concurrency, comprehensive testing, and iterative refinement. The materialized views feature is well-architected and production-ready.

Great job @robfrank! This will be a valuable addition to ArcadeDB.

robfrank and others added 22 commits February 20, 2026 10:05
Add afterCommitCallbacks mechanism to TransactionContext that allows
registering Runnable callbacks to fire after a successful commit.
Callbacks are discarded on rollback and errors in callbacks are logged
but do not affect the commit or other callbacks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…face

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tors)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace ThreadLocal-based deduplication with addAfterCommitCallbackIfAbsent
  on TransactionContext to prevent leak on rollback
- Add unregisterListeners() to clean up INCREMENTAL listeners on view drop
- Wrap create/drop in recordFileChanges() for HA replication atomicity
- Clear materializedViews map before reload in readConfiguration()
- Make lastRefreshTime and status volatile for thread safety
- Synchronize getMaterializedViewScheduler() for thread safety
- Use CommandExecutionException for unsupported ALTER MATERIALIZED VIEW

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add FetchFromSchemaMaterializedViewsStep for SELECT FROM schema:materializedViews
- Register materializedviews target in SelectExecutionPlanner
- Add RemoteMaterializedView read-only proxy for remote clients
- Implement RemoteSchema MV methods (exists, get, getAll, drop)
- buildMaterializedView() throws UnsupportedOperationException remotely
- getMaterializedView() throws SchemaException when not found (consistent with LocalSchema)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add MaterializedViewEdgeCaseTest (9 tests): multiple views, duplicates,
  empty results, large datasets, re-creation, backing type protection
- Add HTTPMaterializedViewIT (7 tests): HTTP API CRUD, refresh modes,
  schema metadata endpoint, idempotent operations
- Add RemoteMaterializedViewIT (2 tests): remote client SQL CRUD,
  schema proxy methods, unsupported operation handling
- Add ReplicationMaterializedViewIT (2 tests): HA create/drop replication,
  schema file synchronization across replicas
- Add incremental tests: listener unregistration on drop, refresh after rollback
- Add SQL test: schema:materializedViews metadata query

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix SQL injection in RemoteSchema by using backtick quoting
- Rewrite extractSourceTypes to use SQL parser AST instead of fragile indexOf
- Rewrite MaterializedViewQueryClassifier to use AST inspection instead of string contains

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ALTER MATERIALIZED VIEW support to change refresh mode (MANUAL,
INCREMENTAL, PERIODIC) on existing views. Uses immutable view replacement
pattern via copyWithRefreshMode() to safely swap instances while tearing
down old and setting up new refresh infrastructure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix post-commit callbacks firing on failed transactions (TransactionContext)
- Fix || to && in CreateMaterializedViewStatement.toString() for PERIODIC mode
- Add backtick escaping for backing type name in DELETE query
- Clean up orphaned backing type on initial refresh failure in builder
- Introduce MaterializedViewStatus enum replacing string status literals
- Use volatile instead of transient for changeListener field
- Store DatabaseInternal directly in MaterializedViewChangeListener
- Fix backtick-to-single-quote quoting in RemoteSchema WHERE clauses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix SQL injection in RemoteSchema by using parameterized queries
- Fix FetchFromSchemaMaterializedViewsStep.close() clearing shared result list
- Replace DELETE FROM with TRUNCATE TYPE UNSAFE in MaterializedViewRefresher
- Fix AlterMaterializedViewStatement.toString() for PERIODIC mode
- Add synchronized to all materializedViews access methods in LocalSchema
- Add e2e test for materialized view creation and query with brewery denormalization

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Persist BUILDING status before initial refresh so crash recovery works
- Validate view names do not contain backtick characters
- Synchronize dropType() iteration over materializedViews map
- Guard RemoteMaterializedView.sourceTypeNames against null from server
- Remove unnecessary instanceof cast in FetchFromSchemaMaterializedViewsStep

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Validate view name for backtick in fromJSON() to prevent SQL injection
  from a tampered schema.json bypassing the create() guard
- Guard fullRefresh() against concurrent execution with AtomicBoolean;
  a second concurrent refresh is skipped to avoid TRUNCATE + race

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Block dropping a type that is a source for a materialized view
  (LocalSchema.dropType now checks sourceTypeNames in addition to the
  existing backing-type guard)
- Make MaterializedViewImpl.refreshInProgress private; expose it via
  tryBeginRefresh() / endRefresh() methods consumed by the refresher
- Align planning docs: replace ON_COMMIT with INCREMENTAL throughout
  2026-02-06-materialized-views-impl.md; update RemoteSchema test
  snippet in 2026-02-15-materialized-views-testing.md to assert
  UnsupportedOperationException instead of null
- Add tests covering coverage gaps reported by Codecov:
  copyWithRefreshMode, tryBeginRefresh/endRefresh, setLastRefreshTime,
  getRefreshInterval, getChangeListener/setChangeListener, getBackingType,
  concurrent-refresh skip, builder withTotalBuckets/withPageSize,
  PERIODIC scheduler refresh, all QueryClassifier branches (GROUP BY,
  aggregates, non-SELECT), onAfterUpdate and onAfterDelete incremental
  triggers, and the no-active-transaction early-return guard

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 20, 2026

Code Review: Materialized Views Implementation

Thank you for this comprehensive implementation of materialized views! This is a substantial feature addition with good architecture and testing. Below is my detailed review organized by priority.


Critical Issues - Must Fix Before Merge

1. Missing Authorization Checks
Location: LocalSchema.java - materialized view DDL methods
Issue: No explicit permission checks before creating/dropping/altering views
Fix: Add database.checkPermissionsOnDatabase(SecurityDatabaseUser.DATABASE_ACCESS.UPDATE_SCHEMA) calls similar to other DDL operations in LocalSchema

2. Potential NPE in FetchFromSchemaMaterializedViewsStep
Location: engine/src/main/java/com/arcadedb/query/sql/executor/FetchFromSchemaMaterializedViewsStep.java:59
Issue: view.getBackingType().getName() called without null checking getBackingType()
Fix: Add null check to handle race condition where backing type might be dropped


Code Style Issues - Per CLAUDE.md Guidelines

3. Missing final Keywords
Locations: Multiple files
Issue: CLAUDE.md requires using final keyword when possible on variables and parameters
Examples: MaterializedViewBuilder.java:38-44, CreateMaterializedViewStatement.java:29-35

4. Unnecessary Braces on Single-Statement If Blocks
Locations: Multiple files
Issue: CLAUDE.md states if statements with only one child sub-statement don't require curly braces
Examples: MaterializedViewQueryClassifier.java:44-45,53-54,61-62,64-65,67-68


Security Concerns

5. SQL Injection Defense in Depth
Location: MaterializedViewRefresher.java:43
Issue: While backtick validation exists in builder, relying solely on input validation for SQL injection prevention is not best practice
Recommendation: Consider using parameterized commands or add validation at all entry points


Performance Considerations

6. Full Refresh on Every Incremental Change
Location: MaterializedViewChangeListener.java:64-76
Issue: Every insert/update/delete on source type triggers full view rebuild
Concern: For large views with high-velocity source data, this could cause significant performance degradation
Recommendation: Document this limitation clearly in user-facing documentation

7. Synchronization Overhead on Read Operations
Location: LocalSchema.java
Issue: Read methods block each other unnecessarily
Recommendation: Consider using ConcurrentHashMap for materializedViews map

8. Single-Threaded Scheduler Bottleneck
Location: MaterializedViewScheduler.java
Issue: Uses single-threaded executor which could be a bottleneck with many periodic views
Recommendation: Consider using a thread pool with configurable size


Test Coverage Gaps

While existing test coverage is good, consider adding:

  • Concurrent access tests (multiple threads refreshing same view)
  • Security/authorization tests
  • Error recovery tests (database crash during BUILDING status)
  • Performance tests (large dataset refreshes, memory usage)

Excellent Practices Found

Great work on these aspects:

  • Proper use of AtomicBoolean for refresh lock
  • Volatile fields for thread-safe status visibility
  • Defensive copying with List.copyOf()
  • Proper cleanup in LocalSchema.close()
  • Protection against dropping types used by materialized views
  • Schema persistence via recordFileChanges()
  • Comprehensive inline documentation
  • No System.out statements left in code
  • Test assertions use correct assertThat().isTrue() syntax
  • No new external dependencies
  • Good reuse of existing components

Summary

Overall Assessment: Solid implementation with good architecture and proper thread safety considerations.

Priority:

  1. Must fix: Issues 1-2 (authorization, NPE)
  2. Should fix: Issues 3-4 (code style for consistency)
  3. Consider: Issues 5-8 (security hardening, performance optimizations)

The implementation demonstrates excellent understanding of the codebase architecture. With the critical fixes applied, this will be a valuable feature addition.

@codacy-production
Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-10.13% 86.08%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (7c92d6f) 87643 63444 72.39%
Head commit (1390ad1) 118081 (+30438) 73520 (+10076) 62.26% (-10.13%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3464) 553 476 86.08%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@robfrank robfrank marked this pull request as ready for review February 20, 2026 10:47
@robfrank robfrank merged commit d14e063 into main Feb 20, 2026
23 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Materialized Views support

1 participant