Conversation
Summary of ChangesHello @robfrank, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers materialized views to ArcadeDB, a significant feature enabling efficient caching of query results. The implementation provides flexible refresh strategies—manual, incremental (event-driven post-commit), and periodic—and integrates deeply with the database's schema and SQL language. A key architectural enhancement is the new transaction post-commit callback system, which ensures data consistency during incremental updates and offers a reusable pattern for other transactional features. The changes span core engine components, SQL parsing, and schema management, all backed by a robust testing suite to guarantee reliability. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-designed feature: materialized views. The implementation is comprehensive, covering core engine changes, SQL syntax, transaction handling, and testing across different layers (unit, integration, HA). The design documents are thorough, though there are some minor inconsistencies between them and the final implementation. I've pointed out a few areas for improvement, mainly around making some logic more robust by using the SQL parser instead of string manipulation, and aligning documentation with the code. Overall, this is an excellent contribution.
| private List<String> extractSourceTypes(final String sql) { | ||
| // Parse the SQL and extract type names from the FROM clause | ||
| final List<String> types = new ArrayList<>(); | ||
| final String upper = sql.toUpperCase(); | ||
| int fromIdx = upper.indexOf("FROM "); | ||
| while (fromIdx >= 0) { | ||
| final int start = fromIdx + 5; | ||
| int i = start; | ||
| while (i < sql.length() && sql.charAt(i) == ' ') | ||
| i++; | ||
| final int nameStart = i; | ||
| while (i < sql.length() && (Character.isLetterOrDigit(sql.charAt(i)) || sql.charAt(i) == '_')) | ||
| i++; | ||
| if (i > nameStart) { | ||
| final String typeName = sql.substring(nameStart, i); | ||
| if (!typeName.isEmpty() && !types.contains(typeName)) | ||
| types.add(typeName); | ||
| } | ||
| fromIdx = upper.indexOf("FROM ", i); | ||
| } | ||
| return types; | ||
| } |
There was a problem hiding this comment.
The extractSourceTypes method uses a simple string-matching approach to find the source type from the FROM clause. This can be brittle and may fail with more complex queries, such as those with quoted identifiers or JOINs. The design documents suggest using the SQL parser for this, which would be a more robust solution.
import com.arcadedb.query.sql.SQLQueryEngine;
import com.arcadedb.query.sql.parser.FromClause;
import com.arcadedb.query.sql.parser.FromItem;
import com.arcadedb.query.sql.parser.SelectStatement;
import com.arcadedb.query.sql.parser.Statement;
...
private List<String> extractSourceTypes(final String sql) {
final List<String> types = new ArrayList<>();
try {
final Statement parsed = ((SQLQueryEngine) database.getQueryEngine("sql")).parse(sql, database);
if (parsed instanceof SelectStatement select) {
final FromClause from = select.getTarget();
if (from != null) {
final FromItem item = from.getItem();
if (item != null && item.getIdentifier() != null) {
types.add(item.getIdentifier().getStringValue());
}
// TODO: Add support for more complex FROM clauses like JOINs if materialized views can be defined on them.
}
}
} catch (final Exception e) {
// Parser failed, could log a warning. For now, this will result in an empty list of source types.
}
return types;
}There was a problem hiding this comment.
The current implementation at L191-203 already uses the SQL parser — it calls database.getStatementCache().get(sql) which returns a fully parsed Statement and then walks the AST (SelectStatement, FromClause, FromItem, identifier). There is no string matching happening here. The suggested replacement is functionally equivalent but bypasses the statement cache, which is a mild regression. No change needed.
There was a problem hiding this comment.
The implementation already uses the SQL parser — database.getStatementCache().get(sql) parses and caches the AST. The subsequent item.getIdentifier().getStringValue() call reads from the parsed AST node, not from string matching. Using the cache is preferred over calling parse() directly since it avoids re-parsing queries that are seen multiple times.
| public static boolean isSimple(final String sql, final Database database) { | ||
| final String upper = sql.toUpperCase().trim(); | ||
|
|
||
| if (upper.contains("GROUP BY")) | ||
| return false; | ||
| if (upper.contains("SUM(") || upper.contains("COUNT(") || upper.contains("AVG(") || | ||
| upper.contains("MIN(") || upper.contains("MAX(")) | ||
| return false; | ||
| if (upper.contains("TRAVERSE")) | ||
| return false; | ||
| if (upper.contains(" JOIN ")) | ||
| return false; | ||
|
|
||
| // Check for subqueries | ||
| final int firstSelect = upper.indexOf("SELECT"); | ||
| if (firstSelect >= 0) { | ||
| final int secondSelect = upper.indexOf("SELECT", firstSelect + 6); | ||
| if (secondSelect >= 0) | ||
| return false; | ||
| } | ||
|
|
||
| // Count FROM clauses | ||
| int fromCount = 0; | ||
| int searchFrom = 0; | ||
| while ((searchFrom = upper.indexOf("FROM ", searchFrom)) >= 0) { | ||
| fromCount++; | ||
| searchFrom += 5; | ||
| } | ||
| if (fromCount > 1) | ||
| return false; | ||
|
|
||
| return true; | ||
| } | ||
| } |
There was a problem hiding this comment.
The isSimple method uses string contains checks to classify a query. This approach is fragile and can be easily fooled by comments, string literals, or aliased column names that happen to match keywords (e.g., SELECT my_sum AS total FROM...). Using the SQL parser to inspect the AST of the query would provide a much more reliable classification.
import com.arcadedb.query.sql.SQLQueryEngine;
import com.arcadedb.query.sql.parser.SelectStatement;
import com.arcadedb.query.sql.parser.Statement;
...
public static boolean isSimple(final String sql, final Database database) {
try {
final Statement parsed = ((SQLQueryEngine) database.getQueryEngine("sql")).parse(sql, database);
if (!(parsed instanceof SelectStatement select)) {
return false;
}
if (select.getGroupBy() != null) {
return false;
}
if (select.getProjection() != null && select.getProjection().isAggregate()) {
return false;
}
if (select.getTarget() != null && select.getTarget().getItem() != null && select.getTarget().getItem().getJoin() != null) {
return false;
}
// NOTE: This does not check for subqueries or TRAVERSE yet, but is more robust than string matching.
return true;
} catch (final Exception e) {
// If parsing fails, it's safer to treat it as a complex query.
return false;
}
}There was a problem hiding this comment.
The current version of isSimple already uses the SQL parser via database.getStatementCache().get(sql) and inspects the AST — no string contains checks are present. It handles GROUP BY, aggregate functions per projection item, subqueries in FROM, function calls in FROM, and absence of a type identifier. Also note: the suggested replacement calls FromItem.getJoin() which does not exist in this codebase, so that code would not compile.
There was a problem hiding this comment.
isSimple already inspects the parsed AST — select.getGroupBy(), item.isAggregate(ctx), item.getStatement(), item.getFunctionCall(), and item.getIdentifier() are all AST node accessors, not string checks. The suggested replacement is actually less complete: it omits the per-item aggregate check (item.isAggregate(ctx)) that the current implementation has, which would cause aggregate functions like sum() or count() to be misclassified as simple.
| public enum MaterializedViewRefreshMode { | ||
| MANUAL, | ||
| ON_COMMIT, | ||
| PERIODIC | ||
| } |
There was a problem hiding this comment.
This implementation plan consistently uses ON_COMMIT for the incremental refresh mode. However, the final implementation in MaterializedViewRefreshMode.java and other related files uses INCREMENTAL. To avoid confusion, this document should be updated to use INCREMENTAL to match the actual code and the main design document.
There was a problem hiding this comment.
Valid — this plan doc was written before the naming was finalised. The term ON_COMMIT in it is stale; the implementation landed as INCREMENTAL. The plan docs are kept for historical context rather than as live documentation, so no code change is needed, but acknowledged.
There was a problem hiding this comment.
Fixed — replaced all ON_COMMIT references with INCREMENTAL in this plan doc to match the final implementation.
| assertThat(database.getSchema().buildMaterializedView()).isNull(); | ||
| }); |
There was a problem hiding this comment.
The test plan suggests that database.getSchema().buildMaterializedView() should return null for a remote database. However, the implementation in RemoteSchema.java throws an UnsupportedOperationException, which is a better practice for unsupported methods. The test plan should be updated to reflect this, asserting that an exception is thrown.
| assertThat(database.getSchema().buildMaterializedView()).isNull(); | |
| }); | |
| // buildMaterializedView is not supported remotely — use SQL instead | |
| assertThatThrownBy(() -> database.getSchema().buildMaterializedView()) | |
| .isInstanceOf(UnsupportedOperationException.class); |
There was a problem hiding this comment.
Agreed — the plan doc is stale here. The actual test in RemoteMaterializedViewIT.java already asserts assertThatThrownBy(() -> database.getSchema().buildMaterializedView()).isInstanceOf(UnsupportedOperationException.class), which matches the implementation. No code change needed.
There was a problem hiding this comment.
Fixed — updated the test plan snippet to assert UnsupportedOperationException instead of isNull(), matching the actual RemoteSchema implementation.
Code Review — Add Materialized Views (#3464)This is a well-scoped and architecturally coherent first implementation of materialized views. The design document, phased plan, and good test coverage make it easy to follow the intent. Below is detailed feedback organized by severity. Critical / Bugs1. SQL injection in The view name is injected directly into a SQL string without escaping: "SELECT FROM schema:materializedViews WHERE name = '" + viewName + "'"An attacker (or just an unusual view name containing a remoteDatabase.command("sql", "drop trigger `" + triggerName + "`");Apply the same backtick quoting here, or use query parameters. 2. The buckets count is parsed as: if (bodyCtx.BUCKETS() \!= null)
stmt.buckets = Integer.parseInt(bodyCtx.INTEGER_LITERAL().getText());
3. view.setStatus("BUILDING"); // volatile write
database.transaction(() -> { ... }); // may take a long time
view.updateLastRefreshTime(); // volatile write — only on success
view.setStatus("VALID"); // volatile write — only on successA concurrent periodic refresh can start a second refresh while the first is still running ( 4. Incremental refresh does a full re-scan — misleading naming
Design / Architecture Concerns5. Source type extraction via
6. Backing type name equals view name — no namespace separation The materialized view's backing 7.
8. Listener deregistration happens outside Cancelling the scheduler and unregistering listeners happen before Code Quality9. Grammar rule + statement class + visitor are all present, but 10.
11.
12. Returning 13. Four large planning Markdown files (~5000 lines total) are under Minor
SummaryThe architecture is sound and the v1 feature set is well-chosen. Most important items to resolve before leaving DRAFT:
The remaining items are suggestions for follow-up iterations. |
🧪 CI InsightsHere's what we observed from your CI run for 1390ad1. 🟢 All jobs passed!But CI Insights is watching 👀 |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3464 +/- ##
==========================================
+ Coverage 63.43% 63.87% +0.44%
==========================================
Files 1321 1331 +10
Lines 87643 88194 +551
Branches 17970 18063 +93
==========================================
+ Hits 55594 56332 +738
+ Misses 24226 23929 -297
- Partials 7823 7933 +110 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Code Review: PR #3464 — Add Materialized ViewsThis is a substantial feature (~9000 lines). The overall architecture is sound and follows existing ArcadeDB patterns well. Below is my detailed review. What Works Well
IssuesBugsB1 — SQL injection risk in database.command("sql", "DELETE FROM " + backingTypeName);Should use backtick escaping to be safe: database.command("sql", "DELETE FROM `" + backingTypeName + "`");The same pattern should be applied to the defining query re-execution. B2 — Full refresh uses B3 —
B4 — remoteDatabase.command("sql", "SELECT FROM schema:materializedViews WHERE name = ...");
B5 — if ("PERIODIC".equalsIgnoreCase(refreshMode) || refreshInterval > 0)
sb.append(" EVERY ").append(refreshInterval)...Should be B6 —
B7 — Empty transaction fires post-commit callbacks unexpectedly In PerformanceP1 — No batching window for INCREMENTAL mode
P2 — Full scan on every INCREMENTAL refresh INCREMENTAL mode still performs a full rescan of the source type on every change, regardless of query complexity. For large source types, this means INCREMENTAL and PERIODIC are functionally equivalent in cost. This limitation should be clearly documented in the user-facing docs so users can make an informed choice. Code QualityQ1 — Planning documents should not be committed The four Q2 — private final Database database;
// ...
final DatabaseInternal db = (DatabaseInternal) database;Store Q3 —
Q4 — Queries joining multiple types ( Q5 — Some DDL statements use Q6 — Fragile string assertion in assertThat(content.contains("HighMetrics")).isTrue();Parse the schema JSON and assert on the structured data instead, following the project's preferred Thread SafetyTS1 — Verify this is protected by the same lock/guard used for Missing Test Coverage
Summary
The core architecture is solid. The main items to address before merging are: remove the planning docs (Q1), fix the |
c2fcc17 to
7f4613a
Compare
Code Review: Add Materialized Views (#3464)This is a well-structured, large-scope feature addition. The design doc is thorough, the code is generally clean, and the test coverage is solid. Below are my findings, organized by severity. Correctness / Potential Bugs1. In boolean committed = false;
try {
// ... commit logic ...
committed = true;
} finally {
if (committed)
resetAndFireCallbacks();
else
reset(); // discard callbacks without firing
}2. Schema persistence race in
3.
4. In 5.
Design / API Concerns6.
7.
private transient MaterializedViewChangeListener changeListener;This field is written from one thread (during view creation/schema reload) and read from another (the record event callback thread during incremental refresh). Without 8. "SELECT FROM schema:materializedViews WHERE name = `" + viewName + "`"View names are user-controlled. Using string concatenation for identifiers is inconsistent with how parameters are handled elsewhere and is a potential injection vector. Use query parameters or validate/escape the identifier. 9. Inconsistency: SQL syntax is The linked issue (#3463) and the design document use 10. Source type validation only at creation time
Test Coverage Gaps11. No test for crash recovery (BUILDING → STALE → auto-refresh) The design specifies that a view with 12. No behavioral test for
13. The test calls Minor / Style14. Plan documents in These files contain implementation checklists ("✅ COMPLETE", "⏳ IN PROGRESS") and agent-specific guidance ("For Claude: REQUIRED SUB-SKILL..."). They are working notes, not project documentation, and should be removed before merging. 15. if ("PERIODIC".equalsIgnoreCase(refreshMode) || refreshInterval > 0)
sb.append(" EVERY ").append(refreshInterval)...The 16. The 17. private transient MaterializedViewChangeListener changeListener;
Summary
The most important issues to address before merging:
The architecture is sound and the overall implementation quality is high. Once the correctness issues are addressed, this will be a solid addition to ArcadeDB. |
7f4613a to
fe9fda3
Compare
Code Review: Add Materialized Views (#3464)Overall this is a well-structured, ambitious feature with solid design documentation and reasonable test coverage. The architecture mirrors existing patterns (triggers, schema builders) appropriately. The comments below focus on areas that need attention before merging. Critical / Bugs1. Post-commit callbacks fire after In 2. Incremental refresh always does a full refresh
Either implement the per-record path or document that incremental mode currently always does a full refresh and the 3. } catch (final Exception e) {
view.setStatus("ERROR");
LogManager.instance().log(..., Level.SEVERE, ...);
throw e; // <-- re-thrown
}When called from the post-commit callback (inside 4. The refresh uses a SQL 5.
Design / Architecture6. Status field uses raw
7. The interface
8. Concurrent Two threads calling 9. A single daemon thread shared across all PERIODIC views means views refresh serially. For long-running refreshes this can cause missed intervals. This is an acceptable initial trade-off, but it should be documented. Code Quality10.
11. The unit conversion uses 12. The metadata step casts the internal 13. Docs directory contains multiple overlapping design documents Four planning documents were committed ( Test Coverage14. No test for crash recovery (
15. No concurrency test for incremental refresh
16. The diff summary shows 17. The HA integration test exists but its assertions were not visible. Key scenarios to verify: view refresh on replica after leader executes refresh, view state after leader failover. Minor
Summary
The most important items to address before merging are #2 (incremental always does full refresh — the feature is advertised differently), #3 (orphaned backing type on initial refresh failure), and #6 (string-based status — enum is safer). Items #4 and #8 are performance/correctness concerns worth addressing in this PR or in a fast-follow. Good work on the overall structure and the thorough design documentation. |
Code Review: Add Materialized Views (#3464)This is a well-structured implementation of a significant new feature. The design document is thorough and the code follows existing patterns closely. Below is detailed feedback organized by area. Architecture & DesignStrengths:
Issues & Concerns1. Incremental refresh is not truly incremental — it's "full refresh triggered by a change"The private void schedulePostCommitRefresh() {
tx.addAfterCommitCallbackIfAbsent(callbackKey, () -> {
MaterializedViewRefresher.fullRefresh(database, view); // always full
});
}The design doc describes per-record incremental updates with This is fine for an MVP, but the naming
2. Post-commit refresh runs in a nested transaction that can deadlockIn database.transaction(() -> {
database.command("sql", "DELETE FROM `" + backingTypeName + "`");
// ... insert results
});This callback fires inside Additionally, if the view refresh transaction itself fails, the source transaction already committed successfully. The view is marked 3. SQL injection vulnerability in
|
| Category | Assessment |
|---|---|
| Design | Good — follows existing patterns, well-documented |
| Core functionality | Works, but INCREMENTAL is misleading (full refresh, not per-record) |
| Thread safety | Needs review around the materializedViews map and alterMaterializedView |
| Security | SQL string building in MaterializedViewRefresher should use Java API instead |
| Test coverage | Good breadth, a few concurrent/HA scenarios missing |
| License headers | 3 files missing headers |
The PR is a solid foundation. Recommend addressing issues #1 (naming clarity), #3 (SQL injection pattern), #7 (thread safety), and #10 (license headers) before merging.
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
6759a1f to
ed6b741
Compare
Code Review: Add Materialized Views (#3464)This is a significant feature addition and the overall architecture is well-thought-out. The design document is comprehensive and the test coverage is broad. Below are issues found across the implementation, grouped by severity. Critical Issues1. SQL Injection in The view name is interpolated directly into a SQL string with single-quote delimiters: remoteDatabase.command("sql",
"SELECT FROM schema:materializedViews WHERE name = '" + viewName + "'");A view name containing a single quote will produce malformed SQL or allow injection. Use a parameterised form: remoteDatabase.command("sql",
"SELECT FROM schema:materializedViews WHERE name = ?", viewName);2. @Override
public void close() {
result.clear(); // clears the field-level list
}
3. Concurrent Refresh Race — No Mutex on Two concurrent calls to 4. final FromItem item = from.getItem();
if (item != null && item.getIdentifier() != null)
types.add(item.getIdentifier().getStringValue());Only one source type is extracted. Queries with JOINs, subqueries, or implicit multi-source traversals will have an empty (or incomplete) High Severity5. database.command("sql", "DELETE FROM `" + backingTypeName + "`");
6. Status / In database.transaction(() -> { /* DELETE + INSERT */ });
view.updateLastRefreshTime(); // ← outside transaction
view.setStatus(VALID); // ← outside transactionA JVM crash between the transaction commit and the status update leaves the in-memory state inconsistent with what was persisted. The crash-recovery path in 7. First In 8. The public interface declares 9. The // NOTE: currently always performs a full refresh even for simple queries.Either remove Medium Severity10. The
11. All four new statement classes use 12.
13.
14. Possible Nested Transaction During Initial
15. Backtick-Quoted Backing Type Name in database.command("sql", "DELETE FROM `" + backingTypeName + "`");A type name containing a backtick would produce malformed SQL. While uncommon, the schema should explicitly forbid backticks in view names, or the deletion should use a non-SQL code path (e.g., direct bucket truncation). Low Severity / Style16. if (view instanceof MaterializedViewImpl)
r.setProperty("refreshInterval", ((MaterializedViewImpl) view).getRefreshInterval());
17. In 18.
19. this.sourceTypeNames = result.getProperty("sourceTypes");
20.
Test Coverage Gaps21. No Concurrent-Refresh Test 22. assertThatThrownBy(...).isInstanceOf(Exception.class);Use the specific expected exception type (e.g., 23. Missing: 24. final JSONObject viewMeta = records.getJSONObject(0);
assertThat(viewMeta.getString("name")).isEqualTo("MetricView");This assumes 25. Missing: Periodic Scheduler Lifecycle After Database Reopen 26. Missing: Multi-Source INCREMENTAL Mode Behaviour 27. Summary
The most urgent items to address before merging are:
The feature is well-conceived and the test suite is commendably broad for a new feature of this size. With the issues above addressed this would be in good shape. |
Code Review: Add Materialized Views (#3464)This is a well-structured implementation of materialized views with a solid design foundation. The architecture is sound and the TDD approach with comprehensive test coverage is excellent. Below is my detailed review. Positive Highlights
Issues and ConcernsCritical / Correctness1. SQL Injection in remoteDatabase.command("sql", "SELECT FROM schema:materializedViews WHERE name = '" + viewName + "'");View names are user-supplied and concatenated directly into SQL. A name like // In dropMaterializedView:
remoteDatabase.command("sql", "DROP MATERIALIZED VIEW `" + viewName + "`"); // backtick-quoted ✓
// But in existsMaterializedView/getMaterializedView:
"SELECT FROM schema:materializedViews WHERE name = '" + viewName + "'" // string concat ✗Also, 2. Post-commit callbacks fire after In the 3. When // In fullRefresh():
view.setStatus(MaterializedViewStatus.ERROR);
// ❌ No saveConfiguration() call here
LogManager.instance().log(...);
throw e;4. Refresh inside an existing transaction in
5. For queries like Design / Architecture6. The From the plan docs, an earlier design used 7. if (schema.existsMaterializedView(name)) {
if (ifNotExists)
return schema.getMaterializedView(name); // Returns existing view, no refresh
...
}This is acceptable behavior but should be documented — 8. Potential double
9. No concurrency protection for concurrent INCREMENTAL refreshes If two transactions on different threads commit simultaneously with changes to the same source type, two post-commit callbacks for the same view will both start Minor Issues10. @Override public String getName() { return name; }Per CLAUDE.md, this project uses the standard Java formatting. Single-line methods should follow the project's existing style for consistency. 11. The plan docs reference The earlier design plan (impl.md) references 12. if ("PERIODIC".equalsIgnoreCase(refreshMode) && refreshInterval > 0)
sb.append(" EVERY ").append(refreshInterval).append(' ').append(refreshUnit);When 13. No test for periodic refresh actually triggering The periodic refresh scheduler is tested only structurally (that it exists). A test that verifies the scheduler actually calls 14. Plan documents checked into The three plan/design markdown files account for the bulk of the 9,171 additions. Consider whether these should live in the PR description / issue tracker or a wiki rather than the repository itself, as they're implementation notes rather than project documentation. SummaryThe core implementation is solid and correct for the happy path. The main issues to address before merging:
|
Code Review: Add Materialized Views (#3464)This is a substantial feature addition covering schema management, SQL grammar, transaction callbacks, event-driven refresh, periodic scheduling, and remote API support. The overall architecture is sound and consistent with existing patterns. Below are my findings, organized by severity. High Priority1.
final FromItem item = from.getItem();
if (item != null && item.getIdentifier() != null)
types.add(item.getIdentifier().getStringValue());Any query with a subquery or multiple source types silently returns an empty or single-entry list. In 2. Crash recovery: In schema.materializedViews.put(name, view); // view starts as VALID (constructor default)
schema.saveConfiguration(); // saves status=VALID ← first save
MaterializedViewRefresher.fullRefresh(...); // sets status BUILDING (in memory only)
schema.saveConfiguration(); // saves status=VALID again after success
3. Crash recovery marks STALE but does not schedule auto-refresh The design doc says "Schedule an automatic full refresh for all STALE views" on startup. The implementation only does: if (MaterializedViewStatus.BUILDING.name().equals(view.getStatus()))
view.setStatus(MaterializedViewStatus.STALE);No auto-refresh is scheduled. Users must manually issue 4. SQL injection risk via backtick quoting in database.command("sql", "TRUNCATE TYPE `" + backingTypeName + "` UNSAFE");The backing type name is set to the user-supplied view name. A view name containing a backtick ( Medium Priority5. Post-commit callbacks fire on no-op (empty) transactions if (phase1 != null)
commit2ndPhase(phase1);
else
resetAndFireCallbacks(); // fires even when nothing was committedWhen 6. In 7. for (final MaterializedViewImpl view : materializedViews.values()) { // not synchronized
if (view.getBackingTypeName().equals(typeName))
throw new SchemaException(...);
}
8. this.sourceTypeNames = result.getProperty("sourceTypes"); // unchecked cast, no null checkIf the server doesn't return Low Priority / Style9. if (view instanceof MaterializedViewImpl)
r.setProperty("refreshInterval", ((MaterializedViewImpl) view).getRefreshInterval());
10. The interface returns a 11. New keywords may shadow existing identifiers
12. The classifier does not check for 13. Docs/design files committed to Four large planning documents (5900+ lines combined) are included in the PR. These are useful artifacts for understanding the design but probably don't belong in the main source tree. Consider moving them to a wiki page or PR description if the plan is to merge them. Test CoverageThe test suite is comprehensive for a new feature of this size:
A few gaps worth addressing:
SummaryThe overall design is well-thought-out and consistent with ArcadeDB's existing patterns. The most important issues to address before merging are:
The PR is marked as DRAFT, so these are well-timed to address. Good work on the breadth of this feature implementation. |
Code Review: Add Materialized Views (#3464)This is a substantial and well-structured feature implementation. The overall design is solid and fits well within ArcadeDB's existing architecture. Below is my feedback organized by area. Architecture & DesignStrengths:
Concerns:
Potential Bugs
Code Quality
Test CoverageGood coverage overall. The test suite ( Missing test cases:
Minor Issues
Summary
Given this is still in DRAFT state, these are the right issues to address before marking ready for review. The foundation is solid. |
Code Review: Add Materialized Views (#3464)This is a well-structured, large feature addition. The design document is thorough and the implementation follows existing project conventions closely. Below is my detailed feedback. Architecture & DesignThe overall architecture is sound: materialized views backed by real Positive highlights:
Issues & Concerns1. SQL Injection in
|
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
5e906a2 to
a8b2855
Compare
Materialized Views — Code ReviewThis is a solid and well-structured implementation. The design is cohesive, the code follows existing ArcadeDB conventions, and test coverage is good. Below are observations organized by severity. Correctness / Potential Bugs1. The 2. Full refresh inside a post-commit callback re-opens a new transaction In 3. In case "materializedviews" -> ...This means 4. remoteDatabase.command("sql", "DROP MATERIALIZED VIEW `" + viewName + "`");Even though backtick quoting is used, a view name containing a backtick would break this. The builder already validates names don't contain backticks on creation, but the remote drop path has no such validation. Consider using a parameterized form or at minimum re-validating. 5. Source type existence is validated at creation, but not at schema reload In Design / API Concerns6. The interface declares 7. Since the
8. final AtomicBoolean refreshInProgress = new AtomicBoolean(false);This is accessed directly from 9. The current Performance10.
11. Sorting in Arrays.stream(views).sorted(...).collect(Collectors.toList())This allocates a Code Style12. The statement classes ( 13. The PR includes 4 planning documents under 14. Missing
Test CoverageThe test suite is comprehensive. A few gaps worth addressing:
Minor
Overall this is a well-executed feature. The main items worth addressing before merge are #6 (type-safe status enum in the interface) and #3 (verifying the case-insensitive routing). Everything else can be addressed as follow-up issues or during review iteration. |
Comprehensive Code Review: PR #3464 - Materialized ViewsExcellent implementation overall! The code demonstrates strong understanding of ArcadeDB's architecture and follows project conventions meticulously. Here's my detailed review: ✅ STRENGTHSCode Quality
Architecture Highlights
🐛 CRITICAL ISSUES (Must Fix)1. Source Type Deletion VulnerabilityLocation: Issue: If a source type is deleted while a materialized view depends on it:
Current protection: MaterializedViewBuilder validates source types exist at creation (lines 111-113), but nothing prevents deletion afterward. Fix needed in LocalSchema.dropType(): // Check if type is source for any materialized view
for (MaterializedViewImpl view : materializedViews.values()) {
if (view.getSourceTypeNames().contains(typeName))
throw new SchemaException("Cannot drop type '" + typeName +
"' because it is a source for materialized view '" + view.getName() + "'");
}2. Field Visibility IssueLocation: final AtomicBoolean refreshInProgress = new AtomicBoolean(false);Issue: Should be
|
Code Review: Materialized Views Implementation (PR #3464)This is an impressive implementation of materialized views for ArcadeDB. The feature is well-architected with comprehensive test coverage and good attention to security/safety concerns. Below are my findings organized by category. ✅ StrengthsArchitecture & Design
Security
Code Quality
|
46d0354 to
2e4d962
Compare
Pull Request Review: Materialized Views ImplementationOverall AssessmentThis is an excellent and comprehensive implementation of materialized views for ArcadeDB. The feature is well-designed, thoroughly tested, and follows the project's coding standards. The implementation includes:
The code has been through multiple rounds of review and refinement, addressing SQL injection, concurrency, and HA concerns. Code Quality: ✅ ExcellentStrengths
Minor Observations
Potential Bugs: ✅ None FoundKey concerns have been addressed:
Performance Considerations:
|
Add afterCommitCallbacks mechanism to TransactionContext that allows registering Runnable callbacks to fire after a successful commit. Callbacks are discarded on rollback and errors in callbacks are logged but do not affect the commit or other callbacks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…face Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tors) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace ThreadLocal-based deduplication with addAfterCommitCallbackIfAbsent on TransactionContext to prevent leak on rollback - Add unregisterListeners() to clean up INCREMENTAL listeners on view drop - Wrap create/drop in recordFileChanges() for HA replication atomicity - Clear materializedViews map before reload in readConfiguration() - Make lastRefreshTime and status volatile for thread safety - Synchronize getMaterializedViewScheduler() for thread safety - Use CommandExecutionException for unsupported ALTER MATERIALIZED VIEW Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add FetchFromSchemaMaterializedViewsStep for SELECT FROM schema:materializedViews - Register materializedviews target in SelectExecutionPlanner - Add RemoteMaterializedView read-only proxy for remote clients - Implement RemoteSchema MV methods (exists, get, getAll, drop) - buildMaterializedView() throws UnsupportedOperationException remotely - getMaterializedView() throws SchemaException when not found (consistent with LocalSchema) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add MaterializedViewEdgeCaseTest (9 tests): multiple views, duplicates, empty results, large datasets, re-creation, backing type protection - Add HTTPMaterializedViewIT (7 tests): HTTP API CRUD, refresh modes, schema metadata endpoint, idempotent operations - Add RemoteMaterializedViewIT (2 tests): remote client SQL CRUD, schema proxy methods, unsupported operation handling - Add ReplicationMaterializedViewIT (2 tests): HA create/drop replication, schema file synchronization across replicas - Add incremental tests: listener unregistration on drop, refresh after rollback - Add SQL test: schema:materializedViews metadata query Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix SQL injection in RemoteSchema by using backtick quoting - Rewrite extractSourceTypes to use SQL parser AST instead of fragile indexOf - Rewrite MaterializedViewQueryClassifier to use AST inspection instead of string contains Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ALTER MATERIALIZED VIEW support to change refresh mode (MANUAL, INCREMENTAL, PERIODIC) on existing views. Uses immutable view replacement pattern via copyWithRefreshMode() to safely swap instances while tearing down old and setting up new refresh infrastructure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix post-commit callbacks firing on failed transactions (TransactionContext) - Fix || to && in CreateMaterializedViewStatement.toString() for PERIODIC mode - Add backtick escaping for backing type name in DELETE query - Clean up orphaned backing type on initial refresh failure in builder - Introduce MaterializedViewStatus enum replacing string status literals - Use volatile instead of transient for changeListener field - Store DatabaseInternal directly in MaterializedViewChangeListener - Fix backtick-to-single-quote quoting in RemoteSchema WHERE clauses Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix SQL injection in RemoteSchema by using parameterized queries - Fix FetchFromSchemaMaterializedViewsStep.close() clearing shared result list - Replace DELETE FROM with TRUNCATE TYPE UNSAFE in MaterializedViewRefresher - Fix AlterMaterializedViewStatement.toString() for PERIODIC mode - Add synchronized to all materializedViews access methods in LocalSchema - Add e2e test for materialized view creation and query with brewery denormalization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Persist BUILDING status before initial refresh so crash recovery works - Validate view names do not contain backtick characters - Synchronize dropType() iteration over materializedViews map - Guard RemoteMaterializedView.sourceTypeNames against null from server - Remove unnecessary instanceof cast in FetchFromSchemaMaterializedViewsStep Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Validate view name for backtick in fromJSON() to prevent SQL injection from a tampered schema.json bypassing the create() guard - Guard fullRefresh() against concurrent execution with AtomicBoolean; a second concurrent refresh is skipped to avoid TRUNCATE + race Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Block dropping a type that is a source for a materialized view (LocalSchema.dropType now checks sourceTypeNames in addition to the existing backing-type guard) - Make MaterializedViewImpl.refreshInProgress private; expose it via tryBeginRefresh() / endRefresh() methods consumed by the refresher - Align planning docs: replace ON_COMMIT with INCREMENTAL throughout 2026-02-06-materialized-views-impl.md; update RemoteSchema test snippet in 2026-02-15-materialized-views-testing.md to assert UnsupportedOperationException instead of null - Add tests covering coverage gaps reported by Codecov: copyWithRefreshMode, tryBeginRefresh/endRefresh, setLastRefreshTime, getRefreshInterval, getChangeListener/setChangeListener, getBackingType, concurrent-refresh skip, builder withTotalBuckets/withPageSize, PERIODIC scheduler refresh, all QueryClassifier branches (GROUP BY, aggregates, non-SELECT), onAfterUpdate and onAfterDelete incremental triggers, and the no-active-transaction early-return guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2e4d962 to
1390ad1
Compare
Code Review: Materialized Views ImplementationThank you for this comprehensive implementation of materialized views! This is a substantial feature addition with good architecture and testing. Below is my detailed review organized by priority. Critical Issues - Must Fix Before Merge1. Missing Authorization Checks 2. Potential NPE in FetchFromSchemaMaterializedViewsStep Code Style Issues - Per CLAUDE.md Guidelines3. Missing final Keywords 4. Unnecessary Braces on Single-Statement If Blocks Security Concerns5. SQL Injection Defense in Depth Performance Considerations6. Full Refresh on Every Incremental Change 7. Synchronization Overhead on Read Operations 8. Single-Threaded Scheduler Bottleneck Test Coverage GapsWhile existing test coverage is good, consider adding:
Excellent Practices FoundGreat work on these aspects:
SummaryOverall Assessment: Solid implementation with good architecture and proper thread safety considerations. Priority:
The implementation demonstrates excellent understanding of the codebase architecture. With the critical fixes applied, this will be a valuable feature addition. |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
What does this PR do?
implementation for #3463