-
Notifications
You must be signed in to change notification settings - Fork 52
Add vector attribute support for PostgreSQL with pgvector extension #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: jakeb994 <jakeb994@gmail.com>
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds first-class vector datatype and query support: Docker installs pgvector; core constants/types and vector query operators; Vector validator and index (HNSW) validation; adapter capability APIs and SQL/Postgres vector ordering; extensive unit and e2e tests and related refactors (SRID and index-length constants). Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant DB as Database
participant SQL as Adapter\\SQL
participant PG as Adapter\\Postgres
participant PGX as PostgreSQL (pgvector)
rect rgb(245,250,255)
note over Client,DB: Create collection with vector attribute
Client->>DB: createCollection(schema with VAR_VECTOR size=N)
DB->>PG: getSupportForVectors()
PG-->>DB: true
DB->>PG: enable pgvector extension
PG->>PGX: CREATE EXTENSION IF NOT EXISTS vector
PGX-->>PG: ok / error
DB->>PG: CREATE TABLE ... COLUMN VECTOR(N)
PG-->>DB: ok
DB-->>Client: collection created
end
sequenceDiagram
autonumber
actor Client
participant DB as Database
participant SQL as Adapter\\SQL
participant PG as Adapter\\Postgres
participant PGX as PostgreSQL (pgvector)
rect rgb(245,255,245)
note over Client,DB: Vector similarity query flow
Client->>DB: find([vectorCosine('emb', v)], limit)
DB->>SQL: build SQL, extract vector queries
SQL->>PG: getVectorDistanceOrder(query, binds, alias)
PG-->>SQL: ORDER BY expression using pgvector ops
SQL-->>DB: final SQL + binds
DB->>PGX: SELECT ... ORDER BY vector distance
PGX-->>DB: rows
DB-->>Client: documents
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-authored-by: jakeb994 <jakeb994@gmail.com>
Co-authored-by: jakeb994 <jakeb994@gmail.com>
Co-authored-by: jakeb994 <jakeb994@gmail.com>
Co-authored-by: jakeb994 <jakeb994@gmail.com>
Co-authored-by: jakeb994 <jakeb994@gmail.com>
Co-authored-by: jakeb994 <jakeb994@gmail.com>
Co-authored-by: jakeb994 <jakeb994@gmail.com>
Co-authored-by: jakeb994 <jakeb994@gmail.com>
Co-authored-by: jakeb994 <jakeb994@gmail.com>
…r-support # Conflicts: # src/Database/Adapter/Postgres.php # src/Database/Adapter/SQL.php # src/Database/Database.php # src/Database/Query.php # src/Database/Validator/Query/Filter.php # src/Database/Validator/Structure.php
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/Database/Query.php (1)
885-901: Bug: fallback assigns offset/cursor from $limitIn two places, fallback uses $limit instead of the local variable, which can silently corrupt grouping when values are missing:
- Line 890:
$offset = $values[0] ?? $limit;should use$offset.- Line 899:
$cursor = $values[0] ?? $limit;should use$cursor.Apply:
- $offset = $values[0] ?? $limit; + $offset = $values[0] ?? $offset; ... - $cursor = $values[0] ?? $limit; + $cursor = $values[0] ?? $cursor;src/Database/Adapter/Postgres.php (1)
877-923: Add HNSW index validation to enforce single-attribute constraint.While the match statement correctly includes HNSW index types, the code doesn't validate that HNSW indexes reference exactly one vector attribute or prevent order directives, as noted in the previous review comments. HNSW indexes in PostgreSQL pgvector must reference a single vector column and don't support sort orders.
Apply this diff to add validation:
public function createIndex(string $collection, string $id, string $type, array $attributes, array $lengths, array $orders, array $indexAttributeTypes = []): bool { $collection = $this->filter($collection); $id = $this->filter($id); + + $isHnsw = \in_array($type, [ + Database::INDEX_HNSW_EUCLIDEAN, + Database::INDEX_HNSW_COSINE, + Database::INDEX_HNSW_DOT, + ], true); + + if ($isHnsw && \count($attributes) !== 1) { + throw new DatabaseException('HNSW indexes must reference exactly one vector attribute'); + } + + if ($isHnsw && !empty($orders)) { + throw new DatabaseException('HNSW indexes do not support attribute order directives'); + } foreach ($attributes as $i => $attr) { - $order = empty($orders[$i]) || Database::INDEX_FULLTEXT === $type ? '' : $orders[$i]; + $order = (empty($orders[$i]) || Database::INDEX_FULLTEXT === $type || $isHnsw) ? '' : $orders[$i];src/Database/Database.php (1)
2411-2460: Vector defaults need the same guard on update.
updateAttribute()now accepts vector defaults without re-checking shape/length, so a scalar (or wrong-size) default slips through even thoughcreateAttribute()rejects it. A malformed default then propagates to new documents and downstream validators. Mirror the top-level checks from the creation path: require an array, enforce exact length, and ensure every element is numeric before callingvalidateDefaultTypes().Apply this diff:
case self::VAR_VECTOR: if (!$this->adapter->getSupportForVectors()) { throw new DatabaseException('Vector types are not supported by the current database'); } if ($array) { throw new DatabaseException('Vector type cannot be an array'); } if ($size <= 0) { throw new DatabaseException('Vector dimensions must be a positive integer'); } if ($size > self::MAX_VECTOR_DIMENSIONS) { throw new DatabaseException('Vector dimensions cannot exceed ' . self::MAX_VECTOR_DIMENSIONS); } + if ($default !== null) { + if (!\is_array($default)) { + throw new DatabaseException('Vector default value must be an array'); + } + if (\count($default) !== $size) { + throw new DatabaseException('Vector default value must have exactly ' . $size . ' elements'); + } + foreach ($default as $component) { + if (!\is_numeric($component)) { + throw new DatabaseException('Vector default value must contain only numeric elements'); + } + } + } break;src/Database/Validator/Index.php (1)
34-38: Initialize$attributesto prevent uninitialized typed-property error.The typed property
$attributesis written by key before being initialized, which can cause errors. Initialize it at the start of the constructor body.Based on learnings
Apply this diff:
) { + $this->attributes = []; foreach ($attributes as $attribute) { $key = \strtolower($attribute->getAttribute('key', $attribute->getAttribute('$id'))); $this->attributes[$key] = $attribute;
♻️ Duplicate comments (3)
src/Database/Adapter/SQL.php (1)
2379-2391: Add validation for nested vector queries and adapter support.Two concerns:
Nested vector queries: The past review comment correctly identifies that nested vector queries (inside AND/OR) will slip through extraction and hit
getSQLOperator(), causing a generic exception. While the exception will eventually be thrown, failing fast with a clear message would improve the developer experience.Adapter capability: No check that
getSupportForVectors()returns true before extracting vector queries. If the adapter doesn't support vectors,getVectorDistanceOrder()returns null and the queries are silently ignored, which could confuse users who expect similarity search to work.Consider adding these validations:
+ // Validate vector support if vector queries present + $hasVectorQueries = false; + foreach ($queries as $query) { + if ($query->isNested()) { + foreach ($query->getValues() as $nested) { + if (in_array($nested->getMethod(), Query::VECTOR_TYPES, true)) { + throw new DatabaseException('Vector queries must be top-level (not nested inside AND/OR).'); + } + } + } + if (in_array($query->getMethod(), Query::VECTOR_TYPES, true)) { + $hasVectorQueries = true; + } + } + + if ($hasVectorQueries && !$this->getSupportForVectors()) { + throw new DatabaseException('Vector queries are not supported by this adapter.'); + } + // Extract vector queries for ORDER BY $vectorQueries = []; $otherQueries = [];src/Database/Validator/Index.php (2)
356-361: Usearray_filterfor orders check to avoid false positives.The
!empty($orders)check can incorrectly reject vector indexes when the API supplies an orders array pre-filled with empty values (e.g.,['', '']). Usearray_filterto match the lengths check.Based on learnings
Apply this diff:
$orders = $index->getAttribute('orders', []); $lengths = $index->getAttribute('lengths', []); - if (!empty($orders) || \count(\array_filter($lengths)) > 0) { + if (\count(\array_filter($orders)) > 0 || \count(\array_filter($lengths)) > 0) { $this->message = 'Vector indexes do not support orders or lengths'; return false; }
285-288: Fix orders check and add lengths validation for spatial indexes.The current
!empty($orders)check false-positives on arrays with empty values (e.g.,['', '']). Additionally, spatial indexes should not allow lengths, but this is not validated.Based on learnings
Apply this diff:
+ $lengths = $index->getAttribute('lengths', []); - if (!empty($orders) && !$this->supportForSpatialIndexNull) { + if (\count(\array_filter($orders)) > 0 && !$this->supportForSpatialIndexOrder) { $this->message = 'Spatial indexes with explicit orders are not supported. Remove the orders to create this index.'; return false; } + + if (\count(\array_filter($lengths)) > 0) { + $this->message = 'Spatial indexes do not support lengths'; + return false; + }
🧹 Nitpick comments (7)
src/Database/Query.php (3)
114-118: Grouping constant VECTOR_TYPESNice touch. Consider adding a convenience helper:
- isVectorQuery(): bool using VECTOR_TYPES (parity with isSpatialQuery), useful in adapters/validators.
877-877: Behavior note: first limit winsComment matches implementation (first limit kept, rest ignored). Consider documenting this in public API docs to avoid caller confusion.
1134-1168: Vector helpers: add lightweight input validation (optional)Helpers accept
array<float>but PHP can't enforce; a quick guard improves DX and earlier errors.public static function vectorDot(string $attribute, array $vector): self { - return new self(self::TYPE_VECTOR_DOT, $attribute, [$vector]); + self::assertNumericVector($vector); + return new self(self::TYPE_VECTOR_DOT, $attribute, [$vector]); } @@ public static function vectorCosine(string $attribute, array $vector): self { - return new self(self::TYPE_VECTOR_COSINE, $attribute, [$vector]); + self::assertNumericVector($vector); + return new self(self::TYPE_VECTOR_COSINE, $attribute, [$vector]); } @@ public static function vectorEuclidean(string $attribute, array $vector): self { - return new self(self::TYPE_VECTOR_EUCLIDEAN, $attribute, [$vector]); + self::assertNumericVector($vector); + return new self(self::TYPE_VECTOR_EUCLIDEAN, $attribute, [$vector]); }Add this private helper anywhere in the class:
/** @param array<mixed> $v */ private static function assertNumericVector(array $v): void { if ($v === [] || !\array_is_list($v)) { throw new QueryException('Vector must be a non-empty list'); } foreach ($v as $x) { if (!\is_int($x) && !\is_float($x)) { throw new QueryException('Vector elements must be numeric'); } } }tests/e2e/Adapter/Scopes/CollectionTests.php (1)
696-698: Sync comment with new constant nameInline comment still mentions ARRAY_INDEX_LENGTH. Update to prevent drift.
- 'lengths' => [500], // Will be changed to Database::ARRAY_INDEX_LENGTH (255) + 'lengths' => [500], // Will be changed to Database::MAX_ARRAY_INDEX_LENGTH (255)src/Database/Adapter/MySQL.php (1)
199-235: Use DEFAULT_SRID in spatial SQL type stringsSRID 4326 is hardcoded three times. Reuse Database::DEFAULT_SRID for consistency and single source of truth.
- case Database::VAR_POINT: - $type = 'POINT SRID 4326'; + case Database::VAR_POINT: + $srid = Database::DEFAULT_SRID; + $type = "POINT SRID {$srid}"; @@ - case Database::VAR_LINESTRING: - $type = 'LINESTRING SRID 4326'; + case Database::VAR_LINESTRING: + $srid = Database::DEFAULT_SRID; + $type = "LINESTRING SRID {$srid}"; @@ - case Database::VAR_POLYGON: - $type = 'POLYGON SRID 4326'; + case Database::VAR_POLYGON: + $srid = Database::DEFAULT_SRID; + $type = "POLYGON SRID {$srid}";src/Database/Validator/Structure.php (1)
359-362: Harden VAR_VECTOR validation (dimension guard and non-array enforcement).Today a missing/zero size yields a confusing “array of 0” requirement, and a misconfigured
array: truecould route vectors through the array-branch. Add small guards:
- Validate size > 0 and ≤ Database::MAX_VECTOR_DIMENSIONS.
- Force
$array = falsefor vectors to avoid treating them as arrays of vectors.Suggested diff:
- case Database::VAR_VECTOR: - $validators[] = new Vector($attribute['size'] ?? 0); - break; + case Database::VAR_VECTOR: + $dimension = (int) ($attribute['size'] ?? 0); + if ($dimension <= 0 || $dimension > Database::MAX_VECTOR_DIMENSIONS) { + $this->message = 'Attribute "'.$key.'" has invalid size. Must be between 1 and '.Database::MAX_VECTOR_DIMENSIONS; + return false; + } + $validators[] = new Vector($dimension); + // Ensure vectors are treated as a single array value, not an array of vectors + $array = false; + break;Additionally, consider tightening Vector::isValid to require array_is_list($value) so associative arrays are rejected. (src/Database/Validator/Vector.php: isValid)
tests/e2e/Adapter/Scopes/DocumentTests.php (1)
86-90: Complete the remaining Database::MAX_BIG_INT replacements in DocumentTests.php for consistency.The migration to Database::MAX_* constants is correct. The verification script found two remaining hardcoded bigint literals that should be replaced:
- Line 408:
assertEquals(9223372036854775807, ...)→ useDatabase::MAX_BIG_INT- Line 424:
assertEquals(9223372036854775807, ...)→ useDatabase::MAX_BIG_INT9223372036854775807 is PHP_INT_MAX on 64-bit systems, and Database::MAX_BIG_INT is defined as this value. Replacing these hardcoded literals ensures consistency across all similar assertions and improves portability, avoiding 32-bit brittleness.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
src/Database/Adapter/MariaDB.php(3 hunks)src/Database/Adapter/MySQL.php(1 hunks)src/Database/Adapter/Postgres.php(10 hunks)src/Database/Adapter/SQL.php(8 hunks)src/Database/Database.php(16 hunks)src/Database/Query.php(7 hunks)src/Database/Validator/Index.php(5 hunks)src/Database/Validator/Sequence.php(1 hunks)src/Database/Validator/Structure.php(2 hunks)tests/e2e/Adapter/Scopes/CollectionTests.php(1 hunks)tests/e2e/Adapter/Scopes/DocumentTests.php(18 hunks)tests/e2e/Adapter/Scopes/IndexTests.php(2 hunks)tests/e2e/Adapter/Scopes/PermissionTests.php(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/Database/Validator/Sequence.php
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T02:04:17.803Z
Learnt from: abnegate
PR: utopia-php/database#721
File: tests/e2e/Adapter/Scopes/DocumentTests.php:6418-6439
Timestamp: 2025-10-03T02:04:17.803Z
Learning: In tests/e2e/Adapter/Scopes/DocumentTests::testSchemalessDocumentInvalidInteralAttributeValidation (PHP), when the adapter reports getSupportForAttributes() === false (schemaless), the test should not expect exceptions from createDocuments for “invalid” internal attributes; remove try/catch and ensure the test passes without exceptions, keeping at least one assertion.
Applied to files:
tests/e2e/Adapter/Scopes/DocumentTests.php
🧬 Code graph analysis (11)
tests/e2e/Adapter/Scopes/CollectionTests.php (1)
src/Database/Database.php (1)
Database(37-8255)
src/Database/Adapter/MariaDB.php (1)
src/Database/Database.php (1)
Database(37-8255)
tests/e2e/Adapter/Scopes/PermissionTests.php (1)
src/Database/Database.php (1)
Database(37-8255)
tests/e2e/Adapter/Scopes/DocumentTests.php (1)
src/Database/Database.php (1)
Database(37-8255)
src/Database/Validator/Structure.php (2)
src/Database/Database.php (1)
Database(37-8255)src/Database/Validator/Vector.php (1)
Vector(7-84)
tests/e2e/Adapter/Scopes/IndexTests.php (6)
src/Database/Adapter/SQL.php (6)
getMaxIndexLength(1817-1823)getInternalIndexesKeys(1867-1870)getSupportForIndexArray(1465-1468)getSupportForSpatialIndexNull(1505-1508)getSupportForSpatialIndexOrder(1515-1518)getSupportForVectors(1535-1538)src/Database/Adapter.php (6)
getMaxIndexLength(876-876)getInternalIndexesKeys(1276-1276)getSupportForIndexArray(935-935)getSupportForSpatialIndexNull(1071-1071)getSupportForSpatialIndexOrder(1085-1085)getSupportForVectors(1029-1029)src/Database/Adapter/Pool.php (6)
getMaxIndexLength(308-311)getInternalIndexesKeys(493-496)getSupportForIndexArray(338-341)getSupportForSpatialIndexNull(438-441)getSupportForSpatialIndexOrder(528-531)getSupportForVectors(408-411)src/Database/Adapter/MariaDB.php (4)
getInternalIndexesKeys(1769-1772)getSupportForIndexArray(1842-1845)getSupportForSpatialIndexNull(1857-1860)getSupportForSpatialIndexOrder(1876-1879)src/Database/Adapter/MySQL.php (2)
getSupportForIndexArray(130-136)getSupportForSpatialIndexOrder(179-182)src/Database/Adapter/Postgres.php (3)
getSupportForSpatialIndexNull(2062-2065)getSupportForSpatialIndexOrder(2082-2085)getSupportForVectors(1990-1993)
src/Database/Adapter/Postgres.php (4)
src/Database/Query.php (5)
getAttribute(173-176)Query(8-1169)setAttribute(214-219)getValues(181-184)getMethod(165-168)src/Database/Database.php (2)
Database(37-8255)getNamespace(791-794)src/Database/Adapter/SQL.php (7)
getSQLTable(1736-1739)getSpatialGeomFromText(1547-1559)getVectorDistanceOrder(1597-1600)getInternalKeyForAttribute(1950-1962)getPDO(1745-1748)execute(1984-1987)getSupportForVectors(1535-1538)src/Database/Adapter.php (5)
getNamespace(107-110)filter(1203-1212)quote(467-467)execute(1300-1300)getSupportForVectors(1029-1029)
src/Database/Adapter/SQL.php (5)
src/Database/Database.php (1)
Database(37-8255)src/Database/Adapter/Postgres.php (2)
getSupportForVectors(1990-1993)getVectorDistanceOrder(1721-1741)src/Database/Adapter.php (1)
getSupportForVectors(1029-1029)src/Database/Adapter/Pool.php (1)
getSupportForVectors(408-411)src/Database/Query.php (2)
Query(8-1169)getMethod(165-168)
src/Database/Adapter/MySQL.php (1)
src/Database/Database.php (1)
Database(37-8255)
src/Database/Validator/Index.php (1)
src/Database/Database.php (1)
Database(37-8255)
src/Database/Database.php (4)
src/Database/Adapter/Postgres.php (1)
getSupportForVectors(1990-1993)src/Database/Adapter/SQL.php (1)
getSupportForVectors(1535-1538)src/Database/Adapter.php (1)
getSupportForVectors(1029-1029)src/Database/Adapter/Pool.php (1)
getSupportForVectors(408-411)
🪛 GitHub Actions: CodeQL
tests/e2e/Adapter/Scopes/IndexTests.php
[error] 167-172: PHPStan: Access to protected property Utopia\Database\Database::$adapter.
[error] 245-250: PHPStan: Access to protected property Utopia\Database\Database::$adapter.
src/Database/Validator/Index.php
[error] 51-51: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 62-62: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 76-76: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 94-94: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 113-113: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 138-138: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 143-143: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 149-149: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 155-155: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 160-160: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 164-164: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 185-185: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 206-206: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 216-216: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 224-224: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 241-241: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 265-265: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 274-274: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 280-280: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 286-286: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 313-313: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 339-339: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 346-346: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 352-352: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 359-359: PHPStan: Access to an undefined property Utopia\Database\Validator\Index::$message.
[error] 51-359: PHPStan reported 37 errors in total from analyse command.
🪛 PHPMD (2.15.0)
src/Database/Adapter/SQL.php
1597-1597: Avoid unused parameters such as '$query'. (undefined)
(UnusedFormalParameter)
1597-1597: Avoid unused parameters such as '$binds'. (undefined)
(UnusedFormalParameter)
1597-1597: Avoid unused parameters such as '$alias'. (undefined)
(UnusedFormalParameter)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Adapter Tests (SharedTables/SQLite)
- GitHub Check: Adapter Tests (SharedTables/Postgres)
- GitHub Check: Adapter Tests (Postgres)
- GitHub Check: Adapter Tests (SQLite)
- GitHub Check: Adapter Tests (MySQL)
- GitHub Check: Adapter Tests (SharedTables/MySQL)
- GitHub Check: Adapter Tests (Pool)
- GitHub Check: Adapter Tests (MariaDB)
- GitHub Check: Adapter Tests (SharedTables/MariaDB)
- GitHub Check: Adapter Tests (Mirror)
- GitHub Check: Unit Test
🔇 Additional comments (22)
src/Database/Query.php (3)
44-48: New vector query constants — LGTMNaming is clear and aligned with distance metrics. No concerns.
99-102: Expose vector types in TYPESGood to surface them as first‑class methods for validation/parse. Ensure downstream validators/adapter switches treat these as filters (not orders/selections).
294-297: isMethod updated for vectorsMethod whitelist correctly includes the new vector methods and select. Looks good.
tests/e2e/Adapter/Scopes/CollectionTests.php (1)
719-721: Update to MAX_ARRAY_INDEX_LENGTH — LGTMAssertion matches Database::createCollection behavior for array attributes.
src/Database/Adapter/MySQL.php (1)
119-123: Switch to DEFAULT_SRIDGood consistency with Database::DEFAULT_SRID for meters path.
src/Database/Validator/Structure.php (2)
327-333: INT range update is correct.Switch to MAX_INT/MAX_BIG_INT with size>=8 heuristic is appropriate and keeps behavior explicit.
335-340: FLOAT bounds via MAX_DOUBLE look good.Using FloatValidator + Range with MAX_DOUBLE mirrors prior intent and avoids stale DOUBLE_MAX.
src/Database/Adapter/MariaDB.php (2)
142-142: LGTM! Constant rename aligns with updated Database constants.The use of
Database::MAX_ARRAY_INDEX_LENGTHis consistent with the new constant naming introduced in the PR.Also applies to: 749-749
1893-1893: LGTM! SRID constant updated correctly.The use of
Database::DEFAULT_SRIDaligns with the new global constant for WGS84 geographic coordinates.tests/e2e/Adapter/Scopes/PermissionTests.php (1)
221-224: LGTM! Test data updated with new constant names.The constant renames from
Database::INT_MAXtoDatabase::MAX_INTandDatabase::BIG_INT_MAXtoDatabase::MAX_BIG_INTare consistent with the updated Database class constants.Also applies to: 253-256, 277-280
src/Database/Adapter/Postgres.php (7)
196-207: LGTM! Vector extension installation integrated correctly.The vector attribute detection and automatic pgvector extension installation in
createCollectionensures the extension is available before creating vector columns.
456-465: LGTM! Vector dimension validation uses central constant.The validation correctly uses
Database::MAX_VECTOR_DIMENSIONSinstead of a hardcoded value and ensures the pgvector extension is installed. This addresses the concerns from the previous review.Also applies to: 568-576
1649-1652: LGTM! Vector queries correctly delegated to ORDER BY clause.Returning an empty string for vector distance queries in the WHERE clause is correct, as these are handled in the ORDER BY clause via
getVectorDistanceOrder.
1712-1741: LGTM! Vector distance ORDER BY implementation looks correct.The
getVectorDistanceOrdermethod correctly:
- Maps query types to PostgreSQL pgvector operators (
<#>,<=>,<->)- Formats the vector as a PostgreSQL array literal
- Casts to
::vectortype
1810-1820: LGTM! Vector SQL type mapping implemented correctly.The addition of
Database::VAR_VECTORreturningVECTOR({$size})is correct for PostgreSQL pgvector. The SRID updates for spatial types also correctly use the centralDatabase::DEFAULT_SRIDconstant.
1841-1855: LGTM! Safe extension installation implementation.The private
ensurePgVectorExtensionmethod safely installs the pgvector extension usingCREATE EXTENSION IF NOT EXISTS, preventing errors if already installed.
1985-1993: LGTM! Vector support capability exposed correctly.The public
getSupportForVectorsmethod correctly indicates that PostgreSQL supports vector types through pgvector.src/Database/Adapter/SQL.php (5)
1133-1136: LGTM! Vector width calculation is correct.The 4 bytes per dimension calculation is appropriate for float32 storage. Note: The past review comment about Postgres document size limit should be addressed in the Postgres adapter file, not here.
1530-1538: LGTM! Capability method follows established pattern.The default
falsereturn allows adapters like Postgres to override and advertise vector support.
1589-1600: LGTM! Base method for adapter overrides.This scaffolding method is correctly designed to be overridden by adapters that support vectors (e.g., Postgres). The static analysis warnings about unused parameters are false positives—the parameters are used in concrete implementations.
1663-1666: LGTM! Correct exception for vector queries.Vector queries are handled via ORDER BY (not WHERE conditions), so throwing an exception here prevents misuse. The error message clearly indicates the issue.
2479-2492: LGTM! Vector ordering logic is sound.Prepending vector distance calculations to the ORDER BY clause correctly prioritizes similarity search results. The implementation handles multiple vector queries (if needed) by adding all their distance expressions to the ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (3)
tests/e2e/Adapter/Scopes/VectorTests.php (3)
1699-1711: Fix skipAuth and find() calls (same issues as lines 1643-1665).Same issues:
skipAuth()method doesn't exist, andfind()expects a string for the 3rd parameter, notarray<Role>.
31-31: Fix type error: pass explicit booleans instead of null.The
createAttribute()method expects boolean parameters for$array(6th parameter) and$required(7th parameter), but line 31 passesnullfor the 6th parameter. This causes a type error under strict types.Apply this diff:
- $database->createAttribute('vectorCollection', 'large_embedding', Database::VAR_VECTOR, 128, false, null); + $database->createAttribute('vectorCollection', 'large_embedding', Database::VAR_VECTOR, 128, false, false, false);
666-701: Add a secondary ordering for stable cursor pagination.Vector distance ordering alone can produce nondeterministic results when multiple documents have equal distances, leading to unstable pagination with
cursorAfter. AppendQuery::orderAsc('$id')as a tie-breaker to ensure stable, predictable pagination.Apply this diff to stabilize pagination:
$page1 = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(3), Query::offset(0) ]); // Second page $page2 = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(3), Query::offset(3) ]); // Test with cursor pagination $firstBatch = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(5) ]); $lastDoc = $firstBatch[4]; $nextBatch = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::cursorAfter($lastDoc), Query::limit(5) ]);
🧹 Nitpick comments (1)
tests/e2e/Adapter/Scopes/VectorTests.php (1)
906-909: Relax strict equality for approximate HNSW results.HNSW is an approximate nearest-neighbor index; identical top-K ordering isn't guaranteed even on small datasets. Replace the strict
assertEqualswith basic sanity checks to avoid flaky test failures.Apply this diff:
- // Results should be the same - $this->assertEquals( - array_map(fn ($d) => $d->getId(), $results1), - array_map(fn ($d) => $d->getId(), $results2) - ); + // Results should be valid and non-empty (HNSW is approximate) + $this->assertCount(10, $results2); + $this->assertNotEmpty($results2[0]->getId());
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/e2e/Adapter/Scopes/VectorTests.php(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T02:04:17.803Z
Learnt from: abnegate
PR: utopia-php/database#721
File: tests/e2e/Adapter/Scopes/DocumentTests.php:6418-6439
Timestamp: 2025-10-03T02:04:17.803Z
Learning: In tests/e2e/Adapter/Scopes/DocumentTests::testSchemalessDocumentInvalidInteralAttributeValidation (PHP), when the adapter reports getSupportForAttributes() === false (schemaless), the test should not expect exceptions from createDocuments for “invalid” internal attributes; remove try/catch and ensure the test passes without exceptions, keeping at least one assertion.
Applied to files:
tests/e2e/Adapter/Scopes/VectorTests.php
🧬 Code graph analysis (1)
tests/e2e/Adapter/Scopes/VectorTests.php (4)
src/Database/Adapter.php (1)
Adapter(16-1325)src/Database/Database.php (4)
Database(37-8255)createAttribute(1734-1793)createIndex(3367-3505)count(7040-7089)src/Database/Query.php (5)
Query(8-1169)vectorDot(1141-1144)vectorCosine(1153-1156)vectorEuclidean(1165-1168)limit(640-643)src/Database/Adapter/Postgres.php (3)
getSupportForVectors(1990-1993)createAttribute(454-484)createIndex(877-932)
🪛 GitHub Actions: CodeQL
tests/e2e/Adapter/Scopes/VectorTests.php
[error] 1261-1261: Parameter #7 $signed of method Utopia\Database\Database::createAttribute() expects bool, array<string, string|true> given.
[error] 1342-1342: Parameter #7 $signed of method Utopia\Database\Database::createAttribute() expects bool, array<string, string|true> given.
[error] 1547-1547: Method Utopia\Database\Database::updateCollection() invoked with 2 parameters, 3 required.
[error] 1547-1547: Parameter #2 $permissions of method Utopia\Database\Database::updateCollection() expects array, string given.
[error] 1643-1643: Call to an undefined method Utopia\Database\Database::skipAuth().
[error] 1646-1646: Parameter #3 $forPermission of method Utopia\Database\Database::find() expects string, array<int, Utopia\Database\Helpers\Role> given.
[error] 1657-1657: Parameter #3 $forPermission of method Utopia\Database\Database::find() expects string, array<int, Utopia\Database\Helpers\Role> given.
[error] 1665-1665: Call to an undefined method Utopia\Database\Database::skipAuth().
[error] 1699-1699: Call to an undefined method Utopia\Database\Database::skipAuth().
[error] 1703-1703: Parameter #3 $forPermission of method Utopia\Database\Database::find() expects string, array<int, Utopia\Database\Helpers\Role> given.
[error] 1711-1711: Call to an undefined method Utopia\Database\Database::skipAuth().
🪛 GitHub Actions: Linter
tests/e2e/Adapter/Scopes/VectorTests.php
[error] 1-1: Pint lint failed. PSR-12 violation: function_declaration. Exit code 1. Run './vendor/bin/pint' to fix code style issues.
🪛 PHPMD (2.15.0)
tests/e2e/Adapter/Scopes/VectorTests.php
595-595: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
604-604: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
888-888: Avoid unused local variables such as '$timeWithoutIndex'. (undefined)
(UnusedLocalVariable)
901-901: Avoid unused local variables such as '$timeWithIndex'. (undefined)
(UnusedLocalVariable)
975-975: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
982-982: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1281-1281: Avoid unused local variables such as '$child1'. (undefined)
(UnusedLocalVariable)
1289-1289: Avoid unused local variables such as '$child2'. (undefined)
(UnusedLocalVariable)
1353-1353: Avoid unused local variables such as '$book1'. (undefined)
(UnusedLocalVariable)
1362-1362: Avoid unused local variables such as '$book2'. (undefined)
(UnusedLocalVariable)
1414-1414: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1618-1618: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
1626-1626: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1634-1634: Avoid unused local variables such as '$doc3'. (undefined)
(UnusedLocalVariable)
1925-1925: Avoid unused local variables such as '$update1'. (undefined)
(UnusedLocalVariable)
1930-1930: Avoid unused local variables such as '$update2'. (undefined)
(UnusedLocalVariable)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Adapter Tests (SharedTables/MySQL)
- GitHub Check: Adapter Tests (SharedTables/MariaDB)
- GitHub Check: Adapter Tests (Postgres)
- GitHub Check: Adapter Tests (SharedTables/Postgres)
- GitHub Check: Adapter Tests (SharedTables/SQLite)
- GitHub Check: Adapter Tests (Pool)
- GitHub Check: Adapter Tests (MySQL)
- GitHub Check: Adapter Tests (SQLite)
- GitHub Check: Adapter Tests (MariaDB)
- GitHub Check: Adapter Tests (Mirror)
🔇 Additional comments (1)
tests/e2e/Adapter/Scopes/VectorTests.php (1)
1547-1547: Based on my verification, the SDK documentation shows updateCollection takes databaseId, collectionId, and collection name as parameters. The test code passes two string parameters toupdateCollection('vectorOriginal', 'vectorRenamed'), which matches the expected signature where the second parameter is the new collection name, not a permissions array.The original review comment's assertion that "the 2nd parameter expects
array<string>(permissions), not a string" is incorrect. The updateCollection method takes collection ID and name as required parameters, with read and write permissions as optional parameters.The method call is correct as written. No fixes are required.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (7)
tests/e2e/Adapter/Scopes/VectorTests.php (7)
666-701: Add deterministic tie-breaker for stable cursor pagination.Vector distance ordering can yield nondeterministic results for equal distances. Append
Query::orderAsc('$id')after vector ordering to ensure stable pagination and predictablecursorAfterbehavior.Apply this diff:
$page1 = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(3), Query::offset(0) ]); // Second page $page2 = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(3), Query::offset(3) ]); // Test with cursor pagination $firstBatch = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(5) ]); $this->assertCount(5, $firstBatch); $lastDoc = $firstBatch[4]; $nextBatch = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::cursorAfter($lastDoc), Query::limit(5) ]);
906-909: Relax strict equality assertion for approximate HNSW results.HNSW is an approximate algorithm and may not return identical top-K results compared to exact search, even on small datasets. Replace strict equality with sanity checks.
Apply this diff:
- // Results should be the same - $this->assertEquals( - array_map(fn ($d) => $d->getId(), $results1), - array_map(fn ($d) => $d->getId(), $results2) - ); + // Results should be sorted and non-empty (HNSW is approximate) + $this->assertCount(10, $results2); + $this->assertNotEmpty($results2[0]->getId());
31-31: Fix null passed to boolean parameter.The 6th parameter of
createAttributeis$array(bool), but this call passesnull, which will cause a TypeError underdeclare(strict_types=1).Apply this diff:
- $database->createAttribute('vectorCollection', 'large_embedding', Database::VAR_VECTOR, 128, false, null); + $database->createAttribute('vectorCollection', 'large_embedding', Database::VAR_VECTOR, 128, false, false);
1261-1261: Fix parameter order in createAttribute call for relationships.The relationship configuration array is passed as the 7th parameter, but
createAttribute()expects a boolean$signedat position 7. The relationship options should be the 8th parameter ($filters).Apply this diff:
- $database->createAttribute('vectorChild', 'parent', Database::VAR_RELATIONSHIP, 0, false, false, false, ['relatedCollection' => 'vectorParent', 'relationType' => Database::RELATION_ONE_TO_MANY, 'twoWay' => true, 'twoWayKey' => 'children']); + $database->createAttribute('vectorChild', 'parent', Database::VAR_RELATIONSHIP, 0, false, false, false, ['relatedCollection' => 'vectorParent', 'relationType' => Database::RELATION_ONE_TO_MANY, 'twoWay' => true, 'twoWayKey' => 'children']);Wait, let me recount the parameters. The signature is:
createAttribute($collection, $key, $type, $size, $required, $array = false, $signed = true, $filters = [])So:
- collection
- key
- type
- size
- required
- array
- signed
- filters
In line 1261:
$database->createAttribute('vectorChild', 'parent', Database::VAR_RELATIONSHIP, 0, false, false, false, ['relatedCollection' => ...]);
- 'vectorChild' - collection ✓
- 'parent' - key ✓
- Database::VAR_RELATIONSHIP - type ✓
- 0 - size ✓
- false - required ✓
- false - array ✓
- false - signed ✓
- [...] - filters ✓
Actually, this looks correct! Let me check the error message again. The CodeQL error says:
"Parameter #7 $signed of method Utopia\Database\Database::createAttribute() expects bool, array<string, string|true> given."So it's saying the 7th parameter is receiving an array. Let me recount:
Actually wait, looking at the code again:
$database->createAttribute('vectorChild', 'parent', Database::VAR_RELATIONSHIP, 0, false, null, ['relatedCollection' => ...]);Oh! The actual line in the code only has 7 arguments total after the method name:
- 'vectorChild'
- 'parent'
- Database::VAR_RELATIONSHIP
- 0
- false
- null <- this should be false for $array
- ['relatedCollection' => ...] <- this is being passed as $signed but should be $filters
So the fix is to insert
false(ortrue) for the $array parameter and anotherfalse(ortrue) for the $signed parameter:- $database->createAttribute('vectorChild', 'parent', Database::VAR_RELATIONSHIP, 0, false, null, ['relatedCollection' => 'vectorParent', 'relationType' => Database::RELATION_ONE_TO_MANY, 'twoWay' => true, 'twoWayKey' => 'children']); + $database->createAttribute('vectorChild', 'parent', Database::VAR_RELATIONSHIP, 0, false, false, false, ['relatedCollection' => 'vectorParent', 'relationType' => Database::RELATION_ONE_TO_MANY, 'twoWay' => true, 'twoWayKey' => 'children']);This changes:
- 6th param from
nulltofalse($array)- 7th param from
['relatedCollection'...]tofalse($signed)- 8th param (new) is
['relatedCollection'...]($filters)
1342-1342: Fix parameter order in createAttribute call for relationships.Same issue as line 1261: the relationship configuration array is passed as the 7th parameter, but should be the 8th (
$filters). The 7th parameter ($signed) expects a boolean, and the 6th parameter ($array) is receivingnullinstead of a boolean.Apply this diff:
- $database->createAttribute('vectorBooks', 'author', Database::VAR_RELATIONSHIP, 0, false, null, ['relatedCollection' => 'vectorAuthors', 'relationType' => Database::RELATION_MANY_TO_ONE, 'twoWay' => true, 'twoWayKey' => 'books']); + $database->createAttribute('vectorBooks', 'author', Database::VAR_RELATIONSHIP, 0, false, false, false, ['relatedCollection' => 'vectorAuthors', 'relationType' => Database::RELATION_MANY_TO_ONE, 'twoWay' => true, 'twoWayKey' => 'books']);
1643-1668: Remove non-existent skipAuth() calls and fix find() permission parameter.The
Databaseclass has noskipAuth()method. Additionally,find()expects astring $forPermissionparameter (e.g.,Database::PERMISSION_READ), not an array of roles.The test logic for permission filtering seems to be using an incorrect API. If you need to test permission-based vector searches, verify the correct API from the Database class documentation.
Apply this diff to remove the invalid calls:
// Query as user1 - should only see doc1 and doc3 - $database->skipAuth(false); - $results = $database->find('vectorPermissions', [ + $results = $database->find('vectorPermissions', [ Query::vectorCosine('embedding', [1.0, 0.0, 0.0]) - ], [Role::user('user1')]); + ]); - $this->assertCount(2, $results); + // Without proper permission filtering API, this test needs revision + $this->assertGreaterThanOrEqual(2, $results); $names = array_map(fn($d) => $d->getAttribute('name'), $results); $this->assertContains('Doc 1', $names); $this->assertContains('Doc 3', $names); - $this->assertNotContains('Doc 2', $names); // Query as user2 - should only see doc2 and doc3 $results = $database->find('vectorPermissions', [ Query::vectorCosine('embedding', [1.0, 0.0, 0.0]) - ], [Role::user('user2')]); + ]); - $this->assertCount(2, $results); + $this->assertGreaterThanOrEqual(2, $results); $names = array_map(fn($d) => $d->getAttribute('name'), $results); $this->assertContains('Doc 2', $names); $this->assertContains('Doc 3', $names); - $this->assertNotContains('Doc 1', $names); - - $database->skipAuth(true);
1699-1711: Remove non-existent skipAuth() calls and fix find() permission parameter.Same issue as lines 1643-1668:
skipAuth()doesn't exist andfind()expects a string permission parameter, not an array of roles.Apply this diff:
// Query with limit 3 as any user - should skip restricted docs and return accessible ones - $database->skipAuth(false); $results = $database->find('vectorPermScoring', [ Query::vectorCosine('embedding', [1.0, 0.0, 0.0]), Query::limit(3) - ], [Role::any()]); + ]); - // Should only get the 2 accessible documents - $this->assertCount(2, $results); + // Without proper permission filtering, adjust assertion + $this->assertLessThanOrEqual(3, count($results)); foreach ($results as $doc) { $this->assertGreaterThanOrEqual(3, $doc->getAttribute('score')); } - - $database->skipAuth(true);
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/e2e/Adapter/Scopes/VectorTests.php(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T02:04:17.803Z
Learnt from: abnegate
PR: utopia-php/database#721
File: tests/e2e/Adapter/Scopes/DocumentTests.php:6418-6439
Timestamp: 2025-10-03T02:04:17.803Z
Learning: In tests/e2e/Adapter/Scopes/DocumentTests::testSchemalessDocumentInvalidInteralAttributeValidation (PHP), when the adapter reports getSupportForAttributes() === false (schemaless), the test should not expect exceptions from createDocuments for “invalid” internal attributes; remove try/catch and ensure the test passes without exceptions, keeping at least one assertion.
Applied to files:
tests/e2e/Adapter/Scopes/VectorTests.php
🧬 Code graph analysis (1)
tests/e2e/Adapter/Scopes/VectorTests.php (8)
src/Database/Adapter.php (1)
Adapter(16-1325)src/Database/Database.php (5)
Database(37-8255)getDatabase(819-822)createAttribute(1734-1793)createIndex(3367-3505)deleteIndex(3519-3544)src/Database/Query.php (4)
Query(8-1169)vectorDot(1141-1144)vectorCosine(1153-1156)vectorEuclidean(1165-1168)tests/e2e/Adapter/Base.php (2)
getDatabase(38-38)deleteIndex(54-54)tests/e2e/Adapter/PostgresTest.php (2)
getDatabase(21-51)deleteIndex(63-72)tests/e2e/Adapter/MariaDBTest.php (2)
getDatabase(21-52)deleteIndex(64-72)src/Database/Adapter/Postgres.php (4)
getSupportForVectors(1990-1993)createAttribute(454-484)createIndex(877-932)deleteIndex(942-955)src/Database/Adapter/SQL.php (2)
getSupportForVectors(1535-1538)createAttribute(234-248)
🪛 GitHub Actions: CodeQL
tests/e2e/Adapter/Scopes/VectorTests.php
[error] 1261-1711: PHPStan: Method Utopia\Database\Database::createAttribute() expects bool in some calls; received array; also multiple parameter type mismatches and undefined methods (skipAuth).
🪛 PHPMD (2.15.0)
tests/e2e/Adapter/Scopes/VectorTests.php
595-595: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
604-604: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
888-888: Avoid unused local variables such as '$timeWithoutIndex'. (undefined)
(UnusedLocalVariable)
901-901: Avoid unused local variables such as '$timeWithIndex'. (undefined)
(UnusedLocalVariable)
975-975: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
982-982: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1281-1281: Avoid unused local variables such as '$child1'. (undefined)
(UnusedLocalVariable)
1289-1289: Avoid unused local variables such as '$child2'. (undefined)
(UnusedLocalVariable)
1353-1353: Avoid unused local variables such as '$book1'. (undefined)
(UnusedLocalVariable)
1362-1362: Avoid unused local variables such as '$book2'. (undefined)
(UnusedLocalVariable)
1414-1414: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1618-1618: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
1626-1626: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1634-1634: Avoid unused local variables such as '$doc3'. (undefined)
(UnusedLocalVariable)
1925-1925: Avoid unused local variables such as '$update1'. (undefined)
(UnusedLocalVariable)
1930-1930: Avoid unused local variables such as '$update2'. (undefined)
(UnusedLocalVariable)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Adapter Tests (SharedTables/MariaDB)
- GitHub Check: Adapter Tests (Mirror)
- GitHub Check: Adapter Tests (Postgres)
- GitHub Check: Adapter Tests (MySQL)
- GitHub Check: Adapter Tests (SharedTables/SQLite)
- GitHub Check: Adapter Tests (SharedTables/MySQL)
- GitHub Check: Adapter Tests (SQLite)
- GitHub Check: Adapter Tests (Pool)
- GitHub Check: Adapter Tests (SharedTables/Postgres)
- GitHub Check: Adapter Tests (MariaDB)
…p/database into feat/postgresql-vector-support # Conflicts: # tests/e2e/Adapter/Scopes/VectorTests.php
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/e2e/Adapter/Scopes/VectorTests.php (1)
1325-1389: Fix relationship attribute parameter order (duplicate issue).Same issue as the previous relationship test: the configuration array is at the wrong position.
Apply this diff:
$database->createCollection('vectorBooks'); $database->createAttribute('vectorBooks', 'title', Database::VAR_STRING, 255, true); $database->createAttribute('vectorBooks', 'embedding', Database::VAR_VECTOR, 3, true); - $database->createAttribute('vectorBooks', 'author', Database::VAR_RELATIONSHIP, 0, false, null, ['relatedCollection' => 'vectorAuthors', 'relationType' => Database::RELATION_MANY_TO_ONE, 'twoWay' => true, 'twoWayKey' => 'books']); + $database->createAttribute('vectorBooks', 'author', Database::VAR_RELATIONSHIP, 0, false, null, false, false, ['relatedCollection' => 'vectorAuthors', 'relationType' => Database::RELATION_MANY_TO_ONE, 'twoWay' => true, 'twoWayKey' => 'books']);
🧹 Nitpick comments (4)
src/Database/Validator/Index.php (2)
255-293: Tighten spatial index validation: usearray_filterfor orders and reject lengths.The
!empty($orders)check at line 287 can false-positive when the orders array contains empty values. Additionally, spatial indexes should not allow length specifications.Apply this diff:
foreach ($attributes as $attributeName) { $attribute = $this->attributes[\strtolower($attributeName)] ?? new Document(); $attributeType = $attribute->getAttribute('type', ''); if (!\in_array($attributeType, Database::SPATIAL_TYPES, true)) { $this->message = 'Spatial index can only be created on spatial attributes (point, linestring, polygon). Attribute "' . $attributeName . '" is of type "' . $attributeType . '"'; return false; } $required = (bool)$attribute->getAttribute('required', false); if (!$required && !$this->supportForSpatialIndexNull) { $this->message = 'Spatial indexes do not allow null values. Mark the attribute "' . $attributeName . '" as required or create the index on a column with no null values.'; return false; } } - if (!empty($orders) && !$this->supportForSpatialIndexNull) { + $lengths = $index->getAttribute('lengths', []); + if (\count(\array_filter($orders)) > 0 && !$this->supportForSpatialIndexOrder) { $this->message = 'Spatial indexes with explicit orders are not supported. Remove the orders to create this index.'; return false; } + + if (\count(\array_filter($lengths)) > 0) { + $this->message = 'Spatial indexes do not support lengths'; + return false; + } return true;
328-366: Fix orders check to usearray_filterfor consistency.Line 360 uses
!empty($orders)which can false-positive when the orders array contains empty values, while the lengths check correctly usesarray_filter. Make them consistent.Apply this diff:
$attribute = $this->attributes[\strtolower($attributes[0])] ?? new Document(); if ($attribute->getAttribute('type') !== Database::VAR_VECTOR) { $this->message = 'Vector index can only be created on vector attributes'; return false; } $orders = $index->getAttribute('orders', []); $lengths = $index->getAttribute('lengths', []); - if (!empty($orders) || \count(\array_filter($lengths)) > 0) { + if (\count(\array_filter($orders)) > 0 || \count(\array_filter($lengths)) > 0) { $this->message = 'Vector indexes do not support orders or lengths'; return false; } return true;tests/e2e/Adapter/Scopes/VectorTests.php (2)
666-701: Add deterministic tie-breaker for cursor pagination stability.Vector distance ordering alone can yield unstable order when documents have equal distances. Without a secondary sort key,
cursorAfterandcursorBeforepagination can behave unpredictably. AddQuery::orderAsc('$id')to ensure stable pagination.Apply this diff to add tie-breakers:
// First page $page1 = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(3), Query::offset(0) ]); $this->assertCount(3, $page1); // Second page $page2 = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(3), Query::offset(3) ]); $this->assertCount(3, $page2); // Ensure different documents $page1Ids = array_map(fn ($doc) => $doc->getId(), $page1); $page2Ids = array_map(fn ($doc) => $doc->getId(), $page2); $this->assertEmpty(array_intersect($page1Ids, $page2Ids)); // Test with cursor pagination $firstBatch = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(5) ]); $this->assertCount(5, $firstBatch); $lastDoc = $firstBatch[4]; $nextBatch = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::cursorAfter($lastDoc), Query::limit(5) ]);
906-910: Relax HNSW result assertion—approximate search doesn't guarantee exact match.HNSW is an approximate nearest-neighbor algorithm. Even on small datasets, requiring identical result ordering is too strict and can cause flaky tests.
Apply this diff:
$this->assertCount(10, $results2); - // Results should be the same - $this->assertEquals( - array_map(fn ($d) => $d->getId(), $results1), - array_map(fn ($d) => $d->getId(), $results2) - ); + // Results should be non-empty and have expected count (approximate search may differ) + $this->assertCount(10, $results1); + $this->assertCount(10, $results2); + $this->assertNotEmpty($results1[0]->getId()); + $this->assertNotEmpty($results2[0]->getId());
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/Database/Validator/Index.php(5 hunks)tests/e2e/Adapter/Scopes/IndexTests.php(2 hunks)tests/e2e/Adapter/Scopes/VectorTests.php(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T02:04:17.803Z
Learnt from: abnegate
PR: utopia-php/database#721
File: tests/e2e/Adapter/Scopes/DocumentTests.php:6418-6439
Timestamp: 2025-10-03T02:04:17.803Z
Learning: In tests/e2e/Adapter/Scopes/DocumentTests::testSchemalessDocumentInvalidInteralAttributeValidation (PHP), when the adapter reports getSupportForAttributes() === false (schemaless), the test should not expect exceptions from createDocuments for “invalid” internal attributes; remove try/catch and ensure the test passes without exceptions, keeping at least one assertion.
Applied to files:
tests/e2e/Adapter/Scopes/IndexTests.phptests/e2e/Adapter/Scopes/VectorTests.php
🧬 Code graph analysis (3)
tests/e2e/Adapter/Scopes/IndexTests.php (7)
src/Database/Database.php (1)
getAdapter(1241-1244)src/Database/Adapter/MySQL.php (2)
getSupportForIndexArray(130-136)getSupportForSpatialIndexOrder(179-182)src/Database/Adapter/MariaDB.php (3)
getSupportForIndexArray(1842-1845)getSupportForSpatialIndexNull(1857-1860)getSupportForSpatialIndexOrder(1876-1879)src/Database/Adapter/SQL.php (4)
getSupportForIndexArray(1465-1468)getSupportForSpatialIndexNull(1505-1508)getSupportForSpatialIndexOrder(1515-1518)getSupportForVectors(1535-1538)src/Database/Adapter.php (4)
getSupportForIndexArray(935-935)getSupportForSpatialIndexNull(1071-1071)getSupportForSpatialIndexOrder(1085-1085)getSupportForVectors(1029-1029)src/Database/Adapter/Pool.php (4)
getSupportForIndexArray(338-341)getSupportForSpatialIndexNull(438-441)getSupportForSpatialIndexOrder(528-531)getSupportForVectors(408-411)src/Database/Adapter/Postgres.php (3)
getSupportForSpatialIndexNull(2062-2065)getSupportForSpatialIndexOrder(2082-2085)getSupportForVectors(1990-1993)
src/Database/Validator/Index.php (1)
src/Database/Database.php (1)
Database(37-8255)
tests/e2e/Adapter/Scopes/VectorTests.php (10)
src/Database/Adapter.php (1)
Adapter(16-1325)src/Database/Database.php (8)
Database(37-8255)getDatabase(819-822)createCollection(1370-1504)createAttribute(1734-1793)find(6812-6943)createIndex(3367-3505)updateAttribute(2345-2572)deleteIndex(3519-3544)src/Database/Helpers/Permission.php (3)
Permission(9-264)read(186-195)update(220-229)src/Database/Query.php (4)
Query(8-1169)vectorDot(1141-1144)vectorCosine(1153-1156)vectorEuclidean(1165-1168)src/Database/Validator/Authorization.php (3)
Authorization(7-225)setRole(83-86)cleanRoles(109-112)tests/e2e/Adapter/Base.php (2)
getDatabase(38-38)deleteIndex(54-54)tests/e2e/Adapter/MariaDBTest.php (2)
getDatabase(21-52)deleteIndex(64-72)tests/e2e/Adapter/PostgresTest.php (2)
getDatabase(21-51)deleteIndex(63-72)src/Database/Adapter/SQL.php (3)
getSupportForVectors(1535-1538)createAttribute(234-248)find(2366-2571)src/Database/Adapter/Postgres.php (6)
getSupportForVectors(1990-1993)createCollection(191-343)createAttribute(454-484)createIndex(877-932)updateAttribute(562-625)deleteIndex(942-955)
🪛 PHPMD (2.15.0)
tests/e2e/Adapter/Scopes/VectorTests.php
596-596: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
605-605: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
889-889: Avoid unused local variables such as '$timeWithoutIndex'. (undefined)
(UnusedLocalVariable)
902-902: Avoid unused local variables such as '$timeWithIndex'. (undefined)
(UnusedLocalVariable)
976-976: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
983-983: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1282-1282: Avoid unused local variables such as '$child1'. (undefined)
(UnusedLocalVariable)
1290-1290: Avoid unused local variables such as '$child2'. (undefined)
(UnusedLocalVariable)
1354-1354: Avoid unused local variables such as '$book1'. (undefined)
(UnusedLocalVariable)
1363-1363: Avoid unused local variables such as '$book2'. (undefined)
(UnusedLocalVariable)
1415-1415: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1578-1578: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
1586-1586: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1594-1594: Avoid unused local variables such as '$doc3'. (undefined)
(UnusedLocalVariable)
1886-1886: Avoid unused local variables such as '$update1'. (undefined)
(UnusedLocalVariable)
1891-1891: Avoid unused local variables such as '$update2'. (undefined)
(UnusedLocalVariable)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Adapter Tests (SharedTables/MySQL)
- GitHub Check: Adapter Tests (SharedTables/Postgres)
- GitHub Check: Adapter Tests (Mirror)
- GitHub Check: Adapter Tests (SharedTables/SQLite)
- GitHub Check: Adapter Tests (Pool)
- GitHub Check: Adapter Tests (SQLite)
- GitHub Check: Adapter Tests (MariaDB)
- GitHub Check: Adapter Tests (SharedTables/MariaDB)
- GitHub Check: Adapter Tests (Postgres)
- GitHub Check: Adapter Tests (MySQL)
🔇 Additional comments (8)
tests/e2e/Adapter/Scopes/IndexTests.php (2)
165-173: Correct usage of adapter capability methods.The Index validator construction properly uses
getAdapter()to access adapter capability methods and includes all new capability flags (spatial index null/order support and vector support). This aligns with the updated constructor signature.
243-251: Consistent adapter capability usage.The second Index validator construction follows the same correct pattern, using
getAdapter()and passing all capability flags.src/Database/Validator/Index.php (3)
28-45: Constructor correctly initializes attributes.The promoted properties pattern is clean and appropriate for PHP 8.1+. The constructor properly initializes the
$attributesarray by indexing both user-provided attributes and internal attributes.
295-321: Spatial attribute protection logic is correct.The method properly prevents creating non-spatial indexes on spatial attributes, ensuring spatial data is indexed with appropriate spatial index types.
399-408: New validation methods properly integrated.The three new validation methods are correctly integrated into
isValid()following the established pattern. The validation order is logical.tests/e2e/Adapter/Scopes/VectorTests.php (3)
15-151: Basic vector attribute tests look comprehensive.The tests properly cover vector attribute creation, dimension validation, document creation, and early returns when vectors aren't supported. The parameter passing is correct.
1632-1676: Permission filtering tests correctly use Authorization API.The tests properly use
Authorization::setRole()andcleanRoles()for permission testing. Thefind()calls respect the role context without needing explicit permission parameters.
1-100: Comprehensive vector test suite with minor fixes needed.This test trait provides excellent coverage of vector functionality including CRUD operations, validation, queries, pagination, indexes, relationships, permissions, and numerous edge cases (infinity, NaN, sparse arrays, etc.). The tests properly skip when vector support is unavailable.
The static analysis warnings about unused variables are false positives—the variables are used in assertions or for setup.
However, three issues require fixes:
- Lines 1262 & 1343: Relationship attribute parameter order (critical)
- Lines 666-701: Missing pagination tie-breakers (recommended)
- Lines 906-910: Too-strict HNSW result assertions (recommended)
These are flagged in separate comments below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/Database/Validator/Index.php (2)
358-363: Make orders check consistent with lengths by using array_filter.Line 360 uses
!empty($orders)while the lengths check correctly usesarray_filter. Pre-filled arrays with empty values would incorrectly fail the orders check.Apply this diff:
$orders = $index->getAttribute('orders', []); $lengths = $index->getAttribute('lengths', []); - if (!empty($orders) || \count(\array_filter($lengths)) > 0) { + if (\count(\array_filter($orders)) > 0 || \count(\array_filter($lengths)) > 0) { $this->message = 'Vector indexes do not support orders or lengths'; return false; }
287-290: Use array_filter for orders check and add lengths validation.
!empty($orders)false-positives when the orders array contains empty values. Additionally, spatial indexes should also reject lengths.Apply this diff:
- if (!empty($orders) && !$this->supportForSpatialIndexOrder) { + $lengths = $index->getAttribute('lengths', []); + if (\count(\array_filter($orders)) > 0 && !$this->supportForSpatialIndexOrder) { $this->message = 'Spatial indexes with explicit orders are not supported. Remove the orders to create this index.'; return false; } + + if (\count(\array_filter($lengths)) > 0) { + $this->message = 'Spatial indexes do not support lengths'; + return false; + }
🧹 Nitpick comments (2)
tests/e2e/Adapter/Scopes/VectorTests.php (2)
906-910: Relax HNSW result equality check—approximate indexes may differ.HNSW performs approximate nearest-neighbor search and does not guarantee identical top-K results, even on small datasets. Consider replacing the strict equality check with basic sanity assertions.
Apply this diff:
- // Results should be the same - $this->assertEquals( - array_map(fn ($d) => $d->getId(), $results1), - array_map(fn ($d) => $d->getId(), $results2) - ); + // HNSW is approximate; verify both return expected count and have overlap + $this->assertCount(10, $results2); + $ids1 = array_map(fn ($d) => $d->getId(), $results1); + $ids2 = array_map(fn ($d) => $d->getId(), $results2); + $overlap = count(array_intersect($ids1, $ids2)); + $this->assertGreaterThanOrEqual(7, $overlap, 'Expected significant overlap between indexed and non-indexed results');
689-705: Add deterministic tie-breaker for cursor-based pagination.Vector distance ordering can produce non-deterministic results when distances are equal. Append
Query::orderAsc('$id')to ensure cursors work reliably.Apply this diff:
// Test with cursor pagination $firstBatch = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(5) ]); $this->assertCount(5, $firstBatch); $lastDoc = $firstBatch[4]; $nextBatch = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::cursorAfter($lastDoc), Query::limit(5) ]);
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/Database/Validator/Index.php(5 hunks)src/Database/Validator/Vector.php(1 hunks)tests/e2e/Adapter/Scopes/VectorTests.php(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T02:04:17.803Z
Learnt from: abnegate
PR: utopia-php/database#721
File: tests/e2e/Adapter/Scopes/DocumentTests.php:6418-6439
Timestamp: 2025-10-03T02:04:17.803Z
Learning: In tests/e2e/Adapter/Scopes/DocumentTests::testSchemalessDocumentInvalidInteralAttributeValidation (PHP), when the adapter reports getSupportForAttributes() === false (schemaless), the test should not expect exceptions from createDocuments for “invalid” internal attributes; remove try/catch and ensure the test passes without exceptions, keeping at least one assertion.
Applied to files:
tests/e2e/Adapter/Scopes/VectorTests.php
🧬 Code graph analysis (3)
src/Database/Validator/Vector.php (2)
src/Database/Database.php (2)
Database(37-8255)count(7040-7089)src/Database/Validator/Query/Filter.php (1)
isValid(280-385)
src/Database/Validator/Index.php (2)
src/Database/Database.php (1)
Database(37-8255)src/Database/Document.php (2)
getAttribute(224-231)Document(12-470)
tests/e2e/Adapter/Scopes/VectorTests.php (4)
src/Database/Adapter.php (1)
Adapter(16-1325)src/Database/Database.php (4)
Database(37-8255)createAttribute(1734-1793)createIndex(3367-3505)count(7040-7089)src/Database/Query.php (5)
Query(8-1169)vectorDot(1141-1144)vectorCosine(1153-1156)vectorEuclidean(1165-1168)limit(640-643)src/Database/Adapter/Postgres.php (3)
getSupportForVectors(1990-1993)createAttribute(454-484)createIndex(877-932)
🪛 PHPMD (2.15.0)
tests/e2e/Adapter/Scopes/VectorTests.php
596-596: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
605-605: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
889-889: Avoid unused local variables such as '$timeWithoutIndex'. (undefined)
(UnusedLocalVariable)
902-902: Avoid unused local variables such as '$timeWithIndex'. (undefined)
(UnusedLocalVariable)
976-976: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
983-983: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1282-1282: Avoid unused local variables such as '$child1'. (undefined)
(UnusedLocalVariable)
1290-1290: Avoid unused local variables such as '$child2'. (undefined)
(UnusedLocalVariable)
1354-1354: Avoid unused local variables such as '$book1'. (undefined)
(UnusedLocalVariable)
1363-1363: Avoid unused local variables such as '$book2'. (undefined)
(UnusedLocalVariable)
1415-1415: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1578-1578: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
1586-1586: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1594-1594: Avoid unused local variables such as '$doc3'. (undefined)
(UnusedLocalVariable)
1886-1886: Avoid unused local variables such as '$update1'. (undefined)
(UnusedLocalVariable)
1891-1891: Avoid unused local variables such as '$update2'. (undefined)
(UnusedLocalVariable)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Adapter Tests (Mirror)
- GitHub Check: Adapter Tests (SharedTables/SQLite)
- GitHub Check: Adapter Tests (MySQL)
- GitHub Check: Adapter Tests (SharedTables/MySQL)
- GitHub Check: Adapter Tests (SharedTables/Postgres)
- GitHub Check: Adapter Tests (SharedTables/MariaDB)
- GitHub Check: Adapter Tests (SQLite)
- GitHub Check: Adapter Tests (MariaDB)
- GitHub Check: Adapter Tests (Pool)
- GitHub Check: Adapter Tests (Postgres)
- GitHub Check: Unit Test
🔇 Additional comments (5)
src/Database/Validator/Index.php (3)
28-45: LGTM: Constructor properly promotes properties and initializes attributes.The typed property declaration on line 17 ensures
$attributesis initialized, and the newsupportForVectorIndexesparameter correctly extends the validator for vector index validation.
295-321: LGTM: Correctly prevents non-spatial indexes on spatial attributes.The validation properly ensures spatial attributes require spatial indexes, preventing misuse.
399-407: LGTM: New validation methods properly integrated.The spatial and vector index checks are correctly called and propagate failures appropriately.
tests/e2e/Adapter/Scopes/VectorTests.php (2)
1343-1343: Verify createAttribute parameter count matches signature.Same concern as line 1262—confirm the 10 parameters align with the actual method signature.
2537-2576: LGTM: Correctly validates single vector query restriction.The test properly ensures that only one vector similarity query can be used per find operation, as multiple vector orderings are undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/e2e/Adapter/Scopes/VectorTests.php (2)
667-701: Consider adding a tie-breaker to stabilize vector-ordered pagination.Vector distance alone can yield nondeterministic ordering when documents have equal distances. For cursor pagination (cursorAfter/cursorBefore) to work predictably, append
Query::orderAsc('$id')as a secondary sort key.Example for the queries in this test:
$page1 = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(3), Query::offset(0) ]);Apply similar changes at lines 675-679, 689-692, and 697-701.
Based on learnings
906-910: Relax HNSW result assertion for approximate search.HNSW is an approximate nearest-neighbor index; exact result-set equality isn't guaranteed even on small datasets. Consider replacing strict
assertEqualswith sanity checks (count, non-empty, reasonable overlap).-// Results should be the same -$this->assertEquals( - array_map(fn ($d) => $d->getId(), $results1), - array_map(fn ($d) => $d->getId(), $results2) -); +// Both results should have the expected count and be valid +$this->assertCount(10, $results2); +$ids2 = array_map(fn ($d) => $d->getId(), $results2); +$this->assertEquals(10, count(array_unique($ids2)), 'IDs should be unique');Based on learnings
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/Database/Database.php(23 hunks)src/Database/Validator/IndexedQueries.php(1 hunks)tests/e2e/Adapter/Scopes/VectorTests.php(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T02:04:17.803Z
Learnt from: abnegate
PR: utopia-php/database#721
File: tests/e2e/Adapter/Scopes/DocumentTests.php:6418-6439
Timestamp: 2025-10-03T02:04:17.803Z
Learning: In tests/e2e/Adapter/Scopes/DocumentTests::testSchemalessDocumentInvalidInteralAttributeValidation (PHP), when the adapter reports getSupportForAttributes() === false (schemaless), the test should not expect exceptions from createDocuments for “invalid” internal attributes; remove try/catch and ensure the test passes without exceptions, keeping at least one assertion.
Applied to files:
tests/e2e/Adapter/Scopes/VectorTests.php
🧬 Code graph analysis (3)
src/Database/Validator/IndexedQueries.php (1)
src/Database/Query.php (2)
getMethod(165-168)Query(8-1169)
tests/e2e/Adapter/Scopes/VectorTests.php (5)
src/Database/Adapter.php (1)
Adapter(16-1325)src/Database/Database.php (4)
Database(37-8255)createAttribute(1734-1793)createIndex(3367-3505)count(7040-7089)src/Database/Query.php (4)
Query(8-1169)vectorDot(1141-1144)vectorCosine(1153-1156)vectorEuclidean(1165-1168)src/Database/Adapter/Postgres.php (3)
getSupportForVectors(1990-1993)createAttribute(454-484)createIndex(877-932)src/Database/Adapter/SQL.php (3)
getSupportForVectors(1535-1538)createAttribute(234-248)count(2583-2645)
src/Database/Database.php (7)
src/Database/Adapter/Postgres.php (2)
getSupportForVectors(1990-1993)getSupportForSpatialAttributes(2052-2055)src/Database/Adapter/SQL.php (3)
getSupportForVectors(1535-1538)count(2583-2645)getSupportForSpatialAttributes(1495-1498)src/Database/Adapter.php (3)
getSupportForVectors(1029-1029)count(825-825)getSupportForSpatialAttributes(1064-1064)src/Database/Adapter/Pool.php (3)
getSupportForVectors(408-411)count(273-276)getSupportForSpatialAttributes(433-436)src/Database/Adapter/MariaDB.php (1)
getSupportForSpatialAttributes(1847-1850)src/Database/Adapter/SQLite.php (1)
getSupportForSpatialAttributes(971-974)src/Database/Query.php (3)
Query(8-1169)isNested(930-937)getValues(181-184)
🪛 PHPMD (2.15.0)
tests/e2e/Adapter/Scopes/VectorTests.php
596-596: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
605-605: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
889-889: Avoid unused local variables such as '$timeWithoutIndex'. (undefined)
(UnusedLocalVariable)
902-902: Avoid unused local variables such as '$timeWithIndex'. (undefined)
(UnusedLocalVariable)
976-976: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
983-983: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1282-1282: Avoid unused local variables such as '$child1'. (undefined)
(UnusedLocalVariable)
1290-1290: Avoid unused local variables such as '$child2'. (undefined)
(UnusedLocalVariable)
1354-1354: Avoid unused local variables such as '$book1'. (undefined)
(UnusedLocalVariable)
1363-1363: Avoid unused local variables such as '$book2'. (undefined)
(UnusedLocalVariable)
1415-1415: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1578-1578: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
1586-1586: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1594-1594: Avoid unused local variables such as '$doc3'. (undefined)
(UnusedLocalVariable)
1888-1888: Avoid unused local variables such as '$update1'. (undefined)
(UnusedLocalVariable)
1893-1893: Avoid unused local variables such as '$update2'. (undefined)
(UnusedLocalVariable)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Adapter Tests (SharedTables/Postgres)
- GitHub Check: Adapter Tests (SharedTables/SQLite)
- GitHub Check: Adapter Tests (SharedTables/MariaDB)
- GitHub Check: Adapter Tests (MySQL)
- GitHub Check: Adapter Tests (Pool)
- GitHub Check: Adapter Tests (SQLite)
- GitHub Check: Adapter Tests (Mirror)
- GitHub Check: Adapter Tests (SharedTables/MySQL)
- GitHub Check: Adapter Tests (MariaDB)
- GitHub Check: Adapter Tests (Postgres)
🔇 Additional comments (1)
tests/e2e/Adapter/Scopes/VectorTests.php (1)
1-14: Excellent comprehensive test coverage for vector functionality.This test trait provides thorough end-to-end testing of vector attributes, queries, indexes, permissions, and edge cases. The test quality is high with proper setup/cleanup and clear assertions. The static analysis warnings about unused variables are false positives typical in test code and can be safely ignored.
Two recommended refinements from previous reviews remain applicable (see separate comments): adding pagination tie-breakers and relaxing HNSW result assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
src/Database/Database.php (3)
77-86: Unify wording (“size” over “dimensions”) and add a BC alias constant.Project discussion prefers “size”. Add an alias and update messages; keeps BC while improving clarity.
public const MAX_VECTOR_DIMENSIONS = 16000; +// Backward-compatible alias; prefer SIZE in new code and error strings +public const MAX_VECTOR_SIZE = self::MAX_VECTOR_DIMENSIONS;- if ($size <= 0) { - throw new DatabaseException('Vector dimensions must be a positive integer'); - } - if ($size > self::MAX_VECTOR_DIMENSIONS) { - throw new DatabaseException('Vector dimensions cannot exceed ' . self::MAX_VECTOR_DIMENSIONS); - } + if ($size <= 0) { + throw new DatabaseException('Vector size must be a positive integer'); + } + if ($size > self::MAX_VECTOR_SIZE) { + throw new DatabaseException('Vector size cannot exceed ' . self::MAX_VECTOR_SIZE); + }Also applies to: 2029-2034, 2426-2431
2138-2143: Numeric component enforcement for vector defaults — LGTM.Covers recursive validation of elements. Please ensure tests exist for:
- scalar default (reject),
- non‑numeric element (reject),
- wrong length (reject).
#!/bin/bash # Look for tests exercising invalid VAR_VECTOR defaults rg -n --type=php -C2 'VAR_VECTOR|vector' tests | rg -n 'default|scalar|length|numeric' -n
3439-3446: Vector indexes: gate added; also enforce “exactly one attribute” (and type=vector).Add a cheap preflight here; keep Validator as the source of truth.
case Database::INDEX_HNSW_EUCLIDEAN: case Database::INDEX_HNSW_COSINE: case Database::INDEX_HNSW_DOT: if (!$this->adapter->getSupportForVectors()) { throw new DatabaseException('Vector indexes are not supported'); } + if (\count($attributes) !== 1) { + throw new DatabaseException('Vector indexes require exactly one attribute'); + } break;After building $indexAttributesWithTypes:
foreach ($attributes as $i => $attr) { ... } + + // Preflight: ensure the single indexed attribute is a vector + if (\in_array($type, [self::INDEX_HNSW_EUCLIDEAN, self::INDEX_HNSW_COSINE, self::INDEX_HNSW_DOT], true)) { + $attrName = $attributes[0] ?? null; + $attrType = $indexAttributesWithTypes[$attrName] ?? null; + if ($attrType !== self::VAR_VECTOR) { + throw new DatabaseException('Vector indexes must target a vector attribute'); + } + }Verify whether IndexValidator already enforces this; if yes, we can skip the preflight:
#!/bin/bash rg -n --type=php -C3 'IndexValidator|HNSW|VAR_VECTOR|Vector indexes require exactly one' srcAlso applies to: 3452-3479
🧹 Nitpick comments (2)
src/Database/Database.php (2)
2022-2050: Vector attribute validation looks solid. Minor copy tweak suggested above.Gates by capability, forbids arrays, bounds size, and validates default shape/content.
Apply the “size” copy changes from the earlier comment for consistency.
580-616: Vector filter: widen decode param type to handle non-string adapter returns.The decode function parameter should accept
mixedinstead of?stringto prevent accidental TypeError if adapters return non-string values.- function (?string $value) { + function (mixed $value) { if (is_null($value)) { return null; } if (!is_string($value)) { return $value; } $decoded = json_decode($value, true); return is_array($decoded) ? $decoded : $value; }The first-class callable syntax
\floatval(...)is already supported by the project's minimum PHP version (8.1) and does not require changes.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/Database/Database.php(26 hunks)src/Database/Validator/IndexedQueries.php(2 hunks)src/Database/Validator/Vector.php(1 hunks)tests/e2e/Adapter/Scopes/VectorTests.php(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/Database/Validator/IndexedQueries.php
- src/Database/Validator/Vector.php
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T02:04:17.803Z
Learnt from: abnegate
PR: utopia-php/database#721
File: tests/e2e/Adapter/Scopes/DocumentTests.php:6418-6439
Timestamp: 2025-10-03T02:04:17.803Z
Learning: In tests/e2e/Adapter/Scopes/DocumentTests::testSchemalessDocumentInvalidInteralAttributeValidation (PHP), when the adapter reports getSupportForAttributes() === false (schemaless), the test should not expect exceptions from createDocuments for “invalid” internal attributes; remove try/catch and ensure the test passes without exceptions, keeping at least one assertion.
Applied to files:
tests/e2e/Adapter/Scopes/VectorTests.php
🧬 Code graph analysis (2)
tests/e2e/Adapter/Scopes/VectorTests.php (5)
src/Database/Adapter.php (1)
Adapter(16-1325)src/Database/Database.php (4)
Database(37-8277)createAttribute(1743-1802)count(7062-7111)create(1292-1311)src/Database/Query.php (5)
Query(8-1169)vectorDot(1141-1144)vectorCosine(1153-1156)vectorEuclidean(1165-1168)or(792-795)src/Database/Adapter/Postgres.php (3)
getSupportForVectors(1990-1993)createAttribute(454-484)create(135-162)src/Database/Adapter/SQL.php (3)
getSupportForVectors(1535-1538)createAttribute(234-248)count(2583-2645)
src/Database/Database.php (5)
src/Database/Adapter/Postgres.php (2)
getSupportForVectors(1990-1993)getSupportForSpatialAttributes(2052-2055)src/Database/Adapter/SQL.php (3)
getSupportForVectors(1535-1538)count(2583-2645)getSupportForSpatialAttributes(1495-1498)src/Database/Adapter.php (3)
getSupportForVectors(1029-1029)count(825-825)getSupportForSpatialAttributes(1064-1064)src/Database/Adapter/Pool.php (3)
getSupportForVectors(408-411)count(273-276)getSupportForSpatialAttributes(433-436)src/Database/Query.php (3)
Query(8-1169)isNested(930-937)getValues(181-184)
🪛 PHPMD (2.15.0)
tests/e2e/Adapter/Scopes/VectorTests.php
596-596: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
605-605: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
889-889: Avoid unused local variables such as '$timeWithoutIndex'. (undefined)
(UnusedLocalVariable)
902-902: Avoid unused local variables such as '$timeWithIndex'. (undefined)
(UnusedLocalVariable)
976-976: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
983-983: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1282-1282: Avoid unused local variables such as '$child1'. (undefined)
(UnusedLocalVariable)
1290-1290: Avoid unused local variables such as '$child2'. (undefined)
(UnusedLocalVariable)
1354-1354: Avoid unused local variables such as '$book1'. (undefined)
(UnusedLocalVariable)
1363-1363: Avoid unused local variables such as '$book2'. (undefined)
(UnusedLocalVariable)
1415-1415: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1583-1583: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
1591-1591: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1599-1599: Avoid unused local variables such as '$doc3'. (undefined)
(UnusedLocalVariable)
1893-1893: Avoid unused local variables such as '$update1'. (undefined)
(UnusedLocalVariable)
1898-1898: Avoid unused local variables such as '$update2'. (undefined)
(UnusedLocalVariable)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Adapter Tests (SharedTables/SQLite)
- GitHub Check: Adapter Tests (Postgres)
- GitHub Check: Adapter Tests (MySQL)
- GitHub Check: Adapter Tests (SharedTables/Postgres)
- GitHub Check: Adapter Tests (SharedTables/MySQL)
- GitHub Check: Adapter Tests (Mirror)
- GitHub Check: Adapter Tests (SQLite)
- GitHub Check: Adapter Tests (Pool)
- GitHub Check: Adapter Tests (MariaDB)
- GitHub Check: Adapter Tests (SharedTables/MariaDB)
- GitHub Check: Unit Test
🔇 Additional comments (6)
src/Database/Database.php (5)
1382-1391: Auto‑attach filters for spatial/vector attributes — LGTM.Ensures encode/decode runs consistently without caller boilerplate.
2419-2445: Mirror of vector default validation in updateAttribute — LGTM.Prevents inconsistent metadata on updates.
3601-3601: Early query type validation — LGTM.checkQueryTypes() before validation/execution is a good guardrail across getters/mutators.
Also applies to: 5110-5110, 6649-6649, 7069-7069, 7133-7133, 6846-6846
7322-7325: Decode: reverse filter order — LGTM.Ensures decode operations invert encode chain.
7732-7741: checkQueryTypes() recursion — LGTM.Clear error text; protects against non‑Query input and nested misuse.
tests/e2e/Adapter/Scopes/VectorTests.php (1)
1-2625: Excellent comprehensive test coverage for vector functionality!This test trait provides thorough end-to-end coverage of the pgvector integration, including:
- Core vector operations (dot, cosine, euclidean)
- Edge cases (INF, NaN, zero vectors, sparse arrays, nested arrays)
- Dimension boundaries (1 to 16000)
- HNSW indexes and performance
- Relationships and permissions
- Pagination and filtering
- Validation and error handling
The test structure is solid—all tests properly check adapter support and return early when vectors aren't available.
Optional improvements (from previous reviews, still pending):
- Lines 667-701: Consider adding
Query::orderAsc('$id')tie-breakers to vector pagination tests for deterministic ordering when distances are equal.- Lines 907-910: The HNSW performance test asserts exact equality of result IDs, but HNSW is approximate—consider relaxing to sanity checks (count, non-empty) instead of strict equality.
Both are nice-to-have refinements rather than blocking issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tests/e2e/Adapter/Scopes/VectorTests.php (2)
667-701: Add deterministic tie-breaker for stable pagination with vector ordering.Vector distance ordering alone can yield unstable ordering when documents have equal distances. Append
Query::orderAsc('$id')after the vector query to ensure pagination and cursor operations behave predictably.Apply this diff:
$page1 = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(3), Query::offset(0) ]); // Second page $page2 = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(3), Query::offset(3) ]); // Test with cursor pagination $firstBatch = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::limit(5) ]); $lastDoc = $firstBatch[4]; $nextBatch = $database->find('vectorPagination', [ Query::vectorCosine('embedding', $searchVector), + Query::orderAsc('$id'), Query::cursorAfter($lastDoc), Query::limit(5) ]);
907-910: Relax HNSW result comparison—approximate search doesn't guarantee identical ordering.HNSW is an approximate nearest-neighbor algorithm. Expecting exact ID match between indexed and non-indexed results is too strict and may cause flaky test failures. Replace with relaxed sanity checks.
Apply this diff:
- // Results should be the same - $this->assertEquals( - array_map(fn ($d) => $d->getId(), $results1), - array_map(fn ($d) => $d->getId(), $results2) - ); + // Results should be valid and have reasonable overlap + $this->assertCount(10, $results2); + $this->assertNotEmpty($results2[0]->getId()); + + // Check for reasonable overlap (at least 50% common results) + $ids1 = array_map(fn ($d) => $d->getId(), $results1); + $ids2 = array_map(fn ($d) => $d->getId(), $results2); + $intersection = count(array_intersect($ids1, $ids2)); + $this->assertGreaterThanOrEqual(5, $intersection, 'HNSW should return mostly similar results');
🧹 Nitpick comments (1)
tests/unit/Validator/VectorTest.php (1)
31-47: Consider adding edge case tests for robustness.The dimension tests are solid, but consider adding edge cases:
// Test nested arrays are rejected $this->assertFalse($validator5->isValid([1.0, 2.0, [3.0], 4.0, 5.0])); // Test special float values $this->assertFalse($validator5->isValid([1.0, 2.0, INF, 4.0, 5.0])); // Infinity $this->assertFalse($validator5->isValid([1.0, 2.0, NAN, 4.0, 5.0])); // Not a numberThese would strengthen validation against unexpected inputs.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tests/e2e/Adapter/Scopes/VectorTests.php(1 hunks)tests/unit/Validator/VectorTest.php(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T02:04:17.803Z
Learnt from: abnegate
PR: utopia-php/database#721
File: tests/e2e/Adapter/Scopes/DocumentTests.php:6418-6439
Timestamp: 2025-10-03T02:04:17.803Z
Learning: In tests/e2e/Adapter/Scopes/DocumentTests::testSchemalessDocumentInvalidInteralAttributeValidation (PHP), when the adapter reports getSupportForAttributes() === false (schemaless), the test should not expect exceptions from createDocuments for “invalid” internal attributes; remove try/catch and ensure the test passes without exceptions, keeping at least one assertion.
Applied to files:
tests/e2e/Adapter/Scopes/VectorTests.php
🧬 Code graph analysis (2)
tests/unit/Validator/VectorTest.php (1)
src/Database/Validator/Vector.php (5)
Vector(7-96)isValid(41-71)getDescription(28-31)getType(92-95)isArray(80-83)
tests/e2e/Adapter/Scopes/VectorTests.php (5)
src/Database/Adapter.php (1)
Adapter(16-1325)src/Database/Database.php (4)
Database(37-8277)createAttribute(1743-1802)createIndex(3389-3527)count(7062-7111)src/Database/Query.php (6)
Query(8-1169)vectorDot(1141-1144)vectorCosine(1153-1156)vectorEuclidean(1165-1168)limit(640-643)or(792-795)src/Database/Adapter/Postgres.php (3)
getSupportForVectors(1990-1993)createAttribute(454-484)createIndex(877-932)src/Database/Adapter/SQL.php (3)
getSupportForVectors(1535-1538)createAttribute(234-248)count(2583-2645)
🪛 PHPMD (2.15.0)
tests/e2e/Adapter/Scopes/VectorTests.php
596-596: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
605-605: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
889-889: Avoid unused local variables such as '$timeWithoutIndex'. (undefined)
(UnusedLocalVariable)
902-902: Avoid unused local variables such as '$timeWithIndex'. (undefined)
(UnusedLocalVariable)
976-976: Avoid unused local variables such as '$doc1'. (undefined)
(UnusedLocalVariable)
983-983: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1282-1282: Avoid unused local variables such as '$child1'. (undefined)
(UnusedLocalVariable)
1290-1290: Avoid unused local variables such as '$child2'. (undefined)
(UnusedLocalVariable)
1354-1354: Avoid unused local variables such as '$book1'. (undefined)
(UnusedLocalVariable)
1363-1363: Avoid unused local variables such as '$book2'. (undefined)
(UnusedLocalVariable)
1415-1415: Avoid unused local variables such as '$doc2'. (undefined)
(UnusedLocalVariable)
1890-1890: Avoid unused local variables such as '$update1'. (undefined)
(UnusedLocalVariable)
1895-1895: Avoid unused local variables such as '$update2'. (undefined)
(UnusedLocalVariable)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Adapter Tests (SharedTables/SQLite)
- GitHub Check: Adapter Tests (SharedTables/MariaDB)
- GitHub Check: Adapter Tests (Mirror)
- GitHub Check: Adapter Tests (MySQL)
- GitHub Check: Adapter Tests (MariaDB)
- GitHub Check: Adapter Tests (SharedTables/Postgres)
- GitHub Check: Adapter Tests (Pool)
- GitHub Check: Adapter Tests (SharedTables/MySQL)
- GitHub Check: Adapter Tests (Postgres)
- GitHub Check: Adapter Tests (SQLite)
🔇 Additional comments (3)
tests/e2e/Adapter/Scopes/VectorTests.php (1)
1-10: Comprehensive vector test suite with excellent edge case coverage.This test trait provides thorough end-to-end coverage of vector functionality including:
- Attribute creation and validation with dimension checks
- Document operations with vector data
- All three vector query types (dot product, cosine, euclidean)
- HNSW index creation and performance
- Pagination (offset, cursor forward/backward)
- Permissions and authorization
- Extensive edge cases (INF, NaN, null, empty, special arrays, boundary dimensions)
- Relationships with vectors
- Concurrent updates and data integrity
The test structure is well-organized with proper adapter capability checks and cleanup. The two recommended refactors noted above will improve test stability.
tests/unit/Validator/VectorTest.php (2)
49-56: LGTM!The description assertions correctly match the Vector validator implementation and test multiple dimension values.
58-63: LGTM with a semantic note.The assertions correctly test the Vector validator's behavior. Note that
isArray() === falsewhilegetType() === 'array'might seem counterintuitive, but this appears to be intentional framework behavior whereisArray()indicates whether the validator accepts an array of items rather than a single array value.If this semantic distinction is unclear, verify it aligns with the Utopia Database framework conventions for validator types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Database/Adapter/Postgres.php (1)
868-879: HNSW index constraints: single vector column, no sort orders; enforce and strip orders.Current code allows multiple attrs and ASC/DESC to leak into USING HNSW, producing invalid SQL. Validate and omit orders for HNSW; optionally assert the attribute is vector using $indexAttributeTypes.
public function createIndex(string $collection, string $id, string $type, array $attributes, array $lengths, array $orders, array $indexAttributeTypes = []): bool { $collection = $this->filter($collection); $id = $this->filter($id); + $isHnsw = \in_array($type, [ + Database::INDEX_HNSW_EUCLIDEAN, + Database::INDEX_HNSW_COSINE, + Database::INDEX_HNSW_DOT, + ], true); + if ($isHnsw) { + if (\count($attributes) !== 1) { + throw new DatabaseException('HNSW indexes must reference exactly one vector attribute'); + } + if (!empty($orders)) { + throw new DatabaseException('HNSW indexes do not support attribute order directives'); + } + $attrName = $attributes[0] ?? ''; + if (($indexAttributeTypes[$attrName] ?? null) !== Database::VAR_VECTOR) { + throw new DatabaseException('HNSW indexes require a vector attribute'); + } + $this->ensurePgVectorExtension(); + } + foreach ($attributes as $i => $attr) { - $order = empty($orders[$i]) || Database::INDEX_FULLTEXT === $type ? '' : $orders[$i]; + $order = (empty($orders[$i]) || Database::INDEX_FULLTEXT === $type || $isHnsw) ? '' : $orders[$i]; $attr = match ($attr) { '$id' => '_uid', '$createdAt' => '_createdAt', '$updatedAt' => '_updatedAt', default => $this->filter($attr), }; $attributes[$i] = "\"{$attr}\" {$order}"; } @@ - // Add USING clause for special index types + // Add USING clause for special index types $sql .= match ($type) { Database::INDEX_SPATIAL => " USING GIST ({$attributes})", Database::INDEX_HNSW_EUCLIDEAN => " USING HNSW ({$attributes} vector_l2_ops)", Database::INDEX_HNSW_COSINE => " USING HNSW ({$attributes} vector_cosine_ops)", Database::INDEX_HNSW_DOT => " USING HNSW ({$attributes} vector_ip_ops)", default => " ({$attributes})", };Also applies to: 895-899, 902-909
♻️ Duplicate comments (2)
src/Database/Adapter/Postgres.php (2)
444-452: Vector attribute: validate + ensure pgvector extension (missing call).You validate dimensions but don’t ensure the extension is installed. This will break on existing DBs that weren’t created by this adapter’s create(). Add the ensure call.
- // Ensure pgvector extension is installed for vector types - if ($type === Database::VAR_VECTOR) { + // Ensure pgvector extension is installed for vector types + if ($type === Database::VAR_VECTOR) { + $this->ensurePgVectorExtension(); if ($size <= 0) { throw new DatabaseException('Vector dimensions must be a positive integer'); } if ($size > Database::MAX_VECTOR_DIMENSIONS) { throw new DatabaseException('Vector dimensions cannot exceed ' . Database::MAX_VECTOR_DIMENSIONS); } }
555-570: Update attribute to vector: ensure extension before type cast/alter.Same gap on updates; ensure the extension then proceed.
- if ($type === Database::VAR_VECTOR) { + if ($type === Database::VAR_VECTOR) { + $this->ensurePgVectorExtension(); if ($size <= 0) { throw new DatabaseException('Vector dimensions must be a positive integer'); } if ($size > Database::MAX_VECTOR_DIMENSIONS) { throw new DatabaseException('Vector dimensions cannot exceed ' . Database::MAX_VECTOR_DIMENSIONS); } }
🧹 Nitpick comments (2)
src/Database/Adapter/Postgres.php (2)
150-152: Centralize extension install and call from attribute/index flows, not just on database create.Replace direct CREATE EXTENSION calls with a helper and reuse it from createAttribute/updateAttribute/createIndex so existing DBs don’t fail on first vector use.
Apply:
- // Enable extensions - $this->getPDO()->prepare('CREATE EXTENSION IF NOT EXISTS postgis')->execute(); - $this->getPDO()->prepare('CREATE EXTENSION IF NOT EXISTS vector')->execute(); + // Enable extensions + $this->ensurePostGIS(); + $this->ensurePgVectorExtension();Add (outside this hunk):
protected function ensurePostGIS(): void { $this->getPDO()->prepare('CREATE EXTENSION IF NOT EXISTS postgis')->execute(); } protected function ensurePgVectorExtension(): void { $this->getPDO()->prepare('CREATE EXTENSION IF NOT EXISTS vector')->execute(); }
1698-1727: Harden vector ORDER BY builder: input validation + avoid first-class callable.
- Validate a non-empty numeric array is provided.
- Prefer array_map('floatval', ...) (wider PHP compatibility) over first-class callable.
- $values = $query->getValues(); - $vectorArray = $values[0] ?? []; - $vector = '[' . implode(',', \array_map(\floatval(...), $vectorArray)) . ']'; + $values = $query->getValues(); + $vectorArray = $values[0] ?? []; + if (!\is_array($vectorArray) || $vectorArray === []) { + throw new DatabaseException('Vector query requires a non-empty numeric array'); + } + $vector = '[' . \implode(',', \array_map('floatval', $vectorArray)) . ']'; $binds[":vector_{$placeholder}"] = $vector;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/Database/Adapter/Postgres.php(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Database/Adapter/Postgres.php (5)
src/Database/Adapter/SQL.php (7)
getPDO(1745-1748)getSQLType(1672-1678)getSQLTable(1736-1739)getSpatialGeomFromText(1547-1559)getVectorDistanceOrder(1597-1600)getInternalKeyForAttribute(1950-1962)getSupportForVectors(1535-1538)src/Database/Database.php (2)
Database(37-8277)getNamespace(800-803)src/Database/Adapter/MariaDB.php (2)
getSQLType(1591-1645)quote(1832-1835)src/Database/Adapter/SQLite.php (1)
getSQLTable(1080-1083)src/Database/Query.php (1)
Query(8-1169)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Adapter Tests (MariaDB)
- GitHub Check: Adapter Tests (Mirror)
- GitHub Check: Adapter Tests (Pool)
- GitHub Check: Adapter Tests (SharedTables/Postgres)
- GitHub Check: Adapter Tests (SharedTables/SQLite)
- GitHub Check: Adapter Tests (MySQL)
- GitHub Check: Adapter Tests (SharedTables/MariaDB)
- GitHub Check: Adapter Tests (Postgres)
- GitHub Check: Adapter Tests (SharedTables/MySQL)
- GitHub Check: Adapter Tests (SQLite)
🔇 Additional comments (4)
src/Database/Adapter/Postgres.php (4)
1508-1511: SRID handling for distance in meters looks good.
Using DEFAULT_SRID with ST_SetSRID aligns geography casts with global config.
1758-1758: Type mappings: DEFAULT_SRID for geometry and VECTOR(size) are correct.
The mappings align with PostGIS/pgvector expectations.Also applies to: 1797-1807
1955-1964: Capability flag exposed correctly.
getSupportForVectors() returning true is appropriate for this adapter.
1635-1639: No issues found; vector distance calculations are correctly processed through ORDER BY.The verification confirms that vector filter queries (DOT, COSINE, EUCLIDEAN) bypass the WHERE clause by design, and are correctly handled via the ORDER BY path. The
getVectorDistanceOrder()method is invoked for each vector query (SQL.php line 2482), and the returned distance calculations are merged into the final ORDER BY clause (lines 2490–2493). The integration is complete and working as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/Database/Validator/Index.php (2)
324-367: Usearray_filterfor orders check to match lengths.Line 361 uses
!empty($orders)which can incorrectly reject vector indexes if the orders array is populated with empty placeholder values. For consistency with the lengths check on the same line, usearray_filter.Apply this diff:
- if (!empty($orders) || \count(\array_filter($lengths)) > 0) { + if (\count(\array_filter($orders)) > 0 || \count(\array_filter($lengths)) > 0) { $this->message = 'Vector indexes do not support orders or lengths'; return false; }
252-294: Fix the orders check and add lengths validation.Two issues in this method:
- Line 288 uses
!empty($orders)which will incorrectly reject indexes when the orders array contains only empty string placeholders. Usearray_filterto detect actual order values.- Missing validation to reject length specifications for spatial indexes.
Apply this diff:
+ $lengths = $index->getAttribute('lengths', []); if (!empty($orders) && !$this->supportForSpatialIndexOrder) { - $this->message = 'Spatial indexes with explicit orders are not supported. Remove the orders to create this index.'; + if (\count(\array_filter($orders)) > 0 && !$this->supportForSpatialIndexOrder) { + $this->message = 'Spatial indexes with explicit orders are not supported. Remove the orders to create this index.'; return false; } + + if (\count(\array_filter($lengths)) > 0) { + $this->message = 'Spatial indexes do not support lengths'; + return false; + }src/Database/Validator/Query/Filter.php (1)
238-249: Good guard: restrict vector queries to vector, non-array attributes.This prevents misuse early. Consider deduping with the vector cases in isValid() to avoid double-checking in two places, or leave as-is for defense in depth.
Note: Using “size” over “dimensions” here aligns with prior review guidance.
🧹 Nitpick comments (1)
src/Database/Validator/Query/Filter.php (1)
159-177: Harden vector value validation: enforce list, finite numerics, and valid size config.Current checks allow associative arrays, zero/undefined size, and ±INF/NaN. Tighten validation and align messages.
case Database::VAR_VECTOR: - // For vector queries, validate that the value is an array of floats - if (!is_array($value)) { + // For vector queries, validate that the value is a list of finite numerics with correct size + if (!\is_array($value)) { $this->message = 'Vector query value must be an array'; return false; } - foreach ($value as $component) { - if (!is_numeric($component)) { - $this->message = 'Vector query value must contain only numeric values'; - return false; - } - } - // Check size match - $expectedSize = $attributeSchema['size'] ?? 0; - if (count($value) !== $expectedSize) { - $this->message = "Vector query value must have {$expectedSize} elements"; - return false; - } + // Ensure sequential integer keys (list) + if (\array_values($value) !== $value) { + $this->message = 'Vector query value must be a list (sequential array)'; + return false; + } + $expectedSize = $attributeSchema['size'] ?? null; + if (!\is_int($expectedSize) || $expectedSize <= 0) { + $this->message = 'Vector attribute "' . $attribute . '" is missing a valid size configuration'; + return false; + } + if (\count($value) !== $expectedSize) { + $this->message = "Vector query value must have {$expectedSize} elements"; + return false; + } + foreach ($value as $component) { + if (!\is_numeric($component) || !\is_finite((float)$component)) { + $this->message = 'Vector query value must contain only finite numeric values'; + return false; + } + } continue 2;Optional: reuse the dedicated Vector validator introduced in this PR for a single source of truth (signature permitting), to avoid drift from attribute creation rules.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/Database/Validator/Index.php(5 hunks)src/Database/Validator/Query/Filter.php(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/Database/Validator/Index.php (2)
src/Database/Database.php (1)
Database(37-8277)src/Database/Document.php (2)
getAttribute(224-231)Document(12-470)
src/Database/Validator/Query/Filter.php (3)
src/Database/Database.php (2)
Database(37-8277)count(7062-7111)src/Database/Query.php (1)
Query(8-1169)src/Database/Validator/Query/Order.php (1)
isValidAttribute(29-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Adapter Tests (SharedTables/SQLite)
- GitHub Check: Adapter Tests (MySQL)
- GitHub Check: Adapter Tests (SharedTables/Postgres)
- GitHub Check: Adapter Tests (Postgres)
- GitHub Check: Adapter Tests (Pool)
- GitHub Check: Adapter Tests (SQLite)
- GitHub Check: Adapter Tests (SharedTables/MySQL)
- GitHub Check: Adapter Tests (Mirror)
- GitHub Check: Adapter Tests (MariaDB)
- GitHub Check: Adapter Tests (SharedTables/MariaDB)
- GitHub Check: Unit Test
🔇 Additional comments (3)
src/Database/Validator/Index.php (3)
29-46: Constructor refactoring looks good.The migration to constructor-promoted properties is clean and the
$attributesarray is properly initialized in the loop body. The new$supportForVectorIndexesparameter aligns with the broader vector support expansion across the codebase.
296-322: LGTM!This method correctly prevents creating non-spatial indexes (KEY, FULLTEXT, etc.) on spatial attributes, which would be invalid. The logic and error messaging are clear.
400-408: LGTM!The new validation methods are properly integrated into
isValid, following the existing early-return pattern. The ordering is logical: spatial index validation → prevent non-spatial indexes on spatial attributes → vector index validation.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Summary by CodeRabbit