Add implicit ORDER BY Distance for VectorSearch translation#38086
Add implicit ORDER BY Distance for VectorSearch translation#38086
Conversation
VECTOR_SEARCH() results are inherently ordered by distance ascending. Add this ordering implicitly during translation so users can compose with Take() without needing an explicit OrderBy(r => r.Distance). Also remove the unnecessary forDml guard from GenerateTop() since VectorSearch() requires DbSet<T> and cannot appear in DELETE/UPDATE table lists through LINQ. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates EFCore.SqlServer’s VectorSearch() translation to automatically apply ORDER BY Distance ASC, reflecting SQL Server’s VECTOR_SEARCH() natural ordering so that users can compose with Take() without adding boilerplate OrderBy(r => r.Distance).
Changes:
- Add an implicit
ORDER BY Distance ASCto the translatedVECTOR_SEARCH()query. - Update SQL Server functional tests to remove redundant explicit
OrderBy(Distance)calls. - Update
VectorSearch()XML docs to describe the new implicit ordering behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| test/EFCore.SqlServer.FunctionalTests/Query/Translations/VectorTranslationsSqlServerTest.cs | Removes explicit OrderBy(Distance) from existing VectorSearch tests while keeping ORDER BY in generated SQL baselines. |
| src/EFCore.SqlServer/Query/Internal/SqlServerQueryableMethodTranslatingExpressionVisitor.cs | Introduces a shared Distance column projection and appends an implicit ordering by that column. |
| src/EFCore.SqlServer/Extensions/SqlServerQueryableExtensions.cs | Updates API remarks to state results are implicitly ordered by distance ascending. |
test/EFCore.SqlServer.FunctionalTests/Query/Translations/VectorTranslationsSqlServerTest.cs
Show resolved
Hide resolved
src/EFCore.SqlServer/Query/Internal/SqlServerQueryableMethodTranslatingExpressionVisitor.cs
Show resolved
Hide resolved
src/EFCore.SqlServer/Query/Internal/SqlServerQueryableMethodTranslatingExpressionVisitor.cs
Show resolved
Hide resolved
|
@copilot address the above unresolved comments. |
…for VECTOR_SEARCH Agent-Logs-Url: https://github.com/dotnet/efcore/sessions/e4de5685-2516-4055-9e2a-b8f8216313a9 Co-authored-by: roji <1862641+roji@users.noreply.github.com>
Done in e8af99c:
|
src/EFCore.SqlServer/Query/Internal/SqlServerQueryableMethodTranslatingExpressionVisitor.cs
Outdated
Show resolved
Hide resolved
Agent-Logs-Url: https://github.com/dotnet/efcore/sessions/89b8b8d6-f330-4dee-b9c9-66fdff8f250b Co-authored-by: roji <1862641+roji@users.noreply.github.com>
| var combinedLimit = _sqlExpressionFactory.Add(existingOffset, translation); | ||
|
|
||
| #pragma warning disable EF1001 // Internal EF Core API usage. | ||
| // Clear the offset so the inner subquery uses TOP(M+N) instead of OFFSET...FETCH | ||
| selectExpression.SetOffset(null); |
There was a problem hiding this comment.
TranslateTake's VECTOR_SEARCH Skip/Take rewrite doesn't preserve multiple-Take semantics: if the query already has Limit (e.g. Skip(1).Take(5).Take(10)), this branch overwrites it and can return too many rows. Consider first applying the normal Take translation (so Limit becomes min(oldLimit, newLimit)), then use that effective Limit for both the inner TOP(existingOffset + effectiveLimit) and the outer FETCH, rather than using the new translation directly.
| /// doing so can result in application failures when updating to a new Entity Framework Core release. | ||
| /// </summary> | ||
| [EntityFrameworkInternal] | ||
| public void SetOffset(SqlExpression? sqlExpression) |
There was a problem hiding this comment.
Adding SelectExpression.SetOffset(...) introduces a new (public) API surface area (even with [EntityFrameworkInternal]). The repo tracks public API in EFCore.Relational.baseline.json, so this PR likely needs the corresponding baseline update to keep API validation passing.
| public void SetOffset(SqlExpression? sqlExpression) | |
| internal void SetOffset(SqlExpression? sqlExpression) |
Tiny follow-up to #38075.
VECTOR_SEARCH() results are inherently ordered by distance ascending. This adds the ordering implicitly during translation so users can compose with
Take()without needing an explicitOrderBy(r => r.Distance).Before:
After:
An explicit
.OrderBy()or.OrderByDescending()still overrides the implicit ordering.We could adopt a more low-level/conservative approach, saying that since SQL Server requires the explicit ORDER BY at the SQL level, we should continue requiring it at the LINQ level. However, we already add implicit orderings elsewhere (especially when transforming JSON arrays to resultsets with OPENJSON), and I don't see the point of having a mandatory, explicit gesture where we can save users the trouble.