Skip to content

Add min score linear retriever #129359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 108 commits into from
Jun 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
12fb2fa
propgating retrievers to inner retrievers
mridula-s109 Jun 2, 2025
81e99b6
test feature taken care of
mridula-s109 Jun 6, 2025
05fb0ab
Merge branch 'elastic:main' into main
mridula-s109 Jun 6, 2025
605c035
Small changes in concurrent multipart upload interfaces (#128977)
tlrx Jun 6, 2025
2dca633
Unmute FollowingEngineTests#testProcessOnceOnPrimary() test (#129054)
martijnvg Jun 6, 2025
4c0e3c9
[Build] Add support for publishing to maven central (#128659)
breskeby Jun 6, 2025
e2189e6
ESQL: Check for errors while loading blocks (#129016)
nik9000 Jun 6, 2025
aec1688
Make `PhaseCacheManagementTests` project-aware (#129047)
nielsbauman Jun 6, 2025
8c423ce
Vector test tools (#128934)
benwtrent Jun 6, 2025
df3ef0d
ES|QL: refactor generative tests (#129028)
luigidellaquila Jun 6, 2025
0eebc8c
Add a test of LOOKUP JOIN against a time series index (#129007)
bpintea Jun 6, 2025
b1e15f0
Make ILM `ClusterStateWaitStep` project-aware (#129042)
nielsbauman Jun 6, 2025
846b09a
Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT tes…
elasticsearchmachine Jun 6, 2025
a97d582
Remove `ClusterState` param from ILM `AsyncBranchingStep` (#129076)
nielsbauman Jun 6, 2025
763b502
Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT tes…
elasticsearchmachine Jun 6, 2025
8a660c8
Mute org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT t…
elasticsearchmachine Jun 6, 2025
aa16175
Mute org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT t…
elasticsearchmachine Jun 6, 2025
6e58b1e
Mute org.elasticsearch.packaging.test.DockerTests test081SymlinksAreF…
elasticsearchmachine Jun 7, 2025
05f70f0
Threadpool merge executor is aware of available disk space (#127613)
albertzaharovits Jun 8, 2025
713ab42
Add option to include or exclude vectors from _source retrieval (#128…
jimczi Jun 9, 2025
0776562
Remove direct minScore propagation to inner retrievers
mridula-s109 Jun 9, 2025
f145d26
cleaned up skip
mridula-s109 Jun 9, 2025
d8b6897
Mute org.elasticsearch.index.engine.ThreadPoolMergeExecutorServiceDis…
elasticsearchmachine Jun 9, 2025
82c7ab1
Add transport version for ML inference Mistral chat completion (#129033)
Jan-Kazlouski-elastic Jun 9, 2025
eca383d
Correct index path validation (#129144)
benwtrent Jun 9, 2025
fb6ec9a
Mute org.elasticsearch.index.engine.ThreadPoolMergeExecutorServiceDis…
elasticsearchmachine Jun 9, 2025
6806b24
Implemented completion task for Google VertexAI (#128694)
leo-hoet Jun 9, 2025
0ef36a1
Merge remote-tracking branch 'upstream/main'
mridula-s109 Jun 9, 2025
ece13d9
Merge remote-tracking branch 'upstream/main'
mridula-s109 Jun 9, 2025
36cd91e
Merge remote-tracking branch 'upstream/main'
mridula-s109 Jun 10, 2025
74b431d
ES|QL - kNN function initial support (#127322)
carlosdelest Jun 10, 2025
c678ebd
Remove optional seed from ES|QL SAMPLE (#128887)
jan-elastic Jun 10, 2025
7d37afa
[Inference API] Add "rerank" task type to "elastic" provider (#126022)
timgrein Jun 10, 2025
eed00f4
Rename target destination for microbenchmarks (#128878)
idegtiarenko Jun 10, 2025
f768664
Include direct memory and non-heap memory in ML memory calculations (…
jan-elastic Jun 10, 2025
2d605ee
Throw better exception for unsupported aggregations over shape fields…
iverase Jun 10, 2025
b68ddd1
Update Test Framework To Handle Query Rewrites That Rely on Non-Null …
Mikep86 Jun 10, 2025
f1bf18e
Update ReproduceInfoPrinter to correctly print a reproduction line fo…
mosche Jun 10, 2025
9abfe1d
Increment inference stats counter for shard bulk inference calls (#12…
jimczi Jun 10, 2025
2fa185a
Synthetic source: avoid storing multi fields of type text and match_o…
martijnvg Jun 10, 2025
ac213d5
Adding `scheduled_report_id` field to kibana reporting template (#127…
ymao1 Jun 10, 2025
01de61e
ES|QL: Add FORK generative tests (#129135)
ioanatia Jun 10, 2025
f48c383
ES|QL Completion command syntax change (#129189)
afoucret Jun 10, 2025
efc6450
Remove optional seed from ES|QL SAMPLE (#128887)
jan-elastic Jun 10, 2025
3b1c5d6
ES|QL Completion command syntax change (#129189)
afoucret Jun 10, 2025
b304882
Remove optional seed from ES|QL SAMPLE (#128887)
jan-elastic Jun 10, 2025
219b424
ES|QL Completion command syntax change (#129189)
afoucret Jun 10, 2025
640d9b5
Add Cluster Feature for L2 Norm (#129181)
mridula-s109 Jun 10, 2025
29f3079
Fix DRA dependenciesInfo task dependency resolution (#129209)
breskeby Jun 10, 2025
35f8315
IVF Hierarchical KMeans Flush & Merge (#128675)
john-wagster Jun 10, 2025
14a2956
Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeForkIT tes…
elasticsearchmachine Jun 10, 2025
70cc427
Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeForkIT tes…
elasticsearchmachine Jun 10, 2025
d941e9b
[ES|QL] Specify population in StdDev docs (#129225)
limotova Jun 10, 2025
cb49533
Unmute IngestGeoIpClientYamlTestSuiteIT (#129178)
samxbr Jun 11, 2025
941f2f1
Fix an NPE in the ES|QL completion command. (#129235)
afoucret Jun 11, 2025
2e69b33
ESQL: fix bwc test by adding min required version (#129204)
bpintea Jun 11, 2025
5f897ea
ESQL: Fix test by add excluding capability (#129202)
bpintea Jun 11, 2025
266baaa
Fix vault field name (#129184)
idegtiarenko Jun 11, 2025
44fac93
Remove all usages of Metadata customs removal methods (#129043)
nielsbauman Jun 11, 2025
83fc5bc
Replace tuple with record (#128976)
idegtiarenko Jun 11, 2025
2f85047
improve support for bytecode patching signed jars (#128613)
richard-dennehy Jun 11, 2025
6c42ac3
rename ES|QL sample capability (#129193)
jan-elastic Jun 11, 2025
044d810
ESQL: Mute GenerativeForkIT for some LOOKUP JOIN tests (#129248)
alex-spies Jun 11, 2025
4b3adc5
ESQL: Extend `RENAME` syntax to allow a `new = old` syntax (#129212)
bpintea Jun 11, 2025
1a9e672
[DOCS] Adds preview tag to the CHANGE_POINT ES|QL command in the comm…
szabosteve Jun 11, 2025
3e79b2a
ESQL: Skip unused STATS groups by adding a Top N BlockHash implementa…
ivancea Jun 11, 2025
601634d
Add "Searchable Snapshots" to changelog validation schema (#129180)
tlrx Jun 11, 2025
4000c4e
ESQL: Fix FieldAttribute name usage in InferNonNullAggConstraint (#12…
alex-spies Jun 11, 2025
4205dc5
Remove usages of `Metadata.Builder#indexGraveyard` (#129041)
nielsbauman Jun 11, 2025
1c6bb76
Mute org.elasticsearch.compute.data.sort.LongTopNSetTests testCrankyB…
elasticsearchmachine Jun 11, 2025
3e25956
Enable Shard-Level Search-load rate metric (#128660)
drempapis Jun 11, 2025
3e6d15a
[ESQL] Fix typo in search-functions.md (#129260)
leemthompo Jun 11, 2025
fad24a4
ESQL: Log partial failures (#129164)
nik9000 Jun 11, 2025
5a23779
Update Gradle wrapper to 8.14.2 (#129179)
breskeby Jun 11, 2025
5acb2fa
Fix ivf nodestats impl for getOffHeapByteSize (#129259)
benwtrent Jun 11, 2025
29192c8
feat: enable date_detection for all apm data streams (#128913)
kruskall Jun 11, 2025
bc7d5b2
[BC Upgrage] Fix incorrect version parsing in tests (#129243)
ldematte Jun 11, 2025
41b201c
[Build] Build maven aggregation zip as part of DRA build (#129175)
breskeby Jun 11, 2025
92ab146
Throttle indexing when disk IO throttling is disabled (#129245)
albertzaharovits Jun 11, 2025
c033175
Register match_phrase as a function not a snapshot function (#129255)
kderusso Jun 11, 2025
babfc86
[Gradle] Spotless plugin update (#115750)
breskeby Jun 11, 2025
2726f50
Adding support to exclude semantic_text subfields (#127664)
Samiul-TheSoccerFan Jun 11, 2025
73df454
Revert "[Gradle] Spotless plugin update (#115750)"
breskeby Jun 11, 2025
4f01aaf
Switch IVF Writer to ES Logger (#129224)
john-wagster Jun 11, 2025
5295e8d
Add heap usage estimate to ClusterInfo (#128723)
nicktindall Jun 12, 2025
99acf4c
Revert "Use IndexOrDocValuesQuery in NumberFieldType#termQuery implem…
iverase Jun 12, 2025
a610698
Delegated authorization using Microsoft Graph (SDK) (#128396)
richard-dennehy Jun 12, 2025
cae2843
Add `none` chunking strategy to disable automatic chunking for infere…
jimczi Jun 12, 2025
6852ded
Fix broken bwc logic in text field mapper introduced by #129126 (#129…
martijnvg Jun 12, 2025
3f1fba0
[ESQL] Fix SpatialDocValuesExtraction rule replacing TimeSeries agg n…
ivancea Jun 12, 2025
3fc3440
Make `TransportMoveToStepAction` project-aware (#129252)
nielsbauman Jun 12, 2025
d1391d5
[DOCS] Adds term vectors API examples (#129328)
szabosteve Jun 12, 2025
ce949a8
[ESQL] Fix TopNSetTestCase test and unmute it (#129327)
ivancea Jun 12, 2025
6645bf9
ESQL: Change queries ID to be the same as the async (#127472)
GalLalouche Jun 12, 2025
dad2394
Adjust unpromotable shard refresh request validation to allow Refresh…
tlrx Jun 12, 2025
6010898
Add a Multi-Project Search Rest Test (#128657)
tvernum Jun 12, 2025
194a221
Modified LinearRetriever to include minScore
mridula-s109 Jun 12, 2025
c961723
cleaned up
mridula-s109 Jun 12, 2025
a7a0ba3
Made the same changes we did in textSimilarity
mridula-s109 Jun 12, 2025
c6a12df
Fixed a minor error
mridula-s109 Jun 12, 2025
5180d51
cleaned up
mridula-s109 Jun 12, 2025
a24f243
Minscore is working :)
mridula-s109 Jun 12, 2025
f57a793
chore: empty commit to trigger PR update
mridula-s109 Jun 12, 2025
7080a57
merged conflict
mridula-s109 Jun 12, 2025
e789e51
Update docs/changelog/129359.yaml
mridula-s109 Jun 12, 2025
e993a20
Merge branch 'main' into add_min_score_linear_retriever
mridula-s109 Jun 12, 2025
8939675
Update 10_linear_retriever.yml
mridula-s109 Jun 12, 2025
d200742
[CI] Auto commit changes from spotless
elasticsearchmachine Jun 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/129359.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 129359
summary: Add min score linear retriever
area: Search
type: enhancement
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

import static org.elasticsearch.search.retriever.CompoundRetrieverBuilder.INNER_RETRIEVERS_FILTER_SUPPORT;
import static org.elasticsearch.xpack.rank.linear.L2ScoreNormalizer.LINEAR_RETRIEVER_L2_NORM;
import static org.elasticsearch.xpack.rank.linear.LinearRetrieverBuilder.LINEAR_RETRIEVER_MINSCORE_FIX;
import static org.elasticsearch.xpack.rank.linear.MinMaxScoreNormalizer.LINEAR_RETRIEVER_MINMAX_SINGLE_DOC_FIX;

public class RankRRFFeatures implements FeatureSpecification {
Expand All @@ -27,6 +28,11 @@ public Set<NodeFeature> getFeatures() {

@Override
public Set<NodeFeature> getTestFeatures() {
return Set.of(INNER_RETRIEVERS_FILTER_SUPPORT, LINEAR_RETRIEVER_MINMAX_SINGLE_DOC_FIX, LINEAR_RETRIEVER_L2_NORM);
return Set.of(
INNER_RETRIEVERS_FILTER_SUPPORT,
LINEAR_RETRIEVER_MINMAX_SINGLE_DOC_FIX,
LINEAR_RETRIEVER_L2_NORM,
LINEAR_RETRIEVER_MINSCORE_FIX
);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import org.apache.lucene.search.ScoreDoc;
import org.elasticsearch.common.ParsingException;
import org.elasticsearch.common.util.Maps;
import org.elasticsearch.features.NodeFeature;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.license.LicenseUtils;
import org.elasticsearch.search.builder.SearchSourceBuilder;
Expand Down Expand Up @@ -46,6 +47,7 @@
*/
public final class LinearRetrieverBuilder extends CompoundRetrieverBuilder<LinearRetrieverBuilder> {

public static final NodeFeature LINEAR_RETRIEVER_MINSCORE_FIX = new NodeFeature("linear_retriever_minscore_fix");
public static final String NAME = "linear";

public static final ParseField RETRIEVERS_FIELD = new ParseField("retrievers");
Expand Down Expand Up @@ -125,12 +127,35 @@ public LinearRetrieverBuilder(
this.normalizers = normalizers;
}

public LinearRetrieverBuilder(
List<RetrieverSource> innerRetrievers,
int rankWindowSize,
float[] weights,
ScoreNormalizer[] normalizers,
Float minScore,
String retrieverName,
List<QueryBuilder> preFilterQueryBuilders
) {
this(innerRetrievers, rankWindowSize, weights, normalizers);
this.minScore = minScore;
if (minScore != null && minScore < 0) {
throw new IllegalArgumentException("[min_score] must be greater than or equal to 0, was: [" + minScore + "]");
}
this.retrieverName = retrieverName;
this.preFilterQueryBuilders = preFilterQueryBuilders;
}

@Override
protected LinearRetrieverBuilder clone(List<RetrieverSource> newChildRetrievers, List<QueryBuilder> newPreFilterQueryBuilders) {
LinearRetrieverBuilder clone = new LinearRetrieverBuilder(newChildRetrievers, rankWindowSize, weights, normalizers);
clone.preFilterQueryBuilders = newPreFilterQueryBuilders;
clone.retrieverName = retrieverName;
return clone;
return new LinearRetrieverBuilder(
newChildRetrievers,
rankWindowSize,
weights,
normalizers,
minScore,
retrieverName,
newPreFilterQueryBuilders
);
}

@Override
Expand Down Expand Up @@ -181,6 +206,10 @@ protected RankDoc[] combineInnerRetrieverResults(List<ScoreDoc[]> rankResults, b
topResults[rank] = sortedResults[rank];
topResults[rank].rank = rank + 1;
}
// Filter by minScore if set(inclusive)
if (minScore != null) {
topResults = Arrays.stream(topResults).filter(doc -> doc.score >= minScore).toArray(LinearRankDoc[]::new);
}
return topResults;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ setup:
- close_to: { hits.hits.2._score: { value: 1.6, error: 0.001 } }
- match: { hits.hits.3._id: "3" }
- close_to: { hits.hits.3._score: { value: 1.2, error: 0.001} }

---
"should handle all zero scores in normalization":
- requires:
Expand Down Expand Up @@ -1196,6 +1196,111 @@ setup:
rank_window_size: -10
- match: { status: 400 }

---
"linear retriever respects min_score after normalization":

- requires:
cluster_features: [ "linear_retriever_minscore_fix" ]
reason: test min_score functionality for linear retriever

- do:
search:
index: test
body:
retriever:
linear:
retrievers:
- retriever:
standard:
query:
function_score:
query:
match_all: {}
functions:
- filter: { term: { _id: "1" } }
weight: 1
- filter: { term: { _id: "2" } }
weight: 2
- filter: { term: { _id: "3" } }
weight: 3
- filter: { term: { _id: "4" } }
weight: 4
weight: 1.0
normalizer: "minmax"
rank_window_size: 10
min_score: 0.8
size: 10

- match: { hits.total.value: 1 }
- length: { hits.hits: 1 }
- match: { hits.hits.0._id: "4" }

---
"linear retriever with min_score zero includes all docs":

- requires:
cluster_features: [ "linear_retriever_minscore_fix" ]
reason: test min score functionality for linear retriever

- do:
search:
index: test
body:
retriever:
linear:
retrievers: [
{
retriever: {
standard: {
query: {
match_all: {}
}
}
},
weight: 1.0,
normalizer: "minmax"
}
]
rank_window_size: 10
min_score: 0
size: 10

- match: { hits.total.value: 4 }
- length: { hits.hits: 4 }

---
"linear retriever with high min_score excludes all docs":

- requires:
cluster_features: [ "linear_retriever_minscore_fix" ]
reason: test min score functionality for linear retriever

- do:
search:
index: test
body:
retriever:
linear:
retrievers: [
{
retriever: {
standard: {
query: {
match_all: {}
}
}
},
weight: 1.0,
normalizer: "minmax"
}
]
rank_window_size: 10
min_score: 2.0
size: 10

- match: { hits.total.value: 0 }
- length: { hits.hits: 0 }

---
"minmax normalization properly handles a single doc result set":
- requires:
Expand Down
Loading