Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add z-score for the normalization processor #376 #470

Open
wants to merge 12 commits into
base: feature/z-score-normalization
Choose a base branch
from
Prev Previous commit
Next Next commit
fix issues due to merge
  • Loading branch information
sam-herman committed Nov 17, 2023
commit b2543364ba7793a58d8038314409f186825f0582
Original file line number Diff line number Diff line change
Expand Up @@ -772,7 +772,6 @@ private String registerModelGroup() {
return modelGroupId;
}


protected List<Map<String, Object>> getNestedHits(Map<String, Object> searchResponseAsMap) {
Map<String, Object> hitsMap = (Map<String, Object>) searchResponseAsMap.get("hits");
return (List<Map<String, Object>>) hitsMap.get("hits");
Expand All @@ -787,7 +786,7 @@ protected Optional<Float> getMaxScore(Map<String, Object> searchResponseAsMap) {
Map<String, Object> hitsMap = (Map<String, Object>) searchResponseAsMap.get("hits");
return hitsMap.get("max_score") == null ? Optional.empty() : Optional.of(((Double) hitsMap.get("max_score")).floatValue());
}

/**
* Enumeration for types of pipeline processors, used to lookup resources like create
* processor request as those are type specific
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ public void testProcessors() {
null,
mock(IngestService.class),
null,
null,
null
);
Map<String, Processor.Factory> processors = plugin.getProcessors(processorParams);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ public void testComplexQuery_withZScoreNormalization() {
NeuralQueryBuilder neuralQueryBuilder = new NeuralQueryBuilder(
TEST_KNN_VECTOR_FIELD_NAME_1,
TEST_QUERY_TEXT,
"",
modelId,
5,
null,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,8 @@ public void testNormalization_whenResultFromOneShardMultipleSubQueries_thenSucce
new TopDocs(new TotalHits(0, TotalHits.Relation.EQUAL_TO), new ScoreDoc[0]),
new TopDocs(
new TotalHits(3, TotalHits.Relation.EQUAL_TO),
// Calculated based on the formula (score - mean_score)/std for the values of mean_score = (0.9 + 0.7 + 0.1)/3 ~ 0.56, std = sqrt(((0.9 - 0.56)^2 + (0.7 - 0.56)^2 + (0.1 - 0.56)^2)/3)
// Calculated based on the formula (score - mean_score)/std for the values of mean_score = (0.9 + 0.7 + 0.1)/3 ~ 0.56,
// std = sqrt(((0.9 - 0.56)^2 + (0.7 - 0.56)^2 + (0.1 - 0.56)^2)/3)
new ScoreDoc[] { new ScoreDoc(3, 0.98058068f), new ScoreDoc(4, 0.39223227f), new ScoreDoc(2, -1.37281295f) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a simple formula or a method as part of the code comments, so we can understand how that score calculated out of provided individual scores. Having a reference to a method description is good, but not the same. Something like you added for integ test assertions will be good.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)
)
Expand Down
Loading