Skip to content

Conversation

@asimmahmood1
Copy link
Contributor

@asimmahmood1 asimmahmood1 commented Apr 10, 2025

Description

  • for motivation see related issue
  • Had to modify more unit tests than expected, do review the changes

Related Issues

Resolves #17823

Check List

  • Functionality includes testing.
  • API changes companion pull request #9600, if applicable.
  • Public documentation issue/PR created, if applicable.

Testing with useSimilarity:true, took time 10ms, score 1.0

dev-dsk-asimmahm-2c-a6d21262 %  curl -X PUT "http://localhost:9200/big5/_mapping?pretty" \
-H "Content-Type: application/json" \
-d '{
  "properties": {
    "process.name": {
       "type":"keyword", "useSimilarity": false
    }
  }
}'
{
  "acknowledged" : true
}

(25-04-11 0:12:34) <0> [~/workplace/OpenSearch]
dev-dsk-asimmahm-2c-a6d21262 % curl -X POST "http://localhost:9200/big5/_search?pretty=true" \
-H "Content-Type: application/json" \
-d '{"size":3, "explain": false,
  "query": {
    "term": {
      "process.name": "kernel"
    }
  },"_source":false
}'
{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "big5",
        "_id" : "xDoCtJQBE3c7bAfikzbk",
        "_score" : 1.0
      },
      {
        "_index" : "big5",
        "_id" : "xzoCtJQBE3c7bAfikzbk",
        "_score" : 1.0
      },
      {
        "_index" : "big5",
        "_id" : "yDoCtJQBE3c7bAfikzbk",
        "_score" : 1.0
      }
    ]
  }
}

Testing with useSimilarity:true, took time 200ms, score 0.8...

dev-dsk-asimmahm-2c-a6d21262 %  curl -X PUT "http://localhost:9200/big5/_mapping?pretty" \
-H "Content-Type: application/json" \
-d '{
  "properties": {
    "process.name": {
       "type":"keyword", "useSimilarity": true
    }
  }
}'
{
  "acknowledged" : true
}

(25-04-11 0:16:04) <0> [~/workplace/OpenSearch]
dev-dsk-asimmahm-2c-a6d21262 % GET "http://localhost:9200/big5/_mapping/field/process.name?pretty=true"
{
  "big5" : {
    "mappings" : {
      "process.name" : {
        "full_name" : "process.name",
        "mapping" : {
          "name" : {
            "type" : "keyword",
            "useSimilarity" : true
          }
        }
      }
    }
  }
}
dev-dsk-asimmahm-2c-a6d21262 % curl -X POST "http://localhost:9200/big5/_search?pretty=true" \
-H "Content-Type: application/json" \
-d '{"size":3, "explain": false,
  "query": {
    "term": {
      "process.name": "kernel"
    }
  },"_source":false
}'
{
  "took" : 200,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 0.8844931,
    "hits" : [
      {
        "_index" : "big5",
        "_id" : "xDoCtJQBE3c7bAfikzbk",
        "_score" : 0.8844931
      },
      {
        "_index" : "big5",
        "_id" : "xzoCtJQBE3c7bAfikzbk",
        "_score" : 0.8844931
      },
      {
        "_index" : "big5",
        "_id" : "yDoCtJQBE3c7bAfikzbk",
        "_score" : 0.8844931
      }
    ]
  }
}


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

@github-actions github-actions bot added bug Something isn't working Search Search query, autocomplete ...etc labels Apr 10, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for 6b60c9c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@rishabh6788
Copy link
Contributor

{"run-benchmark-test": "id_1"}

@rishabh6788
Copy link
Contributor

{"run-benchmark-test": "id_4"}

@github-actions
Copy link
Contributor

The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/2828/ . Final results will be published once the job is completed.

@asimmahmood1
Copy link
Contributor Author

asimmahmood1 commented Apr 10, 2025

Edit: n/m, code out of sync

I'm getting this error when trying to update the mapping:

dev-dsk-asimmahm-2c-a6d21262 % curl -X PUT "http://localhost:9200/big5/_mapping?pretty" \
-H "Content-Type: application/json" \
-d '{
  "properties": {
    "process.name": {
       "type":"keyword", "useSimilarity": true
    }
  }
}'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "unknown parameter [useSimilarity] on mapper [name] of type [keyword]"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "unknown parameter [useSimilarity] on mapper [name] of type [keyword]"
  },
  "status" : 400
}

Without that useSimilarity it works:

dev-dsk-asimmahm-2c-a6d21262 % curl -X PUT "http://localhost:9200/big5/_mapping?pretty" \
-H "Content-Type: application/json" \
-d '{
  "properties": {
    "process.name": {
       "type":"keyword"
    }
  }
}'
{
  "acknowledged" : true
}

@github-actions
Copy link
Contributor

❌ Gradle check result for 493a1d3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

* will follow up with unit tests
* making sure I'm on the right track

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
* will follow up with manually testing the new mapping parameter

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
@asimmahmood1 asimmahmood1 force-pushed the keyword_term_constant branch from 493a1d3 to ef419d5 Compare April 11, 2025 00:00
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for ef419d5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 16882dc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

* also check all places where we do `X instanceof BoostQuery` to also
  check for `ConstantScoreQuery`

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
@asimmahmood1
Copy link
Contributor Author

Failure was real. In case I also where we do X instanceof BoostQuery to also
check for ConstantScoreQuery

»  Caused by: java.lang.IllegalArgumentException: Cannot extract a term from a query of type class org.apache.lucene.search.ConstantScoreQuery: ConstantScore(keyword_field:value1)
»  	at org.opensearch.index.mapper.MappedFieldType.extractTerm(MappedFieldType.java:501) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
»  	at org.opensearch.index.query.SpanTermQueryBuilder.doToQuery(SpanTermQueryBuilder.java:104) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
»  	at org.opensearch.index.query.SpanTermQueryBuilder.doToQuery(SpanTermQueryBuilder.java:54) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]

@github-actions
Copy link
Contributor

❌ Gradle check result for c8e4dcd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❕ Gradle check result for c8e4dcd: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link

codecov bot commented Apr 12, 2025

Codecov Report

Attention: Patch coverage is 70.58824% with 5 lines in your changes missing coverage. Please review.

Project coverage is 72.47%. Comparing base (6ff44d9) to head (c8e4dcd).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...rg/opensearch/index/mapper/KeywordFieldMapper.java 80.00% 2 Missing and 1 partial ⚠️
...a/org/opensearch/index/mapper/MappedFieldType.java 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17889      +/-   ##
============================================
+ Coverage     72.43%   72.47%   +0.03%     
- Complexity    66952    66976      +24     
============================================
  Files          5467     5470       +3     
  Lines        309605   309707     +102     
  Branches      45043    45052       +9     
============================================
+ Hits         224272   224466     +194     
+ Misses        66978    66904      -74     
+ Partials      18355    18337      -18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@peterzhuamazon peterzhuamazon merged commit 4308f4c into opensearch-project:main Apr 12, 2025
30 of 31 checks passed
rgsriram pushed a commit to rgsriram/OpenSearch that referenced this pull request Apr 15, 2025
… logic using useSimilarity parameter (opensearch-project#17889)

* Initial attempt to add constant scorer for term keyword search

* will follow up with unit tests
* making sure I'm on the right track

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix unit tests

* will follow up with manually testing the new mapping parameter

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Add changelog entry

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix unit test

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix changelog entry

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Apply style check

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix unit tests

* since term is a basic query type, it affects many that build on top
* none of these should cause a regression, infact this change should fix
  any possible regression these query types, which may not be copatured
in big5 datasets

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix more unit tests

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix flakly test

* due to the random nature of
  org.opensearch.index.query.SimpleQueryStringBuilderTests.testToQuery,
this can sometimes fail with `but: was <ConstantScore(mapped_string_2:za)>`

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
 # Please enter the commit message for your changes. Lines starting

* Fix style check

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Change parameter to use snake_case: use_similarity

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix integ tests

* also check all places where we do `X instanceof BoostQuery` to also
  check for `ConstantScoreQuery`

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

---------

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com> # Please enter the commit message for your changes. Lines starting
Signed-off-by: Asim M <asim.seng@gmail.com>
Signed-off-by: Sriram Ganesh <srignsh22@gmail.com>
Harsh-87 pushed a commit to Harsh-87/OpenSearch that referenced this pull request May 7, 2025
… logic using useSimilarity parameter (opensearch-project#17889)

* Initial attempt to add constant scorer for term keyword search

* will follow up with unit tests
* making sure I'm on the right track

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix unit tests

* will follow up with manually testing the new mapping parameter

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Add changelog entry

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix unit test

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix changelog entry

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Apply style check

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix unit tests

* since term is a basic query type, it affects many that build on top
* none of these should cause a regression, infact this change should fix
  any possible regression these query types, which may not be copatured
in big5 datasets

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix more unit tests

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix flakly test

* due to the random nature of
  org.opensearch.index.query.SimpleQueryStringBuilderTests.testToQuery,
this can sometimes fail with `but: was <ConstantScore(mapped_string_2:za)>`

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
 # Please enter the commit message for your changes. Lines starting

* Fix style check

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Change parameter to use snake_case: use_similarity

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix integ tests

* also check all places where we do `X instanceof BoostQuery` to also
  check for `ConstantScoreQuery`

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

---------

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com> # Please enter the commit message for your changes. Lines starting
Signed-off-by: Asim M <asim.seng@gmail.com>
Signed-off-by: Harsh Kothari <techarsh@amazon.com>
Harsh-87 pushed a commit to Harsh-87/OpenSearch that referenced this pull request May 7, 2025
… logic using useSimilarity parameter (opensearch-project#17889)

* Initial attempt to add constant scorer for term keyword search

* will follow up with unit tests
* making sure I'm on the right track

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix unit tests

* will follow up with manually testing the new mapping parameter

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Add changelog entry

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix unit test

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix changelog entry

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Apply style check

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix unit tests

* since term is a basic query type, it affects many that build on top
* none of these should cause a regression, infact this change should fix
  any possible regression these query types, which may not be copatured
in big5 datasets

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix more unit tests

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix flakly test

* due to the random nature of
  org.opensearch.index.query.SimpleQueryStringBuilderTests.testToQuery,
this can sometimes fail with `but: was <ConstantScore(mapped_string_2:za)>`

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
 # Please enter the commit message for your changes. Lines starting

* Fix style check

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Change parameter to use snake_case: use_similarity

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

* Fix integ tests

* also check all places where we do `X instanceof BoostQuery` to also
  check for `ConstantScoreQuery`

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

---------

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com> # Please enter the commit message for your changes. Lines starting
Signed-off-by: Asim M <asim.seng@gmail.com>
Signed-off-by: Harsh Kothari <techarsh@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Search Search query, autocomplete ...etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Disable scoring of Keyword Term search in order to avoid regression since 2.18

6 participants