Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a new parameter, max_analyzer_offset, for the highlighter #3893

Merged
merged 23 commits into from
Jul 22, 2022

Conversation

hauck-jvsh
Copy link
Contributor

@hauck-jvsh hauck-jvsh commented Jul 13, 2022

Description

Adds a new parameter, max_analyzer_offset, for the highlighter. This new parameter can be used to avoid aborting when highlighting large fields, or just to speedup the highlight doing it only on the beginning of the field.

Issues Resolved

This pull request closes #3842

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@hauck-jvsh hauck-jvsh requested review from a team and reta as code owners July 13, 2022 22:48
@hauck-jvsh
Copy link
Contributor Author

I couldn't run the check on windows, it was failing even when I run it with the main branch I executed the line below on powershell.

gradlew.bat check

Finally, how can I signed the commits?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@tlfeng tlfeng added enhancement Enhancement or improvement to existing feature or request community Issues raised by community members and contributors Indexing & Search labels Jul 14, 2022
@hauck-jvsh
Copy link
Contributor Author

I see that tests are failing for multi cluster searches. However this is expected, as these other clusters did not support this feature. Is there an way to check te cluster version and only runs this when all clusters support it?

@dblock dblock requested a review from andrross July 14, 2022 17:29
@dblock
Copy link
Member

dblock commented Jul 14, 2022

For DCO, amend your PRs with -s. Not sure about the other questions, @andrross do you know how to do this properly?

@andrross
Copy link
Member

For DCO, amend your PRs with -s. Not sure about the other questions, @andrross do you know how to do this properly?

Do you need to add something like this to your test specification (substituting the correct version of course)?

…max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Hauck <joaoh14@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Hauck <joaoh14@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Jul 15, 2022

Codecov Report

Merging #3893 (4b25f5c) into main (07775ff) will increase coverage by 0.08%.
The diff coverage is 62.73%.

@@             Coverage Diff              @@
##               main    #3893      +/-   ##
============================================
+ Coverage     70.57%   70.66%   +0.08%     
- Complexity    56679    56795     +116     
============================================
  Files          4563     4573      +10     
  Lines        272755   273288     +533     
  Branches      40040    40084      +44     
============================================
+ Hits         192505   193116     +611     
+ Misses        64014    63905     -109     
- Partials      16236    16267      +31     
Impacted Files Coverage Δ
...ark/time/NanoTimeVsCurrentTimeMillisBenchmark.java 0.00% <ø> (ø)
...a/org/opensearch/painless/antlr/PainlessLexer.java 68.18% <ø> (ø)
.../org/opensearch/painless/antlr/PainlessParser.java 68.43% <ø> (ø)
...earch/action/search/PitSearchContextIdForNode.java 0.00% <0.00%> (ø)
...ensearch/action/search/SearchContextIdForNode.java 94.44% <ø> (ø)
...er/src/main/java/org/opensearch/client/Client.java 40.00% <ø> (ø)
.../org/opensearch/client/support/AbstractClient.java 34.96% <0.00%> (-0.19%) ⬇️
...rg/opensearch/common/settings/ClusterSettings.java 91.89% <ø> (ø)
...pensearch/common/settings/IndexScopedSettings.java 100.00% <ø> (ø)
...rch/fetch/subphase/highlight/HighlightBuilder.java 94.37% <0.00%> (-0.39%) ⬇️
... and 513 more

Signed-off-by: Hauck <joaoh14@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@hauck-jvsh
Copy link
Contributor Author

Do you think this can be back-ported to 2.x?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@andrross
Copy link
Member

Do you think this can be back-ported to 2.x?

I think so. Since it's a new parameter it's fully backward compatible. @reta do you see any reason not to backport?

Signed-off-by: Hauck <joaoh14@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@dblock
Copy link
Member

dblock commented Jul 21, 2022

@andrross all yours to final review/merge

@dblock dblock added the backport 2.x Backport to 2.x branch label Jul 21, 2022
@dblock dblock requested a review from andrross July 21, 2022 20:03
@andrross andrross merged commit 931813f into opensearch-project:main Jul 22, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jul 22, 2022
* #3842 adds a new parameter to the highlighter, the max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>

* Adds a test for the new parameter

Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the test add in the previous commit;

Signed-off-by: Hauck <joaoh14@gmail.com>

* This was checking against the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only runs the test for the correct version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips the test in Elasticsearch as well;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove elastic 3.0 to test

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips all versions

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove unnecessary fields as pointed by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* Compute if fieldMaxAnalyzedIsNotValid in the constructor as suggest by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* As discussed, it is better to throws different exceptions for when the fieldMaxAnalyzed is not valid and for when it is disabled;

Signed-off-by: Hauck <joaoh14@gmail.com>

* hint what to do to allow highlight of bigger documents
Signed-off-by: Hauck <joaoh14@gmail.com>

* Let the user define the new parameter globally for all fields highlighted

Signed-off-by: Hauck <joaoh14@gmail.com>

* Change the  fieldMaxAnalyzedOffset Integer in order to use null when it is absent in highlight. This allows the error messages to much more precise, showing invalid for all negative numbers;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update javadocs and implements the stream methods for the new fields;

Signed-off-by: Hauck <joaoh14@gmail.com>

* builder.field do not accept null, so check before calling the method is necessary

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only send and read the new fields if the version supports it

Signed-off-by: Hauck <joaoh14@gmail.com>

* the previous commit was checking the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Check for version 3.0.0 instead of current version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update server/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java

Co-authored-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>

* Execute the test after version 3.0.0

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
(cherry picked from commit 931813f)
@reta
Copy link
Collaborator

reta commented Jul 24, 2022

Do you think this can be back-ported to 2.x?

I think so. Since it's a new parameter it's fully backward compatible. @reta do you see any reason not to backport?

No, I don't see any reasons not to backport, it is good change I think.

hauck-jvsh added a commit to hauck-jvsh/OpenSearch that referenced this pull request Jul 28, 2022
…earch-project#3893)

* opensearch-project#3842 adds a new parameter to the highlighter, the max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>

* Adds a test for the new parameter

Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the test add in the previous commit;

Signed-off-by: Hauck <joaoh14@gmail.com>

* This was checking against the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only runs the test for the correct version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips the test in Elasticsearch as well;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove elastic 3.0 to test

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips all versions

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove unnecessary fields as pointed by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* Compute if fieldMaxAnalyzedIsNotValid in the constructor as suggest by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* As discussed, it is better to throws different exceptions for when the fieldMaxAnalyzed is not valid and for when it is disabled;

Signed-off-by: Hauck <joaoh14@gmail.com>

* hint what to do to allow highlight of bigger documents
Signed-off-by: Hauck <joaoh14@gmail.com>

* Let the user define the new parameter globally for all fields highlighted

Signed-off-by: Hauck <joaoh14@gmail.com>

* Change the  fieldMaxAnalyzedOffset Integer in order to use null when it is absent in highlight. This allows the error messages to much more precise, showing invalid for all negative numbers;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update javadocs and implements the stream methods for the new fields;

Signed-off-by: Hauck <joaoh14@gmail.com>

* builder.field do not accept null, so check before calling the method is necessary

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only send and read the new fields if the version supports it

Signed-off-by: Hauck <joaoh14@gmail.com>

* the previous commit was checking the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Check for version 3.0.0 instead of current version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update server/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java

Co-authored-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>

* Execute the test after version 3.0.0

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
(cherry picked from commit 931813f)
Signed-off-by: Hauck <joaoh14@gmail.com>
reta added a commit that referenced this pull request Jul 28, 2022
…hlighter (#4031)

* Adds a new parameter, max_analyzer_offset, for the highlighter (#3893)

* #3842 adds a new parameter to the highlighter, the max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>

* Adds a test for the new parameter

Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the test add in the previous commit;

Signed-off-by: Hauck <joaoh14@gmail.com>

* This was checking against the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only runs the test for the correct version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips the test in Elasticsearch as well;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove elastic 3.0 to test

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips all versions

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove unnecessary fields as pointed by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* Compute if fieldMaxAnalyzedIsNotValid in the constructor as suggest by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* As discussed, it is better to throws different exceptions for when the fieldMaxAnalyzed is not valid and for when it is disabled;

Signed-off-by: Hauck <joaoh14@gmail.com>

* hint what to do to allow highlight of bigger documents
Signed-off-by: Hauck <joaoh14@gmail.com>

* Let the user define the new parameter globally for all fields highlighted

Signed-off-by: Hauck <joaoh14@gmail.com>

* Change the  fieldMaxAnalyzedOffset Integer in order to use null when it is absent in highlight. This allows the error messages to much more precise, showing invalid for all negative numbers;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update javadocs and implements the stream methods for the new fields;

Signed-off-by: Hauck <joaoh14@gmail.com>

* builder.field do not accept null, so check before calling the method is necessary

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only send and read the new fields if the version supports it

Signed-off-by: Hauck <joaoh14@gmail.com>

* the previous commit was checking the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Check for version 3.0.0 instead of current version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update server/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java

Co-authored-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>

* Execute the test after version 3.0.0

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
(cherry picked from commit 931813f)
Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the version tests

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
dblock pushed a commit that referenced this pull request Nov 8, 2022
* #3842 adds a new parameter to the highlighter, the max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>

* Adds a test for the new parameter

Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the test add in the previous commit;

Signed-off-by: Hauck <joaoh14@gmail.com>

* This was checking against the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only runs the test for the correct version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips the test in Elasticsearch as well;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove elastic 3.0 to test

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips all versions

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove unnecessary fields as pointed by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* Compute if fieldMaxAnalyzedIsNotValid in the constructor as suggest by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* As discussed, it is better to throws different exceptions for when the fieldMaxAnalyzed is not valid and for when it is disabled;

Signed-off-by: Hauck <joaoh14@gmail.com>

* hint what to do to allow highlight of bigger documents
Signed-off-by: Hauck <joaoh14@gmail.com>

* Let the user define the new parameter globally for all fields highlighted

Signed-off-by: Hauck <joaoh14@gmail.com>

* Change the  fieldMaxAnalyzedOffset Integer in order to use null when it is absent in highlight. This allows the error messages to much more precise, showing invalid for all negative numbers;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update javadocs and implements the stream methods for the new fields;

Signed-off-by: Hauck <joaoh14@gmail.com>

* builder.field do not accept null, so check before calling the method is necessary

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only send and read the new fields if the version supports it

Signed-off-by: Hauck <joaoh14@gmail.com>

* the previous commit was checking the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Check for version 3.0.0 instead of current version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update server/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java

Co-authored-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>

* Execute the test after version 3.0.0

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
(cherry picked from commit 931813f)
dblock pushed a commit that referenced this pull request Nov 8, 2022
* #3842 adds a new parameter to the highlighter, the max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>

* Adds a test for the new parameter

Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the test add in the previous commit;

Signed-off-by: Hauck <joaoh14@gmail.com>

* This was checking against the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only runs the test for the correct version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips the test in Elasticsearch as well;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove elastic 3.0 to test

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips all versions

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove unnecessary fields as pointed by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* Compute if fieldMaxAnalyzedIsNotValid in the constructor as suggest by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* As discussed, it is better to throws different exceptions for when the fieldMaxAnalyzed is not valid and for when it is disabled;

Signed-off-by: Hauck <joaoh14@gmail.com>

* hint what to do to allow highlight of bigger documents
Signed-off-by: Hauck <joaoh14@gmail.com>

* Let the user define the new parameter globally for all fields highlighted

Signed-off-by: Hauck <joaoh14@gmail.com>

* Change the  fieldMaxAnalyzedOffset Integer in order to use null when it is absent in highlight. This allows the error messages to much more precise, showing invalid for all negative numbers;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update javadocs and implements the stream methods for the new fields;

Signed-off-by: Hauck <joaoh14@gmail.com>

* builder.field do not accept null, so check before calling the method is necessary

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only send and read the new fields if the version supports it

Signed-off-by: Hauck <joaoh14@gmail.com>

* the previous commit was checking the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Check for version 3.0.0 instead of current version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update server/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java

Co-authored-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>

* Execute the test after version 3.0.0

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
(cherry picked from commit 931813f)
@JekoTarallo
Copy link

May I ask if this fix was introduced in OpenSearch 2.5? Thanks in advance.

@reta
Copy link
Collaborator

reta commented Jul 26, 2023

@JekoTarallo

May I ask if this fix was introduced in OpenSearch 2.5? Thanks in advance.

As per issue labels, it should be available from 2.2.0 and above

@JekoTarallo
Copy link

Unfortunately I'm having this issue on both clusters of OpenSearch 2.5 and 2.7 on AWS, this is the error message that appears on Discover section when using whatever query:

illegal_argument_exception
The length of [message] field of [23595] doc of [index-test-log] index has exceeded [1000000] - maximum allowed to be analyzed for highlighting. This maximum can be set by changing the [index.highlight.max_analyzed_offset] index level setting. For large texts, indexing with offsets or term vectors is recommended!

@hauck-jvsh
Copy link
Contributor Author

In your search query, are you passing the option "max_analyzer_offset": 100000?

@JekoTarallo
Copy link

I'm just doing a classic query:
immagine

When the query bar is empty the records are loaded, if I search anything I receive the error I mentioned before.

@sharebear
Copy link

Stupid question perhaps, but do I understand correctly that this is a setting that needs to be set per-query? How do we ensure this value is set when executing queries from OpenSearch Dasbhoards?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch community Issues raised by community members and contributors enhancement Enhancement or improvement to existing feature or request Indexing & Search
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adds option to do not abort the query when highlighting
8 participants