Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds option to do not abort the query when highlighting #3842

Closed
hauck-jvsh opened this issue Jul 11, 2022 · 3 comments · Fixed by #3893
Closed

Adds option to do not abort the query when highlighting #3842

hauck-jvsh opened this issue Jul 11, 2022 · 3 comments · Fixed by #3893
Labels
enhancement Enhancement or improvement to existing feature or request feedback needed Issue or PR needs feedback Indexing & Search v2.2.0 v3.0.0 Issues and PRs related to version 3.0.0

Comments

@hauck-jvsh
Copy link
Contributor

I have an index with very large texts inside a field, and when a search with highlighting is made against it, the following error is returned:

The length of [content] field of [106071] doc of [arquivos-dre-040-duplorisco] index has exceeded [1000000] - maximum allowed to be analyzed for highlighting. This maximum can be set by changing the [index.highlight.max_analyzed_offset] index level setting. For large texts, indexing with offsets or term vectors is recommended!

I think that it will be better if the query can execute and only the highlight is disabled. Another option is to add an max_analyzed_offset option to the highlighter and ignore all after this option similar to what is done in the current Elasticsearch version.

@hauck-jvsh hauck-jvsh added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 11, 2022
@mch2 mch2 added feedback needed Issue or PR needs feedback Indexing & Search and removed untriaged labels Jul 11, 2022
@hauck-jvsh
Copy link
Contributor Author

Do you guys needs more info? I can link the Elasticsearch issue here or it is forbidden?

@dblock
Copy link
Member

dblock commented Jul 12, 2022

You should feel free to link issues. If you want to contribute a fix, and if ES has fixed this issue, please feel free to reimplement and confirm that you have not been looking at any non-APLv2-compatible-licensed code.

@hauck-jvsh
Copy link
Contributor Author

This issue was fixed in Elasticsearch 7.12, the issue and the pull request is available in the links bellow:

elastic/elasticsearch#52155
elastic/elasticsearch#69016

hauck-jvsh added a commit to hauck-jvsh/OpenSearch that referenced this issue Jul 13, 2022
…max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.
hauck-jvsh added a commit to hauck-jvsh/OpenSearch that referenced this issue Jul 15, 2022
…max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>
andrross pushed a commit that referenced this issue Jul 22, 2022
* #3842 adds a new parameter to the highlighter, the max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>

* Adds a test for the new parameter

Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the test add in the previous commit;

Signed-off-by: Hauck <joaoh14@gmail.com>

* This was checking against the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only runs the test for the correct version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips the test in Elasticsearch as well;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove elastic 3.0 to test

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips all versions

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove unnecessary fields as pointed by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* Compute if fieldMaxAnalyzedIsNotValid in the constructor as suggest by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* As discussed, it is better to throws different exceptions for when the fieldMaxAnalyzed is not valid and for when it is disabled;

Signed-off-by: Hauck <joaoh14@gmail.com>

* hint what to do to allow highlight of bigger documents
Signed-off-by: Hauck <joaoh14@gmail.com>

* Let the user define the new parameter globally for all fields highlighted

Signed-off-by: Hauck <joaoh14@gmail.com>

* Change the  fieldMaxAnalyzedOffset Integer in order to use null when it is absent in highlight. This allows the error messages to much more precise, showing invalid for all negative numbers;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update javadocs and implements the stream methods for the new fields;

Signed-off-by: Hauck <joaoh14@gmail.com>

* builder.field do not accept null, so check before calling the method is necessary

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only send and read the new fields if the version supports it

Signed-off-by: Hauck <joaoh14@gmail.com>

* the previous commit was checking the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Check for version 3.0.0 instead of current version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update server/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java

Co-authored-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>

* Execute the test after version 3.0.0

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
opensearch-trigger-bot bot pushed a commit that referenced this issue Jul 22, 2022
* #3842 adds a new parameter to the highlighter, the max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>

* Adds a test for the new parameter

Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the test add in the previous commit;

Signed-off-by: Hauck <joaoh14@gmail.com>

* This was checking against the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only runs the test for the correct version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips the test in Elasticsearch as well;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove elastic 3.0 to test

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips all versions

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove unnecessary fields as pointed by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* Compute if fieldMaxAnalyzedIsNotValid in the constructor as suggest by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* As discussed, it is better to throws different exceptions for when the fieldMaxAnalyzed is not valid and for when it is disabled;

Signed-off-by: Hauck <joaoh14@gmail.com>

* hint what to do to allow highlight of bigger documents
Signed-off-by: Hauck <joaoh14@gmail.com>

* Let the user define the new parameter globally for all fields highlighted

Signed-off-by: Hauck <joaoh14@gmail.com>

* Change the  fieldMaxAnalyzedOffset Integer in order to use null when it is absent in highlight. This allows the error messages to much more precise, showing invalid for all negative numbers;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update javadocs and implements the stream methods for the new fields;

Signed-off-by: Hauck <joaoh14@gmail.com>

* builder.field do not accept null, so check before calling the method is necessary

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only send and read the new fields if the version supports it

Signed-off-by: Hauck <joaoh14@gmail.com>

* the previous commit was checking the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Check for version 3.0.0 instead of current version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update server/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java

Co-authored-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>

* Execute the test after version 3.0.0

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
(cherry picked from commit 931813f)
hauck-jvsh added a commit to hauck-jvsh/OpenSearch that referenced this issue Jul 28, 2022
…earch-project#3893)

* opensearch-project#3842 adds a new parameter to the highlighter, the max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>

* Adds a test for the new parameter

Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the test add in the previous commit;

Signed-off-by: Hauck <joaoh14@gmail.com>

* This was checking against the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only runs the test for the correct version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips the test in Elasticsearch as well;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove elastic 3.0 to test

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips all versions

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove unnecessary fields as pointed by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* Compute if fieldMaxAnalyzedIsNotValid in the constructor as suggest by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* As discussed, it is better to throws different exceptions for when the fieldMaxAnalyzed is not valid and for when it is disabled;

Signed-off-by: Hauck <joaoh14@gmail.com>

* hint what to do to allow highlight of bigger documents
Signed-off-by: Hauck <joaoh14@gmail.com>

* Let the user define the new parameter globally for all fields highlighted

Signed-off-by: Hauck <joaoh14@gmail.com>

* Change the  fieldMaxAnalyzedOffset Integer in order to use null when it is absent in highlight. This allows the error messages to much more precise, showing invalid for all negative numbers;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update javadocs and implements the stream methods for the new fields;

Signed-off-by: Hauck <joaoh14@gmail.com>

* builder.field do not accept null, so check before calling the method is necessary

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only send and read the new fields if the version supports it

Signed-off-by: Hauck <joaoh14@gmail.com>

* the previous commit was checking the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Check for version 3.0.0 instead of current version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update server/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java

Co-authored-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>

* Execute the test after version 3.0.0

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
(cherry picked from commit 931813f)
Signed-off-by: Hauck <joaoh14@gmail.com>
@reta reta added v3.0.0 Issues and PRs related to version 3.0.0 v2.2.0 labels Jul 28, 2022
reta added a commit that referenced this issue Jul 28, 2022
…hlighter (#4031)

* Adds a new parameter, max_analyzer_offset, for the highlighter (#3893)

* #3842 adds a new parameter to the highlighter, the max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>

* Adds a test for the new parameter

Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the test add in the previous commit;

Signed-off-by: Hauck <joaoh14@gmail.com>

* This was checking against the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only runs the test for the correct version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips the test in Elasticsearch as well;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove elastic 3.0 to test

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips all versions

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove unnecessary fields as pointed by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* Compute if fieldMaxAnalyzedIsNotValid in the constructor as suggest by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* As discussed, it is better to throws different exceptions for when the fieldMaxAnalyzed is not valid and for when it is disabled;

Signed-off-by: Hauck <joaoh14@gmail.com>

* hint what to do to allow highlight of bigger documents
Signed-off-by: Hauck <joaoh14@gmail.com>

* Let the user define the new parameter globally for all fields highlighted

Signed-off-by: Hauck <joaoh14@gmail.com>

* Change the  fieldMaxAnalyzedOffset Integer in order to use null when it is absent in highlight. This allows the error messages to much more precise, showing invalid for all negative numbers;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update javadocs and implements the stream methods for the new fields;

Signed-off-by: Hauck <joaoh14@gmail.com>

* builder.field do not accept null, so check before calling the method is necessary

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only send and read the new fields if the version supports it

Signed-off-by: Hauck <joaoh14@gmail.com>

* the previous commit was checking the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Check for version 3.0.0 instead of current version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update server/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java

Co-authored-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>

* Execute the test after version 3.0.0

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
(cherry picked from commit 931813f)
Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the version tests

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
dblock pushed a commit that referenced this issue Nov 8, 2022
* #3842 adds a new parameter to the highlighter, the max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>

* Adds a test for the new parameter

Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the test add in the previous commit;

Signed-off-by: Hauck <joaoh14@gmail.com>

* This was checking against the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only runs the test for the correct version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips the test in Elasticsearch as well;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove elastic 3.0 to test

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips all versions

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove unnecessary fields as pointed by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* Compute if fieldMaxAnalyzedIsNotValid in the constructor as suggest by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* As discussed, it is better to throws different exceptions for when the fieldMaxAnalyzed is not valid and for when it is disabled;

Signed-off-by: Hauck <joaoh14@gmail.com>

* hint what to do to allow highlight of bigger documents
Signed-off-by: Hauck <joaoh14@gmail.com>

* Let the user define the new parameter globally for all fields highlighted

Signed-off-by: Hauck <joaoh14@gmail.com>

* Change the  fieldMaxAnalyzedOffset Integer in order to use null when it is absent in highlight. This allows the error messages to much more precise, showing invalid for all negative numbers;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update javadocs and implements the stream methods for the new fields;

Signed-off-by: Hauck <joaoh14@gmail.com>

* builder.field do not accept null, so check before calling the method is necessary

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only send and read the new fields if the version supports it

Signed-off-by: Hauck <joaoh14@gmail.com>

* the previous commit was checking the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Check for version 3.0.0 instead of current version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update server/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java

Co-authored-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>

* Execute the test after version 3.0.0

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
(cherry picked from commit 931813f)
dblock pushed a commit that referenced this issue Nov 8, 2022
* #3842 adds a new parameter to the highlighter, the max_analyzer_offset. When this parameter is provided the highlight stops in its value. This prevents the highlighter to go beyond the index maxAnalyzedOffset.

Signed-off-by: Hauck <joaoh14@gmail.com>

* Adds a test for the new parameter

Signed-off-by: Hauck <joaoh14@gmail.com>

* Fix the test add in the previous commit;

Signed-off-by: Hauck <joaoh14@gmail.com>

* This was checking against the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only runs the test for the correct version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips the test in Elasticsearch as well;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove elastic 3.0 to test

Signed-off-by: Hauck <joaoh14@gmail.com>

* Skips all versions

Signed-off-by: Hauck <joaoh14@gmail.com>

* Remove unnecessary fields as pointed by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* Compute if fieldMaxAnalyzedIsNotValid in the constructor as suggest by @reta

Signed-off-by: Hauck <joaoh14@gmail.com>

* As discussed, it is better to throws different exceptions for when the fieldMaxAnalyzed is not valid and for when it is disabled;

Signed-off-by: Hauck <joaoh14@gmail.com>

* hint what to do to allow highlight of bigger documents
Signed-off-by: Hauck <joaoh14@gmail.com>

* Let the user define the new parameter globally for all fields highlighted

Signed-off-by: Hauck <joaoh14@gmail.com>

* Change the  fieldMaxAnalyzedOffset Integer in order to use null when it is absent in highlight. This allows the error messages to much more precise, showing invalid for all negative numbers;

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update javadocs and implements the stream methods for the new fields;

Signed-off-by: Hauck <joaoh14@gmail.com>

* builder.field do not accept null, so check before calling the method is necessary

Signed-off-by: Hauck <joaoh14@gmail.com>

* Only send and read the new fields if the version supports it

Signed-off-by: Hauck <joaoh14@gmail.com>

* the previous commit was checking the wrong field

Signed-off-by: Hauck <joaoh14@gmail.com>

* Check for version 3.0.0 instead of current version

Signed-off-by: Hauck <joaoh14@gmail.com>

* Update server/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java

Co-authored-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Hauck <joaoh14@gmail.com>

* Execute the test after version 3.0.0

Signed-off-by: Hauck <joaoh14@gmail.com>

Co-authored-by: Andriy Redko <drreta@gmail.com>
(cherry picked from commit 931813f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request feedback needed Issue or PR needs feedback Indexing & Search v2.2.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants