Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Highlight Support in PPL and Wildcard in SQL and PPL #110

Merged
merged 1 commit into from
Sep 8, 2022

Conversation

forestmvey
Copy link

@forestmvey forestmvey commented Aug 23, 2022

Description

Support the highlight function in PPL. Add support for wildcard in SQL and PPL.

Issues Resolved

Issue: github:636

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov
Copy link

codecov bot commented Aug 23, 2022

Codecov Report

Merging #110 (43b841d) into integ-highlight-in-ppl (8a7b329) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@                     Coverage Diff                      @@
##             integ-highlight-in-ppl     #110      +/-   ##
============================================================
+ Coverage                     94.85%   94.88%   +0.02%     
- Complexity                     2910     2925      +15     
============================================================
  Files                           288      288              
  Lines                          7830     7875      +45     
  Branches                        570      575       +5     
============================================================
+ Hits                           7427     7472      +45     
  Misses                          349      349              
  Partials                         54       54              
Flag Coverage Δ
query-workbench 62.76% <ø> (ø)
sql-engine 97.81% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ensearch/sql/expression/ExpressionNodeVisitor.java 100.00% <ø> (ø)
...h/sql/expression/function/OpenSearchFunctions.java 100.00% <ø> (ø)
...ain/java/org/opensearch/sql/analysis/Analyzer.java 100.00% <100.00%> (ø)
...rg/opensearch/sql/analysis/ExpressionAnalyzer.java 100.00% <100.00%> (ø)
...search/sql/planner/physical/HighlightOperator.java 100.00% <100.00%> (ø)
.../sql/planner/physical/PhysicalPlanNodeVisitor.java 100.00% <100.00%> (ø)
...ecutor/protector/OpenSearchExecutionProtector.java 100.00% <100.00%> (ø)
...search/sql/opensearch/storage/OpenSearchIndex.java 100.00% <100.00%> (ø)
...java/org/opensearch/sql/ppl/parser/AstBuilder.java 100.00% <100.00%> (ø)
...pensearch/sql/ppl/parser/AstExpressionBuilder.java 100.00% <100.00%> (ø)
... and 1 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@forestmvey forestmvey requested a review from a team August 23, 2022 16:49
@forestmvey forestmvey marked this pull request as ready for review August 23, 2022 16:53
@forestmvey forestmvey changed the title Add Highlight Support in PPL Add Highlight Support in PPL and Wildcard in SQL and PPL Sep 1, 2022
// In the event of multiple returned highlights and wildcard being
// used in conjunction with other highlight calls, we need to ensure
// only wildcard regex matching is mapped to wildcard call.
if (StringUtils.unquoteText(highlight.toString()).matches("(.+\\*)|(\\*.+)")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this regex only matches strings begining or ending with *?
I.e., will it match B*o*d*y?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not match B*o*d*y, yes it must start or end with a star. As well in your example opensearch would not return any value so this if statement would fail the and section of this if statement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to double-check that. From looking at the code, OpenSearch does support * in the middle of the expression.

Highlight eventually uses HighlightPhase, which uses OpenSearch's Regex class which supports something like B*o*d*y. It was not the best example, but I think B*dy will match.

@MaxKsyunz
Copy link

@forestmvey I'd like to have a second look at Object.toString usages.

I understand that in context it provides required data, but semantically toString is intended for something else. This is what the docs have to say about it.

@forestmvey
Copy link
Author

Can you please be more specific and give an example of what you would like revised.

private Pair<String, ExprValue> mapHighlight(Environment<Expression, ExprValue> env) {
String osHighlightKey = "_highlight";
if (!highlight.toString().contains("*")) {
osHighlightKey += "." + StringUtils.unquoteText(highlight.toString());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringUtils.unquoteText(highlight.toString()) is the same as highlight.valueOf(null).string().
It is also used here several times and can be calculated once and stored in a variable.

@MaxKsyunz
Copy link

MaxKsyunz commented Sep 6, 2022

Can you please be more specific and give an example of what you would like revised.

This is the part I was thinking of.

@MaxKsyunz
Copy link

Also, based on this comment, it'd be better if HighlightOperator moves to :opensearch module.

That should also allow us to use OpenSearch's Regex class and make pattern matching consistent.

@MaxKsyunz MaxKsyunz self-requested a review September 6, 2022 19:27
@forestmvey forestmvey force-pushed the dev-highlight-in-ppl branch 3 times, most recently from 458bb34 to ca08438 Compare September 7, 2022 16:03
…t in SQL and PPL.

Signed-off-by: forestmvey <forestv@bitquilltech.com>
Comment on lines +94 to +106
// In the event of multiple returned highlights and wildcard being
// used in conjunction with other highlight calls, we need to ensure
// only wildcard regex matching is mapped to wildcard call.
if (highlightStr.contains("*") && value.type() == STRUCT) {
value = new ExprTupleValue(
new LinkedHashMap<String, ExprValue>(value.tupleValue()
.entrySet()
.stream()
.filter(s -> matchesHighlightRegex(s.getKey(), highlightStr))
.collect(Collectors.toMap(
e -> e.getKey(),
e -> e.getValue()))));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌟 I like the comment and the use of filter.

@forestmvey forestmvey merged commit 74dfcf3 into integ-highlight-in-ppl Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants