Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Array and ExprValue Parsing With Inner Hits #274

Merged

Conversation

forestmvey
Copy link

@forestmvey forestmvey commented Jun 8, 2023

Description

With the addition of the nested function support in the V2 engine the inner_hits portion of the OS response needs to be parsed for OS types. Previously arrays were not entirely parsed due to lack of support.

Functionality Changes

Parsing of each value in arrays needs to be supported for proper nested functionality. The following gives examples of the response formats when parsing arrays.


Object Type - Array Of Objects

Dataset:
"obj": [{"key": "val1"}, {"key": "val2"}]

SQL Query:

SELECT obj FROM objects;

V1 Engine Response:
[{'key': 'val1'},{'key': 'val2'}]

V2 Engine Response:
{'key': 'val1'}


Nested Type - Array Of Objects

Dataset:
"obj": [{"key": "val1"}, {"key": "val2"}]

SQL Query:

SELECT obj FROM nested_objects;

V1 and V2 Engine Response:
[{'key': 'val1'},{'key': 'val2'}]


Array Of Values (Nested only supports objects and not arrays of values):

Dataset:
"obj": ["val1", "val2"]

SQL Query:

SELECT obj FROM objects;

V1 Engine Response:
['val1','val2']

V2 Engine Response:
{'val1'}


Issues Resolved

Issue: 1686

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: forestmvey <forestv@bitquilltech.com>
Signed-off-by: forestmvey <forestv@bitquilltech.com>
@codecov
Copy link

codecov bot commented Jun 8, 2023

Codecov Report

Merging #274 (1d5f52f) into integ-support-inner-hits-exprvalue (691012d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@                          Coverage Diff                          @@
##             integ-support-inner-hits-exprvalue     #274   +/-   ##
=====================================================================
  Coverage                                 97.27%   97.28%           
- Complexity                                 4330     4354   +24     
=====================================================================
  Files                                       388      388           
  Lines                                     10807    10845   +38     
  Branches                                    761      769    +8     
=====================================================================
+ Hits                                      10513    10551   +38     
  Misses                                      287      287           
  Partials                                      7        7           
Flag Coverage Δ
sql-engine 97.28% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...earch/sql/opensearch/data/utils/ObjectContent.java 100.00% <100.00%> (ø)
...l/opensearch/data/utils/OpenSearchJsonContent.java 100.00% <100.00%> (ø)
...nsearch/data/value/OpenSearchExprValueFactory.java 100.00% <100.00%> (ø)
...ch/sql/opensearch/response/OpenSearchResponse.java 100.00% <100.00%> (ø)
...ensearch/storage/script/core/ExpressionScript.java 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link

@Yury-Fridlyand Yury-Fridlyand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
Can you add more integration tests? Array inside array? (is it possible in OpenSearch though?)
How it works with nested function? For example nested(a.b) where a and/or b is array of multiple objects.
Does it change behavior shown in opensearch-project#1300?

@forestmvey
Copy link
Author

forestmvey commented Jun 9, 2023

Nice! Can you add more integration tests? Array inside array? (is it possible in OpenSearch though?) How it works with nested function? For example nested(a.b) where a and/or b is array of multiple objects. Does it change behavior shown in opensearch-project#1300?

  • You cannot have an array of array values like this:
    "obj":[[3,3],[,3,4]]
  • The nested functions works the same as previously. It supports arrays of objects and nested arrays at the maximum depth. I'll add another unit test and IT test to cover object arrays within object arrays.

  • Yes this does change the behaviour of arrays in opensearch-project#1300. Previously the first index of the object arrays or value arrays was returned and now the whole array is returned. See the description for a clear example of the new vs old functionality. We need to decide if we want to push this new functionality forward or keep the old. The new functionality may be a step forward in array support required for this issue, but does not solve the problem.

I can probably add some logic to maintain the old functionality for non inner hits parsing, but should we?

I am adding logic to maintain the previous V2 array support!

@acarbonetto
Copy link

I can probably add some logic to maintain the old functionality for non inner hits parsing, but should we?

Would this be a breaking change for arrays support?

Right now, we have the following behaviours:

  1. For array of objects or primitives, only the first item of the array is returned (V2 engine only).
  2. For array of objects or primitives, the entire array is returned as an array (V1 and PPL engine).
  3. For nest objects using the nested() function, the entire array is flattened (cartesian product - V1 and V2 engine). This behaviour should also be included in PPL.
  4. Note: there is currently no way to flatten arrays of objects or primitives.

My preference is to reserve 4. logic to partiQL syntax. Here's my thoughts:

Using PartiQL search would be the 'common way' to flatten (using cartesian product) as the syntax and logic is the most relational. PartiQL syntax would define a join between multiple 'tables' (ie, join between the main index and any inner hits of a nested, join, or any array's object). The nested() function in a SELECT already does this, and same-table joins using the has_parent or has_child relation would also produce inner_hits.

We don't have a way for V2 engine to behave like PPL and return the json array object - and maybe that's fine for now. If there is interest, we could investigate ways to use flat-type to support this functionality.

@Yury-Fridlyand
Copy link

Yury-Fridlyand commented Jun 9, 2023

Probably, you had a typo. Can you test your code with this sample?
This works:

POST dbg/_doc
{
  "obj": [[1, 2], [3, 4], 5]
}

search result:

{
  "took" : 361,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "dbg",
        "_id" : "uDBnoYgB5NyEnr3HyanK",
        "_score" : 1.0,
        "_source" : {
          "obj" : [
            [
              1,
              2
            ],
            [
              3,
              4
            ],
            5
          ]
        }
      }
    ]
  }
}

mapping:

{
  "dbg" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "obj" : {
          "type" : "long"
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1686168574104",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "LN-lYUQFSsi9MVpa5VT-Zg",
        "version" : {
          "created" : "136297827"
        },
        "provided_name" : "dbg"
      }
    }
  }
}

Signed-off-by: forestmvey <forestv@bitquilltech.com>
Signed-off-by: forestmvey <forestv@bitquilltech.com>
…arrays of non-object type with inner_hits.

Signed-off-by: forestmvey <forestv@bitquilltech.com>
…them.

Signed-off-by: forestmvey <forestv@bitquilltech.com>
…ponse.

Signed-off-by: forestmvey <forestv@bitquilltech.com>
Copy link

@MaxKsyunz MaxKsyunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I noted a couple small improvements.

Signed-off-by: forestmvey <forestv@bitquilltech.com>
@forestmvey forestmvey merged commit 644ef89 into integ-support-inner-hits-exprvalue Jun 14, 2023
@forestmvey forestmvey deleted the dev-support-inner-hits-exprvalue branch June 14, 2023 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants