Skip to content

Conversation

@qianheng-aws
Copy link
Collaborator

@qianheng-aws qianheng-aws commented Apr 23, 2025

Description

It supports parsing flattened document value by populating the ExprTupleValue recursively, so we can access the nested fields as well as the root fields as before.

Related Issues

Resolves #3477

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Heng Qian <qianheng@amazon.com>

/** Implements mergeTo by merging deeply */
@Override
public ExprTupleValue mergeTo(ExprValue base) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add UT

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*
* <p>If there is existing vale for the JsonPath, we need to merge the new value to the old.
*/
private void populateValueRecursive(ExprTupleValue result, JsonPath path, ExprValue value) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add UT

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Override
public void init() throws Exception {
super.init();
putDocument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yamlTest is more readable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or add this test data to resource folder and load it. It is not as simple as test index, adding the mapping and data to resource could be readable and reusable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@penghuo If we want to test this case w/wo Calcite enabled, in yamlTest, is there a convenient way to implement this with less code duplication?

In Java code, we can use inheritance to make it simple like the current code. But in yaml, we may need to duplicate the process with extra settings.

I've thought of using yamlTest but didn't have much information whether it can do such thing conveniently, so I keep using IT to do the verification.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a yaml test with only Calcite enabled.

}
}

protected void putDocument(String index, int id, String json) throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yamlTest support index any doc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be deleted now?

* @param base the target value to merge
* @return The merged value
*/
default ExprValue mergeTo(ExprValue base) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if ExprIpValue mergeTo ExprIntValue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it is replace. Could you comment in code what's the purpose of adding this mergeTo interface?

Copy link
Collaborator Author

@qianheng-aws qianheng-aws Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in the doc, it should override the base under the current implementation. Such case will only happen when there is duplication value for a field.

  1. In normal cases, only ExprTupleValue::mergeTo should be invoked. Cases like:
{"log.json.time": 100, "log.json": { "status": "SUCCESS" }}

For "log.json.time": 100, it will be finally populated into an ExprValue ExprTupleValue(Map.of(\"log\": ExprTupleValue(Map.of({\"json\": ExprTupleValue(Map.of({\"time\": 100}))}))

For "log.json": { "status": "SUCCESS" }, it will be parsed into path: log.json, value: ExprTupleValue(Map.of({\"status\": SUCCESS}))

So after populating the first field, there is key conflict when populating the second field. In this case, we need to invoke ExprTupleValue::mergeTo` to merge 2 ExprTupleValue deeply.

  1. In some edge cases where there is conflict value in customer's document, like:
{"log.json.status": "FAILED", "log.json": { "status": "SUCCESS" }}

There is value conflict for the field log.json.status, so I want the value added most recently to override any previously added values, and ExprValue::megeTo will be invoked.

But, in json, since the sequence of the fields in the same level are unstable actually, we cannot assert which value will be put firstly. Maybe throw exception to tell customer there is incompatible value conflict will also be an option. What's your opinion on this edge case? @LantaoJin @penghuo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw exception to tell customer there is incompatible value conflict will also be an option.

Do you mean throw exception in ppl querying or data ingesting? Data ingesting seems out of the scope here.

Copy link
Member

@LantaoJin LantaoJin Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, if there are conflict fields. Display the any of them would be all fine as long as the field populated out is fixed. For example:

{"id": 1, "log.json.status": "FAILED", "log.json": { "status": "SUCCESS" }}
{"id": 2, "log.json.status": "FAILED", "log.json": { "status": "SUCCESS" }}
{"id": 3, "log.json.status": "FAILED", "log.json": { "status": "SUCCESS" }}

PPL source=t | fields id, log.json.status return

1, FAILED
2, FAILED
3, FAILED

or

1, SUCCESS
2, SUCCESS
3, SUCCESS

are all acceptable. This could be treated as data quality issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for key conflict case?

Copy link
Collaborator Author

@qianheng-aws qianheng-aws Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is value conflict case will flakey because fields order are not ensured in json. We cannot assert which value will be put in the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in the UT of testPopulateValueRecursive, where I can control the populate order

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw exception to tell customer there is incompatible value conflict will also be an option. What's your opinion on this edge case?

do not throw exception, current approach looks good. OpenSearch also return the LATEST field.

###
POST {{baseUrl}}/test00001/_doc
Content-Type: application/x-ndjson

{
    "log.json.status": "failed1",
    "log.json": {
      "status": "failed2"
    },
    "log": {
      "json": {
        "status": "failed3"
      }
    },
}

###
GET {{baseUrl}}/test00001/_search
Content-Type: application/x-ndjson

{
    "fields": [
        "log.json.status"
    ],
    "_source": false
}

### Return 
        "fields": {
          "log.json.status": [
            "failed3"
          ]
        }

// Update the current ExprValue by using mergeTo if exists
result
.tupleValue()
.computeIfPresent(path.getRootPath(), (key, oldValue) -> value.mergeTo(oldValue));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what case is coverred by mergeTo, same field with different value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment [here](#3577 (comment))

@Override
public void init() throws Exception {
super.init();
putDocument(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or add this test data to resource folder and load it. It is not as simple as test index, adding the mapping and data to resource could be readable and reusable.

* @param base the target value to merge
* @return The merged value
*/
default ExprValue mergeTo(ExprValue base) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it is replace. Could you comment in code what's the purpose of adding this mergeTo interface?

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
@LantaoJin LantaoJin added the bug Something isn't working label Apr 25, 2025
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
LantaoJin
LantaoJin previously approved these changes Apr 27, 2025
@LantaoJin
Copy link
Member

LantaoJin commented Apr 27, 2025

@qianheng-aws could you change this log level from warn to debug in this PR? This PR will be merge to 3.0.0 GA
This warning log here will write out thousands of unnecessary logs when the data is a nested type.

Signed-off-by: Heng Qian <qianheng@amazon.com>
@LantaoJin
Copy link
Member

ping @penghuo @dai-chen

* @param base the target value to merge
* @return The merged value
*/
default ExprValue mergeTo(ExprValue base) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw exception to tell customer there is incompatible value conflict will also be an option. What's your opinion on this edge case?

do not throw exception, current approach looks good. OpenSearch also return the LATEST field.

###
POST {{baseUrl}}/test00001/_doc
Content-Type: application/x-ndjson

{
    "log.json.status": "failed1",
    "log.json": {
      "status": "failed2"
    },
    "log": {
      "json": {
        "status": "failed3"
      }
    },
}

###
GET {{baseUrl}}/test00001/_search
Content-Type: application/x-ndjson

{
    "fields": [
        "log.json.status"
    ],
    "_source": false
}

### Return 
        "fields": {
          "log.json.status": [
            "failed3"
          ]
        }

@penghuo penghuo merged commit 8d531e0 into opensearch-project:main Apr 28, 2025
23 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 28, 2025
* Support parsing documents with flattened value

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add UT and yamlTest

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix UT by using LinkedHashMap to ensure order

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Address comments

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix UT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Address comments

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Address comments

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix log level for JdbcOpenSearchDataTypeConvertor

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
(cherry picked from commit 8d531e0)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
penghuo pushed a commit that referenced this pull request Apr 28, 2025
* Support parsing documents with flattened value



* Add UT and yamlTest



* Refine code



* Refine code



* Refine code



* Fix UT by using LinkedHashMap to ensure order



* Address comments



* Fix UT



* Fix IT



* Address comments



* Address comments



* Fix log level for JdbcOpenSearchDataTypeConvertor



---------


(cherry picked from commit 8d531e0)

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ahkcs pushed a commit to ahkcs/sql that referenced this pull request Jun 10, 2025
* Support parsing documents with flattened value

Signed-off-by: Heng Qian <qianheng@amazon.com>

(cherry picked from commit 8d531e0)
ahkcs pushed a commit to ahkcs/sql that referenced this pull request Jun 10, 2025
* Support parsing documents with flattened value

Signed-off-by: Heng Qian <qianheng@amazon.com>
(cherry picked from commit 8d531e0)
ahkcs pushed a commit to ahkcs/sql that referenced this pull request Jun 10, 2025
* Support parsing documents with flattened value

Signed-off-by: Heng Qian <qianheng@amazon.com>

Signed-off-by: Kai Huang <ahkcs@amazon.com>
(cherry picked from commit 8d531e0)
penghuo pushed a commit that referenced this pull request Jun 16, 2025
* Support parsing documents with flattened value

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add UT and yamlTest

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix UT by using LinkedHashMap to ensure order

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Address comments

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix UT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Address comments

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Address comments

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix log level for JdbcOpenSearchDataTypeConvertor

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 3.0 bug Something isn't working v3.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] field name with dot is missing in output without fields command

3 participants