Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract value from map when the key is nested field alike #600

Conversation

kt-eliatra
Copy link
Contributor

@kt-eliatra kt-eliatra commented Aug 27, 2024

Description

Fixes bug where PPL cannot extract value when field is a map structure with key value pairs, but the key is like a nested JSON path.

It looks like dots in Spark are used for accessing nested fields in complex data types such as structs.
But it doesn't work when the keys are nested field alike. The person reporting the bug wrote that: SQL is able to do it by select unmapped['userIdentity.sessioncontext.sessionIssuer.type'].
The original Spark parser translates such expression to:

Project [
  UnresolvedAlias [
    UnresolvedExtractValue 
      child: [ UnresolvedAttribute[ "unmapped" ] ]
      extraction[ Literal[ "userIdentity.sessioncontext.sessionIssuer.type" ]]
  ]
  ...
]

I also used this approach in my solution.

Issues Resolved

#565

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Member

@YANG-DB YANG-DB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the command description in the README file

@YANG-DB
Copy link
Member

YANG-DB commented Aug 28, 2024

Also run the sbt scalafmtAll to fix errors:
[error] scalafmt: 1 files must be formatted (/home/runner/work/opensearch-spark/opensearch-spark/integ-test)

@kt-eliatra kt-eliatra marked this pull request as ready for review August 29, 2024 11:55
@salyh salyh force-pushed the map-with-keys-like-nested-fields branch from 9653201 to 64e2c52 Compare August 29, 2024 13:40
Copy link
Member

@YANG-DB YANG-DB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u plz add in the PR description what u'v done in this PR and explain how its solved the bug ?
thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe i'm missing something, is this to replace the actual mapped nested fields ?
why not adding new tests which covers this additional use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you describe what you are referring to? I don't get this

kt-eliatra and others added 4 commits August 30, 2024 18:58
Signed-off-by: Kacper Trochimiak <kacper.trochimiak@eliatra.com>
Signed-off-by: Kacper Trochimiak <kacper.trochimiak@eliatra.com>
Signed-off-by: Hendrik Saly <hendrik.saly@eliatra.com>
Signed-off-by: Hendrik Saly <hendrik.saly@eliatra.com>
@salyh salyh force-pushed the map-with-keys-like-nested-fields branch from 74fe1cc to 968ab8f Compare August 30, 2024 16:58
Signed-off-by: Kacper Trochimiak <kacper.trochimiak@eliatra.com>
Signed-off-by: Kacper Trochimiak <kacper.trochimiak@eliatra.com>
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate why we need to treat this Struct as a Map? I thought this is no difference than accessing a Struct column?

CREATE TABLE nested_test (
  age INT,
  user_data STRUCT<`user.first.name`:STRING>
)
USING JSON;

INSERT INTO nested_test
VALUES
( 30, STRUCT("Alice") ),
( 40, STRUCT("Bob") );

SELECT age
FROM nested_test
WHERE user_data.`user.first.name` = 'Bob';
---
40

@kt-eliatra
Copy link
Contributor Author

Could you elaborate why we need to treat this Struct as a Map? I thought this is no difference than accessing a Struct column?

CREATE TABLE nested_test (
age INT,
user_data STRUCT<user.first.name:STRING>
)
USING JSON;

INSERT INTO nested_test
VALUES
( 30, STRUCT("Alice") ),
( 40, STRUCT("Bob") );

SELECT age
FROM nested_test
WHERE user_data.user.first.name = 'Bob';

40

I think it doesn't matter whether this column is of Struct or Map type.

With tables defined as

CREATE TABLE nested_test (
  age INT,
  user_data STRUCT<`user.first.name`:STRING>
)
USING JSON;

INSERT INTO nested_test
VALUES
( 30, STRUCT("Alice") ),
( 40, STRUCT("Bob") );

and

CREATE TABLE nested_test2 (
  age INT,
  user_data MAP<STRING, STRING>
)
USING JSON;

INSERT INTO nested_test2
VALUES
( 30, MAP("user.first.name", "Alice") ),
( 40, MAP("user.first.name", "Bob") );

commands like:

select age from nested_test where user_data.`user.first.name` = 'Bob';
select age from nested_test where user_data['user.first.name'] = 'Bob';

select age from nested_test2 where user_data.`user.first.name` = 'Bob';
select age from nested_test2 where user_data['user.first.name'] = 'Bob';

return the same results.

Looks like ppl aready supports commands where nested field containing dots is enclosed in ` `
Do you think this is enough and this PR can be closed?

@dai-chen
Copy link
Collaborator

dai-chen commented Sep 4, 2024

Could you elaborate why we need to treat this Struct as a Map? I thought this is no difference than accessing a Struct column?
CREATE TABLE nested_test (
age INT,
user_data STRUCT<user.first.name:STRING>
)
USING JSON;
INSERT INTO nested_test
VALUES
( 30, STRUCT("Alice") ),
( 40, STRUCT("Bob") );

SELECT age

FROM nested_test
WHERE user_data.user.first.name = 'Bob';
40

I think it doesn't matter whether this column is of Struct or Map type.

With tables defined as

CREATE TABLE nested_test (
  age INT,
  user_data STRUCT<`user.first.name`:STRING>
)
USING JSON;

INSERT INTO nested_test
VALUES
( 30, STRUCT("Alice") ),
( 40, STRUCT("Bob") );

and

CREATE TABLE nested_test2 (
  age INT,
  user_data MAP<STRING, STRING>
)
USING JSON;

INSERT INTO nested_test2
VALUES
( 30, MAP("user.first.name", "Alice") ),
( 40, MAP("user.first.name", "Bob") );

commands like:

select age from nested_test where user_data.`user.first.name` = 'Bob';
select age from nested_test where user_data['user.first.name'] = 'Bob';

select age from nested_test2 where user_data.`user.first.name` = 'Bob';
select age from nested_test2 where user_data['user.first.name'] = 'Bob';

return the same results.

Looks like ppl aready supports commands where nested field containing dots is enclosed in Do you think this is enough and this PR can be closed?

Yes, that's my question. I don't quite understand why we add special syntax here. I thought existing syntax for delimiting identifier is sufficient in SQL and PPL. Let me know if I miss anything. Thanks!

@kt-eliatra
Copy link
Contributor Author

Yes, that's my question. I don't quite understand why we add special syntax here. I thought existing syntax for delimiting identifier is sufficient in SQL and PPL. Let me know if I miss anything. Thanks!

You're right. I just wanted to introduce syntax similar to the one the person reporting this issue used in the SQL query.
But maybe this complicates things unnecessarily, and it's enough to support the existing syntax. If so this PR can be closed.

@YANG-DB YANG-DB closed this Sep 13, 2024
@kt-eliatra kt-eliatra deleted the map-with-keys-like-nested-fields branch September 16, 2024 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants