Skip to content

Conversation

@asuresh8
Copy link
Contributor

Description

Enables OpenSearch users to use LakeFormation option when querying the S3Glue data source.
This is important because some Glue tables are managed by LakeFormation. The customer will need to pass in a property for LakeFormation in order to not break backwards compataibility.

Note that this feature is launching in EMR serverless next month, so an error is thrown. Adding it in now so it's in the next release which is in line with the launch of this feature on EMR serverless.

Testing

Built locally with 2.13 as version and then copied over to cluster running 2.13.

Validating that property is read and correct configuration is added to Spark config

curl --request POST  --url http://localhost:9200/_plugins/_query/_datasources   --header 'content-type: application/x-ndjson'   --data '{"name": "mygdc","description": "","connector": "S3GLUE","allowedRoles": [],"properties": {"glue.auth.type": "iam_role","glue.auth.role_arn": "arn:aws:iam::xxxxxxxxxxxx:role/flint-opensearch-execution-role","glue.indexstore.opensearch.uri": "http://ec2-xx-xx-xx-xx.compute-1.amazonaws.com:9200","glue.indexstore.opensearch.auth": "noauth", "glue.lakeformation.enabled": "true"}}'
curl --request  POST   --url http://localhost:9200/_plugins/_async_query   --header 'content-type: application/x-ndjson'   --data '{"datasource": "mygdc","lang": "sql","query": "SELECT * FROM mygdc.amazon_security_lake_glue_db_us_east_1.amazon_security_lake_table_us_east_1_vpc_flow_2_0 LIMIT 1"}'
{
  "status": 500,
  "error": {
    "type": "RuntimeException",
    "reason": "There was internal problem at backend",
    "details": "Internal Server Error."
  }
cat ~/tmp/opensearch.log 
com.amazonaws.services.emrserverless.model.ValidationException: Lake Formation configuration can only be specified at the application level, not at job run level. (Service: AWSEMRServerless; Status Code: 400; Error Code: ValidationException; Request ID: 28566859-cdfc-45fa-bb0a-21ad69d0a88a; Proxy: null)

Validating backwards compatibility

curl --request POST   --url http://localhost:9200/_plugins/_query/_datasources   --header 'content-type: application/x-ndjson'   --data '{"name": "mygdc2","description": "","connector": "S3GLUE","allowedRoles": [],"properties": {"glue.auth.type": "iam_role","glue.auth.role_arn": "arn:aws:iam::xxxxxxxxxx:role/flint-opensearch-execution-role","glue.indexstore.opensearch.uri": "http://ec2-xx-xx-xx-xx.compute-1.amazonaws.com:9200","glue.indexstore.opensearch.auth": "noauth"}}'
curl --request  POST   --url http://localhost:9200/_plugins/_async_query   --header 'content-type: application/x-ndjson'   --data '{"datasource": "mygdc2","lang": "sql","query": "SELECT * FROM mygdc2.amazon_security_lake_glue_db_us_east_1.amazon_security_lake_table_us_east_1_vpc_flow_2_0 LIMIT 1"}'
{
  "queryId": "b1d5WnhqYkdoMm15Z2RjMg==",
  "sessionId": "eFhDVk85ZUZMMG15Z2RjMg=="
}
curl --request  GET --url http://localhost:9200/_plugins/_async_query/b1d5WnhqYkdoMm15Z2RjMg==
{
  "status": "SUCCESS",
   ...
}

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@vamsimanohar
Copy link
Member

@asuresh8 asuresh8 force-pushed the lf_option branch 2 times, most recently from 60a713f to bd72c12 Compare April 23, 2024 01:42
@codecov
Copy link

codecov bot commented Apr 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.37%. Comparing base (2c97be7) to head (1c8998e).
Report is 1 commits behind head on main.

❗ Current head 1c8998e differs from pull request most recent head 08ccd67. Consider uploading reports for the commit 08ccd67 to get more accurate results

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #2624   +/-   ##
=========================================
  Coverage     95.37%   95.37%           
  Complexity     5133     5133           
=========================================
  Files           490      490           
  Lines         14431    14434    +3     
  Branches        971      971           
=========================================
+ Hits          13763    13766    +3     
  Misses          643      643           
  Partials         25       25           
Flag Coverage Δ
sql-engine 95.37% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new config is configurable for open source user, right? Could you update user manual if so, maybe here https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst?

Comment on lines +1115 to +1121
properties.put(
"glue.auth.role_arn", "arn:aws:iam::924196221507:role/FlintOpensearchServiceRole");
properties.put(
"glue.indexstore.opensearch.uri",
"https://search-flint-dp-benchmark-cf5crj5mj2kfzvgwdeynkxnefy.eu-west-1.es.amazonaws.com");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these account and domain info are internal? Can we replace with fake one for test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these values are used for all tests. I can update as part of a follow-up PR.

@dai-chen
Copy link
Collaborator

As discussed earlier, we may need to disable covering index rewrite if LF enabled (until better solution): https://github.com/opensearch-project/opensearch-spark/pull/318/files#diff-cf86f425a16363ac48fd59d28648420bc5c961b1b472edfb54075360633be338R136

Signed-off-by: Adi Suresh <adsuresh@amazon.com>
@asuresh8
Copy link
Contributor Author

This new config is configurable for open source user, right? Could you update user manual if so, maybe here https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst?

Done

As discussed earlier, we may need to disable covering index rewrite if LF enabled (until better solution):

Done

@dai-chen
Copy link
Collaborator

This new config is configurable for open source user, right? Could you update user manual if so, maybe here https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst?

Done

As discussed earlier, we may need to disable covering index rewrite if LF enabled (until better solution):

Done

Thanks a lot! Will get my PR merged soon.

corrected formatting issue.

Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>
@vamsimanohar vamsimanohar merged commit ea08c8f into opensearch-project:main Apr 29, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 29, 2024
* Add option to use LakeFormation in S3Glue data source.

Signed-off-by: Adi Suresh <adsuresh@amazon.com>

* Update s3glue_connector.rst

corrected formatting issue.

Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>

---------

Signed-off-by: Adi Suresh <adsuresh@amazon.com>
Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>
Co-authored-by: Vamsi Manohar <reddyvam@amazon.com>
(cherry picked from commit ea08c8f)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ps48 pushed a commit that referenced this pull request May 2, 2024
* Add option to use LakeFormation in S3Glue data source.



* Update s3glue_connector.rst

corrected formatting issue.



---------




(cherry picked from commit ea08c8f)

Signed-off-by: Adi Suresh <adsuresh@amazon.com>
Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Vamsi Manohar <reddyvam@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.x enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants