-
Notifications
You must be signed in to change notification settings - Fork 181
Add option to use LakeFormation in S3Glue data source. #2624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@asuresh8 can you update the docs over here with the new property: https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst |
spark/src/main/java/org/opensearch/sql/spark/asyncquery/model/SparkSubmitParameters.java
Outdated
Show resolved
Hide resolved
spark/src/test/java/org/opensearch/sql/spark/dispatcher/SparkQueryDispatcherTest.java
Show resolved
Hide resolved
60a713f to
bd72c12
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2624 +/- ##
=========================================
Coverage 95.37% 95.37%
Complexity 5133 5133
=========================================
Files 490 490
Lines 14431 14434 +3
Branches 971 971
=========================================
+ Hits 13763 13766 +3
Misses 643 643
Partials 25 25
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
dai-chen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new config is configurable for open source user, right? Could you update user manual if so, maybe here https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst?
| properties.put( | ||
| "glue.auth.role_arn", "arn:aws:iam::924196221507:role/FlintOpensearchServiceRole"); | ||
| properties.put( | ||
| "glue.indexstore.opensearch.uri", | ||
| "https://search-flint-dp-benchmark-cf5crj5mj2kfzvgwdeynkxnefy.eu-west-1.es.amazonaws.com"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these account and domain info are internal? Can we replace with fake one for test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these values are used for all tests. I can update as part of a follow-up PR.
|
As discussed earlier, we may need to disable covering index rewrite if LF enabled (until better solution): https://github.com/opensearch-project/opensearch-spark/pull/318/files#diff-cf86f425a16363ac48fd59d28648420bc5c961b1b472edfb54075360633be338R136 |
Signed-off-by: Adi Suresh <adsuresh@amazon.com>
Done
Done |
Thanks a lot! Will get my PR merged soon. |
corrected formatting issue. Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>
* Add option to use LakeFormation in S3Glue data source. Signed-off-by: Adi Suresh <adsuresh@amazon.com> * Update s3glue_connector.rst corrected formatting issue. Signed-off-by: Vamsi Manohar <reddyvam@amazon.com> --------- Signed-off-by: Adi Suresh <adsuresh@amazon.com> Signed-off-by: Vamsi Manohar <reddyvam@amazon.com> Co-authored-by: Vamsi Manohar <reddyvam@amazon.com> (cherry picked from commit ea08c8f) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add option to use LakeFormation in S3Glue data source. * Update s3glue_connector.rst corrected formatting issue. --------- (cherry picked from commit ea08c8f) Signed-off-by: Adi Suresh <adsuresh@amazon.com> Signed-off-by: Vamsi Manohar <reddyvam@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Vamsi Manohar <reddyvam@amazon.com>
Description
Enables OpenSearch users to use LakeFormation option when querying the S3Glue data source.
This is important because some Glue tables are managed by LakeFormation. The customer will need to pass in a property for LakeFormation in order to not break backwards compataibility.
Note that this feature is launching in EMR serverless next month, so an error is thrown. Adding it in now so it's in the next release which is in line with the launch of this feature on EMR serverless.
Testing
Built locally with 2.13 as version and then copied over to cluster running 2.13.
Validating that property is read and correct configuration is added to Spark config
Validating backwards compatibility
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.