-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] fix reusing a sparkSession using the force_reuse_spark_context
keyword
#3245
[BUGFIX] fix reusing a sparkSession using the force_reuse_spark_context
keyword
#3245
Conversation
…to DataSourceConfigSchema
…spark-context-emr # Conflicts: # docs_rtd/changelog.rst
👷 Deploy request for niobium-lead-7998 pending review. 🔨 Explore the source changes: 1ac14a7 |
@gipaetusb I think the issue with the test came from the |
|
Looks like other PRs have the same issue (e.g. #3245). I think this is due to being a PR as stated 👆. Would be great to get some help to resolve that. I will try to add some unittests to avoid regression on the stuff from this PR. |
Thanks for this, @mbakunze ! I think you are right, and that these failing tests are unrelated to this PR. I've set the tests to re-run, and if they still fail, we'll figure out what's wrong. How would it make the most sense to merge this with #3126? Would it make sense for you to attempt to merge this into that branch instead? |
Hey @talagluck - I will try to coordinate with @gipaetusb as he did the real work. I just want to add a unittest and will then figure that out. |
…emr' into maintenance/reuse-spark-context-emr
…emr' into maintenance/reuse-spark-context-emr
@gipaetusb I propose you merge this into your PR and we close this PR here. Maybe we should also re-label this as a bug fix, which imo is a better fit. |
@mbakunze - do you want to try and merge into @gipaetusb's branch instead of |
@talagluck - just tried to create a PR in @gipaetusb 's branch w/o any luck. Looks like I do not have access to do that? 🤷♂️ FWIW: I am not fuzzed about how to get the changes in - I can also spend some time tomorrow to bring this PR here into shape. |
force_reuse_spark_context
keyword
@talagluck this is ready for review |
…emr' into maintenance/reuse-spark-context-emr
Had another quick look at the unittest and modified them to explicitly test the pass through of |
Closing this PR then! |
Changes proposed in this pull request
We are adding the
force_reuse_spark_context
andspark_config
to theDatasourceConfigSchema
in order to pass the values for these along. They seem to have previously been dropped when initiating aDataContext
with a spark source.This PR is based on PR #3126 (and would close it)- and there have been some discussions on Slack regarding this bug.
Context
The usage of
force_reuse_spark_context
is currently broken in a way that when using this keyword either via a.yml
or via dynamically generatedDataContext
the sparkSession still gets stopped by GE. The ketywords were not actually passed to the part of the code that was responisble to get or create the sparkSession. This in turn makes it impossible to reuse a sparkSession and use an in-memory dataframe as input for GE. The ability to reuse the sparkSession is important for use in EMR and spark on kubernetes.Checklist