-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MAINTENANCE] [MAINTENANCE] Add force_reuse_spark_context to DatasourceConfigSchema #3126
[MAINTENANCE] [MAINTENANCE] Add force_reuse_spark_context to DatasourceConfigSchema #3126
Conversation
…to DataSourceConfigSchema
👷 Deploy request for niobium-lead-7998 pending review. 🔨 Explore the source changes: a210a83 |
@talagluck here's the new attempt for the closed #2968 |
Thanks for re-opening this, @gipaetusb ! We will review over the next week and be in touch. |
This would be very useful to have! |
Thanks @mbakunze.
|
I mentioned this issue in the Slack channel - this seems to currently block us to use GE with spark on k8s (at least in the way we wanted to use it :) ). |
Thanks so much for the prompt, @mbakunze ! By any chance would you have time to take a look and see why the tests are failing here? Otherwise I can prioritize this for next week. |
I didn't understand why some tests failed when I glanced at it. But will try to take a look again. |
…spark-context-emr # Conflicts: # docs_rtd/changelog.rst
@gipaetusb I started working on fixing the tests in #3245 |
…emr' into maintenance/reuse-spark-context-emr
…emr' into maintenance/reuse-spark-context-emr
…emr' into maintenance/reuse-spark-context-emr
@mbakunze thank you very very much! |
…tenance/reuse-spark-context-emr
…etusb/great_expectations into maintenance/reuse-spark-context-emr
Hi @talagluck, thanks to @mbakunze this is finally ready to be merged :) ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much this contribution and for your patience, @gipaetusb and @mbakunze ! Great work - LGTM!
I've looked into this fix in my current setup, unfortunately if I changed my config from Next if I change it back to Here a reference of what my In this case I want a runtime batch following the directions laid out here -> https://discuss.greatexpectations.io/t/how-to-validate-spark-dataframes-in-0-13/582 I think the solution is to not only pass
This is what my
|
Nice - I guess @gipaetusb and I were still using the v2 approach. Great if we can get this working in v3 as well - I did not test it. |
Thanks for the feedback, @fep2! @mbakunze is exactly right here. This fix was just for V2. |
As much as I would love to contribute, I have other commitments I must prioritize, that said I think I'll open a bug ticket or feature ticket and reference the following. I think that way the need is address and hopefully this is something user would want sooner rather than later. |
That sounds great, thank you @fep2! |
With PR 2733 a parameter was added to
SparkDFDatasource
, to make GE's context reuse an existing Spark Context.This was extremely useful.
However, when using a dynamic Data Context configuration (e.g. in EMR) like
I've found that the
spark_config
andforce_reuse_spark_context
parameters weren't actually passed togeat_expectations.core.util.get_or_create_spark_application()
.In fact, the parameters are lost when the
DataContextConfig
object is dumped into a dictionary, because theDatasourceConfigSchema
(elem of theDataContextConfigSchema
) doesn't list these parameters.Changes proposed in this pull request:
spark_config
andforce_reuse_spark_context
to theDatasourceConfigSchema
Definition of Done
Please delete options that are not relevant.