-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MAINTENANCE] Add force_reuse_spark_context to DatasourceConfigSchema #2968
[MAINTENANCE] Add force_reuse_spark_context to DatasourceConfigSchema #2968
Conversation
✔️ Deploy Preview for knoxpod ready! 🔨 Explore the source changes: 227511e 🔍 Inspect the deploy log: https://app.netlify.com/sites/knoxpod/deploys/60ed73cafe66250007930236 😎 Browse the preview: https://deploy-preview-2968--knoxpod.netlify.app |
A new contributor, HUZZAH! Welcome and thanks for joining our community. In order to accept a pull request we require that all contributors sign our Contributor License Agreement. We have two different CLAs, depending on whether you are contributing to GE in a personal or professional capacity. Please sign the one that is applicable to your situation so that we may accept your contribution: Individual Contributor License Agreement v1.0 Once you have signed the CLA, you can add a comment with the text Please reach out to @kyleaton, on our Slack if you have any questions or if you have already signed the CLA and are receiving this message in error. Users missing a CLA: gio@gio |
@cla-bot check |
Hi @gipaetusb, thank you so much for the contribution - this is awesome! Just a few things:
|
…t-expectations#2956) * abstracted out shared spark additional notes
…n citations (great-expectations#2964) * refactor(expectationsuite): Add profiler_config to citations * test(fixture): Add Bobster config to improve tests * chore(misc): Move global YAML obj into fixture
The team was having issues getting the mssql docker image to initialize during Azure CI/CD. Some research into the matter revealed that without explictly stating a version for the image, the default of 'latest' is applied. Docker was unable to find this image (I believe it is to be deprecated on the MS side). Per Microsoft docs, pulling the image from mcr.microsoft.com and specifying the 2019-latest version is the recommended approach. All files that used this outdated image have been updated appropriately. Microsoft docs: https://docs.microsoft.com/en-us/sql/linux/quickstart-install-connect-docker?view=sql-server-ver15&pivots=cs1-bash
* feat(profiler): Add citations to Profiler.profile() * docs(profiler): Update citation msg and docstring * test(alice): Update alice fixture to include citations * test(bobby): Update bobby fixture to include citations
…ctations#2963) * Adding in GDOC-127 Moving over files from previous non checked in work * Updating the PR comments Updating for PR comments * Remove an obsolete info note Remove an obsolete info note * Addin stubs and sidebar change Addin stubs and sidebar change
* [DOCS] How to connect to a MySQL database * fixed broken link in next steps
…#2933) * Initial docs * Update documentation for rule based profiler * Linting * Updated tests in script runner * Updated to reflect develop * Updated per feedback * Addressed Alex's feedback * Some updates to structure and text * Updated core concepts * New tutorial and core-concepts format for docs * Incorporated feedback from PRs * Fix broken link * More text updates * Fixed typo * Fixed integration test
* [FEATURE] NumericMetricRangeMultiBatchParameterBuilder -- First MultiBatch Parameter Builder (great-expectations#2860) * Bobby use case: Expectation Suite for number of rows in active batch to be similar to that in previous batches. * Validator will no longer be passed to Rule Based Profiler. * Fixtures for the entire active table multi-batch use-case. * Elegant refactoring of metric computations. * Bootstrapped Estimator and Range Parameter Builder. * Robustness and Efficiency. * Update requirements for scipy (need new bootstrap API). * DocStrings. * Feature Maturity clause. * ChangeLog update.
…hout additional requirements* files (great-expectations#2982) * Fix requirements. * get documentation integration tests to pass
…xl for pandas >= 1.3.0 (great-expectations#2989) - [BUGFIX] Modify read_excel() to handle new optional-dependency openpyxl for pandas >= 1.3.0
* Updated diagram image * Updated profiler docs
…s#2986) * fix minor bug in parameter getter
…tations#2990) * clean up expection suite citations
…rsion 1.3.0 (great-expectations#2995) * Update old CLI tests in response to Pandas upgrade to version 1.3.0 on 7/2/2021.
…tations#2996) * Remove mostly from bobster test fixture and minor cleanup.
…t-expectations#2972) * Add in Contributing fields and updates Solves GDOC-140m GDOC-129, GDOC-123, GDOC-122 * Updating per comments Fixing some links and style * Update Updates * Fix the root and the glossary link Fix the root and the glossary link * Fix typo and heading Fix typo and heading * Update contributing-github.md remove a line Co-authored-by: James Campbell <james.p.campbell@gmail.com>
) The documentation page for instantiating a Data Context without a yaml file was missing an import, without which the code snippet wouldn't work. Adding the missing import in this change. This is the page concerned: https://docs.greatexpectations.io/en/latest/guides/how_to_guides/configuring_data_contexts/how_to_instantiate_a_data_context_without_a_yml_file.html Co-authored-by: talagluck <tal@superconductive.com>
Instrument test_yaml_config() and update Anonymizers
…to_contain_set (great-expectations#2997) * dates for expect_column_distinct_values_to_contain_set * support date in panda schema * add date to internal schema * add missing support for DATE types in SQL backends * lint util.py * fix typos
…or read_parquet (great-expectations#2506) * [BUGFIX] Fix issue where compression key was added to reader_method for read_parquet Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Update tests/execution_engine/test_pandas_execution_engine.py Co-authored-by: Luke Dyer <luke321321@gmail.com> * Update tests for "None" when param should be inferred Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Update test path->bytes Signed-off-by: James Campbell <james.p.campbell@gmail.com> Co-authored-by: Sam Bail <sam@superconductive.com> Co-authored-by: Luke Dyer <luke321321@gmail.com>
* [BUGFIX] fix check for null value * use-isinstance * fix * precommit * syntax tweak Co-authored-by: James Campbell <james.p.campbell@gmail.com> Co-authored-by: James Campbell <james.p.campbell@gmail.com>
* update header to match io * adjust cloud link * mobile header update * adjust community desktop modal width and position * adjust padding of li items in mobile menu and width Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
…erBuilder (great-expectations#3001) * Small fix for batch_filter (the tests already cover it). * Estimator of mostly using the precision measure. * Estimator of mostly using the precision measure. * Estimator of mostly using the precision measure. * docstring * Bootstrap estimator for NumericMetricRangeMultiBatchParameterBuilder needs to include both mean and std statistics. * merge * Use np.quantile for bootstrap statistic. * typo * optimization * pandas upgrade * Implement custom bootstrap and test. * cleanup * use quantiles and false_positive_rate uniformly * clean up * only recent numpy versions allowed for rule_based_profiler tests * Improve code readability Signed-off-by: James Campbell <james.p.campbell@gmail.com> Co-authored-by: James Campbell <james.p.campbell@gmail.com>
* Adding in url links and style Adding in url links and style * Fix broken link Fix a broken link * Update docs/contributing/contributing-test.md Co-authored-by: William Shin <will@superconductive.com> * Update docs/contributing/contributing-setup.md Co-authored-by: William Shin <will@superconductive.com> * Update docs/contributing/contributing.md Co-authored-by: William Shin <will@superconductive.com> * Update docs/contributing/contributing-checklist.md Co-authored-by: William Shin <will@superconductive.com> * Update correct links Co-authored-by: William Shin <will@superconductive.com>
…ations#2985) [DOCS] Getting Started - Clean Up and Integration Tests (great-expectations#2985)
* fix docusaurus lint issue
* validating data without a checkpoint * corrections for links and standardization of TODOs
… from RTD to Docusaurus (great-expectations#3014) * docs: Port .rst guide to .md * fix: Update broken link
…TD to Docusaurus (great-expectations#3011) * docs: Port over guide from RTD to Docusaurus
…eat_expectations into reuse-spark-context
@cla-bot check |
Hi @talagluck, thanka for taking a look:
|
Hi @gipaetusb - thanks for the follow-up! It looks like the CLA verification is still failing, and I still see the user Sounds good about the tests - I'll check in in in a week or so to see how things are going. Thanks again! |
Hi @gipaetusb ! I wanted to check in here. It looks like we are still having CLA issues. I also wanted to check in to see if you had time to fix the failing tests, or if you needed some help? |
Hi @talagluck thank you for the follow up. Much easier to open a new, cleaner PR right? Here you go: #3126 If someone can help with the unit tests, that would be great :) ! |
Thanks for the follow-up, @gipaetusb ! Someone from our team will jump in next week to review and work on the unit tests. |
With PR 2733 a parameter was added to
SparkDFDatasource
, to make GE's context reuse an existing Spark Context.This was extremely useful.
However, when using a dynamic Data Context configuration (e.g. in EMR) like
I've found that the
spark_config
andforce_reuse_spark_context
parameters weren't actually passed togeat_expectations.core.util.get_or_create_spark_application()
.In fact, the parameters are lost when the
DataContextConfig
object is dumped into a dictionary, because theDatasourceConfigSchema
(elem of theDataContextConfigSchema
) doesn't list these parameters.Changes proposed in this pull request:
spark_config
andforce_reuse_spark_context
to theDatasourceConfigSchema
Definition of Done
Please delete options that are not relevant.