-
Notifications
You must be signed in to change notification settings - Fork 175
[Feature] Read streams by 1MB chunks by default. #817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…atabricks-sdk-py into chunk-stream
|
Related PR: #319. @renaudhartert-db Looking at the related PR, it's the responsibility of the caller to enable chunking or not. Was there a specific issue or regression you saw where this didn't happen or do you just want to do the right thing by default here? |
Thanks for calling this out, I've updated the PR description to clarify the intent. |
Signed-off-by: Renaud Hartert <renaud.hartert@databricks.com>
|
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
|
Test Details: go/deco-tests/11741381709 |
### New Features and Improvements * Read streams by 1MB chunks by default. ([#817](#817)). ### Bug Fixes * Rewind seekable streams before retrying ([#821](#821)). ### Internal Changes * Reformat SDK with YAPF 0.43. ([#822](#822)). * Update Jobs GetRun API to support paginated responses for jobs and ForEach tasks ([#819](#819)). * Update PR template ([#814](#814)). ### API Changes: * Added `databricks.sdk.service.apps`, `databricks.sdk.service.billing`, `databricks.sdk.service.catalog`, `databricks.sdk.service.compute`, `databricks.sdk.service.dashboards`, `databricks.sdk.service.files`, `databricks.sdk.service.iam`, `databricks.sdk.service.jobs`, `databricks.sdk.service.marketplace`, `databricks.sdk.service.ml`, `databricks.sdk.service.oauth2`, `databricks.sdk.service.pipelines`, `databricks.sdk.service.provisioning`, `databricks.sdk.service.serving`, `databricks.sdk.service.settings`, `databricks.sdk.service.sharing`, `databricks.sdk.service.sql`, `databricks.sdk.service.vectorsearch` and `databricks.sdk.service.workspace` packages. OpenAPI SHA: 2035bf5234753adfd080a79bff325dd4a5b90bc2, Date: 2024-11-15
### New Features and Improvements * Read streams by 1MB chunks by default. ([#817](#817)). ### Bug Fixes * Rewind seekable streams before retrying ([#821](#821)). * Properly serialize nested data classes. ### Internal Changes * Reformat SDK with YAPF 0.43. ([#822](#822)). * Update Jobs GetRun API to support paginated responses for jobs and ForEach tasks ([#819](#819)). ### API Changes: * Added `service_principal_client_id` field for `databricks.sdk.service.apps.App`. * Added `azure_service_principal`, `gcp_service_account_key` and `read_only` fields for `databricks.sdk.service.catalog.CreateCredentialRequest`. * Added `azure_service_principal`, `read_only` and `used_for_managed_storage` fields for `databricks.sdk.service.catalog.CredentialInfo`. * Added `omit_username` field for `databricks.sdk.service.catalog.ListTablesRequest`. * Added `azure_service_principal` and `read_only` fields for `databricks.sdk.service.catalog.UpdateCredentialRequest`. * Added `external_location_name`, `read_only` and `url` fields for `databricks.sdk.service.catalog.ValidateCredentialRequest`. * Added `is_dir` field for `databricks.sdk.service.catalog.ValidateCredentialResponse`. * Added `only` field for `databricks.sdk.service.jobs.RunNow`. * Added `restart_window` field for `databricks.sdk.service.pipelines.CreatePipeline`. * Added `restart_window` field for `databricks.sdk.service.pipelines.EditPipeline`. * Added `restart_window` field for `databricks.sdk.service.pipelines.PipelineSpec`. * Added `private_access_settings_id` field for `databricks.sdk.service.provisioning.UpdateWorkspaceRequest`. * Changed `create_credential()` and `generate_temporary_service_credential()` methods for [w.credentials](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/credentials.html) workspace-level service with new required argument order. * Changed `access_connector_id` field for `databricks.sdk.service.catalog.AzureManagedIdentity` to be required. * Changed `access_connector_id` field for `databricks.sdk.service.catalog.AzureManagedIdentity` to be required. * Changed `name` field for `databricks.sdk.service.catalog.CreateCredentialRequest` to be required. * Changed `credential_name` field for `databricks.sdk.service.catalog.GenerateTemporaryServiceCredentialRequest` to be required. OpenAPI SHA: f2385add116e3716c8a90a0b68e204deb40f996c, Date: 2024-11-15
What changes are proposed in this pull request?
This PR changes the
_BaseClientto read streams by chunks of 1MB by default. 1MB was chosen as a good compromise between speed and memory usage (see PR #319).Note that this is not a new feature per se as it was possible to configure chunk size on the returned
_StreamResponsebefore calling its read method. However, the functionality was not easy to discover and led several users to experience memory issues. The new default behavior is more defensive.How is this tested?
Added a few test cases to verify that streams are chunked as expected.