Skip to content

Conversation

@renaudhartert-db
Copy link
Contributor

@renaudhartert-db renaudhartert-db commented Nov 5, 2024

What changes are proposed in this pull request?

This PR changes the _BaseClient to read streams by chunks of 1MB by default. 1MB was chosen as a good compromise between speed and memory usage (see PR #319).

Note that this is not a new feature per se as it was possible to configure chunk size on the returned _StreamResponse before calling its read method. However, the functionality was not easy to discover and led several users to experience memory issues. The new default behavior is more defensive.

How is this tested?

Added a few test cases to verify that streams are chunked as expected.

@pietern
Copy link
Contributor

pietern commented Nov 7, 2024

Related PR: #319.

@renaudhartert-db Looking at the related PR, it's the responsibility of the caller to enable chunking or not. Was there a specific issue or regression you saw where this didn't happen or do you just want to do the right thing by default here?

@renaudhartert-db
Copy link
Contributor Author

Related PR: #319.

@renaudhartert-db Looking at the related PR, it's the responsibility of the caller to enable chunking or not. Was there a specific issue or regression you saw where this didn't happen or do you just want to do the right thing by default here?

Thanks for calling this out, I've updated the PR description to clarify the intent.

@renaudhartert-db renaudhartert-db changed the title [Fix] Read streams by 1MB chunks by default. [Feature] Read streams by 1MB chunks by default. Nov 7, 2024
Signed-off-by: Renaud Hartert <renaud.hartert@databricks.com>
@github-actions
Copy link

github-actions bot commented Nov 8, 2024

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 817
  • Commit SHA: b9070bc4104572b1267164813f8e534dcc0293a5

Checks will be approved automatically on success.

@eng-dev-ecosystem-bot
Copy link
Collaborator

Test Details: go/deco-tests/11741381709

@renaudhartert-db renaudhartert-db added this pull request to the merge queue Nov 8, 2024
Merged via the queue into main with commit 2143e35 Nov 8, 2024
19 checks passed
@renaudhartert-db renaudhartert-db deleted the renaud.hartert/chunk-stream branch November 8, 2024 12:15
renaudhartert-db added a commit that referenced this pull request Nov 18, 2024
### New Features and Improvements

 * Read streams by 1MB chunks by default. ([#817](#817)).

### Bug Fixes

 * Rewind seekable streams before retrying ([#821](#821)).

### Internal Changes

 * Reformat SDK with YAPF 0.43. ([#822](#822)).
 * Update Jobs GetRun API to support paginated responses for jobs and ForEach tasks ([#819](#819)).
 * Update PR template ([#814](#814)).

### API Changes:

 * Added `databricks.sdk.service.apps`, `databricks.sdk.service.billing`, `databricks.sdk.service.catalog`, `databricks.sdk.service.compute`, `databricks.sdk.service.dashboards`, `databricks.sdk.service.files`, `databricks.sdk.service.iam`, `databricks.sdk.service.jobs`, `databricks.sdk.service.marketplace`, `databricks.sdk.service.ml`, `databricks.sdk.service.oauth2`, `databricks.sdk.service.pipelines`, `databricks.sdk.service.provisioning`, `databricks.sdk.service.serving`, `databricks.sdk.service.settings`, `databricks.sdk.service.sharing`, `databricks.sdk.service.sql`, `databricks.sdk.service.vectorsearch` and `databricks.sdk.service.workspace` packages.

OpenAPI SHA: 2035bf5234753adfd080a79bff325dd4a5b90bc2, Date: 2024-11-15
This was referenced Nov 18, 2024
github-merge-queue bot pushed a commit that referenced this pull request Nov 18, 2024
### New Features and Improvements

* Read streams by 1MB chunks by default.
([#817](#817)).

### Bug Fixes

* Rewind seekable streams before retrying
([#821](#821)).
 * Properly serialize nested data classes. 

### Internal Changes

* Reformat SDK with YAPF 0.43.
([#822](#822)).
* Update Jobs GetRun API to support paginated responses for jobs and
ForEach tasks
([#819](#819)).

### API Changes:

* Added `service_principal_client_id` field for
`databricks.sdk.service.apps.App`.
* Added `azure_service_principal`, `gcp_service_account_key` and
`read_only` fields for
`databricks.sdk.service.catalog.CreateCredentialRequest`.
* Added `azure_service_principal`, `read_only` and
`used_for_managed_storage` fields for
`databricks.sdk.service.catalog.CredentialInfo`.
* Added `omit_username` field for
`databricks.sdk.service.catalog.ListTablesRequest`.
* Added `azure_service_principal` and `read_only` fields for
`databricks.sdk.service.catalog.UpdateCredentialRequest`.
* Added `external_location_name`, `read_only` and `url` fields for
`databricks.sdk.service.catalog.ValidateCredentialRequest`.
* Added `is_dir` field for
`databricks.sdk.service.catalog.ValidateCredentialResponse`.
 * Added `only` field for `databricks.sdk.service.jobs.RunNow`.
* Added `restart_window` field for
`databricks.sdk.service.pipelines.CreatePipeline`.
* Added `restart_window` field for
`databricks.sdk.service.pipelines.EditPipeline`.
* Added `restart_window` field for
`databricks.sdk.service.pipelines.PipelineSpec`.
* Added `private_access_settings_id` field for
`databricks.sdk.service.provisioning.UpdateWorkspaceRequest`.
* Changed `create_credential()` and
`generate_temporary_service_credential()` methods for
[w.credentials](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/credentials.html)
workspace-level service with new required argument order.
* Changed `access_connector_id` field for
`databricks.sdk.service.catalog.AzureManagedIdentity` to be required.
* Changed `access_connector_id` field for
`databricks.sdk.service.catalog.AzureManagedIdentity` to be required.
* Changed `name` field for
`databricks.sdk.service.catalog.CreateCredentialRequest` to be required.
* Changed `credential_name` field for
`databricks.sdk.service.catalog.GenerateTemporaryServiceCredentialRequest`
to be required.

OpenAPI SHA: f2385add116e3716c8a90a0b68e204deb40f996c, Date: 2024-11-15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants