Skip to content

Conversation

@jsjasonseba
Copy link
Contributor

@jsjasonseba jsjasonseba commented Sep 25, 2024

closes: #42439
related: #42439

This PR refactors bucket.get_blob calls in GCSHook to handle validation for non-existent objects. An internal method named _get_blob is created in GCSHook to consolidate bucket.get_blob calls where it includes validation. Exception is raised when object is non-existent.

It also fixes implicit exception in GCSToLocalFilesystemOperator when reading non-existing files by raising an explicit exception indicating object non-existence.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Sep 25, 2024
@jsjasonseba jsjasonseba reopened this Sep 25, 2024
@jsjasonseba
Copy link
Contributor Author

Hi @shahar1, my PR is ready. I noticed that several methods in the GCS hook have the same issue as get_size, and some of them implemented nonexistent file checking too, so I decided to refactor those methods to reduce redundancy. Kindly check the PR when you have time.

Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start! Few comments:

  1. If you decide to solve the deeper cause of the original issue (which is great!) - please refer to it directly in the PR: title, content and newsfragment (i.e., fixing the issue in the hooks that tilize get_blob, rather than the operator). It still closes the original issue though.
  2. There are 2 failing tests due to the refactoring, try to fix these tests.

There's also an inline comment :)

@jsjasonseba jsjasonseba changed the title Fix implicit exception in GCSToLocalFilesystemOperator when reading non-existing files Refactors bucket.get_blob calls in GCSHook to handle validation for non-existent objects. Sep 26, 2024
@jsjasonseba jsjasonseba changed the title Refactors bucket.get_blob calls in GCSHook to handle validation for non-existent objects. Refactor bucket.get_blob calls in GCSHook to handle validation for non-existent objects. Sep 26, 2024
@jsjasonseba jsjasonseba requested a review from shahar1 September 26, 2024 14:54
@shahar1
Copy link
Contributor

shahar1 commented Sep 27, 2024

Great work!

@shahar1 shahar1 merged commit b0234cb into apache:main Sep 27, 2024
joaopamaral pushed a commit to joaopamaral/airflow that referenced this pull request Oct 21, 2024
…n for non-existent objects. (apache#42474)

* Fix implicit exception in GCSToLocalFilesystemOperator when reading non-existing files.

* change method to internal method

* Fix unit test expectations

* Refactor bucket.get_blob calls in GCSHook to handle validation for non-existent objects
ellisms pushed a commit to ellisms/airflow that referenced this pull request Nov 13, 2024
…n for non-existent objects. (apache#42474)

* Fix implicit exception in GCSToLocalFilesystemOperator when reading non-existing files.

* change method to internal method

* Fix unit test expectations

* Refactor bucket.get_blob calls in GCSHook to handle validation for non-existent objects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implicit exception in GCSToLocalFilesystemOperator when reading non-existing files

2 participants