Return destination GCS URIs from ADLSToGCSOperator#61463
Return destination GCS URIs from ADLSToGCSOperator#61463Abhishekmishra2808 wants to merge 2 commits intoapache:mainfrom
Conversation
SameerMesiah97
left a comment
There was a problem hiding this comment.
Looks good overall. But this would benefit from some more polish.
providers/google/src/airflow/providers/google/cloud/transfers/adls_to_gcs.py
Outdated
Show resolved
Hide resolved
providers/google/src/airflow/providers/google/cloud/transfers/adls_to_gcs.py
Outdated
Show resolved
Hide resolved
providers/google/tests/unit/google/cloud/transfers/test_adls_to_gcs.py
Outdated
Show resolved
Hide resolved
providers/google/tests/unit/google/cloud/transfers/test_adls_to_gcs.py
Outdated
Show resolved
Hide resolved
shahar1
left a comment
There was a problem hiding this comment.
Please resolve comment made by @SameerMesiah97 (great review!),
and I think that it will be good to go.
I recall that we should run the system tests before merging this PR to ensure that they don't break, please let us know if you're able to do it - otherwise Google team or someone else will have to do it.
providers/google/src/airflow/providers/google/cloud/transfers/adls_to_gcs.py
Outdated
Show resolved
Hide resolved
a6488b4 to
3583457
Compare
3583457 to
aef396e
Compare
|
@shahar1 @SameerMesiah97 Could you please check now, and tell if something is missing on my end. |
|
@shahar1 Also, it seems this change may already be reflected in |
|
@shahar1 Your views on this ? |
First try to resolve the conflict and see what's going on. Then make sure that your changes are aligned with the appropriate version. |
|
ignore the commits as of now, before merging i will change this to one commit only. but before that @shahar1 please review it |
Something doesn't look right - try to avoid reverting any changes introduced by |
037e823 to
b251621
Compare
|
@shahar1 Could please provide more detailed review? |
|
@yuseok89 Thanks for testing ! |
Hiding it in the kwargs and raising the TypeError exception (which you deleted in this PR) were deliberate decisions to make transition to Gen 2 clearer and explicit to the end-user. |
6df52e2 to
22844cd
Compare
- Migrate from AzureDataLakeHook (Gen1) to AzureDataLakeStorageV2Hook (Gen2) - Update execute() to return list[str] of destination GCS URIs (gs://bucket/object) - Add file_system_name parameter validation via **kwargs with clear TypeError - Update unit tests to verify Gen2 hooks and return values - Add new test for file_system_name TypeError validation - Remove ADLSToGCSOperator from MISSING_EXAMPLES_FOR_CLASSES This change aligns with Azure's retirement of ADLS Gen1 and ensures operators return destination URIs for downstream task dependencies. The file_system_name parameter is required but passed via **kwargs to maintain explicit Gen1-to-Gen2 migration messaging for users.
22844cd to
a50b9f3
Compare
Add destination URI tracking and return list of GCS URIs instead of file paths. Changes: - Update execute() to return list[str] of destination GCS URIs (gs://bucket/object) - Add destination_uris list initialization - Append GCS URI after each successful upload - Update docstring to document return value - Update unit tests to assert returned GCS URIs This enables downstream tasks to reference the uploaded GCS objects directly.
c795d27 to
8653cc1
Compare
|
@shahar1 could you please review this now? |
There was a problem hiding this comment.
Pull request overview
This PR updates ADLSToGCSOperator to return a list of destination GCS URIs (gs://bucket/object) instead of source ADLS paths, aligning with the standardization effort across all ToGCS operators (issue #11323). The return type remains list[str], but the values now represent where files were uploaded rather than where they came from.
Changes:
- Updated
execute()method to return destination GCS URIs in the formatgs://bucket/object - Moved
_parse_gcs_url()call outside the file upload loop for efficiency - Changed from
os.path.jointoposixpath.joinfor cross-platform compatibility - Updated tests to verify destination URIs instead of source paths
- Added return value documentation to the class docstring
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| providers/google/src/airflow/providers/google/cloud/transfers/adls_to_gcs.py | Modified execute() to build and return destination GCS URIs; optimized by moving _parse_gcs_url outside loop; switched to posixpath.join for GCS path construction |
| providers/google/tests/unit/google/cloud/transfers/test_adls_to_gcs.py | Updated test assertions to verify destination GCS URIs match expected format |

Related: #11323
execute()now returns alist[str]of destination GCS URIs (gs://bucket/object) for uploaded files.list[str]; only the returned values are updated to reflect the destination in GCS.