Skip to content

Return destination GCS URIs from ADLSToGCSOperator#61463

Open
Abhishekmishra2808 wants to merge 2 commits intoapache:mainfrom
Abhishekmishra2808:fix/adls-to-gcs-return-destination-uris
Open

Return destination GCS URIs from ADLSToGCSOperator#61463
Abhishekmishra2808 wants to merge 2 commits intoapache:mainfrom
Abhishekmishra2808:fix/adls-to-gcs-return-destination-uris

Conversation

@Abhishekmishra2808
Copy link
Contributor

Related: #11323

  • Return value: execute() now returns a list[str] of destination GCS URIs (gs://bucket/object) for uploaded files.
  • The return type remains list[str]; only the returned values are updated to reflect the destination in GCS.

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Feb 4, 2026
Copy link
Contributor

@SameerMesiah97 SameerMesiah97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. But this would benefit from some more polish.

Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve comment made by @SameerMesiah97 (great review!),
and I think that it will be good to go.
I recall that we should run the system tests before merging this PR to ensure that they don't break, please let us know if you're able to do it - otherwise Google team or someone else will have to do it.

@Abhishekmishra2808 Abhishekmishra2808 force-pushed the fix/adls-to-gcs-return-destination-uris branch from a6488b4 to 3583457 Compare February 6, 2026 13:28
@Abhishekmishra2808 Abhishekmishra2808 force-pushed the fix/adls-to-gcs-return-destination-uris branch from 3583457 to aef396e Compare February 6, 2026 13:31
@Abhishekmishra2808
Copy link
Contributor Author

@shahar1 @SameerMesiah97 Could you please check now, and tell if something is missing on my end.

Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
We need to run the system tests to ensure that nothing breaks (+attach screenshots).

@yuseok89 - if you could help with it, it would be great.

CC: @VladaZakharova @MaksYermak

@yuseok89
Copy link
Contributor

@shahar1
While running the system tests, it looks like this should be using ADLS Gen2 rather than the older Gen1 path.
If I’m mistaken or missing context here, please correct me.

Also, it seems this change may already be reflected in main, so my guess is that resolving the conflict (merge main) and re-running the system tests could be the next step.

@Abhishekmishra2808
Copy link
Contributor Author

@shahar1 Your views on this ?

@shahar1
Copy link
Contributor

shahar1 commented Feb 26, 2026

@shahar1 Your views on this ?

First try to resolve the conflict and see what's going on. Then make sure that your changes are aligned with the appropriate version.

@Abhishekmishra2808
Copy link
Contributor Author

ignore the commits as of now, before merging i will change this to one commit only. but before that @shahar1 please review it

@shahar1
Copy link
Contributor

shahar1 commented Feb 26, 2026

ignore the commits as of now, before merging i will change this to one commit only. but before that @shahar1 please review it

Something doesn't look right - try to avoid reverting any changes introduced by
#61188
(e.g, materializing file_system_name into a property instead of getting it from kwargs), as it was intended for clearing up that gen 1 was retired.

@Abhishekmishra2808 Abhishekmishra2808 force-pushed the fix/adls-to-gcs-return-destination-uris branch from 037e823 to b251621 Compare February 26, 2026 11:43
@Abhishekmishra2808
Copy link
Contributor Author

Abhishekmishra2808 commented Feb 26, 2026

@shahar1 Could please provide more detailed review?
improved upon it by making file_system_name an explicit parameter instead of hiding it in kwargs

Copy link
Contributor

@yuseok89 yuseok89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shahar1
I ran the system test for ADLSToGCSOperator.
It seems to be working now.

Image

@Abhishekmishra2808
Copy link
Contributor Author

@yuseok89 Thanks for testing !

@shahar1
Copy link
Contributor

shahar1 commented Feb 27, 2026

@shahar1 Could please provide more detailed review?
improved upon it by making file_system_name an explicit parameter instead of hiding it in kwargs

Hiding it in the kwargs and raising the TypeError exception (which you deleted in this PR) were deliberate decisions to make transition to Gen 2 clearer and explicit to the end-user.
Please read comments on PR #61188 for more context, and revert changes that were introduced by that PR. To make this PR good to go, at least from my perspective, it should be focused only returning the destination uris.

@Abhishekmishra2808 Abhishekmishra2808 force-pushed the fix/adls-to-gcs-return-destination-uris branch from 6df52e2 to 22844cd Compare February 27, 2026 11:59
- Migrate from AzureDataLakeHook (Gen1) to AzureDataLakeStorageV2Hook (Gen2)
- Update execute() to return list[str] of destination GCS URIs (gs://bucket/object)
- Add file_system_name parameter validation via **kwargs with clear TypeError
- Update unit tests to verify Gen2 hooks and return values
- Add new test for file_system_name TypeError validation
- Remove ADLSToGCSOperator from MISSING_EXAMPLES_FOR_CLASSES

This change aligns with Azure's retirement of ADLS Gen1 and ensures
operators return destination URIs for downstream task dependencies.

The file_system_name parameter is required but passed via **kwargs
to maintain explicit Gen1-to-Gen2 migration messaging for users.
@Abhishekmishra2808 Abhishekmishra2808 force-pushed the fix/adls-to-gcs-return-destination-uris branch from 22844cd to a50b9f3 Compare February 27, 2026 12:02
Add destination URI tracking and return list of GCS URIs instead of file paths.

Changes:
- Update execute() to return list[str] of destination GCS URIs (gs://bucket/object)
- Add destination_uris list initialization
- Append GCS URI after each successful upload
- Update docstring to document return value
- Update unit tests to assert returned GCS URIs

This enables downstream tasks to reference the uploaded GCS objects directly.
@Abhishekmishra2808 Abhishekmishra2808 force-pushed the fix/adls-to-gcs-return-destination-uris branch from c795d27 to 8653cc1 Compare February 27, 2026 12:41
@Abhishekmishra2808
Copy link
Contributor Author

@shahar1 could you please review this now?

Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, well done!

CC: @VladaZakharova @MaksYermak

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates ADLSToGCSOperator to return a list of destination GCS URIs (gs://bucket/object) instead of source ADLS paths, aligning with the standardization effort across all ToGCS operators (issue #11323). The return type remains list[str], but the values now represent where files were uploaded rather than where they came from.

Changes:

  • Updated execute() method to return destination GCS URIs in the format gs://bucket/object
  • Moved _parse_gcs_url() call outside the file upload loop for efficiency
  • Changed from os.path.join to posixpath.join for cross-platform compatibility
  • Updated tests to verify destination URIs instead of source paths
  • Added return value documentation to the class docstring

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
providers/google/src/airflow/providers/google/cloud/transfers/adls_to_gcs.py Modified execute() to build and return destination GCS URIs; optimized by moving _parse_gcs_url outside loop; switched to posixpath.join for GCS path construction
providers/google/tests/unit/google/cloud/transfers/test_adls_to_gcs.py Updated test assertions to verify destination GCS URIs match expected format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants