-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow version
main (development)
If "Other Airflow 2 version" selected, which one?
No response
What happened?
I'm opening this issue to track a needed refactor of airflow.io.path.ObjectStoragePath to support the next release of universal_pathlib without having to have a special implementation for Python-3.12
What you think should happen instead?
Tagging @bolkedebruin, @uranusjr, @potiuk
It would be great if I could have some insight into the intended behaviour for the custom UPath subclass airflow.io.path.ObjectStoragePath
Looking at the current implementation it seems that these are the intended changes:
- (1) extracting a connection ID from the URI
Lines 148 to 151 in 07fd364
userinfo, have_info, hostinfo = parsed_url.netloc.rpartition("@") if have_info: conn_id = conn_id or userinfo or None parsed_url = parsed_url._replace(netloc=hostinfo) - (2) delegating fsspec filesystem creation to
airflow.io.store.ObjectStoreLines 49 to 65 in 07fd364
def __init__( self, parsed_url: SplitResult | None, conn_id: str | None = None, **kwargs: typing.Any, ) -> None: # warning: we are not calling super().__init__ here # as it will try to create a new fs from a different # set if registered filesystems if parsed_url and parsed_url.scheme: self._store = attach(parsed_url.scheme, conn_id) else: self._store = attach("file", conn_id) @property def _fs(self) -> AbstractFileSystem: return self._store.fs - (3) customization and extension of the
UPathinterface with several custom methods.
I am working on supporting (1) and (2) via custom methods that can be overwritten in UPath subclasses, and allow to have a single implementation for python versions prior to 3.12 and 3.12+, see: fsspec/universal_pathlib#172
Once that is integrated in UPath it should simplify the ObjectStoragePath implementation a lot. The custom accessor will be replaced by a factory method for creating fsspec filesystems, and the custom __new__ implementation by a method for parsing additional storage_options.
Let me know if that sounds good, and what's the best way to collaborate on this issue and move forward.
Cheers,
Andreas 😃
How to reproduce
N/A
Operating System
N/A
Versions of Apache Airflow Providers
Deployment
Other
Deployment details
N/A
Anything else?
N/A
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct