Skip to content

Update airflow.io.path.ObjectStoragePath for upcoming universal_pathlib release #37067

@ap--

Description

@ap--

Apache Airflow version

main (development)

If "Other Airflow 2 version" selected, which one?

No response

What happened?

I'm opening this issue to track a needed refactor of airflow.io.path.ObjectStoragePath to support the next release of universal_pathlib without having to have a special implementation for Python-3.12

What you think should happen instead?

Tagging @bolkedebruin, @uranusjr, @potiuk

It would be great if I could have some insight into the intended behaviour for the custom UPath subclass airflow.io.path.ObjectStoragePath

Looking at the current implementation it seems that these are the intended changes:

  • (1) extracting a connection ID from the URI

    airflow/airflow/io/path.py

    Lines 148 to 151 in 07fd364

    userinfo, have_info, hostinfo = parsed_url.netloc.rpartition("@")
    if have_info:
    conn_id = conn_id or userinfo or None
    parsed_url = parsed_url._replace(netloc=hostinfo)
  • (2) delegating fsspec filesystem creation to airflow.io.store.ObjectStore
    def __init__(
    self,
    parsed_url: SplitResult | None,
    conn_id: str | None = None,
    **kwargs: typing.Any,
    ) -> None:
    # warning: we are not calling super().__init__ here
    # as it will try to create a new fs from a different
    # set if registered filesystems
    if parsed_url and parsed_url.scheme:
    self._store = attach(parsed_url.scheme, conn_id)
    else:
    self._store = attach("file", conn_id)
    @property
    def _fs(self) -> AbstractFileSystem:
    return self._store.fs
  • (3) customization and extension of the UPath interface with several custom methods.

I am working on supporting (1) and (2) via custom methods that can be overwritten in UPath subclasses, and allow to have a single implementation for python versions prior to 3.12 and 3.12+, see: fsspec/universal_pathlib#172

Once that is integrated in UPath it should simplify the ObjectStoragePath implementation a lot. The custom accessor will be replaced by a factory method for creating fsspec filesystems, and the custom __new__ implementation by a method for parsing additional storage_options.

Let me know if that sounds good, and what's the best way to collaborate on this issue and move forward.

Cheers,
Andreas 😃

How to reproduce

N/A

Operating System

N/A

Versions of Apache Airflow Providers


Deployment

Other

Deployment details

N/A

Anything else?

N/A

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions