Skip to content

Conversation

@luserg
Copy link
Collaborator

@luserg luserg commented Jan 30, 2026

Implement Ozone provider

This PR introduces a new Apache Ozone Provider for Apache Airflow, enabling seamless integration with Apache Ozone — a distributed, scalable object store for Apache Hadoop.

Features

  • Native CLI Operations: Manage volumes/buckets and perform filesystem operations on ofs:// and o3fs:// paths via the Ozone CLI
  • Data Transfers: Efficient data movement including HDFS → Ozone (DistCp), Ozone → S3 backups, and in-cluster moves/renames (metadata-only)
  • Hive Integration: Register Ozone paths as Hive table partitions for query access
  • Backup & DR: Create native Ozone bucket snapshots for disaster recovery
  • Monitoring: Sensors for Ozone FS paths/keys and S3 keys
  • SSL/TLS Support: Secure Native CLI and S3 Gateway connections with configurable certificate verification
  • Kerberos Authentication: Enterprise authentication for Native CLI with automatic kinit using keytab
  • Secrets Backend Support: Resolve sensitive values via secret:// references and mask secrets in logs
  • Reliability & Performance: Retries with exponential backoff, subprocess timeouts, detailed logging, and parallel transfers for bulk operations

@luserg luserg self-assigned this Jan 30, 2026
@luserg luserg force-pushed the feature/ozone-provider branch from 90aeb40 to edb8ab1 Compare February 8, 2026 22:18
------------

* ``apache-airflow`` >= 2.10.3
* ``apache-airflow-providers-amazon`` >= 8.27.0
Copy link
Collaborator Author

@luserg luserg Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix (mistake in doc)

"""
Build Kerberos environment overrides (delta) for a process.

This function is PURE относительно внешних эффектов:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix (Rus comments)

&& rm -rf /var/lib/apt/lists/*; \
fi
USER airflow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it for correct work UI (not ozone provider part)

# Install Node.js/yarn for www asset build. Placed before COPY so this layer is cacheable when only sources change.
RUN apt-get update && apt-get install -y --no-install-recommends nodejs npm \
&& npm install -g yarn@1.22.19 \
&& rm -rf /var/lib/apt/lists/* /root/.npm
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it for correct work UI (not ozone provider part)

&& rm -rf airflow/www/node_modules \
&& apt-get update && apt-get purge -y nodejs npm && apt-get autoremove -y --purge \
&& rm -rf /var/lib/apt/lists/*

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it for correct work UI - precompile (not ozone provider part)

@@ -0,0 +1,24 @@
.. Licensed to the Apache Software Foundation (ASF) under one
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to move in docs/apache-airflow-providers-arenadata-ozone/ path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant