Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/airflow_providers_bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ body:
- apache-spark
- apprise
- arangodb
- arenadata-ozone
- asana
- atlassian-jira
- celery
Expand Down
10 changes: 10 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -1544,6 +1544,16 @@ RUN bash /scripts/docker/install_packaging_tools.sh; \
bash /scripts/docker/install_airflow_dependencies_from_branch_tip.sh; \
fi

# When installing from context (sources), hatch build runs compile_www_assets and needs nodejs/yarn.
# Placed before COPY sources so this layer is cacheable when only sources change.
USER root
RUN if [ "${INSTALL_PACKAGES_FROM_CONTEXT}" = "true" ]; then \
apt-get update && apt-get install -y --no-install-recommends nodejs npm \
&& npm install -g yarn@1.22.19 \
&& rm -rf /var/lib/apt/lists/*; \
fi
USER airflow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it for correct work UI (not ozone provider part)

COPY --chown=airflow:0 ${AIRFLOW_SOURCES_FROM} ${AIRFLOW_SOURCES_TO}

# Add extra python dependencies
Expand Down
11 changes: 11 additions & 0 deletions Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -1435,13 +1435,24 @@ RUN if command -v airflow; then \
# Install autocomplete for Kubectl
RUN echo "source /etc/bash_completion" >> ~/.bashrc

# Install Node.js/yarn for www asset build. Placed before COPY so this layer is cacheable when only sources change.
RUN apt-get update && apt-get install -y --no-install-recommends nodejs npm \
&& npm install -g yarn@1.22.19 \
&& rm -rf /var/lib/apt/lists/* /root/.npm
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it for correct work UI (not ozone provider part)


# We can copy everything here. The Context is filtered by dockerignore. This makes sure we are not
# copying over stuff that is accidentally generated or that we do not need (such as egg-info)
# if you want to add something that is missing and you expect to see it in the image you can
# add it with ! in .dockerignore next to the airflow, test etc. directories there
COPY . ${AIRFLOW_SOURCES}/

# Build frontend assets so the UI (Graph/DAG pages) works. Node/yarn removed after build to keep image smaller.
WORKDIR ${AIRFLOW_SOURCES}
RUN python3 scripts/ci/pre_commit/compile_www_assets.py \
&& rm -rf airflow/www/node_modules \
&& apt-get update && apt-get purge -y nodejs npm && apt-get autoremove -y --purge \
&& rm -rf /var/lib/apt/lists/*

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it for correct work UI - precompile (not ozone provider part)

WORKDIR ${AIRFLOW_SOURCES}

ARG BUILD_ID
Expand Down
16 changes: 16 additions & 0 deletions airflow/providers/arenadata/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
33 changes: 33 additions & 0 deletions airflow/providers/arenadata/ozone/CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

.. http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.


.. NOTE TO CONTRIBUTORS:
Please, only add notes to the Changelog just below the "Changelog" header when there are some breaking changes
and you want to add an explanation to the users on how they are supposed to deal with them.
The changelog is updated and maintained semi-automatically by release manager.

``apache-airflow-providers-ozone``


Changelog
---------

1.0.0
.....

Initial release of the Apache Ozone provider.
39 changes: 39 additions & 0 deletions airflow/providers/arenadata/ozone/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# NOTE! THIS FILE IS AUTOMATICALLY GENERATED AND WILL BE
# OVERWRITTEN WHEN PREPARING DOCUMENTATION FOR THE PACKAGES.
#
# IF YOU WANT TO MODIFY THIS FILE, YOU SHOULD MODIFY THE TEMPLATE
# `PROVIDER__INIT__PY_TEMPLATE.py.jinja2` IN the `dev/breeze/src/airflow_breeze/templates` DIRECTORY
#
from __future__ import annotations

import packaging.version

from airflow import __version__ as airflow_version

__all__ = ["__version__"]

__version__ = "1.0.0"

if packaging.version.parse(packaging.version.parse(airflow_version).base_version) < packaging.version.parse(
"2.10.3"
):
raise RuntimeError(
f"The package `apache-airflow-providers-ozone:{__version__}` needs Apache Airflow 2.10.3+"
)
17 changes: 17 additions & 0 deletions airflow/providers/arenadata/ozone/example_dags/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
#!/usr/bin/env python
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

"""
Cross-Region Replication Example DAG

This DAG demonstrates replicating data from a primary Ozone cluster to a DR (Disaster Recovery) cluster:
- Uses HdfsToOzoneOperator with distcp for efficient cross-cluster data replication
- Scheduled to run daily (can be adjusted)
- Shows how to replicate critical data for disaster recovery purposes

This example requires:
- Two Ozone clusters (primary and DR) with network connectivity
- Proper Hadoop configuration for cross-cluster communication
- Separate Airflow connections for each cluster (if needed)

Note: This DAG uses distcp, which is the standard tool for cross-cluster replication in Hadoop ecosystems.
"""

from __future__ import annotations

from datetime import timedelta

import pendulum

from airflow.models.dag import DAG
from airflow.providers.arenadata.ozone.transfers.hdfs_to_ozone import HdfsToOzoneOperator

with DAG(
dag_id="example_ozone_cross_region_replication",
start_date=pendulum.datetime(2025, 1, 1, tz="UTC"),
catchup=False,
schedule="0 1 * * *", # Daily at 1 AM
tags=["ozone", "example"],
doc_md="""
### Cross-Region Replication Example

This DAG demonstrates replicating data from a primary Ozone cluster to a DR (Disaster Recovery) cluster.
It uses `HdfsToOzoneOperator`, which leverages `distcp`, the standard tool for this task.

**Prerequisites:**
1. Two Airflow connections: `ozone_primary_cluster` and `ozone_dr_cluster`.
2. Network connectivity between the Airflow worker and both clusters.
3. The `distcp` command must be configured to handle cross-cluster communication (e.g., via `core-site.xml`).
""",
) as dag:
replicate_critical_data = HdfsToOzoneOperator(
task_id="replicate_critical_data_to_dr",
# Note: We provide the full HDFS path including the cluster name (nameservice)
source_path="ofs://primary-cluster/critical_data/{{ ds }}/",
dest_path="ofs://dr-cluster/replicated_data/critical_data/{{ ds }}/",
# In a real setup, you might have separate hooks/connections for each
# but distcp handles this at the Hadoop config level.
execution_timeout=timedelta(minutes=5),
)
Loading