Skip to content

Conversation

@goncalo-m-c
Copy link
Contributor

@goncalo-m-c goncalo-m-c commented Sep 18, 2025

TL;DR: Add support in Helm chart for custom labels in redis, statsd and dagProcessor objects

Description

I would like to be able to customize the labels defined for Airflow Kubernetes resources so I can comply with my company guidelines and be able to track all objects in the same way.

This PR adds the ability to specify custom labels for all Airflow objects and pods defined in the Helm chart. Labels can be set globally through .Values.labels and component-specifically through <component>.labels. These labels are merged, with component-specific labels taking precedence.

Changes

  • Added labels property to redis, statsd and dagProcessor components.
  • Updated respective templates to get custom labels and merge them with the global labels.
  • Updated values.schema.json to include schema definitions for label fields
  • Added documentation in chart/docs/customizing-labels-for-pods.rst explaining the labeling system

Example Usage

# Global labels for all objects and pods
labels:
  environment: production

# Component-specific labels
scheduler:
  labels:
    role: scheduler

workers:
  labels:
    role: worker

webserver:
  labels:
    role: ui

Testing

I tested my changes by building the helm chart locally and templating our current values.yaml on top of the local chart archive.

# Package chart in original repo
helm package chart/
cp airflow-1.19.0-dev.tgz ./path/to/my/repo/chart/charts/
# Render full configuration with our own instance of the chart
helm template ./chart --name-template airflow --namespace airflow --values ./path/to/values.yaml --debug > test-template.yaml

Test Case 1 - Using global labels and component labels without overlap

Example config in values.yaml:

# Global labels
labels:
  service: airflow

# Component specific labels
dagProcessor:
  enabled: true
  labels:
    miro_function: etl
    component_name: dag-processor

Generated Dag Processor objects:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: airflow-dag-processor
  labels:
    tier: airflow
    component: dag-processor
    release: airflow
    chart: "airflow-1.19.0-dev"
    heritage: Helm
    service: airflow
spec:
# ....
  template:
    metadata:
      labels:
        tier: airflow
        component: dag-processor
        release: airflow
        component_name: dag-processor
        miro_function: etl
        service: airflow
# ...
---
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: true
metadata:
  name: "airflow-dag-processor"
  labels:
    tier: airflow
    component: dag-processor
    release: airflow
    chart: "airflow-1.19.0-dev"
    heritage: Helm
    component_name: dag-processor
    miro_function: etl
    service: airflow
# ....

Test Case 2 - Override global labels with component labels

Example config in values.yaml:

# Global labels
labels:
  service: airflow

# Component specific labels
dagProcessor:
  enabled: true
  labels:
    service: airflow-dag-processor
    miro_function: etl
    component_name: dag-processor

Generated Dag Processor objects:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: airflow-dag-processor
  labels:
    tier: airflow
    component: dag-processor
    release: airflow
    chart: "airflow-1.19.0-dev"
    heritage: Helm
    service: airflow
spec:
# ....
  template:
    metadata:
      labels:
        tier: airflow
        component: dag-processor
        release: airflow
        component_name: dag-processor
        miro_function: etl
        service: airflow-dag-processor
# ...
---
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: true
metadata:
  name: "airflow-dag-processor"
  labels:
    tier: airflow
    component: dag-processor
    release: airflow
    chart: "airflow-1.19.0-dev"
    heritage: Helm
    component_name: dag-processor
    miro_function: etl
    service: airflow-dag-processor
# ....

Test Case 3 - Use only component labels

Example config in values.yaml:

# Component specific labels
dagProcessor:
  enabled: true
  labels:
    service: apache-airflow
    miro_function: etl
    component_name: dag-processor

Generated Dag Processor objects:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: airflow-dag-processor
  labels:
    tier: airflow
    component: dag-processor
    release: airflow
    chart: "airflow-1.19.0-dev"
    heritage: Helm
spec:
# ....
  template:
    metadata:
      labels:
        tier: airflow
        component: dag-processor
        release: airflow
        component_name: dag-processor
        miro_function: etl
        service: airflow-dag-processor
# ...
---
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: true
metadata:
  name: "airflow-dag-processor"
  labels:
    tier: airflow
    component: dag-processor
    release: airflow
    chart: "airflow-1.19.0-dev"
    heritage: Helm
    component_name: dag-processor
    miro_function: etl
    service: airflow-dag-processor
# ....

Checklist

  • Description above provides context of the change
  • Added schema definitions for new configuration options
  • Documented new values in docs/
  • Used mustMerge to properly handle label merging
  • No breaking changes introduced
  • Tested configuration changes

Additional Notes

Deployments are currently only labeled with global labels and I don't know if this is done for a specific reason. If possible, I would also like to implement custom labels for Deployments.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg
Copy link

boring-cyborg bot commented Sep 18, 2025

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@goncalo-m-c goncalo-m-c force-pushed the customize-statsd-dagprocessor-labels branch from 1e144fd to 8e6f559 Compare September 18, 2025 12:27
@eladkal eladkal requested a review from romsharon98 September 18, 2025 12:42
@goncalo-m-c goncalo-m-c force-pushed the customize-statsd-dagprocessor-labels branch 3 times, most recently from 3a5a297 to 3c2d421 Compare September 18, 2025 13:45
@EMNs
Copy link

EMNs commented Sep 19, 2025

Looks good!

@goncalo-m-c goncalo-m-c force-pushed the customize-statsd-dagprocessor-labels branch 4 times, most recently from a001ba7 to 31fc098 Compare October 15, 2025 08:42
Copy link
Contributor

@Miretpl Miretpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change!

Could you add the test cases which would valided the changed behaviour of the labels?

@goncalo-m-c goncalo-m-c force-pushed the customize-statsd-dagprocessor-labels branch from 8cfc187 to e58c261 Compare October 16, 2025 09:38
@goncalo-m-c
Copy link
Contributor Author

Nice change!

Could you add the test cases which would valided the changed behaviour of the labels?

Thanks for reviewing! I've added 3 test cases in the PR description

@goncalo-m-c goncalo-m-c requested a review from Miretpl October 16, 2025 11:07
@goncalo-m-c
Copy link
Contributor Author

@Miretpl btw, I still think that the whole label configuration is a bit inconsistent. For example, at the moment we can customize labels for Pods created under a Deployment (here), but we cannot customize the Deployment labels themselves (here).

I didn't add such changes in this PR to keep consistency with the other resources, but I will open a proposal for that separately.

@goncalo-m-c goncalo-m-c force-pushed the customize-statsd-dagprocessor-labels branch from e58c261 to 78e97cb Compare October 16, 2025 11:11
Copy link
Contributor

@Miretpl Miretpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dstandish @jedcunningham @hussein-awala @romsharon98, could you take a look at it?

@Miretpl
Copy link
Contributor

Miretpl commented Oct 16, 2025

@goncalo-m-c yeah, there are a lot of like unfinished things in the chart or inconsistencies. I agree that the helm chart needs a little more love from the whole community.

@goncalo-m-c goncalo-m-c force-pushed the customize-statsd-dagprocessor-labels branch from 78e97cb to f8a9462 Compare October 23, 2025 11:31
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR as well as also the documentation added.

Just a small nit - can you add a small test case as well to ensure the function is not degrading with some changes applied?

@goncalo-m-c goncalo-m-c force-pushed the customize-statsd-dagprocessor-labels branch 2 times, most recently from 1423795 to 314903c Compare October 29, 2025 10:38
@goncalo-m-c
Copy link
Contributor Author

Thanks for the PR as well as also the documentation added.

Just a small nit - can you add a small test case as well to ensure the function is not degrading with some changes applied?

Thanks for reviewing @jscheffl, can you clarify your request? I've just added new unit tests to reflect the changes I am proposing.

@goncalo-m-c goncalo-m-c force-pushed the customize-statsd-dagprocessor-labels branch from 7c72b90 to ecb44e9 Compare October 30, 2025 13:50
@goncalo-m-c
Copy link
Contributor Author

goncalo-m-c commented Oct 30, 2025

CI fails now in some static tests. Can you run pre-commit checks which we implemented with prek? (There is also a development quick start guide) --> https://github.com/apache/airflow/blob/main/contributing-docs/03a_contributors_quick_start_beginners.rst That will fix the static checks or tell you what need to be changed to make CI happy.

My bad, I did not enable those. However, I cannot get all the checks to work properly both on my local environment and on a clean Codespace. What I was able to do was to run only the hooks that were failing, which are passing now, and then commit the changes.

# prek end-of-file-fixer ruff-format update-breeze-cmd-output --directory helm-tests
Make sure that there is an empty line at the end.........................Passed
Run 'ruff format'........................................................Passed
Update breeze docs...................................(no files to check)Skipped

Changes are in a separate commit

@goncalo-m-c goncalo-m-c requested a review from jscheffl October 31, 2025 08:47
@jscheffl
Copy link
Contributor

Oh, still a minor fix is needed as the CLI output of the breeze tooling needs a bit of updates. Can you run prek run -a update-breeze-cmd-output to get fixes and commit the files to the branch of the PR

@goncalo-m-c
Copy link
Contributor Author

Oh, still a minor fix is needed as the CLI output of the breeze tooling needs a bit of updates. Can you run prek run -a update-breeze-cmd-output to get fixes and commit the files to the branch of the PR

Updated ✅

@eladkal eladkal added this to the Airflow Helm Chart 1.19.0 milestone Nov 3, 2025
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks!

@jscheffl jscheffl merged commit b397baa into apache:main Nov 3, 2025
118 checks passed
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 3, 2025

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@goncalo-m-c goncalo-m-c deleted the customize-statsd-dagprocessor-labels branch November 4, 2025 08:34
Copilot AI pushed a commit to jason810496/airflow that referenced this pull request Dec 5, 2025
* feat(helm): Pass custom labels to redis, statsd and dagProcessor

Co-authored-by: Gonçalo <goncalo@miro.com>

* docs(helm): Write helm labels customization doc

Co-authored-by: Gonçalo <goncalo@miro.com>

* feat(helm): Clarify dagProcessor values.yaml comments

Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>

* docs(helm): Clarify component-specific labels override behavior

* feat(helm): Remove parenthesis from label checks

* tests(helm): Add label tests for redis, statsd and dagprocessor

* tests(helm): Fix tests formatting

* fix(docs): Update breeze output docs

---------

Co-authored-by: Gonçalo Costa <goncalomc294@gmail.com>
Co-authored-by: Gonçalo <goncalo@miro.com>
Co-authored-by: Przemysław Mirowski <miretpl@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants