Skip to content

Conversation

@NewSmoke38
Copy link

Description

Added:

  • verify_notice_files.py: Script to validate NOTICE files in both source tree and distribution packages (.tar.gz, .whl)

Verification checks:

  • Ensures required "Apache Software Foundation" text is present
  • Validates copyright year range ends with current year (2026)
  • Handles special cases (e.g., FAB provider with vendored Flask App Builder)

Integration: Added verification steps to all 5 README_RELEASE_*.md files

Flags: Supports --sources-only and --dist-only for flexible verification during development and release

Testing

  • Verified all 104 source NOTICE files successfully
  • Tested extraction and validation from real PyPI .whl packages
  • Validated error detection with incorrect copyright years
  • Confirmed proper exit codes (0 for success, 1 for failure)

Future Work

As mentioned in #60540:

  • Exploring symbolic links to reduce duplication of NOTICE files across 200+ locations
  • Implementing pre-commit hooks to keep NOTICE files in sync automatically

If this approach is approved, I'd be happy to work on these enhancements in a follow-up PR.


Closes #60540


Was generative AI tooling used to co-author this PR?

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

For distribution packages, after building:

```shell script
python3 dev/verify_notice_files.py --dist-only --year 2026 --verbose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the year not be auto-discoverable by the script? Maybe to be able to optionally override the year if release is cut on Dec 31st and somebody validates on Jan 1st.... but in all other cases we are in the same year.

If you have verbose by default, is logging then set correct? Or does a verbose output need to be inspected for each release check? I'd favor less verbosity in default and a clear PASS/ERROR result.

@potiuk
Copy link
Member

potiuk commented Jan 17, 2026

It feels that there is a lot of AI -generated content that could be simpliified a lot if you assume that you have a prek hook that check individual files (which we will have to do anyway),

I just (also using Claude Sonnet) generated minimal prek hook doing such verification #60699 and it's ** really simple **. Take a look as an example.

Again I asked Claude to generate minimal verification script without and configuration parameters - because we generally do not need them

And it proposed this:

bash:

set -euo pipefail

tmpdir=$(mktemp -d)
trap "rm -rf $tmpdir" EXIT

for file in dist/*.{whl,tar.gz}; do
    [[ -f "$file" ]] || continue
    unzip -q -j "$file" "*/NOTICE" -d "$tmpdir" 2>/dev/null || \
    tar -xzf "$file" --wildcards "*/NOTICE" -C "$tmpdir" --strip-components=1 2>/dev/null || true
done

notices=("$tmpdir"/NOTICE*)
[[ -f "${notices[0]}" ]] && ./scripts/ci/prek/check_notice_files.py "${notices[@]}"

Python:

from __future__ import annotations

import glob
import subprocess
import tarfile
import tempfile
import zipfile
from pathlib import Path

with tempfile.TemporaryDirectory() as d:
    for f in glob.glob("dist/*.{whl,tar.gz}", recursive=True):
        try:
            zipfile.ZipFile(f).extract(
                next(n for n in zipfile.ZipFile(f).namelist() if n.endswith("NOTICE")), d
            )
        except:
            try:
                tarfile.open(f).extract(
                    next(n for n in tarfile.open(f).getnames() if n.endswith("NOTICE")), d
                )
            except:
                pass
    if n := list(Path(d).rglob("NOTICE")):
        subprocess.run(["./scripts/ci/prek/check_notice_files.py", *map(str, n)])

It probably needs a b it polish. but it's generally way, way smaller and does the same job IMHO.

@NewSmoke38
Copy link
Author

Thanks for the feedback! I've simplified the script:

  • Now calls the merged pre-commit hook (scripts/ci/prek/check_notice_files.py)
  • Year is auto-discovered, no flags needed

The script finds all source NOTICE files + extracts from dist packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add verification steps for NOTICE files to our release processes

3 participants