Skip to content

Conversation

@jscheffl
Copy link
Contributor

@jscheffl jscheffl commented Feb 2, 2025

Main is failing e.g. in https://github.com/apache/airflow/actions/runs/13101805105/job/36550974189 as runners are "out of disk space".
This seems to be caused as only 23GB are free on /root disk where main artifacts like docker is produced. When the runner starts the following disk space is available:

Run df -H
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        78G   56G   23G  72% /
tmpfs           8.4G  177k  8.4G   1% /dev/shm
tmpfs           3.4G  1.2M  3.4G   1% /run
tmpfs           5.3M     0  5.3M   0% /run/lock
/dev/sda15      110M  6.4M  104M   6% /boot/efi
/dev/sdb1        79G  4.3G   71G   6% /mnt
tmpfs           1.7G   13k  1.7G   1% /run/user/1001

(A lot of build tools are already on CI image - see more debug info from the additional debug steps in https://github.com/apache/airflow/actions/runs/13102704519/job/36552932758 where the disk space onthe image is allocated.)

...but this means that after building the image only ~5GB are free and in some corner cases (e.g. main) the build fails. Disk space at then of building pipelines is:

Run df -H
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        78G   73G  5.2G  94% /
tmpfs           8.4G  177k  8.4G   1% /dev/shm
tmpfs           3.4G  1.2M  3.4G   1% /run
tmpfs           5.3M     0  5.3M   0% /run/lock
/dev/sdb15      110M  6.4M  104M   6% /boot/efi
/dev/sda1        79G  8.2G   67G  11% /mnt
tmpfs           1.7G   13k  1.7G   1% /run/user/1001

...and as it seems a LOT of disk space is on the second drive in /mnt and the larged data chunk from triaging is the docker workspace, this PR adds a bind mout to use the second drive in /var/lib/docker. With this at the end of the build of the image the following free space is available:

Run df -H
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        78G   54G   25G  69% /
tmpfs           8.4G  177k  8.4G   1% /dev/shm
tmpfs           3.4G  1.2M  3.4G   1% /run
tmpfs           5.3M     0  5.3M   0% /run/lock
/dev/sda15      110M  6.4M  104M   6% /boot/efi
/dev/sdb1        79G   28G   48G  37% /mnt
tmpfs           1.7G   13k  1.7G   1% /run/user/1001

Adding this as a function to the docker cleanup script and making this to all pipelines.
(Numbers were taken from Python 3.9)

@jscheffl jscheffl added full tests needed We need to run full set of tests for this PR to merge all versions If set, the CI build will be forced to use all versions of Python/K8S/DBs and removed full tests needed We need to run full set of tests for this PR to merge labels Feb 2, 2025
@jscheffl jscheffl changed the title Triage disk space issues, DO NOT MERGE Fix disk space issues, DO NOT MERGE - still WIP Feb 2, 2025
@jscheffl jscheffl changed the title Fix disk space issues, DO NOT MERGE - still WIP Fix disk space issues on github runners Feb 2, 2025
@jscheffl jscheffl marked this pull request as ready for review February 2, 2025 22:09
@jscheffl jscheffl requested review from ashb and potiuk as code owners February 2, 2025 22:09
@jscheffl
Copy link
Contributor Author

jscheffl commented Feb 2, 2025

Mhm, not sure but smells like image constraint generation failure is un-related to this PR... at least should be...

@jscheffl jscheffl force-pushed the bugfix/triage-disk-space-issues branch from cbcc89d to a85dcff Compare February 3, 2025 05:53
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM +1

@jscheffl
Copy link
Contributor Author

jscheffl commented Feb 3, 2025

Docs build problems seem to be un-related, merging. Dcos will need another fix

@jscheffl jscheffl merged commit 8d8afdf into apache:main Feb 3, 2025
152 of 154 checks passed
dabla pushed a commit to dabla/airflow that referenced this pull request Feb 3, 2025
* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Move docker storage to second drive in general

* Cleanup debug and triaging stuff

* Exception of docker volume for constraints building
potiuk added a commit to potiuk/airflow that referenced this pull request Feb 4, 2025
The apache#46358 moved docker to another mounted directory - but this
directory and all files in it are owned by host user. The directory
and all files inside should be owned by root in order to properly
reflect permissions of the files when building docker images.

The change is now simplified. Rather than passing mount directory
by variable and passing it through GitHub Actions, we hard-code
the location of docker in cleanup_docker.sh script - we also
incorporate changing ownership and showing disk space in the same
cleanup_docker.sh script and make sure that script is only called
in the "real" (not composite) actions at the beginning - right
after the repository is checked out - previously that script
was also called in composite actions and changing the repo to be
writeable was done AFTER cleanup_docker.sh - which would not
work as we want the /mnt directory to be still owned by the
host user, but the docker storage should be still owned by root.
potiuk added a commit that referenced this pull request Feb 4, 2025
The #46358 moved docker to another mounted directory - but this
directory and all files in it are owned by host user. The directory
and all files inside should be owned by root in order to properly
reflect permissions of the files when building docker images.

The change is now simplified. Rather than passing mount directory
by variable and passing it through GitHub Actions, we hard-code
the location of docker in cleanup_docker.sh script - we also
incorporate changing ownership and showing disk space in the same
cleanup_docker.sh script and make sure that script is only called
in the "real" (not composite) actions at the beginning - right
after the repository is checked out - previously that script
was also called in composite actions and changing the repo to be
writeable was done AFTER cleanup_docker.sh - which would not
work as we want the /mnt directory to be still owned by the
host user, but the docker storage should be still owned by root.
ambika-garg pushed a commit to ambika-garg/airflow that referenced this pull request Feb 4, 2025
* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Move docker storage to second drive in general

* Cleanup debug and triaging stuff

* Exception of docker volume for constraints building
niklasr22 pushed a commit to niklasr22/airflow that referenced this pull request Feb 8, 2025
* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Move docker storage to second drive in general

* Cleanup debug and triaging stuff

* Exception of docker volume for constraints building
niklasr22 pushed a commit to niklasr22/airflow that referenced this pull request Feb 8, 2025
…e#46428)

The apache#46358 moved docker to another mounted directory - but this
directory and all files in it are owned by host user. The directory
and all files inside should be owned by root in order to properly
reflect permissions of the files when building docker images.

The change is now simplified. Rather than passing mount directory
by variable and passing it through GitHub Actions, we hard-code
the location of docker in cleanup_docker.sh script - we also
incorporate changing ownership and showing disk space in the same
cleanup_docker.sh script and make sure that script is only called
in the "real" (not composite) actions at the beginning - right
after the repository is checked out - previously that script
was also called in composite actions and changing the repo to be
writeable was done AFTER cleanup_docker.sh - which would not
work as we want the /mnt directory to be still owned by the
host user, but the docker storage should be still owned by root.
ambika-garg pushed a commit to ambika-garg/airflow that referenced this pull request Feb 17, 2025
* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Move docker storage to second drive in general

* Cleanup debug and triaging stuff

* Exception of docker volume for constraints building
ambika-garg pushed a commit to ambika-garg/airflow that referenced this pull request Feb 17, 2025
…e#46428)

The apache#46358 moved docker to another mounted directory - but this
directory and all files in it are owned by host user. The directory
and all files inside should be owned by root in order to properly
reflect permissions of the files when building docker images.

The change is now simplified. Rather than passing mount directory
by variable and passing it through GitHub Actions, we hard-code
the location of docker in cleanup_docker.sh script - we also
incorporate changing ownership and showing disk space in the same
cleanup_docker.sh script and make sure that script is only called
in the "real" (not composite) actions at the beginning - right
after the repository is checked out - previously that script
was also called in composite actions and changing the repo to be
writeable was done AFTER cleanup_docker.sh - which would not
work as we want the /mnt directory to be still owned by the
host user, but the docker storage should be still owned by root.
jscheffl added a commit to jscheffl/airflow that referenced this pull request Mar 15, 2025
* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Move docker storage to second drive in general

* Cleanup debug and triaging stuff

* Exception of docker volume for constraints building
jscheffl pushed a commit to jscheffl/airflow that referenced this pull request Mar 15, 2025
…e#46428)

The apache#46358 moved docker to another mounted directory - but this
directory and all files in it are owned by host user. The directory
and all files inside should be owned by root in order to properly
reflect permissions of the files when building docker images.

The change is now simplified. Rather than passing mount directory
by variable and passing it through GitHub Actions, we hard-code
the location of docker in cleanup_docker.sh script - we also
incorporate changing ownership and showing disk space in the same
cleanup_docker.sh script and make sure that script is only called
in the "real" (not composite) actions at the beginning - right
after the repository is checked out - previously that script
was also called in composite actions and changing the repo to be
writeable was done AFTER cleanup_docker.sh - which would not
work as we want the /mnt directory to be still owned by the
host user, but the docker storage should be still owned by root.
jscheffl added a commit that referenced this pull request Mar 15, 2025
* Fix disk space issues on github runners (#46358)

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Triage disk space issues, DO NOT MERGE

* Move docker storage to second drive in general

* Cleanup debug and triaging stuff

* Exception of docker volume for constraints building

* Fix ownership of files that docker uses in mounted directories (#46428)

The #46358 moved docker to another mounted directory - but this
directory and all files in it are owned by host user. The directory
and all files inside should be owned by root in order to properly
reflect permissions of the files when building docker images.

The change is now simplified. Rather than passing mount directory
by variable and passing it through GitHub Actions, we hard-code
the location of docker in cleanup_docker.sh script - we also
incorporate changing ownership and showing disk space in the same
cleanup_docker.sh script and make sure that script is only called
in the "real" (not composite) actions at the beginning - right
after the repository is checked out - previously that script
was also called in composite actions and changing the repo to be
writeable was done AFTER cleanup_docker.sh - which would not
work as we want the /mnt directory to be still owned by the
host user, but the docker storage should be still owned by root.

---------

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
@jscheffl jscheffl deleted the bugfix/triage-disk-space-issues branch October 5, 2025 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

all versions If set, the CI build will be forced to use all versions of Python/K8S/DBs area:dev-tools full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants