-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add venv pycache clean up for the PythonVirtualenvOperator #53390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add venv pycache clean up for the PythonVirtualenvOperator #53390
Conversation
|
Hi @potiuk ! |
3ca211d to
f9bbb4e
Compare
providers/standard/src/airflow/providers/standard/operators/python.py
Outdated
Show resolved
Hide resolved
providers/standard/src/airflow/providers/standard/operators/python.py
Outdated
Show resolved
Hide resolved
providers/standard/src/airflow/providers/standard/operators/python.py
Outdated
Show resolved
Hide resolved
providers/standard/src/airflow/providers/standard/operators/python.py
Outdated
Show resolved
Hide resolved
|
There is a potential problem I see here. On Celery worker when you have multiple tasks in parallel, potentially the same Also I am not sure what problem it solves to be honest - the But there, I'd say a better solution might be to set PYTHONDONOTWRITEBYTECODE before launching interpreter. Yes it might be slower a bit because parsing and bytecode generation happens every time, but in vast majority of cases when tasks runs for seconds, this will be negligible overhead - unless you have vast amount of code to parse. Could you please elaborate a bit more what exactly problem we are trying to solve here? |
|
@potiuk Thank you for the attention and detailed review!
The exact problem - there is as a side effect of running the
With behavior mentioned in the STR above, I see an issue with many operator runs, it may eventually pollute file system, exhaust resources on cloud providers environments. As for
I couldn't agree more! There was also my first intention.
So you mean |
bd62ed8 to
f7e1360
Compare
f7e1360 to
946456b
Compare
|
hi there! |
e44956d to
124960b
Compare
Extend cleanup logic for the PythonVirtualenvOperator, as previous implementation cleans up the venv directory itself, but kept the pycache trace.
124960b to
ce5d005
Compare
Yeah. with temp dir there is no risk. |
|
|
||
| with TemporaryDirectory(prefix="venv") as tmp_dir: | ||
| tmp_path = Path(tmp_dir) | ||
| tmp_dir, temp_venv_dir = tmp_path.relative_to(tmp_path.anchor).parts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added fix here #54214
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, can you please share a way to reproduce it, as for with TemporaryDirectory(prefix="venv") as tmp_dir: I was expecting a path like tmp_dir_path/venv<random_str>, as both parts are 'static'. What is a way to add additional nested dirs there? @gopidesupavan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you can do it simply in local. your assumption will work inside container. but it may not work if it installed standalone in local.
you can follow these steps:
uv venv --python 3.10
source .venv/bin/activate
mkdir airflow_home
export AIRFLOW_HOME=<pwd>/airflow_home
uv pip install -U apache-airflow==3.0.4rc2 --pre --constraints https://github.com/apache/airflow/blob/constraints-3.0.4rc2/constraints-3.10.txt
airflow standalone
pwd -> repalce full path
And you can run an example with PythonVirtualenvOperator


Extend cleanup logic for the PythonVirtualenvOperator, as previous implementation cleans up the venv directory itself, but kept the pycache trace.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.