Replies: 3 comments 3 replies
-
|
This is a great investigation @wjddn279 ! It's great to find that my "hypothesis" finds confirmation - the unexpected behaviour of @ashb @kaxil @amoghrajesh are likely those who should be looking at this and we should likely have an agreed approach, and I love the idea of actually documenting memory behaviour somewhere. My points for your discussion points:
I think we should be able to use it. The documentation about gc.freeze() recommends this pattern for forking and gc.freeze() is available since Python 3.7, so I don't see why we would not be able to use it. We could also check other usages of fork() in our code and see if it might help as well with memory usage. Likely DagFileProcessor might benefit a bit from it - though there maybe disabling and freezing gc continuously might have more side effects, I am quite sure however that gc.freeze approach for Task SDK and task execution might actually be very helpful to manage memory duplication for the forked process that actually executes the task. Probably worth testing the effect. https://docs.python.org/3/library/gc.html#gc.freeze
I agree, that if we manage memory in forks more efficiently with gc.freeze() than with the current lazy-loading - that might be better approach indeed.
Yeah. I remember refactoring it in Airflow 2 and trying to get rid of the Executor class from it, because essentially Job is a database model and keeping executor as a field there never felt right. Back then (1.5 year ago or so) I succeded only partially - but since then a lot of the code for Airlfow 3 have changed in this area, so refactoring it out seems like a good idea. |
Beta Was this translation helpful? Give feedback.
-
|
With the above work, the COW issue has been resolved, and we no longer observe cases of sudden large memory increases. However, I did notice a gradual increase in heap memory per worker. In a high-load environment where 100 DAGs with 5 tasks each run every minute, memory increased by about 20MB over 12 hours. Using memray, I found that this phenomenon was related to worker log file creation. A memory leak can be observed in the section where worker log files are generated. This issue occurs identically in the current version 3.1.2, confirming it's not related to gc.freeze. However, if log file creation doesn't happen frequently, it's likely not a significant problem. I think it would be good to resolve this COW issue first, then have discussions to address this separate issue. |
Beta Was this translation helpful? Give feedback.
-
Very good point. I was looking earlier at that issue and It does look that gc.freeze() should help there without any MySQL specific solution and without adding unnecessary complexity in handling the connections.
This is a great catch. I think this is a fantastic find. Yes what happens in that case - we are creating new loggger every time we start a new task and my guess is that the loggers are never deleted and hold all the related resources. It's the first time I look at the way how we are using structlog) - but I think for performance reasons we are using "cache_logger_on_first_use" and that holds the loggers we create in memory, held by a global cache dictionary - and for whatever reason (configuration likely - different file name used for different tasks) - each such logger for different tasks is a new cached logger that is only reused from cache while the task is running and then never deleted after the task completes - but I think it needs a bit deeper knowledge how structlog is used in this case in the Local Executor. Thanks for being so dilligent! @ashb @amoghrajesh -> WDYT ? I think those two finds are quite actionable and we should be able to apply some memory optimisations with it ? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I investigated the final part of this issue. kaxil's work resolved the first two problems in version 3.1.2.
However, LocalExecutor still shows continuous memory increase. Using smem, I observed that worker processes starting at 20-30MB grow to over 100MB after 1-2 hours, steadily rising to ~50MB then suddenly jumping to ~100MB (another case)
As jarek mentioned, I confirmed this is due to parent process COW by examining worker process pmap, which showed increased PSS for memory addresses shared with the parent process.
Initially I tried minimizing pre-fork memory allocation, but this was ineffective since initialization loads heavy libraries/objects (sqlalchemy, fastapi, LazyDeserializedDAG, etc.) consuming at least 150MB.
I found article, which describes gc.freeze as preventing COW by moving objects to permanent locations. I confirmed this prevents COW in our workers. (Benchmark results will be attached with the PR.)
Before sharing the solution code, I'd like to discuss:
Are there side effects of gc.freeze?
The approach is simple and works well for us, but I'd like others' opinions on potential issues. This would only apply to LocalExecutor, and by using gc.freeze -> worker fork -> gc.unfreeze to release from permanent state, it prevents COW without affecting scheduler operations
Should we maintain LocalExecutor's current worker creation method?
Current lazy loading would require repeating gc.freeze/unfreeze cycles, potentially affecting scheduler performance. Pre-creating workers at parallelism level (like v2.x) would be more efficient. Of course, worker processes might die and need to be created irregularly, but this would occur much less frequently than the current method.
Alternatively, to maintain the existing lazy loading approach, we could fork a gc.freeze'd snapshot process for worker generation (scheduler -> fork (gc freezed snapshot) -> worker), but this adds complexity.
Should the Job class be coupled with the executor?
The executor is currently loaded and defined as a field in the Job class. The only class that uses this is the scheduler. It would be more intuitive to explicitly separate the executor loading and pass it as an argument to SchedulerJobRunner. This would also prevent unnecessary memory forking in the LocalExecutor case (even if the COW issue is resolved).
PS. I'd like to document memory management insights from this investigation (e.g., which objects are heavy and should be lazy-loaded). Is there existing documentation for this?
Beta Was this translation helpful? Give feedback.
All reactions