-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Cleanup of Temporary Directories for built-in Materializers #2257
Comments
Can you assign this to me please |
Is there a documented reproduction case for this?
I've examined all of the materializers in zenml/src/zenml/materializers, and here's what I've found:
Do we have a list of the materializers that are engaging in the offensive behavior? Without knowing how they are creating the temporary directories, designing a solution is quite difficult. |
Most of the 'real' materializers that people use are in |
Looks like these are the integrations that create temporary objects in their materializer
I'll take a look |
Most of those materializers are already using the However there are a few corner cases:
I have a patch that fixes the common cases, I'll work that into a PR and improve the tests where possible. Can anyone answer for these 3 special case integrations? |
@akesterson sorry for the delay.
Thanks! |
…reation onto the tempfile module
Sorry for the long delay here. PR is up. There are some comments around the best way to test. |
…reation onto the tempfile module
…reation onto the tempfile module
I discovered another materializer that is a special case. The tensorflow dataset materializer intentionally does not clean up after itself in the load() stage, because the dataset is lazily loaded from the generated temporary directory. A decision should be made as to whether or not these files are actually temporary, or if there needs to be something else done in the artifact storage for this pattern. |
…reation onto the tempfile module
…reation onto the tempfile module
…reation onto the tempfile module
…reation onto the tempfile module
…reation onto the tempfile module
…reation onto the tempfile module
…reation onto the tempfile module
…reation onto the tempfile module
Open Source Contributors Welcomed!
Please comment below if you would like to work on this issue!
Contact Details [Optional]
support@zenml.io
What happened?
Some materializers in ZenML create local temporary directories to handle files from the artifact store. These directories are not being cleaned up automatically, leading to potential clutter and storage issues, especially when large or numerous temporary files are involved.
Task Description
Develop a solution within ZenML to ensure that temporary directories created by materializers are cleaned up efficiently after their use. This could involve direct cleanup mechanisms within materializers or a centralized approach, such as a dedicated temporary directory for ZenML step runs, which is automatically cleared post-execution.
Expected Outcome
Steps to Implement
Additional Context
Ensuring efficient cleanup of temporary directories is crucial for sustainable resource management and to maintain optimal performance in data pipelines managed by ZenML.
Code of Conduct
The text was updated successfully, but these errors were encountered: