[IMP] jinja_to_qweb: avoid MemoryError #288
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
upg-2994884
Avoid
MemoryError
and/or killed process because ofmalloc()
failing withinlxml
/libxml2
.Debugging this by determining the size of involved datastructures through means of
sys.getsizeof()
showed that:templates_to_check
grows to roughly 1.4GiB after the various calls toupgrade_jinja_fields()
by upgrade scripts. The process uses ~1.5GiB at that pointverify_upgraded_jinja_fields()
, the process is still at ~1.5GiB. While iterating over all the templates intemplates_to_check
, no significant amount of memory is allocated on top of this to python datastructures. But, with each call tois_converted_template_valid()
, the size of the process increases until it hits the RLIMIT. This function calls intolxml
multiple times, suggesting that the memory is allocated inmalloc()
calls in the C library, evading python's accounting. Internet research shows thatlxml
has a long history of different memory leaks in C code, plus some caching mechanism across documents that could be responsible1. More recent versions of the module seem to have been improved, but still we're stuck with old versions.This patch solves / works around (2) by running the function body of
is_converted_template_valid()
in a forked subprocess that is killed immediately after (instead of using a process pool) and thus no additional memory is ever allocated bylxml
in the main process, its memory use stays at ~1.5GiB during the runtime ofverify_upgraded_jinja_fields()
. We also add a final line to that function that clears the global data when it is processed, at which point the processes memory use drops to 100MiB.This patch does not solve (1). Solving (2) is enough to fully process upg-2994884, but a future upgrade request may require a fix for (1), too. This will probably be a bigger patch, because I think a viable solution for this would mean to remove the global
templates_to_check
and instead store tha data in the database.Footnotes
https://benbernardblog.com/tracking-down-a-freaky-python-memory-leak-part-2/ ↩