chore(ingestion): optionally serialize the calls to loadPlugin to limit memory usage #17391
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
We can observe prod-us plugin-ingestion memory usage spiking up to 12GiB at startup, and sometimes OOMIng. This does not happen on EU nor other roles of plugin-server. overflow spikes lower, but still around 5GiB, even when there's no significant traffic to the overflow topic.
We currently have 31k active pluginconfig entries on US, and we can observe that this creates a significant amount of heap allocations on startup:
#16329 tried to delay the call to
loadPlugin
until an event comes in, but that behaviour would break scheduler, and the PR has failing tests due to side-effects. The current PR is a way to reduce the impact, instead of completely negating it, by hopefully reducing the GC pressure.Changes
PLUGIN_LOAD_SEQUENTIALLY
boolean option to callloadPlugin
sequentially instead of in parallel for all pluginconfigsloadPlugin
is async, its only awaits are frontend app transpilation (only executed on scheduler pods) andprocessError
when the plugin is invalid. I think this should not increase startup times significantly, but I'm keeping it disabled by default for now.How did you test this code?