Reduce resource consumption when using file discovery with same file in multiple probes
Setup:
Multiple HTTP probes with same file_targets {file_path} (generated from consul-template) and different filter{} to for each probe to select related targets (similar approach as with run_on {}), .textpb file is about 220 MiB in size with more than 1M resource{}s, cloudprober.cfg has more than 500 probes.
Problem:
During start all probes launch file.refresh() and it kills almost any machine. Looking at code - all probes simultaneous opens same file and do .Unmarshal() in same time. This is memory and CPU heavy operation. Later on everything normalized due random delays and GC.
Description of what you want to happen and what problem will it solve for you.
I tried few different ways like putting global file mutex which serializes all processing but then it takes ages to gather targets for all probes. Doing similar randomization as in refresh loop will help too but same as with serial processing it will take ages. I also tried similar approach as described here: https://firas.dev.sy/post/go/2020/making-timed-maps-golang/ but simpler and with extended API with .LoadOrStore() as in sync.Map for reserving computation (file reading to avoid thundering herd problem), two maps - for reservation and another one for caching Unmarshal()'ed resources (before filtering). I got some promising results but don't have working/publishable code yet.
Some numbers I've measured (no concurrency):
- loading file - almost instant
- .Unmarshal() - ~50s
- filter targets for probe - ~1s
BTW, I have everything working with all probes and targets when I put everything statically in main configuration file.
Reduce resource consumption when using file discovery with same file in multiple probes
Setup:
Multiple HTTP probes with same file_targets {file_path} (generated from consul-template) and different filter{} to for each probe to select related targets (similar approach as with
run_on{}),.textpbfile is about 220 MiB in size with more than 1M resource{}s,cloudprober.cfghas more than 500 probes.Problem:
During start all probes launch file.refresh() and it kills almost any machine. Looking at code - all probes simultaneous opens same file and do .Unmarshal() in same time. This is memory and CPU heavy operation. Later on everything normalized due random delays and GC.
Description of what you want to happen and what problem will it solve for you.
I tried few different ways like putting global file mutex which serializes all processing but then it takes ages to gather targets for all probes. Doing similar randomization as in refresh loop will help too but same as with serial processing it will take ages. I also tried similar approach as described here: https://firas.dev.sy/post/go/2020/making-timed-maps-golang/ but simpler and with extended API with .LoadOrStore() as in sync.Map for reserving computation (file reading to avoid thundering herd problem), two maps - for reservation and another one for caching Unmarshal()'ed resources (before filtering). I got some promising results but don't have working/publishable code yet.
Some numbers I've measured (no concurrency):
BTW, I have everything working with all probes and targets when I put everything statically in main configuration file.