-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove mmap usage in binary index header reader #6070
Comments
I would like to work on this issue. |
2c: This is only theory, I personally never see us being impacted by this, but also I never benchmarked for concurrency latency like this. What I am saying - if new solution will be both slower and use more RAM, then it might be not a good change. Still worth to experiment on it with learnings from @pracucci (: BTW RAM will be higher (a lot!) as mmap is not tracked by go_mem metrics AND typical cadvisor RSS metric (should be tracked by container working bytes though). |
Running Mimir at Grafana Labs this was also practice. We suffered this issue since more than a year, but the bigger the scale the worse the situation went. Removing mmap fixed it. At some point we've been able to prove it slowing down the filesystem using fuse. However, it took us as an huge effort just to prove it.
Surprise, surprise... it's not. Now we're running store-gateway without mmap in all production clusters and, no, store-gateway memory is not higher and store-gateway latency is not higher. Just to be clear, removing mmap doesn't mean you load the whole index-header in memory. We rewrote all Symbols reader, DecBuf & co in a streaming way. We never read everything in memory. We just read batch-by-batch, reusing buffers, pooling file descriptions, so if you make it properly memory is not higher. |
We saw huge improvements on compactor time only giving it way more memory - even though the compactor process was using only 25% of the available memory- allowing the os to cache more files and so reduce those pause times (when fetching mapped data from disk). We could also see big improvements on compaction time if we mlock the symbols table. We never saw the blocked time on the store gateway (maybe we just did not look deep enough) but we definitely saw this problems in other places on the Tsdb code. I think using buffer pools can improve the block times but also to give more control on the cache eviction policy. |
If Thanos community is interested into it, we're open to upstream our no-mmap readers back to Prometheus. |
I think that upstream is a good idea. |
Problem is that Thanos Store reads binary index headers using mmap. Goroutine scheduler is blocked when the whole process is blocked during reading. Most of our friends like Mimir/VictoriaMetrics have already switched to using good, old seek/read so let's do that as well. This allows the goroutine scheduler to schedule something else while it is blocked by read/seek.
The text was updated successfully, but these errors were encountered: