Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove mmap usage in binary index header reader #6070

Open
GiedriusS opened this issue Jan 24, 2023 · 6 comments
Open

Remove mmap usage in binary index header reader #6070

GiedriusS opened this issue Jan 24, 2023 · 6 comments

Comments

@GiedriusS
Copy link
Member

Problem is that Thanos Store reads binary index headers using mmap. Goroutine scheduler is blocked when the whole process is blocked during reading. Most of our friends like Mimir/VictoriaMetrics have already switched to using good, old seek/read so let's do that as well. This allows the goroutine scheduler to schedule something else while it is blocked by read/seek.

@kon3m
Copy link

kon3m commented Jan 24, 2023

I would like to work on this issue.

@bwplotka
Copy link
Member

2c: This is only theory, I personally never see us being impacted by this, but also I never benchmarked for concurrency latency like this. What I am saying - if new solution will be both slower and use more RAM, then it might be not a good change. Still worth to experiment on it with learnings from @pracucci (:

BTW RAM will be higher (a lot!) as mmap is not tracked by go_mem metrics AND typical cadvisor RSS metric (should be tracked by container working bytes though).

@pracucci
Copy link
Contributor

This is only theory

Running Mimir at Grafana Labs this was also practice. We suffered this issue since more than a year, but the bigger the scale the worse the situation went. Removing mmap fixed it. At some point we've been able to prove it slowing down the filesystem using fuse. However, it took us as an huge effort just to prove it.

BTW RAM will be higher (a lot!) as mmap is not tracked by go_mem metrics AND typical cadvisor RSS metric (should be tracked by container working bytes though).

Surprise, surprise... it's not. Now we're running store-gateway without mmap in all production clusters and, no, store-gateway memory is not higher and store-gateway latency is not higher.

Just to be clear, removing mmap doesn't mean you load the whole index-header in memory. We rewrote all Symbols reader, DecBuf & co in a streaming way. We never read everything in memory. We just read batch-by-batch, reusing buffers, pooling file descriptions, so if you make it properly memory is not higher.

@alanprot
Copy link
Contributor

We saw huge improvements on compactor time only giving it way more memory - even though the compactor process was using only 25% of the available memory- allowing the os to cache more files and so reduce those pause times (when fetching mapped data from disk). We could also see big improvements on compaction time if we mlock the symbols table.

We never saw the blocked time on the store gateway (maybe we just did not look deep enough) but we definitely saw this problems in other places on the Tsdb code.

I think using buffer pools can improve the block times but also to give more control on the cache eviction policy.

@pracucci
Copy link
Contributor

If Thanos community is interested into it, we're open to upstream our no-mmap readers back to Prometheus.

@480132
Copy link

480132 commented Mar 6, 2024

I think that upstream is a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants