Make metrics more local to prevent unbounded memory usage#6156
Make metrics more local to prevent unbounded memory usage#6156
Conversation
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
a556fb3 to
858e9e5
Compare
| @@ -1,27 +0,0 @@ | |||
| // SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
This is not used, if any downstream user wants this its easy enough to replicate and it delegates to a 3rd part crate.
| @@ -1,17 +0,0 @@ | |||
| // SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
We don't want global metrics, at least not the way they work now.
CodSpeed Performance ReportMerging this PR will improve performance by 12.07%Comparing
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | WallTime | u64_values_u32_codes[10M] |
316.5 µs | 282.4 µs | +12.07% |
Footnotes
-
1219 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
Part of #6078. I have a bigger branch I'm experimenting with in the background, but I figured it makes to start breaking it into smaller parts while I figure out bigger issues. This PR is a step towards simplifying the metrics API, starting by trying to make it scoped to scan so holding instances of `VortexFile` won't make memory grow in an unbounded way. ## Changes in this PR 1. I've also made some code less reliant on `Arc<dyn VortexReadAt>` in favor of a statically dispatched `R: VortexReadAt + Clone`, otherwise we end up with an `Arc<Arc<Arc<dyn VortexReadAt>>>` if not worse. 2. Made IO-level metrics opt-in, they are enabled by default for vortex-datafusion and `InstrumentedReadAt` is still part of the public API, maybe we should make it even more visible? 3. Make `VortexOpenOptions::open_read` public. I don't want to break the public API here, but it seems more useful than `VortexOpenOptions::read` that requires an `Arc` and just delegates to it anyway. --------- Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Part of #6078.
I have a bigger branch I'm experimenting with in the background, but I figured it makes to start breaking it into smaller parts while I figure out bigger issues.
This PR is a step towards simplifying the metrics API, starting by trying to make it scoped to scan so holding instances of
VortexFilewon't make memory grow in an unbounded way.Changes in this PR
Arc<dyn VortexReadAt>in favor of a statically dispatchedR: VortexReadAt + Clone, otherwise we end up with anArc<Arc<Arc<dyn VortexReadAt>>>if not worse.InstrumentedReadAtis still part of the public API, maybe we should make it even more visible?VortexOpenOptions::open_readpublic. I don't want to break the public API here, but it seems more useful thanVortexOpenOptions::readthat requires anArcand just delegates to it anyway.