Skip to content

Commit

Permalink
Correctly record the rawBytesRead_ metric in TableScan (#8147)
Browse files Browse the repository at this point in the history
Summary:
Based on oerling design:
_rawBytesRead is the quantized size of data read by a query.  Read is the volume copied from storage. The second is 0 if data comes from caches. With DirectBufferedInput  these are close but then read has unused coalesced columns counted where raw read does not. A column can be coalesced into IO but never hit by the query due to filtering._

Currently, `DirectBufferInput` only tracks `rawBytesRead` in the `loadSync` method. It does not track `rawBytesRead` in the asynchronous `loadData` method. This PR adds the `rawBytesRead` metric to the `loadData` method.

Pull Request resolved: #8147

Reviewed By: Yuhta

Differential Revision: D52616884

Pulled By: xiaoxmeng

fbshipit-source-id: f56f72167ff4cd9ab1c1eb083791d7453de55e98
  • Loading branch information
JkSelf authored and facebook-github-bot committed Jan 25, 2024
1 parent de50afd commit 6d2f0d4
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 1 deletion.
3 changes: 3 additions & 0 deletions velox/dwio/common/CacheInputStream.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,9 @@ void CacheInputStream::loadPosition() {
// 'region_'
loadRegion.length = std::min<int32_t>(
loadQuantum_, region_.length - (loadRegion.offset - region_.offset));

// There is no need to update the metric in the loadData method because
// loadSync is always executed regardless and updates the metric.
loadSync(loadRegion);
}
auto* entry = pin_.checkedEntry();
Expand Down
1 change: 1 addition & 0 deletions velox/dwio/common/DirectBufferedInput.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,7 @@ std::vector<cache::CachePin> DirectCoalescedLoad::loadData(bool isPrefetch) {
}
input_->read(buffers, requests_[0].region.offset, LogType::FILE);
ioStats_->read().increment(size);
ioStats_->incRawBytesRead(size - overread);
ioStats_->incRawOverreadBytes(overread);
if (isPrefetch) {
ioStats_->prefetch().increment(size);
Expand Down
3 changes: 3 additions & 0 deletions velox/dwio/common/DirectInputStream.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,9 @@ void DirectInputStream::loadPosition() {
loadedRegion_.length = (offsetInRegion_ + loadQuantum_ <= region_.length)
? loadQuantum_
: (region_.length - offsetInRegion_);

// Since the loadSync method updates the metric, but it is conditionally
// executed, we also need to update the metric in the loadData method.
loadSync();
}

Expand Down
3 changes: 2 additions & 1 deletion velox/exec/tests/TableScanTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,8 @@ TEST_F(TableScanTest, allColumns) {
auto scanNodeId = plan->id();
auto it = planStats.find(scanNodeId);
ASSERT_TRUE(it != planStats.end());
ASSERT_TRUE(it->second.peakMemoryBytes > 0);
ASSERT_GT(it->second.peakMemoryBytes, 0);
ASSERT_GT(it->second.rawInputBytes, 0);
EXPECT_LT(0, exec::TableScan::ioWaitNanos());
}

Expand Down

0 comments on commit 6d2f0d4

Please sign in to comment.