Skip to content

[Core] support scan metrics #2170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 1, 2023
Merged

Conversation

schnappi17
Copy link
Contributor

Purpose

Linked issue: close #2000

@schnappi17 schnappi17 closed this Oct 23, 2023
@schnappi17 schnappi17 reopened this Oct 23, 2023
@schnappi17 schnappi17 force-pushed the scan-metrics branch 5 times, most recently from 9b30db8 to 93c0223 Compare October 25, 2023 06:20
Copy link
Contributor

@tsreaper tsreaper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also bridge this metrics with Flink, just like CommitMetrics?

Iterable<ManifestEntry> entries =
ParallellyExecuteUtils.parallelismBatchIterable(
files ->
files.parallelStream()
.filter(this::filterManifestFileMeta)
.flatMap(m -> readManifest.apply(m).stream())
.filter(this::filterByStats)
.peek(e -> cntEntries.getAndIncrement())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly increase cntEntries by the size of the collected list? If you use peek, cntEntries will be increased by every element in this list. Note that this stream is parallelized, so there will be many concurrent writes which affect the performance.

@@ -247,6 +248,12 @@ public SnapshotReader withBucketFilter(Filter<Integer> bucketFilter) {
return this;
}

@Override
public SnapshotReader withMetricRegistry(MetricRegistry registry) {
// won't register metric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? AuditLogTable also scans from data file.

@tsreaper tsreaper merged commit 683a245 into apache:master Nov 1, 2023
schnappi17 added a commit to schnappi17/flink-table-store that referenced this pull request Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Support scan metrics
2 participants