-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API/Core: Make ScanReport and its related classes Immutable #5780
Conversation
b535b85
to
25c378b
Compare
api/src/test/java/org/apache/iceberg/metrics/TestScanReport.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/metrics/TestScanMetricsResultParser.java
Show resolved
Hide resolved
public Timer totalPlanningDuration() { | ||
return totalPlanningDuration; | ||
return metricsContext().timer(TOTAL_PLANNING_DURATION, TimeUnit.NANOSECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler just to use a ScanMetrics.from(MetricsContext)
method so that ScanMetrics
doesn't need to keep track of the context? The from
method could create all of the metrics and pass them into the builder and we wouldn't need all the generated code for initialization and making sure MetricsContext
is only called once per counter/timer. If we also make these required, then there should be no problem with them being null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the MetricsContext
parameter here less as a "this class needs to keep track of this" but rather "this is what's required to create an instance of this class". Your comment is perfectly valid here and I initially had the exact same thought.
Therefore it make sense to have a ScanMetrics.from(MetricsContext)
. In fact we have a ScanMetrics.of(MetricsContext)
that does exactly that.
We could go ahead and change ScanMetrics.of(...)
to:
public static ScanMetrics of(MetricsContext metricsContext) {
return ImmutableScanMetrics.builder()
.metricsContext(metricsContext)
.totalPlanningDuration(
metricsContext.timer(TOTAL_PLANNING_DURATION, TimeUnit.NANOSECONDS))
.resultDataFiles(metricsContext.counter(RESULT_DATA_FILES, MetricsContext.Unit.COUNT))
.resultDeleteFiles(metricsContext.counter(RESULT_DELETE_FILES, MetricsContext.Unit.COUNT))
.scannedDataManifests(
metricsContext.counter(SCANNED_DATA_MANIFESTS, MetricsContext.Unit.COUNT))
.totalDataManifests(
metricsContext.counter(TOTAL_DATA_MANIFESTS, MetricsContext.Unit.COUNT))
.totalDeleteManifests(
metricsContext.counter(TOTAL_DELETE_MANIFESTS, MetricsContext.Unit.COUNT))
.totalFileSizeInBytes(
metricsContext.counter(TOTAL_FILE_SIZE_IN_BYTES, MetricsContext.Unit.BYTES))
.totalDeleteFileSizeInBytes(
metricsContext.counter(TOTAL_DELETE_FILE_SIZE_IN_BYTES, MetricsContext.Unit.BYTES))
.skippedDataManifests(
metricsContext.counter(SKIPPED_DATA_MANIFESTS, MetricsContext.Unit.COUNT))
.build();
}
It would produce the exact same results. The only reason I went with @Value.Derived
is because it does not allow manual creation of metric instances.
ScanMetrics.of(...)
is just a utility method and it doesn't prevent anyone from using the Builder directly. Using @Value.Derived
in this context means essentially that all those metric instances are derived from MetricsContext
but cannot be manually set, thus providing a builder where only ImmutableScanMetrics.builder() .metricsContext(metricsContext).build()
can be set.
The lazy initialization of metrics that @Value.Derived
provides is just a nice thing to have that we get here, but my main motivation was to prevent the different counters/timers to be manually set. This gives us the exact same semantics we had in the previous ScanMetrics
API, since all the metrics were initialized in the constructor.
* Use true immutable objects that are type-safe, thread-safe, null-safe * Get builder classes for free This is relying on https://immutables.github.io/ (Apache License 2.0), which allows generating immutable objects and builders via annotation processing. * Immutable objects are serialization ready (including JSON and its binary forms) * Supports lazy, derived and optional attributes * Immutable objects are constructed once, in a consistent state, and can be safely shared * Will fail if mandatory attributes are missing * Cannot be sneakily modified when passed to other code * Immutable objects are naturally thread-safe and can therefore be safely shared among threads * No excessive copying * No excessive synchronization * Object definitions are pleasant to write and read * No boilerplate setter and getters * No ugly IDE-generated hashCode, equals and toString methods that end up being stored in source control. Note that we are specifically preventing people from using Jackson-related annotations (`@JsonSerialize` & `@JsonDeserialize`) in order to avoid potential runtime library dependency issues where a clashing jackson lib would prevent jackson-related annotations to be used/processed.
25c378b
to
649c7b6
Compare
Thanks, @nastra! Looks great. |
This is relying on https://immutables.github.io/ (Apache License 2.0), which allows generating immutable objects and builders via annotation processing.
Note that we are specifically preventing people from using Jackson-related annotations (
@JsonSerialize
&@JsonDeserialize
) in order to avoid potential runtime library dependency issues where a clashing jackson lib would prevent jackson-related annotations to be used/processed.