-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Add RESTScanReporter to send scan report to REST endpoint #5407
Conversation
b69475b
to
0bda731
Compare
0bda731
to
228e7be
Compare
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/metrics/ScanReportParser.java
Outdated
Show resolved
Hide resolved
8a49731
to
98a9be3
Compare
2505013
to
2c43833
Compare
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
"tables", | ||
RESTUtil.encodeString(identifier.name()), | ||
"metrics", | ||
"scan"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to make multiple endpoints? Why not just post to metrics with a type included in the serialized metrics report? That way there's only one metrics endpoint needed and we don't need to specify this later.
We can also make the update to the REST spec generic using anyOf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment in #5407 (comment) about the initial reasoning for a /metrics/scan
endpoint (rather than just /metrics
).
Obviously we could post simply to /metrics
with a type but eventually you might want to also do a GET
and retrieve metrics of a particular type, so using /metrics/scan
seems more in-line with how REST would model post/get requests.
Example: We could have a /metrics/commit
endpoint where we'd POST
commit metrics to and could also perform a GET
to retrieve them
I understand the reasoning for this, since we'd like to keep changes to the REST spec to a minimum. I don't have a strong opinion on this, but I'd like us to at least consider this. IMO this is a bit cleaner and aligns better with REST to have it this way. However, I'm interested in what others think about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielcweeks, what do you think about having one metrics endpoint vs. separate endpoints for different metrics types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this for a while and I think we should go with the generic metrics endpoint rather than specific endpoints for scan, commit, etc. That keeps the REST spec simpler. I also think that it fits with the potential for future GET requests because we could provide a listing of metric reports and filter them using query params for the .../metrics
endpoint. We could also return individual reports using .../metrics/<report-id>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good. The only concerns I have are:
- Can we make the metrics endpoint and requests generic so that we don't have to modify this to send other metrics?
- Do we need a request object that adds another layer to the JSON? This is minor but it seems odd to have a record that just wraps another record to make the Java types work out (i.e.
ScanReportRequest
is aRESTRequest
).
Not sure if there's an easier way to do it, but essentially our |
2c43833
to
1f459d2
Compare
24f6882
to
698d22a
Compare
31d9d93
to
199c8ba
Compare
core/src/test/java/org/apache/iceberg/rest/requests/TestReportMetricsRequestParser.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/requests/TestReportMetricsRequestParser.java
Show resolved
Hide resolved
*/ | ||
package org.apache.iceberg.metrics; | ||
|
||
public interface MetricsReport {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also expose the metrics from the underlying report. That way we only need parsers to serialize the scan- or commit-specific fields and can then delegate to a metrics parser. In addition, we can also still handle metrics from an unknown report type (minor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might make sense, but I would probably defer this until we have a better picture of how commit reports look like
44f0738
to
3d66379
Compare
core/src/main/java/org/apache/iceberg/rest/requests/ReportMetricsRequestParser.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, other than the serialization format for the metrics report request.
We can also break this into smaller pieces if that's going to be a blocker. All of the API refactoring looks ready to go in to me.
f738731
to
1ee9511
Compare
.addDeserializer(OAuthTokenResponse.class, new OAuthTokenResponseDeserializer()) | ||
.addSerializer(ReportMetricsRequest.class, new ReportMetricsRequestSerializer<>()) | ||
.addDeserializer(ReportMetricsRequest.class, new ReportMetricsRequestDeserializer<>()) | ||
.addSerializer(ImmutableReportMetricsRequest.class, new ReportMetricsRequestSerializer<>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into this in a follow-up. I was expecting that registering a custom serializer/deserializer for ReportMetricsRequest
would also work out-of-the-box for ImmutableReportMetricsRequest
but apparently that's not the case and it requires the subclass to be registered explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #5911 to address this
1ee9511
to
99d4539
Compare
@rdblue thanks for the deep review. I've simplified/improved the things you mentioned and a few other things |
() -> ReportMetricsRequestParser.fromJson("{\"report-type\":\"invalid\"}")) | ||
.isInstanceOf(IllegalArgumentException.class) | ||
.hasMessage( | ||
"No enum constant org.apache.iceberg.rest.requests.ReportMetricsRequest.ReportType.INVALID"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: we could replace this with a better error message for the caller. No enum constant doesn't really tell me that I was trying to deserialize an unsupposed report type, invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a similar pattern in a few other parsers as well. I'll follow up with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #5910 to address this
|
||
Assertions.assertThat(ReportMetricsRequestParser.fromJson(json).report()) | ||
.usingRecursiveComparison() | ||
.ignoringFields("projection") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? It seems strange to me that the projection doesn't match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no equals()
method defined on Schema
, so it will always perform reference equality check
Nice work, @nastra! This looks ready so I merged it. |
depends on #5427