-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minion taskExecutor for RealtimeToOfflineSegments task #6050
Conversation
96fa857
to
1010a8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some metrics? Number of tables/segments merged, time taken, etc. I think this needs some thought, so you can do it in a subsequent PR, but please add a TODO somewhere, or create an issue, or something to track it.
...on/src/main/java/org/apache/pinot/minion/executor/RealtimeToOfflineSegmentsTaskExecutor.java
Outdated
Show resolved
Hide resolved
...on/src/main/java/org/apache/pinot/minion/executor/RealtimeToOfflineSegmentsTaskExecutor.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
if (sortedColumns != null) { | ||
for (String column : sortedColumns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this check, given that we are pulling from tableconfig (at least in this task). If you are checking, then you should perhaps also check that this sorted column is not same as one of the aggregation columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recent table config validation efforts showed that we hadn't been checking for validity of column names used in indexing config. I want to prevent failing in the SegmentProcessorFramework as much as possible, hence the check. If this check is not done, then we'd see failure in reduce step, after a lot of wasted computation.
I don't think we need to check for sorted column not being a metric, as nothing is really stopping a user from setting it that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant if a column is present both as sortedColumn, but specified in aggregateConfigs
above. Not sure if that case is handled correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a valid scenario imo
pinot-core/src/main/java/org/apache/pinot/core/common/MinionConstants.java
Outdated
Show resolved
Hide resolved
/** | ||
* Construct a {@link RecordFilterConfig} by setting a filter function on the time column, for extracting data between window start/end | ||
*/ | ||
private RecordFilterConfig getRecordFilterConfigForWindow(long windowStartMs, long windowEndMs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that we will drop the late data once the data from a window get moved to offline?
e.g.
- day1 gets moved to offline table
- day1 data arrived late (this row shows up in the result because of realtime table will index and serve this row)
- day2 gets moved to offline table (the above late data will be dropped by the filter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this will happen. this has been called out in the design doc: https://docs.google.com/document/d/1-e_9aHQB4HXS38ONtofdxNvMsGmAoYfSnc2LP88MbIc/edit#heading=h.5lkm0pm6vp7o
We discussed that the only way to handle this is for user to set a sufficient "bufferTime".
...on/src/main/java/org/apache/pinot/minion/executor/RealtimeToOfflineSegmentsTaskExecutor.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/common/MinionConstants.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#5753
The minion task executor which receives
And then creates Pinot segments using data from that time window.
Uses the SegmentProcessorFramework. Applies:
Next steps:
RealtimeToOfflineSegmentsTaskGenerator