-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segment processing framework #5934
Conversation
80e1c4d
to
eece981
Compare
Codecov Report
@@ Coverage Diff @@
## master #5934 +/- ##
===========================================
- Coverage 66.44% 43.20% -23.24%
===========================================
Files 1075 1210 +135
Lines 54773 62540 +7767
Branches 8168 9529 +1361
===========================================
- Hits 36396 27023 -9373
- Misses 15700 33081 +17381
+ Partials 2677 2436 -241
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments for high-level discussion
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
Show resolved
Hide resolved
...org/apache/pinot/core/segment/processing/transformer/TransformFunctionRecordTransformer.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/function/FunctionEvaluator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/function/InbuiltFunctionEvaluator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/segment/processing/partitioner/PartitionFilter.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/pinot/core/segment/processing/partitioner/PartitionerFactory.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/segment/processing/collector/CollectorConfig.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentReducer.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/collector/Collector.java
Outdated
Show resolved
Hide resolved
...-core/src/main/java/org/apache/pinot/core/segment/processing/collector/CollectorFactory.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/segment/processing/collector/RollupCollector.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/segment/processing/collector/RollupCollector.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentReducer.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/pinot/core/segment/processing/partitioner/PartitionerFactory.java
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/segment/processing/utils/SegmentProcessorUtils.java
Show resolved
Hide resolved
2c642d6
to
eece981
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job splitting the work into multiple modules and make it very easy to extend. Mostly minor comments.
One high level comment: since we always use Avro as the intermediate format, should we directly work on GenericRecord
instead of converting back and forth between GenericRow
and GenericRecord
?
Also, we might want to support more input formats other than Pinot segments. We can do it as the next step.
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/collector/Collector.java
Show resolved
Hide resolved
...-core/src/main/java/org/apache/pinot/core/segment/processing/collector/GenericRowSorter.java
Outdated
Show resolved
Hide resolved
...-core/src/main/java/org/apache/pinot/core/segment/processing/collector/GenericRowSorter.java
Outdated
Show resolved
Hide resolved
...-core/src/main/java/org/apache/pinot/core/segment/processing/collector/GenericRowSorter.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/segment/processing/collector/CollectorConfig.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/segment/processing/collector/CollectorConfig.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/collector/Collector.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentProcessorConfig.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/segment/processing/utils/SegmentProcessorUtils.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I like that all the core components are interfaced out and easy to extend. I have put some comments. Some of them are questions or points that I would like to discuss.
...t-core/src/main/java/org/apache/pinot/core/segment/processing/collector/RollupCollector.java
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/segment/processing/partitioner/NoOpPartitioner.java
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/core/segment/processing/partitioner/ColumnValuePartitioner.java
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/core/segment/processing/partitioner/TableConfigPartitioner.java
Show resolved
Hide resolved
...src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentProcessorConfig.java
Show resolved
Hide resolved
.../main/java/org/apache/pinot/core/segment/processing/framework/SegmentProcessorFramework.java
Show resolved
Hide resolved
Addressed the comments. Added TODOs in code and description for those that will be handled in future PRs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for addressing all the comments!
Description
#5753
A Segment Processing Framework to convert "m" segments to "n" segments
The phases of the Segment Processor are
A SegmentProcessorFrameworkCommand is provided to run this on demand. Run using command
bin/pinot-admin.sh SegmentProcessorFramework -segmentProcessorFrameworkSpec /<path>/spec.json
where spec.json is
Note:
This framework will typically be used by minion tasks, which want to perform some processing on segments
(eg task which merges segments, tasks which aligns segments per time boundaries etc). The existing Segment merge jobs can be changed to use this framework.
Pending
Enhancements like (Added TODOs in code)