Refactor quickstart data source#8567
Conversation
19612e5 to
aa693bb
Compare
Codecov Report
@@ Coverage Diff @@
## master #8567 +/- ##
============================================
+ Coverage 63.68% 63.98% +0.29%
- Complexity 4224 4315 +91
============================================
Files 1677 1646 -31
Lines 88078 86711 -1367
Branches 13354 13233 -121
============================================
- Hits 56094 55479 -615
+ Misses 27851 27205 -646
+ Partials 4133 4027 -106
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
For Reviewer: this line is the only difference from its parent class. That the partitioning key is using RsvpId() instead of EventId(). So in the refactoring I just make the Generator configurable with partition key, and removed this file
There was a problem hiding this comment.
For Reviewer: Note that this line is the difference with MeetupRsvpJsonStream. getEventId() is the different key
There was a problem hiding this comment.
For reviewer: the RsvpStream and RsvpJsonStream is merged to use this class instead, and using the config of _keyColumn to apply different key
walterddr
left a comment
There was a problem hiding this comment.
overall the approach looks good. I have some suggestions, please kindly take a look.
although this is a refactor PR but there are some new features that were added in, for example the rate limiter interface, it's good to also called out exactly what new things were added and how they were tested.
There was a problem hiding this comment.
since this goes into SPI. we need to be careful explaining what's the intended usage. could you give a bit more in the javadoc?
At the first glance, I thought it is representing a row in a partitioned table with the partition key.
Also this seems to be an interface used by pinot-tools, it might work better to put it in the pinot-tools module as a helper utils (along with produce batch)
There was a problem hiding this comment.
Good point. I moved the classed to be inside StreamDataProducer so it will not be used in other places. If in future people see the need for such class they can do refactor
There was a problem hiding this comment.
since we already consolidated all the generator / producer usage all in this class. let's put the batch operation API here as a helper utils.
static produceBatchRowWithKey(_producer, _topicName, rows) {
// .. the default impl
}
There was a problem hiding this comment.
The default void produceKeyedBatch() is introduced so we can have a batch interface too. I hope to introduce this method so we can easily batch produce rows to improve throughput. (E.g., if the _producer is behind a microservice endpoint) In that case the implementation in the StreamProducer can override the produceKeyedBatch to provide a more efficient call
There was a problem hiding this comment.
+1. this actually is very helpful. #8537 is likely due to a batch produce instability.
There was a problem hiding this comment.
SourceGenerator sounds like you are generating a subclass of PinotRealtimeSource. PinotSourceDataGenerator may be a better name.
There was a problem hiding this comment.
Thanks. Changed
0f1f3ba to
7f05c25
Compare
Description
Refactor the quickstart classes so some of the code/logic could be shared and extended easily in future.
This is mostly because the same logic of created a thread, pull from source, writes into Steaming sinks, and shutdown are written in many places.
PinotRealtimeSourceclass as the driver with threaded running loop. It pulls from sources and writes intoStreamDataProducerlike Kafka;AvroFilePinotSourceGeneratoras one source that pulls from Avro files and generate row bytes.AirlineDataStreamto use the driver (PinotRealtimeSource) as an examplePullRequestMergedEventsStreamto use the driver too.MeetupRsvpStream,MeetupRsvpJsonStreamto use the driver too.Tests Done
Extra Note
Based on discussion in #8180 (comment) , the MeetupRSVP data needs some tweaking. I will work on a follow up PR to make it run.