[feature] Add pinot-clp-log plugin to encode user-specified fields with CLP during ingestion#9942
Conversation
|
This is a great feature! Can you please give public access to the design doc? |
Codecov Report
@@ Coverage Diff @@
## master #9942 +/- ##
=============================================
- Coverage 68.75% 13.58% -55.17%
+ Complexity 5685 176 -5509
=============================================
Files 1996 1941 -55
Lines 107802 105336 -2466
Branches 16388 16094 -294
=============================================
- Hits 74115 14315 -59800
- Misses 28440 89894 +61454
+ Partials 5247 1127 -4120
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Apologies for the delay, the previous doc couldn't be shared due to some security settings. I've updated the link to a publicly accessible doc. |
There was a problem hiding this comment.
add comments. What does extractAll mean?
...-clp-log/src/main/java/org/apache/pinot/plugin/inputformat/clplog/CLPLogRecordExtractor.java
Outdated
Show resolved
Hide resolved
...-clp-log/src/main/java/org/apache/pinot/plugin/inputformat/clplog/CLPLogRecordExtractor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
why is this an extractor test when it tests and decoder actually? Can we test the extractor directly?
There was a problem hiding this comment.
We also need to test different scenarios based on props's value.
…RecordExtractorTest.
6beda35 to
5fcad3e
Compare
|
Thanks for adding this powerful feature. Can you help add a documentation page in the pinot doc describing when and how to use the feature? |
|
Yep, will do. |
At a high-level, the plugin takes two inputs: a JSON record and a list of fields (unstructured log messages) to encode with CLP. The plugin will extract and encode the user-specified fields into CLP's three-column format and store the output in a Pinot
GenericRowobject.This is part of the change requested in #9819 and described in this design doc.
Release notes
New plugin added:
pinot-clp-logto encode user-specified fields with CLP during ingestion.Users can use the plugin by specifying these configuration options in their
tableIndexConfig.streamConfigs:where
<field-names>is a comma-separated list of fields you wish to encode with CLP.Testing performed
messagefield was replaced with CLP's three fields:message_logtype,message_dictionaryVars, andmessage_encodedVars.