map-string-string dimension column contrib extension #10628
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Introduces a "mapStringString" complex type dimension column. All the code lives in an isolated contrib extension and implements relevant extensible Druid core interfaces e.g. ComplexColumn, ComplexMetricSerde, DimensionHandler, DimensionIndexer, DimensionMerger etc etc.
It allows ingestion of
Map<String,String>
column in users input data as a singledimension
column . At query time, user can access individual keys in the map as if they were simplestring
dimension columns (seeMapStringStringKeyVirtualColumn.java
) or column can be accessed as records ofMap<String,String>
type but that does not have much use in Druid query layer for now aside fromselect
query.User would have to implement their own
InputRowParser
to supplyMap<String,String>
records for ingestion , we used one that was very specific to our needs.This extension also serves as an example of how users can create their own dimension columns for the specific needs and also [indirectly] tests the dimension extensibility support introduced in #10277
Future: Now that
VirtualColumn
interface has been enhanced to support vectorization, maybe the virtual column implementation code in this extension could also be improved to support that for [potentially] better performance.This PR has: