-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Objective
The objective of this issue is to discuss about FieldConfig.IndexType, which:
- Has been superseded by Index Service Provider Interface (SPI) #10183
- Is being used inconsistently
We cannot remove this enum because we need to keep backward compatibility, but we should decide what to do with the four literals that are not documented and are not being used (or used in an inconsistent way).
Description
FieldConfig.indexTypes was intended to be the new way to create all indexes, but it was superseded by FieldConfig.indexes attribute introduced in #10183. The main problems with FieldConfig.indexTypes were that:
- Most indexes can be configured in some way. To do so,
FieldConfig.indexTypesrequired to introduce unsafe string-based configs inFieldConfig.properties. - IndexTypes is an enum that doesn't contain all valid indexes and includes some extra values that are not valid.
This issue is related to the last point. Reading the documentation, the valid values here should be: Text, FST, Timestamp and h3. But the enum right now contains the following literals:
- TEXT
- FST
- JSON
- H3
- INVERTED
- SORTED
- RANGE
- TIMESTAMP
The last 4 are not included in the documentation and they are not being consistently used in the code.
Inconsistent usage
To do this analysis I used tag release-0.12.1. The reason to do not use master is that #10183 (already merged) modifies a lot of code related to indexes, so I want to be sure that the inconsistent usage of these literals was already there before index-spi was merged:
Inverted
- It was introduced in the first version of this document (2020-02-07).
- It is not read in production code. The only two usages are in MutableSegmentImpl (here, and here). In both cases it is being used to report some error.
- It is being used in test, but its usage doesn't seem actually important.
Sorted
- It was introduced in the first version of this document (2020-02-07).
- It is only being used in HadoopSegmentPreprocessingJob. This class does not exist in master and it is not being used in master. The fact that it is not an actual index type and it was only used here makes me think that it was introduced as an attempt to do something, but it wasn't finished.
Range
- This was introduced in make it possible to specify field config properties for columns with JSON and RANGE indexes #7615 (2021-10-25).
- It is not used in production code. Like inverted, it is being used in tests, but just doesn't seem to do anything.
- Range is an actual index type, but no range index is created when this literal is added in
indexTypes.
Timestamp
- This was introduced in Timestamp type index #8343 (2022-4-11).
- The only usage in production code is a check. This check fails if the FieldConfig points to a column that is not a timestamp but
indexTypescontains the TIMESTAMP literal. - Timestamp indexes are not actual indexes. There is no timestamp index writer, reader or handler. They are syntactic sugar (or an alias). When pinot detects that this syntactic sugar has to be applied (more on that later) what actually happens is that Pinot generates one extra row for each timestamp granularity and index that column with a range index.
- Contrary to what documentation says,
FieldConfig.indexTypesdoesn't have to contain the TIMESTAMP literal in order to activate this syntactic sugar. It is activated if and only if theFieldConfig.timestampConfigis not null. Whether or not TIMESTAMP is included inFieldConfig.indexTypeshas zero impact on that decision. - The timestamp syntactic sugar is just a modification of the table config and the schema and it is applied in: