Conversation
cb8f7d1 to
b197bfe
Compare
Codecov Report
@@ Coverage Diff @@
## master #8343 +/- ##
============================================
+ Coverage 70.57% 70.66% +0.08%
- Complexity 4283 4285 +2
============================================
Files 1672 1674 +2
Lines 87483 87640 +157
Branches 13241 13273 +32
============================================
+ Hits 61745 61931 +186
+ Misses 21454 21398 -56
- Partials 4284 4311 +27
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/DataSchemaSegmentPruner.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/IndexCreationContext.java
Outdated
Show resolved
Hide resolved
b197bfe to
b96fdbd
Compare
...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
Outdated
Show resolved
Hide resolved
b85926d to
80ecfe8
Compare
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/IndexCreationContext.java
Outdated
Show resolved
Hide resolved
80ecfe8 to
6b76294
Compare
|
Please add description |
0caf992 to
760e539
Compare
Will do. This is still WIP, also need to add unit & integration tests. |
963231b to
de009b2
Compare
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/TimestampIndexGranularity.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/TimestampIndexGranularity.java
Outdated
Show resolved
Hide resolved
|
Can you explain if there is any dependency on distinct hash code values? I recall seeing hash codes being used as map keys but can’t find it. |
8740247 to
531cf6e
Compare
pinot-common/src/test/java/org/apache/pinot/common/request/RequestContextUtilsTest.java
Outdated
Show resolved
Hide resolved
531cf6e to
ac40d93
Compare
Got suggestions from @Jackie-Jiang to change this logic to use the hint expressions to prevent the hash collision. |
3d077c2 to
5f57202
Compare
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
5f57202 to
cdabfa3
Compare
There was a problem hiding this comment.
This achieves a couple of goals
- use an index for time range queries in common time units
- have a dictionary for time values in common time units, which accelerates group by expressions
- bypasses the time conversions which are slow and have to use a library, so can't be accelerated further
This could be extended to provide inventory for any affine function ax + b for given parameter values for a and b, or any monotonic function. So we could also do this for cents and dollars or celsius and Fahrenheit, assuming we have functions to represent these conversions, or even for logarithms.
I think this is the simplest possible implementation (just add a column for each unit) but I think it's wasteful.
- time range parameters could just be translated into the base unit. e.g. to query a day range, convert the first day's start and the last day's end into milliseconds and query the range index in millisecond units. Parameter translation works for any affine or monotonic function, whereas the materialisation only works for a given scale
aand offsetb. That is, you don't need to store anything extra to make this work, we just need an affine function concept. If an affine functionfis applied to a fieldx, andxhas an index, that index also indexesf(query parameters). - this approach can never work for arbitrary arithmetic expressions like
ax + b, because we need a physical data structure for each pair(a, b).
|
Thinking about it more, the time transforms are indeed costly so storing the values makes sense for this use case, which can be extended to many time conversion functions. I don't think this approach generalises to arithmetic, where the transforms tend to vectorize anyway so aren't as slow as time conversion. |
b5f7ace to
69e50bb
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Based on my understanding, the current implementation does not handle the case of adding timestamp index on the fly. The index addition logic is not implemented in the BaseDefaultColumnHandler
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/TimestampIndexGranularity.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/TimestampIndexGranularity.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentLoader.java
Outdated
Show resolved
Hide resolved
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Can we add some validation to the TableConfigUtils.validate()? The timestamp index can only be applied to columns with millis granularity (TIMESTAMP data type, or DATE_TIME field with EPOCH:MILLISECONDS, or it will give wrong result.
pinot-common/src/main/java/org/apache/pinot/common/config/provider/TableCache.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentLoader.java
Outdated
Show resolved
Hide resolved
...t-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
Outdated
Show resolved
Hide resolved
...t-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
Outdated
Show resolved
Hide resolved
...t-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
Outdated
Show resolved
Hide resolved
make sense. |
Description
Adding Timestamp index for Pinot data type TIMESTAMP.
Timestamp stores value as millisecond epoch long value. Typical workload for timestamp is filtering on a time range and group by with different time granularities(days/month/etc).
The current implementation requires the query executor to extract values, apply the function then do filter/groupBy, no leverage on dictionary or index.
This PR introduces the Timestamp Index. Users can configure the most useful granularities for a Timestamp data type column.
$${ts_column_name}$${ts_granularity}, e.g. Timestamp columntswith granularitiesDAY,MONTHwill have two extra columns generated:$ts$DAYand$ts$MONTH.2.1 GROUPBY: functions like
dateTrunc('DAY', ts)will be translated to use the underly column$ts$DAYto fetch data.2.2 PREDICATE: range index is auto-built for all granularity columns.
Example query usage:
Some preliminary benchmark shows the query perf over 2.7 billion records improved from 45 secs to 4.2 secs
vs.
Release Notes
Introduces the Timestamp Index. Users can configure the most useful granularities for a Timestamp data type column.
$${ts_column_name}$${ts_granularity}, e.g. Timestamp columntswith granularitiesDAY,MONTHwill have two extra columns generated:$ts$DAYand$ts$MONTH.2.1 SELECTION/GROUPBY: functions like
dateTrunc('DAY', ts)will be translated to use the underly column$ts$DAYto fetch data.2.2 PREDICATE: range index is auto-built for all granularity columns.
Documentation