Skip to content

Null handling for time column#7269

Merged
Jackie-Jiang merged 6 commits intoapache:masterfrom
liuchang0520:null_handling_in_time_col
Aug 17, 2021
Merged

Null handling for time column#7269
Jackie-Jiang merged 6 commits intoapache:masterfrom
liuchang0520:null_handling_in_time_col

Conversation

@liuchang0520
Copy link
Contributor

@liuchang0520 liuchang0520 commented Aug 10, 2021

Description

Add optional time column null value handling. Per the discussion in #7202.

Release Note

This PR introduces a new table config field: allowNullTimeValue.

    "segmentsConfig": {
          ...
          "timeColumnName": "secondsSinceEpoch",
          "timeType": "SECONDS",
          "allowNullTimeValue": true,
          "retentionTimeUnit": "DAYS",
          "peerSegmentDownloadScheme": "http"
          ....
    }

Default value for this field is set to be false to keep backward compatibility.
If "allowNullTimeValue" is set to be true and time column value is null from data source, a default time value based on machine time will be filled in.

@codecov-commenter
Copy link

codecov-commenter commented Aug 10, 2021

Codecov Report

Merging #7269 (22a9a98) into master (12e5bcd) will decrease coverage by 0.01%.
The diff coverage is 81.81%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7269      +/-   ##
============================================
- Coverage     71.32%   71.30%   -0.02%     
+ Complexity     3367     3280      -87     
============================================
  Files          1503     1503              
  Lines         74030    74050      +20     
  Branches      10718    10723       +5     
============================================
  Hits          52803    52803              
- Misses        17622    17637      +15     
- Partials       3605     3610       +5     
Flag Coverage Δ
integration1 30.03% <0.00%> (-0.04%) ⬇️
integration2 29.25% <0.00%> (-0.08%) ⬇️
unittests1 68.97% <81.81%> (-0.05%) ⬇️
unittests2 14.63% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../local/recordtransformer/CompositeTransformer.java 83.33% <ø> (ø)
...he/pinot/segment/local/utils/TableConfigUtils.java 66.31% <50.00%> (-0.09%) ⬇️
.../local/recordtransformer/NullValueTransformer.java 90.00% <78.57%> (-10.00%) ⬇️
...ig/table/SegmentsValidationAndRetentionConfig.java 95.83% <100.00%> (+0.27%) ⬆️
...he/pinot/spi/utils/builder/TableConfigBuilder.java 81.25% <100.00%> (+0.39%) ⬆️
...a/manager/realtime/RealtimeSegmentDataManager.java 50.00% <0.00%> (-25.00%) ⬇️
.../helix/core/minion/MinionInstancesCleanupTask.java 60.86% <0.00%> (-8.70%) ⬇️
.../startree/v2/builder/OffHeapSingleTreeBuilder.java 83.23% <0.00%> (-8.39%) ⬇️
...inot/core/util/SegmentCompletionProtocolUtils.java 53.84% <0.00%> (-7.70%) ⬇️
.../segment/index/readers/OnHeapStringDictionary.java 70.00% <0.00%> (-6.67%) ⬇️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12e5bcd...22a9a98. Read the comment docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time field can be of type TIME or DATE_TIME, so we might want to check the field name instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the existing logic will only leave out TIME type as null. DATE_TIME will be filled with default value by
org.apache.pinot.spi.data.FieldSpec.getDefaultNullValue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code has not been updated accordingly when we deprecate the TIME field spec. The main time column can be defined as a DATE_TIME field (also this is recommended), so we should skip the main time column, instead of the TIME field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will update this part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be multiple date time fields. Do we want to handle the null time value for only the time column or all the date time fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the time column.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the release notes about this new config?

@liuchang0520 liuchang0520 marked this pull request as draft August 11, 2021 21:55
@liuchang0520 liuchang0520 force-pushed the null_handling_in_time_col branch from 13a16ce to 44a8179 Compare August 13, 2021 17:54
@liuchang0520 liuchang0520 marked this pull request as ready for review August 13, 2021 18:50
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@kishoreg
Copy link
Member

can we make sure we update the docs.we should have a page for time column

@liuchang0520
Copy link
Contributor Author

can we make sure we update the docs.we should have a page for time column

Will do.

@liuchang0520 liuchang0520 force-pushed the null_handling_in_time_col branch from b2889a6 to 22a9a98 Compare August 17, 2021 02:14
@Jackie-Jiang Jackie-Jiang merged commit f034e28 into apache:master Aug 17, 2021
@Jackie-Jiang Jackie-Jiang added the release-notes Referenced by PRs that need attention when compiling the next release notes label Aug 17, 2021
liuchang0520 added a commit to liuchang0520/pinot-docs that referenced this pull request Aug 18, 2021
Related with [#7269](apache/pinot#7269), add new config allowNullTimeValue about null time value handling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants