Skip to content

Support single-valued BigDecimal in schema, type conversion, SQL statements and minimum set of transforms.#8503

Merged
Jackie-Jiang merged 26 commits intoapache:masterfrom
nizarhejazi:nhejazi-ingestion-bigdecimal
Apr 30, 2022
Merged

Support single-valued BigDecimal in schema, type conversion, SQL statements and minimum set of transforms.#8503
Jackie-Jiang merged 26 commits intoapache:masterfrom
nizarhejazi:nhejazi-ingestion-bigdecimal

Conversation

@nizarhejazi
Copy link
Contributor

@nizarhejazi nizarhejazi commented Apr 11, 2022

  • Support single-valued BigDecimal data type in schema, and type conversion so that data can be ingested as BIG_DECIMAL.
  • Support BigDecimal at both data access layer and data storage layer (BlockValSet, DataBlockCache, DataFetcher, RowBasedBlockValueFetcher, DataTableBuilder, ProjectionBlock, etc.).
  • Add BigDecimalDictionary.
  • Support BigDecimal in dictionary-encoded ForwardIndexReader.
  • Support BIG_DECIMAL in SQL statements:
    • SELECT
    • ORDER BY
    • GROUP BY
    • DISTINCT
    • COUNT
  • Support BIG_DECIMAL in the following set of transforms to unblock BigDecimal ingestion and testing:
    • JsonExtractScalar
    • DefaultJsonPathEvaluator
    • LiteralTransform
    • IdentifierTransform
    • PushDownTransform
    • ScalarTransformFunctionWrapper
    • SumPrecision
  • Fix bug in transformToBytesValuesSV method in BaseTransformFunction.

@codecov-commenter
Copy link

codecov-commenter commented Apr 11, 2022

Codecov Report

Merging #8503 (d78be94) into master (fc12dce) will decrease coverage by 3.33%.
The diff coverage is 62.73%.

❗ Current head d78be94 differs from pull request most recent head 149b476. Consider uploading reports for the commit 149b476 to get more accurate results

@@             Coverage Diff              @@
##             master    #8503      +/-   ##
============================================
- Coverage     63.60%   60.27%   -3.34%     
+ Complexity     4322     4284      -38     
============================================
  Files          1648     1684      +36     
  Lines         86917    88943    +2026     
  Branches      13265    13487     +222     
============================================
- Hits          55281    53606    -1675     
- Misses        27595    31355    +3760     
+ Partials       4041     3982      -59     
Flag Coverage Δ
integration2 25.53% <5.79%> (?)
unittests1 66.47% <62.73%> (-0.04%) ⬇️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../java/org/apache/pinot/common/utils/DataTable.java 94.73% <ø> (ø)
...core/operator/docvalsets/TransformBlockValSet.java 25.58% <0.00%> (-0.93%) ⬇️
...sform/function/ScalarTransformFunctionWrapper.java 54.58% <0.00%> (-0.54%) ⬇️
...impl/dictionary/BytesOffHeapMutableDictionary.java 55.55% <0.00%> (-0.79%) ⬇️
.../impl/dictionary/BytesOnHeapMutableDictionary.java 52.38% <0.00%> (-0.85%) ⬇️
...mpl/dictionary/DoubleOffHeapMutableDictionary.java 36.84% <0.00%> (-0.40%) ⬇️
...impl/dictionary/DoubleOnHeapMutableDictionary.java 33.33% <0.00%> (-0.41%) ⬇️
...impl/dictionary/FloatOffHeapMutableDictionary.java 38.94% <0.00%> (-0.42%) ⬇️
.../impl/dictionary/FloatOnHeapMutableDictionary.java 34.52% <0.00%> (-0.42%) ⬇️
...e/impl/dictionary/IntOffHeapMutableDictionary.java 48.42% <0.00%> (-0.52%) ⬇️
... and 584 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc12dce...149b476. Read the comment docs.

@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes bugfix labels Apr 12, 2022
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not introduce BigDecimal to the data access layer yet. We need to revisit whether we want to do implicit type casting. For now, we can enhance the CastTransformFunction to explicitly cast numbers/strings to BigDecimal bytes

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for getting back and forth, but because of the different ordering behavior with BYTES type, we might have to introduce BigDecimal to the data representation layer (the underlying storage format can be the same as BYTES). For an example, if we reuse BytesDictionary for BigDecimal data type, multiple APIs (insertionIndexOf, getMinVal, getMaxVal etc.) will fail. We should add more tests into the BigDecimalQueriesTest to catch these unhandled cases.

Alternatively, if we can find a different way to serialize big decimals so that the ordering is the same as bytes ordering, then we can reuse the bytes type in the representation layer. We might want more discussion here.

@nizarhejazi
Copy link
Contributor Author

  • Added dictionary, transform function, etc. Planning to add raw forward index in separate PR.
  • Tested SumPrecision.
  • Please note that changes related to supporting SQL statements are minimal so I included these changes.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides Dictionary, we also want to add big-decimal support to the ForwardIndexReader, and make it a supported stored type. When we access the values, it should directly return BigDecimal objects instead of byte[] or String

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good, very impressive!

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some minor comments

There is an extra binary file that seems unrelated: pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-hadoop/pinot-plugins.tar.gz. Can you please check if it is included accidentally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix feature release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants