-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cherry-pick](branch-2.1) add parquet tvf cases and fix some parquet bug #41931
[cherry-pick](branch-2.1) add parquet tvf cases and fix some parquet bug #41931
Conversation
Start docker in parallel to reduce external pipeline time
…he#41506) ## Proposed changes Reason: https://issues.apache.org/jira/browse/ARROW-5322 Java readers(parquet-mr) handles "dictionaryPageOffset = 0" to determine if dictionary page exists where as the C readers uses "has_dictionaryPageOffset" (_isset bit in thrift message) to determine the same resulting in incompatible behaviours. Therefore, we should consider that dicttionary page exists when both `__isset.dictionary_page_offset` is true and `dictionary_page_offset` is greater than 0.
## Proposed changes Implemented reading parqeut files with decimal256 type
…der (apache#41683) ## Proposed changes Impl ByteStreamSplitDecoder to decode BYTE_STREAM_SPLIT encoding parquet. relate pr: apache/arrow#42372 > Apache Parquet does not have any encodings suitable for FP data and the available text compressors (zstd, gzip, etc) do not handle FP data very well. It is possible to apply a simple data transformation named "stream splitting". Such could be "byte stream splitting" which creates K streams of length N where K is the number of bytes in the data type (4 for floats, 8 for doubles) and N is the number of elements in the sequence. --------- Co-authored-by: morningman <morningman@163.com>
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
||
#include "byte_stream_split.h" | ||
|
||
#include <glog/logging.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'glog/logging.h' file not found [clang-diagnostic-error]
#include <glog/logging.h>
^
…che#41526)" This reverts commit 3ff7697.
run buildall |
TeamCity be ut coverage result: |
Proposed changes
pick pr:
#41683
#41506
#41338
#40379