-
Notifications
You must be signed in to change notification settings - Fork 3.9k
ARROW-1588: [C++/Format] Harden Decimal Format #1211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for doing this! @jacques-n can you review and if you agree with the representation let's open a JIRA to change the Java code to byte swap as appropriate to satisfy BigDecimal?
cpp/src/arrow/util/decimal.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reinterpret_cast here -- can this change the bit pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_cast from one builtin integer type to another of the same byte size won't change the underlying bits.
The compiler will also fail to type check this because reinterpret_cast from one integer to another isn't allowed by the standard: http://en.cppreference.com/w/cpp/language/reinterpret_cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify where this behavior is defined in the standard: http://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_conversions
The specific language is this:
If the destination type is unsigned, the resulting value is the smallest unsigned value equal to the source value modulo 2 ** n
where n is the number of bits used to represent the destination type.
That is, depending on whether the destination type is wider or narrower, signed integers are sign-extended [footnote 1] or truncated and unsigned integers are zero-extended or truncated respectively.
Here's the footnote :)
This only applies if the arithmetic is two's complement which is only required for the exact-width integer types. Note, however, that at the moment all platforms with a C++ compiler use two's complement arithmetic
We're using exact-width integer types here, so all behavior here is well-defined and within the standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. there is always *reinterpret_cast<T*>(&val) but this is helpful to know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I will wait also for Jacques to review in case we need to expand the language to elaborate on the representation. I opened https://issues.apache.org/jira/browse/ARROW-1691 about changing the Java implementation
|
I'm surprised the integration tests pass. Shouldn't they not? |
|
Yes, I'm not sure why they're passing. I'll take a look and get to the bottom of it |
|
@jacques-n could you take a look at this? we still need to figure out what's going on with the integration tests |
|
Just to clarify from ARROW-1716 the integration tests pass because the interpretation of decimal bytes isn't being checked in the integration tests, so the bytes compare equal but if we were to print the value as the underlying big integer or int128 in c++ and compare those they would be different. |
|
I think we should merge this and then address the integration test issue and Java fixes in follow up patches. @jacques-n or @siddharthteotia can you please review? |
|
LGTM +1 |
This depends on: - [x] [ARROW-1607](apache/arrow#1128) - [x] [ARROW-1656](apache/arrow#1184) - [x] [ARROW-1588](apache/arrow#1211) - [x] Add tests for writing different sizes of values Author: Phillip Cloud <cpcloud@gmail.com> Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #403 from cpcloud/PARQUET-1095 and squashes the following commits: 8c3d222 [Phillip Cloud] Remove loop from BytesToInteger 63018bc [Wes McKinney] Suppress C4996 due to arrow/util/variant.h e4b02d3 [Phillip Cloud] Refactor types.h 83948ec [Phillip Cloud] Add last_value_ init 51965cd [Phillip Cloud] Min commit that contains the unique kernel in arrow e25c59b [Phillip Cloud] Fix reader writer test for unique kernel addition da0a7eb [Phillip Cloud] Update for ARROW-1811 16935de [Phillip Cloud] Reverse operand order and explicit cast 6036ca5 [Phillip Cloud] ARROW-1811 c5c4294 [Phillip Cloud] Fix issues 32a4abe [Phillip Cloud] Cleanup iteration a bit 920832a [Phillip Cloud] Update arrow version 9f97c1d [Phillip Cloud] Update for ARROW-1794: rename DecimalArray to Decimal128Array b2e0290 [Phillip Cloud] IWYU 64748a8 [Phillip Cloud] Copy from arrow for now 6c9e2a7 [Phillip Cloud] Reduce the number of decimal test cases 7ab2e5c [Phillip Cloud] Parameterize on precision 30655d6 [Phillip Cloud] Use arrow random_decimals 9ff7eb4 [Phillip Cloud] Remove specific template parameters 1eee6a9 [Phillip Cloud] Remove specific randint call 8808e4c [Phillip Cloud] Bump arrow version 659fbc1 [Phillip Cloud] Fix deprecated API call e162ca1 [Phillip Cloud] Allocate scratch space to hold the byteswapped values 5c9292b [Phillip Cloud] Proper dcheck call 1782da0 [Phillip Cloud] Use arrow 3d243d5 [Phillip Cloud] Checkpoint [ci skip] 028fb03 [Phillip Cloud] Remove garbage values 46dff15 [Phillip Cloud] Clean up uint32 test 613255e [Phillip Cloud] Do not use std::copy when reinterpret_cast will suffice 2917a62 [Phillip Cloud] PARQUET-1095: [C++] Read and write Arrow decimal values
This depends on: - [x] [ARROW-1607](apache#1128) - [x] [ARROW-1656](apache#1184) - [x] [ARROW-1588](apache#1211) - [x] Add tests for writing different sizes of values Author: Phillip Cloud <cpcloud@gmail.com> Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#403 from cpcloud/PARQUET-1095 and squashes the following commits: 8c3d222 [Phillip Cloud] Remove loop from BytesToInteger 63018bc [Wes McKinney] Suppress C4996 due to arrow/util/variant.h e4b02d3 [Phillip Cloud] Refactor types.h 83948ec [Phillip Cloud] Add last_value_ init 51965cd [Phillip Cloud] Min commit that contains the unique kernel in arrow e25c59b [Phillip Cloud] Fix reader writer test for unique kernel addition da0a7eb [Phillip Cloud] Update for ARROW-1811 16935de [Phillip Cloud] Reverse operand order and explicit cast 6036ca5 [Phillip Cloud] ARROW-1811 c5c4294 [Phillip Cloud] Fix issues 32a4abe [Phillip Cloud] Cleanup iteration a bit 920832a [Phillip Cloud] Update arrow version 9f97c1d [Phillip Cloud] Update for ARROW-1794: rename DecimalArray to Decimal128Array b2e0290 [Phillip Cloud] IWYU 64748a8 [Phillip Cloud] Copy from arrow for now 6c9e2a7 [Phillip Cloud] Reduce the number of decimal test cases 7ab2e5c [Phillip Cloud] Parameterize on precision 30655d6 [Phillip Cloud] Use arrow random_decimals 9ff7eb4 [Phillip Cloud] Remove specific template parameters 1eee6a9 [Phillip Cloud] Remove specific randint call 8808e4c [Phillip Cloud] Bump arrow version 659fbc1 [Phillip Cloud] Fix deprecated API call e162ca1 [Phillip Cloud] Allocate scratch space to hold the byteswapped values 5c9292b [Phillip Cloud] Proper dcheck call 1782da0 [Phillip Cloud] Use arrow 3d243d5 [Phillip Cloud] Checkpoint [ci skip] 028fb03 [Phillip Cloud] Remove garbage values 46dff15 [Phillip Cloud] Clean up uint32 test 613255e [Phillip Cloud] Do not use std::copy when reinterpret_cast will suffice 2917a62 [Phillip Cloud] PARQUET-1095: [C++] Read and write Arrow decimal values Change-Id: Ibe81cd5a5961bbe86c66db811ec8b770ae48c38b
No description provided.