Skip to content

Conversation

@cpcloud
Copy link
Contributor

@cpcloud cpcloud commented Oct 18, 2017

No description provided.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for doing this! @jacques-n can you review and if you agree with the representation let's open a JIRA to change the Java code to byte swap as appropriate to satisfy BigDecimal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reinterpret_cast here -- can this change the bit pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_cast from one builtin integer type to another of the same byte size won't change the underlying bits.

The compiler will also fail to type check this because reinterpret_cast from one integer to another isn't allowed by the standard: http://en.cppreference.com/w/cpp/language/reinterpret_cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify where this behavior is defined in the standard: http://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_conversions

The specific language is this:

If the destination type is unsigned, the resulting value is the smallest unsigned value equal to the source value modulo 2 ** n
where n is the number of bits used to represent the destination type.
That is, depending on whether the destination type is wider or narrower, signed integers are sign-extended [footnote 1] or truncated and unsigned integers are zero-extended or truncated respectively.

Here's the footnote :)

This only applies if the arithmetic is two's complement which is only required for the exact-width integer types. Note, however, that at the moment all platforms with a C++ compiler use two's complement arithmetic

We're using exact-width integer types here, so all behavior here is well-defined and within the standard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. there is always *reinterpret_cast<T*>(&val) but this is helpful to know

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I will wait also for Jacques to review in case we need to expand the language to elaborate on the representation. I opened https://issues.apache.org/jira/browse/ARROW-1691 about changing the Java implementation

@wesm
Copy link
Member

wesm commented Oct 20, 2017

I'm surprised the integration tests pass. Shouldn't they not?

@cpcloud
Copy link
Contributor Author

cpcloud commented Oct 20, 2017

Yes, I'm not sure why they're passing. I'll take a look and get to the bottom of it

@wesm
Copy link
Member

wesm commented Oct 23, 2017

@jacques-n could you take a look at this? we still need to figure out what's going on with the integration tests

@cpcloud
Copy link
Contributor Author

cpcloud commented Oct 24, 2017

Just to clarify from ARROW-1716 the integration tests pass because the interpretation of decimal bytes isn't being checked in the integration tests, so the bytes compare equal but if we were to print the value as the underlying big integer or int128 in c++ and compare those they would be different.

@wesm
Copy link
Member

wesm commented Oct 24, 2017

I think we should merge this and then address the integration test issue and Java fixes in follow up patches. @jacques-n or @siddharthteotia can you please review?

@jacques-n
Copy link
Contributor

LGTM +1

@wesm wesm closed this in b2596f6 Oct 25, 2017
wesm pushed a commit to apache/parquet-cpp that referenced this pull request Nov 20, 2017
This depends on:
- [x] [ARROW-1607](apache/arrow#1128)
- [x] [ARROW-1656](apache/arrow#1184)
- [x] [ARROW-1588](apache/arrow#1211)
- [x] Add tests for writing different sizes of values

Author: Phillip Cloud <cpcloud@gmail.com>
Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #403 from cpcloud/PARQUET-1095 and squashes the following commits:

8c3d222 [Phillip Cloud] Remove loop from BytesToInteger
63018bc [Wes McKinney] Suppress C4996 due to arrow/util/variant.h
e4b02d3 [Phillip Cloud] Refactor types.h
83948ec [Phillip Cloud] Add last_value_ init
51965cd [Phillip Cloud] Min commit that contains the unique kernel in arrow
e25c59b [Phillip Cloud] Fix reader writer test for unique kernel addition
da0a7eb [Phillip Cloud] Update for ARROW-1811
16935de [Phillip Cloud] Reverse operand order and explicit cast
6036ca5 [Phillip Cloud] ARROW-1811
c5c4294 [Phillip Cloud] Fix issues
32a4abe [Phillip Cloud] Cleanup iteration a bit
920832a [Phillip Cloud] Update arrow version
9f97c1d [Phillip Cloud] Update for ARROW-1794: rename DecimalArray to Decimal128Array
b2e0290 [Phillip Cloud] IWYU
64748a8 [Phillip Cloud] Copy from arrow for now
6c9e2a7 [Phillip Cloud] Reduce the number of decimal test cases
7ab2e5c [Phillip Cloud] Parameterize on precision
30655d6 [Phillip Cloud] Use arrow random_decimals
9ff7eb4 [Phillip Cloud] Remove specific template parameters
1eee6a9 [Phillip Cloud] Remove specific randint call
8808e4c [Phillip Cloud] Bump arrow version
659fbc1 [Phillip Cloud] Fix deprecated API call
e162ca1 [Phillip Cloud] Allocate scratch space to hold the byteswapped values
5c9292b [Phillip Cloud] Proper dcheck call
1782da0 [Phillip Cloud] Use arrow
3d243d5 [Phillip Cloud] Checkpoint [ci skip]
028fb03 [Phillip Cloud] Remove garbage values
46dff15 [Phillip Cloud] Clean up uint32 test
613255e [Phillip Cloud] Do not use std::copy when reinterpret_cast will suffice
2917a62 [Phillip Cloud] PARQUET-1095: [C++] Read and write Arrow decimal values
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 8, 2018
This depends on:
- [x] [ARROW-1607](apache#1128)
- [x] [ARROW-1656](apache#1184)
- [x] [ARROW-1588](apache#1211)
- [x] Add tests for writing different sizes of values

Author: Phillip Cloud <cpcloud@gmail.com>
Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#403 from cpcloud/PARQUET-1095 and squashes the following commits:

8c3d222 [Phillip Cloud] Remove loop from BytesToInteger
63018bc [Wes McKinney] Suppress C4996 due to arrow/util/variant.h
e4b02d3 [Phillip Cloud] Refactor types.h
83948ec [Phillip Cloud] Add last_value_ init
51965cd [Phillip Cloud] Min commit that contains the unique kernel in arrow
e25c59b [Phillip Cloud] Fix reader writer test for unique kernel addition
da0a7eb [Phillip Cloud] Update for ARROW-1811
16935de [Phillip Cloud] Reverse operand order and explicit cast
6036ca5 [Phillip Cloud] ARROW-1811
c5c4294 [Phillip Cloud] Fix issues
32a4abe [Phillip Cloud] Cleanup iteration a bit
920832a [Phillip Cloud] Update arrow version
9f97c1d [Phillip Cloud] Update for ARROW-1794: rename DecimalArray to Decimal128Array
b2e0290 [Phillip Cloud] IWYU
64748a8 [Phillip Cloud] Copy from arrow for now
6c9e2a7 [Phillip Cloud] Reduce the number of decimal test cases
7ab2e5c [Phillip Cloud] Parameterize on precision
30655d6 [Phillip Cloud] Use arrow random_decimals
9ff7eb4 [Phillip Cloud] Remove specific template parameters
1eee6a9 [Phillip Cloud] Remove specific randint call
8808e4c [Phillip Cloud] Bump arrow version
659fbc1 [Phillip Cloud] Fix deprecated API call
e162ca1 [Phillip Cloud] Allocate scratch space to hold the byteswapped values
5c9292b [Phillip Cloud] Proper dcheck call
1782da0 [Phillip Cloud] Use arrow
3d243d5 [Phillip Cloud] Checkpoint [ci skip]
028fb03 [Phillip Cloud] Remove garbage values
46dff15 [Phillip Cloud] Clean up uint32 test
613255e [Phillip Cloud] Do not use std::copy when reinterpret_cast will suffice
2917a62 [Phillip Cloud] PARQUET-1095: [C++] Read and write Arrow decimal values

Change-Id: Ibe81cd5a5961bbe86c66db811ec8b770ae48c38b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants