Skip to content

Conversation

@wesm
Copy link
Member

@wesm wesm commented Nov 20, 2016

Modeled after Java version in ARROW-367

wesm added 4 commits November 18, 2016 15:58
Change-Id: I156149572f7913ff47db6adaba7ddb579c706a6e
Change-Id: Ic0b5d44bc2ab04a1a4b6a2dabf3c932b39249787
Change-Id: I0d915b0309ba1515422d23d9ec738bbe3ec0a1f8
Change-Id: Ie39ca80405bad788f306c38853fbb5e768b5a3ef
@wesm wesm changed the title ARROW-383: [C++] WIP: Integration testing CLI tool ARROW-383: [C++] Integration testing CLI tool Nov 21, 2016
@wesm
Copy link
Member Author

wesm commented Nov 21, 2016

Don't know what's going on with Travis CI, but this is ready to go cc @xhochy

wesm added 3 commits November 21, 2016 12:28
Change-Id: If5af704126a30a45ad7b785f82f70d2da126b32a
…tegration testing

Change-Id: I5cad36511c88dd7d6b8a05d4f58d8ee98da35e3f
Change-Id: Ib36148918bc08435e40edea4f6823c038c8c7075
@wesm
Copy link
Member Author

wesm commented Nov 21, 2016

Java failed due to a transient apt-get issue. I'm going to merge this so I can proceed with ARROW-363 -- if there are comments on this patch I can address them in #211

@wesm
Copy link
Member Author

wesm commented Nov 21, 2016

+1

@asfgit asfgit closed this in f082b17 Nov 21, 2016
@wesm wesm deleted the ARROW-383 branch November 21, 2016 22:55

if (APPLE)
target_link_libraries(json-integration-test
arrow_static
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to factor the common libraries out and only list the apple-specific ones here.

wesm added a commit to wesm/arrow that referenced this pull request Sep 8, 2018
…t-mr <= 1.2.8

This turned up in reading of old data files generated by parquet-mr in 2013. There's a bug in parquet-mr 1.2.8 and lower in which the column chunk metadata in the Parquet file is incorrect. Impala inserted an explicit workaround for this (see See https://github.com/apache/incubator-impala/blob/88448d1d4ab31eaaf82f764b36dc7d11d4c63c32/be/src/exec/hdfs-parquet-scanner.cc#L1227).

In this particular file, the dictionary page header is 15 bytes, and the entire column chunk is: 15 (dict page header) + 277 (dictionary) + 17 (data page header) + 28 (data page) bytes, making 337 bytes.

But the metadata says the column chunk is only 322 bytes – the dict page header size got dropped from the accounting.

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#209 from wesm/PARQUET-816 and squashes the following commits:

21fdcbe [Wes McKinney] Move FileVersion to an inner class in FileMetaData
64e7f95 [Wes McKinney] Remove unnecessary std::move causing clang warning
bacb815 [Wes McKinney] Fix compilation error in benchmarks
f4c259e [Wes McKinney] cpplint
1e8c160 [Wes McKinney] clang-format
d2aa9a8 [Wes McKinney] Do not continue reading data pages in SerializedPageReader reading the indicated number of rows in a row group
2638490 [Wes McKinney] Bring in IMPALA-694 workaround for PARQUET-816
bd3e949 [Wes McKinney] Optimistically decode truncated data pages. Add example data file

Change-Id: I1cd6c9e754ad1986c797d624989491af544a26b5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants