Skip to content

Commit 17c9ae7

Browse files
xhochywesm
authored andcommitted
ARROW-357: Use a single RowGroup for Parquet files as default.
This is not the optimal choice, we should rather have an option to optimise for the underlying block size of the filesystem but without the infrastructure for that in ``parquet-cpp``, writing a single RowGroup is the much better choice. Author: Uwe L. Korn <uwelk@xhochy.com> Closes #192 from xhochy/ARROW-357 and squashes the following commits: 9eccefd [Uwe L. Korn] ARROW-357: Use a single RowGroup for Parquet files as default.
1 parent 2a059bd commit 17c9ae7

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

python/pyarrow/parquet.pyx

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,8 @@ def write_table(table, filename, chunk_size=None, version=None,
106106
table : pyarrow.Table
107107
filename : string
108108
chunk_size : int
109-
The maximum number of rows in each Parquet RowGroup
109+
The maximum number of rows in each Parquet RowGroup. As a default,
110+
we will write a single RowGroup per file.
110111
version : {"1.0", "2.0"}, default "1.0"
111112
The Parquet format version, defaults to 1.0
112113
use_dictionary : bool or list
@@ -121,7 +122,7 @@ def write_table(table, filename, chunk_size=None, version=None,
121122
cdef WriterProperties.Builder properties_builder
122123
cdef int64_t chunk_size_ = 0
123124
if chunk_size is None:
124-
chunk_size_ = min(ctable_.num_rows(), int(2**16))
125+
chunk_size_ = ctable_.num_rows()
125126
else:
126127
chunk_size_ = chunk_size
127128

0 commit comments

Comments
 (0)