Skip to content

Commit b5b4df5

Browse files
committed
ARROW-356: Add documentation about reading Parquet
Change-Id: I1810ccbb021a79f1da1474cc1b952ab98503f010
1 parent 772bc6e commit b5b4df5

File tree

2 files changed

+72
-7
lines changed

2 files changed

+72
-7
lines changed

python/doc/index.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,14 @@ additional functionality such as reading Apache Parquet files into Arrow
3131
structures.
3232

3333
.. toctree::
34-
:maxdepth: 4
35-
:hidden:
34+
:maxdepth: 2
35+
:caption: Getting Started
3636

3737
Module Reference <modules.rst>
3838

39-
Indices and tables
40-
==================
39+
.. toctree::
40+
:maxdepth: 2
41+
:caption: Additional Features
42+
43+
Parquet format <parquet.rst>
4144

42-
* :ref:`genindex`
43-
* :ref:`modindex`
44-
* :ref:`search`

python/doc/parquet.rst

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
.. Licensed to the Apache Software Foundation (ASF) under one
2+
.. or more contributor license agreements. See the NOTICE file
3+
.. distributed with this work for additional information
4+
.. regarding copyright ownership. The ASF licenses this file
5+
.. to you under the Apache License, Version 2.0 (the
6+
.. "License"); you may not use this file except in compliance
7+
.. with the License. You may obtain a copy of the License at
8+
9+
.. http://www.apache.org/licenses/LICENSE-2.0
10+
11+
.. Unless required by applicable law or agreed to in writing,
12+
.. software distributed under the License is distributed on an
13+
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
.. KIND, either express or implied. See the License for the
15+
.. specific language governing permissions and limitations
16+
.. under the License.
17+
18+
Reading/Writing Parquet files
19+
=============================
20+
21+
If you have built ``pyarrow`` with Parquet support, i.e. ``parquet-cpp`` was
22+
found during the build, you can read files in the Parquet format to/from Arrow
23+
memory structures. The Parquet support code is located in the
24+
:mod:`pyarrow.parquet` module.
25+
26+
Reading Parquet
27+
---------------
28+
29+
To read a Parquet file into Arrow memory, you can use the following code
30+
snippet. It will read the whole Parquet file into memory as an
31+
:class:`pyarrow.table.Table`.
32+
33+
.. code-block:: python
34+
35+
import pyarrow
36+
import pyarrow.parquet
37+
38+
A = pyarrow
39+
40+
table = A.parquet.read_table('<filename>')
41+
42+
Writing Parquet
43+
---------------
44+
45+
Given an instance of :class:`pyarrow.table.Table`, the most simple way to
46+
persist it to Parquet is by using the :meth:`pyarrow.parquet.write_table`
47+
method.
48+
49+
.. code-block:: python
50+
51+
import pyarrow
52+
import pyarrow.parquet
53+
54+
A = pyarrow
55+
56+
table = A.Table(..)
57+
A.parquet.write_table(table, '<filename>')
58+
59+
By default this will write the Table as a single RowGroup using ``DICTIONARY``
60+
encoding. To increase the potential of parallelism a query engine can process
61+
a Parquet file, set the ``chunk_size`` to a fraction of the total number of rows.
62+
63+
If you also want to compress the columns, you can select a compression
64+
method using the ``compression`` argument. Typically, ``GZIP`` is the choice if
65+
you want to minimize size and ``SNAPPY`` for performance.

0 commit comments

Comments
 (0)