Open
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Using PyArrow to read the first n rows of a file, as suggested in #24511 could help more conveniently read part of a large DataFrame that may otherwise not fit into memory.
Feature Description
Add a parameter, n_rows, to read_parquet. Will use the ParquetFile.iterbatches() generator in the PyArrow implementation and ParquetFile.head() in the fastparquet implementation (although this would be purely for convenience and not have any performance benefits).
Alternative Solutions
You could currently use PyArrow:
from pyarrow.parquet import ParquetFile
import pyarrow as pa
pf = ParquetFile('file_name.pq')
first_ten_rows = next(pf.iter_batches(batch_size = 10))
df = pa.Table.from_batches([first_ten_rows]).to_pandas()
Additional Context
No response