Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading Parquet float16 data as float32 #19265

Closed
adamreeve opened this issue Oct 16, 2024 · 1 comment · Fixed by #19278
Closed

Support reading Parquet float16 data as float32 #19265

adamreeve opened this issue Oct 16, 2024 · 1 comment · Fixed by #19278
Assignees
Labels
A-io-parquet Area: reading/writing Parquet files accepted Ready for implementation enhancement New feature or an improvement of an existing feature support

Comments

@adamreeve
Copy link
Contributor

Description

Support for working with float16 data in Polars is still being debated: #7288

In the meantime, it would be useful to be able to at least read float16 data from Parquet files. Eg:

import numpy as np
import polars as pl
import pyarrow as pa
import pyarrow.parquet as pq


table = pa.Table.from_pydict({
    'x': pa.array(np.array([0.0, 0.5, 1.0, 1.5], dtype=np.float16), type=pa.float16()),
})
pq.write_table(table, 'data.parquet')

df = pl.read_parquet('data.parquet')

Using Polars 1.9.0, this fails with:

polars.exceptions.ComputeError: parquet: File out of specification: Invalid thrift: bad data

Ideally this would work and convert the data to float32, which is what happens if you use pl.read_parquet('data.parquet', use_pyarrow=True), but this isn't possible when using pl.scan_parquet with lazy or streaming operations.

@adamreeve adamreeve added the enhancement New feature or an improvement of an existing feature label Oct 16, 2024
@adamreeve
Copy link
Contributor Author

The ability to write float32 as float16 Parquet might also be useful, but that would probably depend on #17418

@coastalwhite coastalwhite added accepted Ready for implementation A-io-parquet Area: reading/writing Parquet files labels Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-parquet Area: reading/writing Parquet files accepted Ready for implementation enhancement New feature or an improvement of an existing feature support
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants