Skip to content

When datafusion.execution.parquet.coerce_int96 is set, timestamp type is still reported as Timestamp(nanoseconds) #15721

@alamb

Description

@alamb

Describe the bug

datafusion.execution.parquet.coerce_int96 is supposed to

If true, parquet reader will read columns of physical type int96 as originating from a different resolution than nanosecond. This is useful for reading data from systems like Spark which stores microsecond resolution timestamps in an int96 allowing it to write values with a larger date range than 64-bit timestamps with nanosecond resolution.

However, when I set this to ms the type is still reported to be Timestamp(Nanoseconds)

To Reproduce

-- Enable coercion of int96 to microseconds
set datafusion.execution.parquet.coerce_int96 = ms;

-- Create external table
CREATE EXTERNAL TABLE int96_from_spark
STORED AS PARQUET
LOCATION 'parquet-testing/data/int96_from_spark.parquet';

-- Print schema
describe int96_from_spark;

Results in

+-------------+-----------------------------+-------------+
| column_name | data_type                   | is_nullable |
+-------------+-----------------------------+-------------+
| a           | Timestamp(Nanosecond, None) | YES         |
+-------------+-----------------------------+-------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

Expected behavior

I expect the output type to be Timestamp(Microsecond, None)

Additional context

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions