Skip to content

Inconsistent Signedness Of Legacy Parquet Timestamps Written By Spark #7958

@comphead

Description

@comphead

Describe the bug

DF reads parquet timestamp datatype as nanos from parquet file whereas DuckDb and Spark treats timestamp datatype as seconds

To Reproduce

create a parquet file with timestamp value -62125747200 and read it back

DuckDb or Spark reads the value correctly

0001-04-25 00:00:00

but DF reads timestamps as nanos and provides the wrong answer

❯ select * from test;
+-------------------------------+
| a                             |
+-------------------------------+
| 1754-12-22T22:43:41.128654848 |
+-------------------------------+

Expected behavior

Behavior should be the same

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions