Description
Checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import datetime
import zoneinfo
import polars as pl
print(pl.__version__)
"""1.20.0"""
# Working example with unnested dictionary
data = [
{
"timestamp": datetime.datetime(
2021, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo("Europe/Amsterdam")
)
}
]
df = pl.DataFrame(data)
print(df)
"""
shape: (1, 1)
┌─────────────────────────┐
│ timestamp │
│ --- │
│ datetime[μs, UTC] │
╞═════════════════════════╡
│ 2020-12-31 23:00:00 UTC │
└─────────────────────────┘
"""
data = [
{
"timestamp": {
"content": datetime.datetime(
2021, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo("Europe/Amsterdam")
)
}
}
]
# Broken example with nested dictionary
df2 = pl.DataFrame(data).unnest("timestamp")
print(df2)
# Wrong output, I would have expected a datetime[μs, UTC]
"""
┌─────────────────────┐
│ content │
│ --- │
│ datetime[μs] │
╞═════════════════════╡
│ 2020-12-31 23:00:00 │
└─────────────────────┘
"""
Log output
Issue description
I need to read data from a legacy data source that returns data as a list of nested objects. This format can be unnested to a regular table. However, in this process the timezone information is dropped from the column schema.
There seem to be a few related issues, but I think this one is not covered by the other issues.
#20264: The suggested workaround is using a dictionary, which I use here and is not working for nested dictionaries.
#19509: This one seems to break when there is a None value in the timestamp field, which I do not have in my example.
#19268: Actually gives an error, I do not get an error. Plus, I'm not using map_elements
.
Expected behavior
The timezone information is correctly parsed (the Europe/Amsterdam time is converted to UTC), however there is no timezone set on the column dtype. I expect the column dtype in the broken example to be datetime[μs, UTC]
.
Installed versions
--------Version info---------
Polars: 1.20.0
Index type: UInt32
Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.36
Python: 3.11.11 (main, Dec 6 2024, 20:02:44) [Clang 18.1.8 ]
LTS CPU: False
----Optional dependencies----
Azure CLI 2.67.0
adbc_driver_manager <not installed>
altair <not installed>
azure.identity 1.19.0
boto3 <not installed>
cloudpickle <not installed>
connectorx <not installed>
deltalake <not installed>
fastexcel <not installed>
fsspec <not installed>
gevent <not installed>
google.auth <not installed>
great_tables <not installed>
matplotlib <not installed>
nest_asyncio <not installed>
numpy <not installed>
openpyxl <not installed>
pandas <not installed>
pyarrow <not installed>
pydantic 2.10.5
pyiceberg <not installed>
sqlalchemy <not installed>
torch <not installed>
xlsx2csv <not installed>
xlsxwriter <not installed>
Metadata
Assignees
Type
Projects
Status
Done