Skip to content

Polars does not retain timezone information when reading data from a nested dictionary #20766

Closed
@rikjongerius

Description

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import datetime
import zoneinfo

import polars as pl

print(pl.__version__)
"""1.20.0"""

# Working example with unnested dictionary
data = [
    {
        "timestamp": datetime.datetime(
            2021, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo("Europe/Amsterdam")
        )
    }
]
df = pl.DataFrame(data)
print(df)
"""
shape: (1, 1)
┌─────────────────────────┐
│ timestamp               │
│ ---                     │
│ datetime[μs, UTC]       │
╞═════════════════════════╡
│ 2020-12-31 23:00:00 UTC │
└─────────────────────────┘
"""

data = [
    {
        "timestamp": {
            "content": datetime.datetime(
                2021, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo("Europe/Amsterdam")
            )
        }
    }
]

# Broken example with nested dictionary
df2 = pl.DataFrame(data).unnest("timestamp")
print(df2)
# Wrong output, I would have expected a datetime[μs, UTC]
"""
┌─────────────────────┐
│ content             │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2020-12-31 23:00:00 │
└─────────────────────┘
"""

Log output

Issue description

I need to read data from a legacy data source that returns data as a list of nested objects. This format can be unnested to a regular table. However, in this process the timezone information is dropped from the column schema.

There seem to be a few related issues, but I think this one is not covered by the other issues.
#20264: The suggested workaround is using a dictionary, which I use here and is not working for nested dictionaries.
#19509: This one seems to break when there is a None value in the timestamp field, which I do not have in my example.
#19268: Actually gives an error, I do not get an error. Plus, I'm not using map_elements.

Expected behavior

The timezone information is correctly parsed (the Europe/Amsterdam time is converted to UTC), however there is no timezone set on the column dtype. I expect the column dtype in the broken example to be datetime[μs, UTC].

Installed versions

--------Version info---------
Polars:              1.20.0
Index type:          UInt32
Platform:            Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.36
Python:              3.11.11 (main, Dec  6 2024, 20:02:44) [Clang 18.1.8 ]
LTS CPU:             False

----Optional dependencies----
Azure CLI            2.67.0
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       1.19.0
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         <not installed>
numpy                <not installed>
openpyxl             <not installed>
pandas               <not installed>
pyarrow              <not installed>
pydantic             2.10.5
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>

Metadata

Assignees

No one assigned

    Labels

    A-temporalArea: date/time functionalityP-lowPriority: lowbugSomething isn't workingpythonRelated to Python Polars

    Type

    No type

    Projects

    • Status

      Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions