Skip to content

Commit

Permalink
[SPARK-49693][PYTHON][CONNECT] Refine the string representation of `t…
Browse files Browse the repository at this point in the history
…imedelta`

### What changes were proposed in this pull request?
Refine the string representation of `timedelta`, by following the ISO format.
Note that the used units in JVM side (`Duration`) and Pandas are different.

### Why are the changes needed?
We should not leak the raw data

### Does this PR introduce _any_ user-facing change?
yes

PySpark Classic:
```
In [1]: from pyspark.sql import functions as sf

In [2]: import datetime

In [3]: sf.lit(datetime.timedelta(1, 1))
Out[3]: Column<'PT24H1S'>
```

PySpark Connect (before):
```
In [1]: from pyspark.sql import functions as sf

In [2]: import datetime

In [3]: sf.lit(datetime.timedelta(1, 1))
Out[3]: Column<'86401000000'>
```

PySpark Connect (after):
```
In [1]: from pyspark.sql import functions as sf

In [2]: import datetime

In [3]: sf.lit(datetime.timedelta(1, 1))
Out[3]: Column<'P1DT0H0M1S'>
```

### How was this patch tested?
added test

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#48159 from zhengruifeng/pc_lit_delta.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
  • Loading branch information
zhengruifeng committed Sep 19, 2024
1 parent 398457a commit 94dca78
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 2 deletions.
12 changes: 11 additions & 1 deletion python/pyspark/sql/connect/expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -489,7 +489,17 @@ def __repr__(self) -> str:
ts = TimestampNTZType().fromInternal(self._value)
if ts is not None and isinstance(ts, datetime.datetime):
return ts.strftime("%Y-%m-%d %H:%M:%S.%f")
# TODO(SPARK-49693): Refine the string representation of timedelta
elif isinstance(self._dataType, DayTimeIntervalType):
delta = DayTimeIntervalType().fromInternal(self._value)
if delta is not None and isinstance(delta, datetime.timedelta):
import pandas as pd

# Note: timedelta itself does not provide isoformat method.
# Both Pandas and java.time.Duration provide it, but the format
# is sightly different:
# java.time.Duration only applies HOURS, MINUTES, SECONDS units,
# while Pandas applies all supported units.
return pd.Timedelta(delta).isoformat() # type: ignore[attr-defined]
return f"{self._value}"


Expand Down
23 changes: 22 additions & 1 deletion python/pyspark/sql/tests/test_column.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,13 @@
from enum import Enum
from itertools import chain
import datetime
import unittest

from pyspark.sql import Column, Row
from pyspark.sql import functions as sf
from pyspark.sql.types import StructType, StructField, IntegerType, LongType
from pyspark.errors import AnalysisException, PySparkTypeError, PySparkValueError
from pyspark.testing.sqlutils import ReusedSQLTestCase
from pyspark.testing.sqlutils import ReusedSQLTestCase, have_pandas, pandas_requirement_message


class ColumnTestsMixin:
Expand Down Expand Up @@ -289,6 +290,26 @@ def test_lit_time_representation(self):
ts = datetime.datetime(2021, 3, 4, 12, 34, 56, 1234)
self.assertEqual(str(sf.lit(ts)), "Column<'2021-03-04 12:34:56.001234'>")

@unittest.skipIf(not have_pandas, pandas_requirement_message)
def test_lit_delta_representation(self):
for delta in [
datetime.timedelta(days=1),
datetime.timedelta(hours=2),
datetime.timedelta(minutes=3),
datetime.timedelta(seconds=4),
datetime.timedelta(microseconds=5),
datetime.timedelta(days=2, hours=21, microseconds=908),
datetime.timedelta(days=1, minutes=-3, microseconds=-1001),
datetime.timedelta(days=1, hours=2, minutes=3, seconds=4, microseconds=5),
]:
import pandas as pd

# Column<'PT69H0.000908S'> or Column<'P2DT21H0M0.000908S'>
s = str(sf.lit(delta))

# Parse the ISO string representation and compare
self.assertTrue(pd.Timedelta(s[8:-2]).to_pytimedelta() == delta)

def test_enum_literals(self):
class IntEnum(Enum):
X = 1
Expand Down

0 comments on commit 94dca78

Please sign in to comment.