Skip to content

Commit

Permalink
[SPARK-45018][PYTHON][CONNECT] Add CalendarIntervalType to Python Client
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
Add CalendarIntervalType to Python Client

### Why are the changes needed?
for feature parity

### Does this PR introduce _any_ user-facing change?
yes

before this PR:
```
In [1]: from pyspark.sql import functions as sf

In [2]: spark.range(1).select(sf.make_interval(sf.lit(1))).schema
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 spark.range(1).select(sf.make_interval(sf.lit(1))).schema

File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:1687, in DataFrame.schema(self)
   1685     if self._session is None:
   1686         raise Exception("Cannot analyze without SparkSession.")
-> 1687     return self._session.client.schema(query)
   1688 else:
   1689     raise Exception("Empty plan.")

...

Exception: Unsupported data type calendar_interval
```

after this PR:
```
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 4.0.0.dev0
      /_/

Using Python version 3.10.11 (main, May 17 2023 14:30:36)
Client connected to the Spark Connect server at localhost
SparkSession available as 'spark'.

In [1]: from pyspark.sql import functions as sf

In [2]: spark.range(1).select(sf.make_interval(sf.lit(1))).schema
Out[2]: StructType([StructField('make_interval(1, 0, 0, 0, 0, 0, 0)', CalendarIntervalType(), True)])
```

### How was this patch tested?
added UT

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes apache#42743 from zhengruifeng/py_connect_cal_interval.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
  • Loading branch information
zhengruifeng committed Aug 31, 2023
1 parent 9a023c4 commit 9d84369
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 1 deletion.
5 changes: 5 additions & 0 deletions python/pyspark/sql/connect/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
TimestampNTZType,
DayTimeIntervalType,
YearMonthIntervalType,
CalendarIntervalType,
MapType,
StringType,
CharType,
Expand Down Expand Up @@ -169,6 +170,8 @@ def pyspark_types_to_proto_types(data_type: DataType) -> pb2.DataType:
elif isinstance(data_type, YearMonthIntervalType):
ret.year_month_interval.start_field = data_type.startField
ret.year_month_interval.end_field = data_type.endField
elif isinstance(data_type, CalendarIntervalType):
ret.calendar_interval.CopyFrom(pb2.DataType.CalendarInterval())
elif isinstance(data_type, StructType):
struct = pb2.DataType.Struct()
for field in data_type.fields:
Expand Down Expand Up @@ -265,6 +268,8 @@ def proto_schema_to_pyspark_data_type(schema: pb2.DataType) -> DataType:
else None
)
return YearMonthIntervalType(startField=start, endField=end)
elif schema.HasField("calendar_interval"):
return CalendarIntervalType()
elif schema.HasField("array"):
return ArrayType(
proto_schema_to_pyspark_data_type(schema.array.element_type),
Expand Down
2 changes: 1 addition & 1 deletion python/pyspark/sql/tests/connect/test_parity_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ def test_rdd_with_udt(self):
def test_udt(self):
super().test_udt()

@unittest.skip("SPARK-45018: should support CalendarIntervalType")
@unittest.skip("SPARK-45026: spark.sql should support datatypes not compatible with arrow")
def test_calendar_interval_type(self):
super().test_calendar_interval_type()

Expand Down
4 changes: 4 additions & 0 deletions python/pyspark/sql/tests/test_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -1284,6 +1284,10 @@ def test_calendar_interval_type(self):
schema1 = self.spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)").schema
self.assertEqual(schema1.fields[0].dataType, CalendarIntervalType())

def test_calendar_interval_type_with_sf(self):
schema1 = self.spark.range(1).select(F.make_interval(F.lit(1))).schema
self.assertEqual(schema1.fields[0].dataType, CalendarIntervalType())


class DataTypeTests(unittest.TestCase):
# regression test for SPARK-6055
Expand Down

0 comments on commit 9d84369

Please sign in to comment.