Skip to content

Commit

Permalink
[SPARK-48996][SQL][PYTHON] Allow bare literals for __and__ and __or__…
Browse files Browse the repository at this point in the history
… of Column

### What changes were proposed in this pull request?

Allows bare literals for `__and__` and `__or__` of Column API in Spark Classic.

### Why are the changes needed?

Currently bare literals are not allowed for `__and__` and `__or__` of Column API in Spark Classic and need to wrap with `lit()` function. It should be allowed similar to other similar operators.

```py
>>> from pyspark.sql.functions import *
>>> c = col("c")
>>> c & True
Traceback (most recent call last):
...
py4j.Py4JException: Method and([class java.lang.Boolean]) does not exist

>>> c & lit(True)
Column<'and(c, true)'>
```

whereas other operators:

```py
>>> c + 1
Column<'`+`(c, 1)'>
>>> c + lit(1)
Column<'`+`(c, 1)'>
```

Spark Connect allows this.

```py
>>> c & True
Column<'and(c, True)'>
>>> c & lit(True)
Column<'and(c, True)'>
```

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Added the related tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47474 from ueshin/issues/SPARK-48996/literal_and_or.

Authored-by: Takuya Ueshin <ueshin@databricks.com>
Signed-off-by: Takuya Ueshin <ueshin@databricks.com>
  • Loading branch information
ueshin authored and ilicmarkodb committed Jul 29, 2024
1 parent 0c38865 commit ba82b54
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 4 deletions.
16 changes: 12 additions & 4 deletions python/pyspark/sql/classic/column.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,25 +309,33 @@ def eqNullSafe(
def __and__(
self, other: Union[ParentColumn, "LiteralType", "DecimalLiteral", "DateTimeLiteral"]
) -> ParentColumn:
return _bin_op("and", self, other)
from pyspark.sql.functions import lit

return _bin_op("and", self, lit(other))

def __or__(
self, other: Union[ParentColumn, "LiteralType", "DecimalLiteral", "DateTimeLiteral"]
) -> ParentColumn:
return _bin_op("or", self, other)
from pyspark.sql.functions import lit

return _bin_op("or", self, lit(other))

def __invert__(self) -> ParentColumn:
return _func_op("not", self)

def __rand__(
self, other: Union[ParentColumn, "LiteralType", "DecimalLiteral", "DateTimeLiteral"]
) -> ParentColumn:
return _bin_op("and", self, other)
from pyspark.sql.functions import lit

return _bin_op("and", self, lit(other))

def __ror__(
self, other: Union[ParentColumn, "LiteralType", "DecimalLiteral", "DateTimeLiteral"]
) -> ParentColumn:
return _bin_op("or", self, other)
from pyspark.sql.functions import lit

return _bin_op("or", self, lit(other))

# container operators
def __contains__(self, item: Any) -> None:
Expand Down
8 changes: 8 additions & 0 deletions python/pyspark/sql/tests/test_column.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,14 @@ def test_column_operators(self):
cs.startswith("a"),
cs.endswith("a"),
ci.eqNullSafe(cs),
sf.col("b") & sf.lit(True),
sf.col("b") & True,
sf.lit(True) & sf.col("b"),
True & sf.col("b"),
sf.col("b") | sf.lit(True),
sf.col("b") | True,
sf.lit(True) | sf.col("b"),
True | sf.col("b"),
)
self.assertTrue(all(isinstance(c, Column) for c in css))
self.assertTrue(isinstance(ci.cast(LongType()), Column))
Expand Down

0 comments on commit ba82b54

Please sign in to comment.