Skip to content

[SPARK-13647][SQL] also check if numeric value is within allowed range in _verify_type #11492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 24 additions & 3 deletions python/pyspark/sql/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -1093,8 +1093,11 @@ def _infer_schema_type(obj, dataType):

def _verify_type(obj, dataType):
"""
Verify the type of obj against dataType, raise an exception if
they do not match.
Verify the type of obj against dataType, raise a TypeError if they do not match.

Also verify the value of obj against datatype, raise a ValueError if it's not within the allowed
range, e.g. using 128 as ByteType will overflow. Note that, Python float is not checked, so it
will become infinity when cast to Java float if it overflows.

>>> _verify_type(None, StructType([]))
>>> _verify_type("", StringType())
Expand All @@ -1111,6 +1114,12 @@ def _verify_type(obj, dataType):
Traceback (most recent call last):
...
ValueError:...
>>> # Check if numeric values are within the allowed range.
>>> _verify_type(12, ByteType())
>>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError:...
"""
# all objects are nullable
if obj is None:
Expand All @@ -1137,7 +1146,19 @@ def _verify_type(obj, dataType):
if type(obj) not in _acceptable_types[_type]:
raise TypeError("%s can not accept object %r in type %s" % (dataType, obj, type(obj)))

if isinstance(dataType, ArrayType):
if isinstance(dataType, ByteType):
if obj < -128 or obj > 127:
raise ValueError("object of ByteType out of range, got: %s" % obj)

elif isinstance(dataType, ShortType):
if obj < -32768 or obj > 32767:
raise ValueError("object of ShortType out of range, got: %s" % obj)

elif isinstance(dataType, IntegerType):
if obj < -2147483648 or obj > 2147483647:
raise ValueError("object of IntegerType out of range, got: %s" % obj)

elif isinstance(dataType, ArrayType):
for i in obj:
_verify_type(i, dataType.elementType)

Expand Down