[Func/Arith] Multiply, Division and Modulus (#32)

* feat: multiply arithmetic function * feat: cudf options on multiply * feat: division arithmetic function * fix: supplemental description for On_domain_error and On_division_by_zero in divide function * feat: add modulus function * fix: modulus function options * fix: modulus on_domain_error cases * feat: overflow information in modulus function * fix: NULL cases in modulus function
substrait-io · Feb 6, 2024 · a2a927a · a2a927a
1 parent 8a10e81
commit a2a927a
Show file tree

Hide file tree

Showing 14 changed files with 362 additions and 23 deletions.
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -2,7 +2,7 @@
     "python.formatting.provider": "black",
     "editor.formatOnSave": true,
     "editor.codeActionsOnSave": {
-        "source.organizeImports": true
+        "source.organizeImports": "explicit"
     },
     "isort.args": [
         "--profile",

diff --git a/bft/cases/parser.py b/bft/cases/parser.py
@@ -14,7 +14,7 @@ def __init__(self):
     def __resolve_proto_case(self, case: ProtoCase, function: str) -> Case:
         if case.group not in self.__groups:
             raise Exception(
-                "A case referred to group {case.group} which was not defined in the file"
+                "A case referred to group " + case.group +" which was not defined in the file"
             )
         grp = self.__groups[case.group]
         return Case(function, grp, case.args, case.result, case.options)

diff --git a/cases/arithmetic/divide.yaml b/cases/arithmetic/divide.yaml
@@ -61,7 +61,6 @@ cases:
       on_division_by_zero: ERROR
     result:
       special: error
-      type: i8
   - group: division_by_zero
     args:
       - value: 5

diff --git a/cases/arithmetic/modulus.yaml b/cases/arithmetic/modulus.yaml
@@ -75,26 +75,49 @@ cases:
       value: null
       type: i64
   - group:
-      id: division_by_zero
-      description: Examples demonstrating division by zero
+      id: on_domain_error
+      description: Examples demonstrating operation when the divisor is 0
     args:
       - value: 5
         type: i8
       - value: 0
         type: i8
     options:
-      on_division_by_zero: NONE
+      on_domain_error: "NULL"
     result:
       value: null
       type: i8
-  - group: division_by_zero
+  - group: on_domain_error
     args:
       - value: 5
         type: i8
       - value: 0
         type: i8
     options:
-      on_division_by_zero: NAN
+      on_domain_error: ERROR
     result:
-      value: nan
+      special: error
+  - group:
+      id: division_type
+      description: Examples demonstrating truncate and floor division types
+    args:
+      - value: 8
+        type: i8
+      - value: -3
+        type: i8
+    options:
+      division_type: TRUNCATE
+    result:
+      value: 2
+      type: i8
+  - group: division_type
+    args:
+      - value: 8
+        type: i8
+      - value: -3
+        type: i8
+    options:
+      division_type: FLOOR
+    result:
+      value: -1
       type: i8
diff --git a/dialects/cudf.yaml b/dialects/cudf.yaml
@@ -12,19 +12,23 @@ scalar_functions:
       overflow: SILENT # cudf rolls over on overflow
       rounding: TIE_TO_EVEN
   - name: multiply
+    local_name: multiply
     required_options:
-      overflow: SILENT
+      overflow: SILENT # cudf rolls over on overflow
+      rounding: TIE_TO_EVEN
   - name: divide
+    local_name: divide
     required_options:
-      on_division_by_zero: LIMIT
-      overflow: SILENT
+      overflow: SILENT # cudf rolls over on overflow
       rounding: TIE_TO_EVEN
+      on_division_by_zero: LIMIT
+      on_domain_error: NAN
   - name: modulus
     local_name: mod
     required_options:
-      on_division_by_zero: NAN
+      division_type: FLOOR
       overflow: SILENT
-      rounding: TIE_TO_EVEN
+      on_domain_error: "NULL"
   - name: power
     local_name: pow
   - name: sqrt

diff --git a/dialects/datafusion.yaml b/dialects/datafusion.yaml
@@ -25,7 +25,7 @@ scalar_functions:
       overflow: ERROR
       rounding: TIE_TO_EVEN
   - name: modulus
-    unsupported: true
+    unsupported: True
   - name: power
   - name: sqrt
     required_options:

diff --git a/dialects/duckdb.yaml b/dialects/duckdb.yaml
@@ -30,9 +30,18 @@ scalar_functions:
     local_name: "%"
     infix: True
     required_options:
-      on_division_by_zero: NONE
+      division_type: TRUNCATE
       overflow: ERROR
-      rounding: TIE_TO_EVEN
+      on_domain_error: "NULL"
+    unsupported_kernels:
+      - args:
+          - fp32
+          - fp32
+        result: fp32
+      - args:
+          - fp64
+          - fp64
+        result: fp64
   - name: power
     required_options:
       overflow: ERROR

diff --git a/dialects/postgres.yaml b/dialects/postgres.yaml
@@ -68,8 +68,9 @@ scalar_functions:
     local_name: "%"
     infix: True
     required_options:
-      on_division_by_zero: NONE
+      division_type: TRUNCATE
       overflow: ERROR
+      on_domain_error: ERROR
     unsupported_kernels:
       - args:
           - i8

diff --git a/dialects/sqlite.yaml b/dialects/sqlite.yaml
@@ -38,8 +38,9 @@ scalar_functions:
     local_name: "%"
     infix: True
     required_options:
-      on_division_by_zero: NONE
+      division_type: TRUNCATE
       overflow: SILENT
+      on_domain_error: "NULL"
   - name: power
   - name: sqrt
     required_options:

diff --git a/dialects/velox_presto.yaml b/dialects/velox_presto.yaml
@@ -99,8 +99,9 @@ scalar_functions:
   - name: modulus
     local_name: mod
     required_options:
-      on_division_by_zero: NONE
+      division_type: TRUNCATE
       overflow: ERROR
+      on_domain_error: ERROR
     unsupported_kernels:
       - args:
           - i8

diff --git a/supplemental/arithmetic/divide.md b/supplemental/arithmetic/divide.md
@@ -0,0 +1,116 @@
+# Divide
+
+## Options
+
+### Overflow
+
+Dividing two integers can trigger an overflow when the result is outside the
+representable range of the type class. This option controls what happens when
+this overflow occurs.
+
+#### SILENT
+
+If an overflow occurs then an integer value will be returned. The value is
+undefined. It may be any integer and can change from engine to engine or
+even from row to row within the same query.  The only constraint is that it
+must be a valid value for the result type class (e.g. dividing two int16
+cannot yield an int32 on overflow)
+
+#### SATURATE
+
+If an overflow occurs then the largest (for positive overflow) or smallest
+(for negative overflow) possible value for the type class will be returned.
+
+#### ERROR
+
+If an overflow occurs then an error should be raised.
+
+### Rounding
+
+Dividing two floating point numbers can yield a result that is not exactly
+representable in the given type class. In this case the value will be rounded.
+Rounding behaviors are defined as part of the IEEE 754 standard.
+
+#### TIE_TO_EVEN
+
+Round to the nearest value. If the number is exactly halfway between two
+values then round to the number whose least significant digit is even. Or,
+because we are working with binary digits, round to the number whose last digit
+is 0. This is the default behavior in many systems because it helps to avoid
+bias in rounding.
+
+#### TIE_AWAY_FROM_ZERO
+
+Round to the nearest value. If the number is exactly halfway between two values
+then round to the number furthest from zero.
+
+#### TRUNCATE
+
+Round to the nearest value. If the number is exactly halfway between two values
+then round to the value closest to zero.
+
+#### CEILING
+
+Round to the value closest to positive infinity.
+
+#### FLOOR
+
+Round to the value closest to negative infinity.
+
+### On_domain_error
+
+Option controls what happens when the dividend and divisor in a divide function
+are either both 0 or both ±infinity.
+
+#### NAN
+
+Return a Not a Number value if the dividend and the divisor are either both 0 or
+both ±infinity.
+
+#### ERROR
+
+If the dividend and the divisor are either both 0 or both ±infinity an error should 
+be raised.
+
+### On_division_by_zero
+
+Option controls function behavior in cases when the divisor is 0 but the dividend is not zero.
+
+#### LIMIT
+
+Return +infinity or -infinity depending on the signs of the dividend and the divisor involved.
+
+## Details
+
+### Other floating point exceptions
+
+The IEEE 754 standard defines a number of exceptions beyond rounding. For
+example, overflow, and underflow. However, these exceptions
+have default behaviors defined by IEEE 754 and, since no known engine deviates
+from these default values, these exceptions are not exposed as options. For more
+information on what happens in these cases refer to the IEEE 754 standard.
+
+### Not commutative
+
+Division, the algebraic operation, is commutative.  So it may be tempting to
+believe the divide function is commutative as well.  However, this is not true
+because of overflow.  For example, when working with int8 the result of
+divide(divide(-128, -1), -1) will yield a different result than
+divide(-128, divide(-1, -1)) because the first will overflow and the second
+will not.
+
+## Properties
+
+### Null propagating
+
+If any of the inputs is null then the output will be null
+
+### NaN propagating
+
+If any of the inputs is NaN (and the other input is not null) then the output
+will be NaN
+
+### Stateless
+
+The output will be the same regardless of the order of input rows. This is not
+guaranteed to be true for integer division when overflow is SILENT.
diff --git a/supplemental/arithmetic/modulus.md b/supplemental/arithmetic/modulus.md
@@ -0,0 +1,92 @@
+# Modulus
+
+## Options
+
+### Overflow
+
+The modulus operation typically occurs after finding the quotient,
+i.e., mod(x, y) = x - round_func(x/y), where the round_func can be
+to truncate, floor, or any such operation. Thus, the entire operation
+may trigger an overflow when the result is outside the representable
+range of the type class. This option controls what happens when this overflow occurs.
+
+#### SILENT
+
+If an overflow occurs then an integer value will be returned. The value is
+undefined. It may be any integer and can change from engine to engine or
+even from row to row within the same query.  The only constraint is that it
+must be a valid value for the result type class (e.g. modulus of int16
+cannot yield an int32 on overflow)
+
+#### SATURATE
+
+If an overflow occurs then the largest (for positive overflow) or smallest
+(for negative overflow) possible value for the type class will be returned.
+
+#### ERROR
+
+If an overflow occurs then an error should be raised.
+
+### Division_type
+
+Determines the nature of division rounding function and quotient
+evaluation that shall lead to the reminder. The reminder will be
+determined by  r = x - round_func(x/y)
+
+#### TRUNCATE
+
+The quotient is evaluated i.e. the round_func(x/y) is truncated,
+thus the fractional result is rounded towards zero.
+
+#### FLOOR
+
+The quotient is evaluated i.e. the round_func(x/y) is floored,
+thus the fractional result is rounded to the largest integer
+value less than or equal to it.
+
+### On_domain_error
+
+Option controls what happens when the dividend is ±infinity or
+the divisor is 0 or ±infinity in a divide function.
+
+#### NULL
+
+Return a NULL if the dividend is ±infinity or the divisor is 0
+or ±infinity.
+
+#### ERROR
+
+If the dividend is ±infinity or the divisor is 0 or ±infinity,
+an error should be raised.
+
+## Details
+
+### Overflow
+
+The Modulus function requires the Overflow option in situations
+where any or all of the involved operations result in overflow
+from the specified range. For example, in mod(-128, -1) within
+the int8 range, an overflow will occur as the operation will
+lead to (-128) - round_func(-128/-1). Since the division operation
+(-128/-1) results in an overflow (given that the range of int8
+is -127 to 128), the Overflow option becomes essential.
+
+### Not commutative
+
+Modulus as an arithmetic operation is not commutative by nature.
+
+## Properties
+
+### Null propagating
+
+If any of the inputs is null then the output will be null
+
+### NaN propagating
+
+If any of the inputs is NaN (and the other input is not null) then the output
+will be NaN
+
+### Stateless
+
+The output will be the same regardless of the order of input rows. This is not
+guaranteed to be true for integer division when overflow is SILENT.