Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rounding functions to the Column type #6817

Merged
merged 31 commits into from
Jun 1, 2023
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
8c34ebb
move each_propagate into Range
GregoryTravis May 23, 2023
3d8b55c
Column_Ops
GregoryTravis May 23, 2023
2aa2302
round
GregoryTravis May 23, 2023
5f0d00f
truncate ceil floor
GregoryTravis May 23, 2023
b8e5282
unused
GregoryTravis May 23, 2023
97ab8ec
all but date
GregoryTravis May 23, 2023
07527b6
examples
GregoryTravis May 23, 2023
b0ed569
cleanup
GregoryTravis May 23, 2023
c3818dc
cleanup
GregoryTravis May 23, 2023
35d83a1
date_time truncate
GregoryTravis May 24, 2023
430090f
cleanup
GregoryTravis May 24, 2023
f97554d
Merge branch 'develop' into wip/gmt/6805-col-round
GregoryTravis May 24, 2023
2db1924
fix format tests
GregoryTravis May 24, 2023
8a62af2
check column types
GregoryTravis May 24, 2023
759de4d
each_p no
GregoryTravis May 24, 2023
d616cea
noop
GregoryTravis May 24, 2023
b85f558
changelog, test fix
GregoryTravis May 24, 2023
82ed443
java fmt
GregoryTravis May 24, 2023
0eeb10e
review, throw
GregoryTravis May 25, 2023
e9be9d5
move adapter
GregoryTravis May 25, 2023
91babe2
Merge branch 'develop' into wip/gmt/6805-col-round
GregoryTravis May 25, 2023
01f0f65
limit round input for float too
GregoryTravis May 26, 2023
d6aa9ef
Merge branch 'develop' into wip/gmt/6805-col-round
GregoryTravis May 26, 2023
4a4e886
unimplemented stubs
GregoryTravis May 26, 2023
25cbea1
Merge branch 'develop' into wip/gmt/6805-col-round
GregoryTravis May 26, 2023
ac93d31
generalize error
GregoryTravis May 30, 2023
45c1754
Merge branch 'develop' into wip/gmt/6805-col-round
GregoryTravis May 30, 2023
ea1fd0b
remove big num test
GregoryTravis May 31, 2023
a48cf30
Merge branch 'develop' into wip/gmt/6805-col-round
GregoryTravis May 31, 2023
2fab69b
fix test
GregoryTravis Jun 1, 2023
a31681f
Merge branch 'develop' into wip/gmt/6805-col-round
GregoryTravis Jun 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -470,6 +470,7 @@
- [Added `Date_Range`.][6621]
- [Implemented the `cast` operation for `Table` and `Column`.][6711]
- [Added `.round` and `.int` to `Integer` and `Decimal`.][6743]
- [Added `.round`, `.truncate`, `.ceil`, and `.floor` to `Column`.][6817]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -682,6 +683,7 @@
[6621]: https://github.com/enso-org/enso/pull/6621
[6711]: https://github.com/enso-org/enso/pull/6711
[6743]: https://github.com/enso-org/enso/pull/6743
[6817]: https://github.com/enso-org/enso/pull/6817

#### Enso Compiler

Expand Down
33 changes: 19 additions & 14 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Numbers.enso
Original file line number Diff line number Diff line change
Expand Up @@ -601,14 +601,14 @@ type Decimal
Use Banker's Rounding.

2.5 . round use_bankers=True == 2
round : Integer -> Integer | Decimal ! Illegal_Argument
round : Integer -> Boolean -> Integer | Decimal ! Illegal_Argument
round self decimal_places=0 use_bankers=False =
check_decimal_places decimal_places <|
case self.is_nan || self.is_infinite of
True ->
msg = "round cannot accept " + self.to_text
Error.throw (Arithmetic_Error.Error msg)
False ->
False -> check_round_input self <|
decimal_result = case use_bankers of
False ->
scale = 10 ^ decimal_places
Expand Down Expand Up @@ -923,18 +923,13 @@ type Integer
Round to the nearest hundred, using Banker's Rounding.

12250 . round -2 use_bankers=True == 12200
round : Integer -> Integer ! Illegal_Argument
round : Integer -> Boolean -> Integer ! Illegal_Argument
round self decimal_places=0 use_bankers=False =
check_decimal_places decimal_places <|
case self < round_min_long || self > round_max_long of
True ->
msg = "Error: Integer.round can only accept values between " + round_min_long.to_text + " and " + round_max_long.to_text + "(inclusive), but was " + self.to_text
Error.throw (Illegal_Argument.Error msg)
False ->
## It's already an integer so unless decimal_places is
negative, the value is unchanged.
if decimal_places >= 0 then self else
self.to_decimal.round decimal_places use_bankers . truncate
check_decimal_places decimal_places <| check_round_input self <|
## It's already an integer so unless decimal_places is
negative, the value is unchanged.
if decimal_places >= 0 then self else
self.to_decimal.round decimal_places use_bankers . truncate

## Compute the negation of this.

Expand Down Expand Up @@ -1132,8 +1127,18 @@ round_max_long = 99999999999999
round_min_long : Integer
round_min_long = -99999999999999

## PRIVATE
Restrict rounding decimal_places parameter.
check_decimal_places : Integer -> Function
check_decimal_places decimal_places ~action =
if decimal_places >= round_min_decimal_places && decimal_places <= round_max_decimal_places then action else
msg = "round: decimal_places must be between " + round_min_decimal_places.to_text + " and " + round_max_decimal_places.to_text + "(inclusive), but was " + decimal_places.to_text
msg = "round: decimal_places must be between " + round_min_decimal_places.to_text + " and " + round_max_decimal_places.to_text + " (inclusive), but was " + decimal_places.to_text
Error.throw (Illegal_Argument.Error msg)

## PRIVATE
Restrict allowed range of input to rounding methods.
check_round_input : Number -> Function
check_round_input n ~action =
if n >= round_min_long && n <= round_max_long then action else
msg = "Error: `round` can only accept values between " + round_min_long.to_text + " and " + round_max_long.to_text + " (inclusive), but was " + n.to_text
Error.throw (Illegal_Argument.Error msg)
16 changes: 16 additions & 0 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Range.enso
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,22 @@ type Range
@Tail_Call go current+self.step
go self.start

## PRIVATE
ADVANCED

Executes a function for each element in the range. Exits early if the body
produces an `Error`.
each_propagate : (Integer -> Nothing) -> Nothing ! Error
each_propagate self function =
if self.step == 0 then throw_zero_step_error else
end_condition = if self.step > 0 then (>=) else (<=)
go current =
if end_condition current self.end then Nothing else
result = function current
result.if_not_error <|
@Tail_Call go current+self.step
go self.start

## PRIVATE
ADVANCED
Applies a function to each element of the range.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -557,6 +557,27 @@ type Column
op_result = self.make_op "IIF" [when_true, when_false] new_name
adapt_unified_column op_result common_type

## Rounding values is not supported in database columns.
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved
round : Integer -> Boolean -> Problem_Behavior -> Column | Illegal_Argument | Invalid_Value_Type
round self decimal_places=0 use_bankers=False on_problems=Report_Warning =
_ = [decimal_places, use_bankers, on_problems]
Error.throw <| Unsupported_Database_Operation.Error "`Column.round` is not implemented yet for the Database backends."

## Truncating values is not supported in database columns.
truncate : Column ! Invalid_Value_Type
truncate self =
Error.throw <| Unsupported_Database_Operation.Error "`Column.truncate` is not implemented yet for the Database backends."

## Taking the ceiling of values is not supported in database columns.
ceil : Column ! Invalid_Value_Type
ceil self =
Error.throw <| Unsupported_Database_Operation.Error "`Column.ceil` is not implemented yet for the Database backends."

## Taking the floor of values is not supported in database columns.
floor : Column ! Invalid_Value_Type
floor self =
Error.throw <| Unsupported_Database_Operation.Error "`Column.floor` is not implemented yet for the Database backends."

## Returns a column of first non-`Nothing` value on each row of `self` and
`values` list.

Expand Down
119 changes: 114 additions & 5 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import project.Data.Type.Storage
import project.Data.Type.Value_Type_Helpers
import project.Data.Table.Table
import project.Internal.Cast_Helpers
import project.Internal.Column_Ops
import project.Internal.Java_Problems
import project.Internal.Naming_Helpers.Naming_Helpers
import project.Internal.Parse_Values_Helper
Expand All @@ -23,8 +24,9 @@ from project.Internal.Column_Format import all
from project.Data.Table import print_table
from project.Data.Type.Value_Type import Value_Type, Auto
from project.Errors import No_Index_Set_Error, Floating_Point_Equality, Invalid_Value_Type, Inexact_Type_Coercion, Conversion_Failure
from project.Internal.Java_Exports import make_string_builder
from project.Internal.Java_Exports import make_string_builder, make_double_builder, make_long_builder, make_date_builder_adapter

polyglot java import org.enso.table.data.column.builder.object.DateBuilder
polyglot java import org.enso.table.data.column.operation.map.MapOperationProblemBuilder
polyglot java import org.enso.table.data.column.storage.Storage as Java_Storage
polyglot java import org.enso.table.data.table.Column as Java_Column
Expand Down Expand Up @@ -656,6 +658,113 @@ type Column
rs = s.iif true_val false_val storage_type
Column.Value (Java_Column.new new_name rs)

## Round the values in a numeric column to a specified number of decimal
places.

For integers, rounding to 0 or more decimal places simply returns the
argument. For negative decimal places, see below.

By default, rounding uses "asymmetric round-half-up", also known as
"round towards positive infinity." If use_bankers=True, then it uses
"round-half-even", also known as "banker's rounding".

If the column is of type `Float` and `decimal_places` > 0, `round`
returns a column of `Float`; otherwise, it returns a column of
`Integer`.

Arguments:
- decimal_places: The number of decimal places to round to. Can be
negative, which results in rounding to positive integer powers of 10.
Must be between -15 and 15 (inclusive).
- use_bankers: Rounds mid-point to nearest even number.
- on_problems: Specifies how to handle if a problem occurs, raising as a
warning by default.

! Error Conditions
Reports `Illegal_Argument` if the number is 15 or more decimal places.
Above 14 digits, it is possible that the underlying long, converted to
double in the rounding process, would lose precision in the least
significant bits.
(See https://en.wikipedia.org/wiki/Double-precision_floating-point_format.)

If `decimal_places` is outside the range -15..15 (inclusive), an
`Illegal_Argument` error is thrown.

? Negative decimal place counts
Rounding to `n` digits can be thought of as "rounding to the nearest
multiple of 10^(-n)". For negative decimal counts, this results in
rounding to the nearest positive integer power of 10.

> Example
Round a column of `Decimal` values`.

Column.from_vector "foo" [1.2, 2.3, 3.6] . round == (Column.from_vector "foo" [1, 2, 4])
round : Integer -> Boolean -> Problem_Behavior -> Column | Illegal_Argument | Invalid_Value_Type
round self decimal_places=0 use_bankers=False on_problems=Report_Warning = Value_Type.expect_numeric self <|
# If it's an integer column and decimal_places >=0 then it's a no-op.
if self.value_type.is_integer && decimal_places >= 0 then self else
returns_double = decimal_places > 0
builder = if returns_double then make_double_builder else make_long_builder
fun = _.round decimal_places use_bankers
Column_Ops.map_over_storage self fun builder skip_nothing=True on_problems=on_problems

## ALIAS int

If the column is numeric, truncate the floating-point values to an
integer by dropping the fractional part. This is equivalent to
"round-toward-zero". If the column is of type `Date_Time`, truncates the
values to `Date`.

> Example
Truncate a column of `Decimal` values.

Column.from_vector "foo" [1.25, 2.33, 3.57] . truncate == (Column.from_vector "foo" [1, 2, 3])

> Example
Truncate a column of `Date_Time` values.
date_times = Column.from_vector "foo" [Date_Time.new 2020 10 24 1 2 3, Date_Time.new 2020 10 24 1 2 3]
dates = Column.from_vector "foo" [Date.new 2020 10 24, Date.new 2020 10 24]
col.truncate == dates
truncate : Column ! Invalid_Value_Type
truncate self =
case self.value_type.is_numeric of
True ->
fun = _.truncate
Column_Ops.map_over_storage self fun make_long_builder skip_nothing=True
False -> case self.value_type == Value_Type.Date_Time of
True ->
fun = _.date
Column_Ops.map_over_storage self fun make_date_builder_adapter skip_nothing=True
False -> Error.throw <| Invalid_Value_Type.Column "Numeric or Date_Time" self.value_type

## Computes the nearest integer above this number for values in a numeric
column.

Returns a column of `Integer`.

> Example
Take the ceiling of a column of `Decimal` values.

Column.from_vector "foo" [1.25, 2.33, 3.57] . ceil == (Column.from_vector "foo" [2, 3, 4])
ceil : Column ! Invalid_Value_Type
ceil self = Value_Type.expect_numeric self <|
fun = _.ceil
Column_Ops.map_over_storage self fun make_long_builder skip_nothing=True
Comment on lines +749 to +752
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure if these operations should be implemented in Enso.

Currently most of the 'primitive' operations, especially on numeric columns are implemented in Java, for better efficiency.

Do we want to move towards doing these in Enso? I think we should measure what is the difference in performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue.

Currently I see that the ceil method may return an EnsoBigInteger (refer to the decimal/CeilNode.java).

However, currently the Table library storage does not support Big Integer storage - only LongStorage. So this will likely fail in some way.

I think we need to have tests checking what happens in the scenario where the double value magnitude exceeds the long range, to verify the behaviour.

IMO currently it could be valid to return Nothing in such a case and attach some kind of Arithmetic_Overflow exception. Extending Table to handle Big Integers is a separate issue (currently unscheduled, @jdunkerley do we have plans to support this?)

Copy link
Member

@radeusgd radeusgd May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear about the first issue. The map_over_storage is OK. Its usage is warranted when the operation we perform on each row must run some Enso logic (some complicated Enso function that is not worth replicating into Java, or has so expensive logic anyway the benefit of inlining it to Java would be negligible).

But I think we shouldn't use Enso operations for primitive operations and instead implement them directly in Java. That is the current architecture of the primitive-value storages in the Table library. We can discuss if it should be changed, but to do so I think we need to do some performance measurements to take informed decisions.

(For example one of the reasons we do the primitive operations in Java is that on long/double types it allows to completely avoid boxing the values, which I don't think is avoidable when talking between Enso and Java).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think worth making this a vectorised operation on the NumberStorage.

We can use the Core_Utils to share the common implementation. Lets put it as a follow up ticket though please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the Core_Utils to share the common implementation. Lets put it as a follow up ticket though please.

We could make the round implementation in Core_Utils, but then it would no longer be able to be written in Enso...

As for all the others - ceil, floor and I think truncate - we seem to be doing just the basic Java operations Math.ceil etc. in the 'primitive' path. The only additional handling is to handle promotion to BigInteger if the result exceeds the range of long. But BigInteger is currently not supported by Table anyway, so I don't think sharing the implementation of these gives us any value at this moment - because the pure Enso side needs the additional BigInteger handling that has some Truffle magic, and the Table side needs to just report a warning/failure in case of an overflow - the common part is just delegating to Math.ceil and co. which IMO isn't enough to make sharing worth it.

(But I'd make these vectorized nonetheless, I'm only unsure about round as that would require us moving the whole implementation into Java - it could make it more efficient for tables though, so may as well be worth it, but it's quite a bit more work).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - not need to put in Core_Utils then.

Worth us seeing the relative performance of rounding 1,000,000 values as a map vs a vectorised op.


## Computes the nearest integer below this number for values in a numeric
column.

Returns a column of `Integer`.

> Example
Take the floor of a column of `Decimal` values.

Column.from_vector "foo" [1.25, 2.33, 3.57] . floor == (Column.from_vector "foo" [1, 2, 3])
floor : Column ! Invalid_Value_Type
floor self = Value_Type.expect_numeric self <|
fun = _.floor
Column_Ops.map_over_storage self fun make_long_builder skip_nothing=True

## Returns a column of first non-`Nothing` value on each row of `self` and
`values` list.

Expand Down Expand Up @@ -1278,18 +1387,18 @@ type Column
new_column = case format of
"" ->
formatter = .to_text
map_over_storage self formatter make_string_builder
Column_Ops.map_over_storage self formatter make_string_builder on_problems=Problem_Behavior.Report_Error
Nothing ->
formatter = .to_text
map_over_storage self formatter make_string_builder
Column_Ops.map_over_storage self formatter make_string_builder on_problems=Problem_Behavior.Report_Error
_ : Text ->
formatter = create_formatter
formatter.if_not_error <|
map_over_storage self (formatter format=format) make_string_builder
Column_Ops.map_over_storage self (formatter format=format) make_string_builder on_problems=Problem_Behavior.Report_Error
format_column : Column -> Value_Type.expect_text format_column <|
formatter = create_formatter
formatter.if_not_error <|
map_2_over_storage self format_column formatter make_string_builder
Column_Ops.map_2_over_storage self format_column formatter make_string_builder
_ -> Error.throw <| Illegal_Argument.Error <| "Unsupported format type: " + format.to_text
new_column

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,59 +56,3 @@ handle_illegal_argument_exception format_string ~action =
Error.throw (Illegal_Argument.Error msg)
Panic.catch IllegalArgumentException handler=handler <|
Panic.catch UnsupportedTemporalTypeException handler=handler action

## PRIVATE
Iterate over a range, exiting early if the body produces an `Error`.
each_propagate : Range -> (Number -> Any) -> Nothing ! Error
each_propagate range function =
if range.step == 0 then Error.throw (Illegal_State.Error "A range with step = 0 is ill-formed.") else
end_condition = if range.step > 0 then (>=) else (<=)
go current =
if end_condition current range.end then Nothing else
result = function current
result.if_not_error <|
@Tail_Call go current+range.step
go range.start

## PRIVATE
Map a text-returning function over the column values, using Storage directly.
The output column has the same name as the input.
map_over_storage : Column -> (Any -> Text) -> (Integer -> Any) -> Boolean -> Column
map_over_storage input_column function builder skip_nothing=True =
input_storage = input_column.java_column.getStorage
num_input_rows = input_storage.size
output_storage_builder = builder num_input_rows
ok = each_propagate (0.up_to num_input_rows) i->
input_value = input_storage.getItemBoxed i
if skip_nothing && input_value.is_nothing then output_storage_builder.append Nothing else
output_value = function input_value
output_value.if_not_error
output_storage_builder.append output_value
ok.if_not_error <|
output_storage = output_storage_builder.seal
Column.from_storage input_column.name output_storage

## PRIVATE
Map a text-returning function over the values of two columns, using Storage
directly. The output column has the same name as the first input column.
`skip_nothing` applies to the first input to the function, not both inputs.
map_2_over_storage : Column -> Column -> (Any -> Any -> Text) -> (Integer -> Any) -> Boolean -> Column
map_2_over_storage input_column_0 input_column_1 function builder skip_nothing=True =
input_storage_0 = input_column_0.java_column.getStorage
input_storage_1 = input_column_1.java_column.getStorage
case input_storage_0.size != input_storage_1.size of
True ->
msg = "Column lengths differ: " + input_storage_0.size.to_text + " != " + input_storage_1.size.to_text
Error.throw (Illegal_Argument.Error msg)
False ->
num_input_rows = input_storage_0.size
output_storage_builder = builder num_input_rows
ok = each_propagate (0.up_to num_input_rows) i->
input_value_0 = input_storage_0.getItemBoxed i
input_value_1 = input_storage_1.getItemBoxed i
if skip_nothing && input_value_0.is_nothing then output_storage_builder.append Nothing else
output_value = function input_value_0 input_value_1
output_storage_builder.append output_value
ok.if_not_error <|
output_storage = output_storage_builder.seal
Column.from_storage input_column_0.name output_storage
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
from Standard.Base import all

import Standard.Base.Errors.Illegal_Argument.Illegal_Argument

import project.Data.Column.Column
import project.Internal.Problem_Builder.Problem_Builder

## PRIVATE
Map a text-returning function over the column values, using Storage directly.
The output column has the same name as the input.
map_over_storage : Column -> (Any -> Text) -> (Integer -> Any) -> Boolean -> Problem_Behavior -> Column
map_over_storage input_column function builder skip_nothing=True on_problems=Report_Warning =
problem_builder = Problem_Builder.new
input_storage = input_column.java_column.getStorage
num_input_rows = input_storage.size
output_storage_builder = builder num_input_rows
0.up_to num_input_rows . each i->
input_value = input_storage.getItemBoxed i
if skip_nothing && input_value.is_nothing then output_storage_builder.append Nothing else
output_value = function input_value . catch Any err->
problem_builder.report_other_warning err
Nothing
output_storage_builder.append output_value
output_storage = output_storage_builder.seal
new_column = Column.from_storage input_column.name output_storage
problem_builder.attach_problems_after on_problems new_column

## PRIVATE
Map a text-returning function over the values of two columns, using Storage
directly. The output column has the same name as the first input column.
`skip_nothing` applies to the first input to the function, not both inputs.
map_2_over_storage : Column -> Column -> (Any -> Any -> Text) -> (Integer -> Any) -> Boolean -> Column
map_2_over_storage input_column_0 input_column_1 function builder skip_nothing=True =
input_storage_0 = input_column_0.java_column.getStorage
input_storage_1 = input_column_1.java_column.getStorage
case input_storage_0.size != input_storage_1.size of
True ->
msg = "Column lengths differ: " + input_storage_0.size.to_text + " != " + input_storage_1.size.to_text
Error.throw (Illegal_Argument.Error msg)
False ->
num_input_rows = input_storage_0.size
output_storage_builder = builder num_input_rows
ok = 0.up_to num_input_rows . each_propagate i->
input_value_0 = input_storage_0.getItemBoxed i
input_value_1 = input_storage_1.getItemBoxed i
if skip_nothing && input_value_0.is_nothing then output_storage_builder.append Nothing else
output_value = function input_value_0 input_value_1
output_storage_builder.append output_value
ok.if_not_error <|
output_storage = output_storage_builder.seal
Column.from_storage input_column_0.name output_storage
Loading