Add rounding functions to the Column type #6817

GregoryTravis · 2023-05-23T18:42:03Z

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed, the GUI was tested when built using ./run ide build.

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Range.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso

radeusgd · 2023-05-25T11:25:37Z

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso

+    ceil : Column ! Invalid_Value_Type
+    ceil self = Value_Type.expect_numeric self <|
+        fun = _.ceil
+        Column_Ops.map_over_storage self fun make_long_builder skip_nothing=True


I'm really not sure if these operations should be implemented in Enso.

Currently most of the 'primitive' operations, especially on numeric columns are implemented in Java, for better efficiency.

Do we want to move towards doing these in Enso? I think we should measure what is the difference in performance.

Another issue.

Currently I see that the ceil method may return an EnsoBigInteger (refer to the decimal/CeilNode.java).

However, currently the Table library storage does not support Big Integer storage - only LongStorage. So this will likely fail in some way.

I think we need to have tests checking what happens in the scenario where the double value magnitude exceeds the long range, to verify the behaviour.

IMO currently it could be valid to return Nothing in such a case and attach some kind of Arithmetic_Overflow exception. Extending Table to handle Big Integers is a separate issue (currently unscheduled, @jdunkerley do we have plans to support this?)

To be clear about the first issue. The map_over_storage is OK. Its usage is warranted when the operation we perform on each row must run some Enso logic (some complicated Enso function that is not worth replicating into Java, or has so expensive logic anyway the benefit of inlining it to Java would be negligible).

But I think we shouldn't use Enso operations for primitive operations and instead implement them directly in Java. That is the current architecture of the primitive-value storages in the Table library. We can discuss if it should be changed, but to do so I think we need to do some performance measurements to take informed decisions.

(For example one of the reasons we do the primitive operations in Java is that on long/double types it allows to completely avoid boxing the values, which I don't think is avoidable when talking between Enso and Java).

I think worth making this a vectorised operation on the NumberStorage.

We can use the Core_Utils to share the common implementation. Lets put it as a follow up ticket though please.

We can use the Core_Utils to share the common implementation. Lets put it as a follow up ticket though please.

We could make the round implementation in Core_Utils, but then it would no longer be able to be written in Enso...

As for all the others - ceil, floor and I think truncate - we seem to be doing just the basic Java operations Math.ceil etc. in the 'primitive' path. The only additional handling is to handle promotion to BigInteger if the result exceeds the range of long. But BigInteger is currently not supported by Table anyway, so I don't think sharing the implementation of these gives us any value at this moment - because the pure Enso side needs the additional BigInteger handling that has some Truffle magic, and the Table side needs to just report a warning/failure in case of an overflow - the common part is just delegating to Math.ceil and co. which IMO isn't enough to make sharing worth it.

(But I'd make these vectorized nonetheless, I'm only unsure about round as that would require us moving the whole implementation into Java - it could make it more efficient for tables though, so may as well be worth it, but it's quite a bit more work).

Ok - not need to put in Core_Utils then.

Worth us seeing the relative performance of rounding 1,000,000 values as a map vs a vectorised op.

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Java_Exports.enso

radeusgd

The API and tests look ok.

I think we need to have tests checking what is the behaviour when the double value exceeds the long range - since the Table currently does not support Big Integer storage, I'm not sure what will happen and I'd like to verify it.
Ideally, I don't think using map_over_storage for the primitive numeric operations like ceil and floor, possibly also truncate is the right choice. I think to be consistent with the architecture of the Table library, these should be implemented as UnaryMapOperation.

I think it is fine to use map_over_storage for round since it includes pretty complex Enso logic which we don't want to replicate into Java. But the other operations, if I'm reading correctly, just delegate to Java methods, so I think they should be 'vectorized'.

radeusgd · 2023-05-26T08:52:18Z

I reckon the 'vectorization' of the operations into Java will happen as a separate task.

Still I think we need tests for the case where the double is too big to store in our LongStorage after rounding:

polyglot java import java.lang.Long

... =
	max_long = Java_Long.MAX_VALUE
	too_big_double = (max_long + 1.0) * 100.0
	table = Table.new [["X", [1.0, 2.9, too_big_double, 12.1]]]
    col = table.at "X"
    c1 = col.ceil
	c1 . to_vector . should_equal ?
	# One of these:
	Problems.assume_no_problems c1
	Problems.expect_warning Arithmetic_Error c1

	c2 = col.floor
	c3 = col.truncate
	c4 = col.round
	...

I assume that DB operations are not in scope of this PR. However, to keep the APIs consistent (we really should add a test verifying the consistency at some point...) please add method stubs that throw Unsupported_Database_Operation.Error "Operation not yet implemented in the Database backend".

jdunkerley

Lets add a follow up to look at vectorising the operations and the relative performance.

Otherwise LGTM.

jdunkerley · 2023-05-26T10:16:07Z

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso

+    ceil : Column ! Invalid_Value_Type
+    ceil self = Value_Type.expect_numeric self <|
+        fun = _.ceil
+        Column_Ops.map_over_storage self fun make_long_builder skip_nothing=True


Ok - not need to put in Core_Utils then.

Worth us seeing the relative performance of rounding 1,000,000 values as a map vs a vectorised op.

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Numbers.enso

radeusgd · 2023-05-29T11:42:06Z

test/Tests/src/Data/Numbers_Spec.enso

@@ -585,7 +585,7 @@ spec =
            231.2 . round . should_be_a Integer
            231.2 . round -1 . should_be_a Integer

-        Test.specify "Edge cases" <|
+        Test.specify "Edge cases" pending="Re-enable this if the 15-digit restriction is removed" <|


Do we plan to ever remove this restriction?

Then I think we should just remove these tests.

jdunkerley

Agree with @radeusgd small comments - worth clearing those up before merging.

distribution/lib/Standard/Database/0.0.0-dev/src/Data/Column.enso

jdunkerley · 2023-05-30T07:46:56Z

test/Tests/src/Data/Numbers_Spec.enso

@@ -585,7 +585,7 @@ spec =
            231.2 . round . should_be_a Integer
            231.2 . round -1 . should_be_a Integer

-        Test.specify "Edge cases" <|
+        Test.specify "Edge cases" pending="Re-enable this if the 15-digit restriction is removed" <|


GregoryTravis added 8 commits May 23, 2023 10:56

move each_propagate into Range

8c34ebb

Column_Ops

3d8b55c

round

2aa2302

truncate ceil floor

5f0d00f

unused

b8e5282

all but date

97ab8ec

examples

07527b6

cleanup

b0ed569

GregoryTravis changed the title ~~Wip/gmt/6805 col round~~ Add rounding functions to the Column type #6805 May 23, 2023

GregoryTravis linked an issue May 23, 2023 that may be closed by this pull request

Add rounding functions to the Column type #6805

Closed

GregoryTravis added 8 commits May 23, 2023 14:49

cleanup

c3818dc

date_time truncate

35d83a1

cleanup

430090f

Merge branch 'develop' into wip/gmt/6805-col-round

f97554d

fix format tests

2db1924

check column types

8a62af2

each_p no

759de4d

noop

d616cea

GregoryTravis changed the title ~~Add rounding functions to the Column type #6805~~ Add rounding functions to the Column type May 24, 2023

changelog, test fix

b85f558

GregoryTravis marked this pull request as ready for review May 24, 2023 19:23

GregoryTravis requested review from jdunkerley and radeusgd as code owners May 24, 2023 19:23

java fmt

82ed443

radeusgd reviewed May 25, 2023

View reviewed changes

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Range.enso Outdated Show resolved Hide resolved

radeusgd reviewed May 25, 2023

View reviewed changes

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso Outdated Show resolved Hide resolved

radeusgd reviewed May 25, 2023

View reviewed changes

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso Outdated Show resolved Hide resolved

radeusgd reviewed May 25, 2023

View reviewed changes

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Java_Exports.enso Show resolved Hide resolved

radeusgd requested changes May 25, 2023

View reviewed changes

GregoryTravis added 3 commits May 25, 2023 14:53

review, throw

0eeb10e

move adapter

e9be9d5

Merge branch 'develop' into wip/gmt/6805-col-round

91babe2

GregoryTravis requested a review from radeusgd May 25, 2023 19:00

jdunkerley approved these changes May 26, 2023

View reviewed changes

GregoryTravis added 4 commits May 26, 2023 14:31

limit round input for float too

01f0f65

Merge branch 'develop' into wip/gmt/6805-col-round

d6aa9ef

unimplemented stubs

4a4e886

Merge branch 'develop' into wip/gmt/6805-col-round

25cbea1

GregoryTravis requested a review from jdunkerley May 26, 2023 18:43

radeusgd reviewed May 29, 2023

View reviewed changes

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Numbers.enso Outdated Show resolved Hide resolved

radeusgd reviewed May 29, 2023

View reviewed changes

jdunkerley approved these changes May 30, 2023

View reviewed changes

radeusgd mentioned this pull request May 30, 2023

Rename the Decimal type to Float? #6889

Closed

5 tasks

generalize error

ac93d31

GregoryTravis requested a review from radeusgd May 30, 2023 18:08

Merge branch 'develop' into wip/gmt/6805-col-round

45c1754

radeusgd approved these changes May 31, 2023

View reviewed changes

GregoryTravis added 2 commits May 31, 2023 12:23

remove big num test

ea1fd0b

Merge branch 'develop' into wip/gmt/6805-col-round

a48cf30

GregoryTravis added the CI: Ready to merge This PR is eligible for automatic merge label May 31, 2023

GregoryTravis added 2 commits June 1, 2023 12:04

fix test

2fab69b

Merge branch 'develop' into wip/gmt/6805-col-round

a31681f

mergify bot merged commit 0337180 into develop Jun 1, 2023

mergify bot deleted the wip/gmt/6805-col-round branch June 1, 2023 20:06

kevinlu1248 mentioned this pull request Jul 26, 2023

Sweep: Rename the Decimal type to Float? sweepai-dev/enso#1

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rounding functions to the Column type #6817

Add rounding functions to the Column type #6817

GregoryTravis commented May 23, 2023 •

edited

Loading

radeusgd May 25, 2023

radeusgd May 25, 2023

radeusgd May 25, 2023 •

edited

Loading

jdunkerley May 25, 2023

radeusgd May 25, 2023

jdunkerley May 26, 2023

radeusgd left a comment

radeusgd commented May 26, 2023

jdunkerley left a comment

jdunkerley May 26, 2023

radeusgd May 29, 2023

jdunkerley May 30, 2023

radeusgd May 31, 2023

jdunkerley left a comment

jdunkerley May 30, 2023

Add rounding functions to the Column type #6817

Add rounding functions to the Column type #6817

Conversation

GregoryTravis commented May 23, 2023 • edited Loading

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radeusgd May 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radeusgd left a comment

Choose a reason for hiding this comment

radeusgd commented May 26, 2023

jdunkerley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdunkerley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GregoryTravis commented May 23, 2023 •

edited

Loading

radeusgd May 25, 2023 •

edited

Loading