Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add trigonometry functions #241

Merged
merged 1 commit into from
Jul 26, 2022

Conversation

thisisnic
Copy link
Contributor

@thisisnic thisisnic commented Jul 8, 2022

This PR adds definitions for common trigonometry functions.

@thisisnic thisisnic force-pushed the add_trig_funcs branch 4 times, most recently from 78838f9 to 6668bf1 Compare July 11, 2022 11:56
@thisisnic thisisnic marked this pull request as ready for review July 11, 2022 11:57
Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main thoughts are:

  • do we really need the narrower types?
  • Can you make sure that we correctly state error conditions for each type of function (I don't remember my trig).

description: "Get the cosine of a value in radians."
impls:
- args:
- value: i8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we should have signatures for all these narrower values? Is there some value in doing this over only supporting i64 and fp64?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to this, in Arrow-c++, we only have kernels for fp32 and fp64 (I believe the docs may be wrong here and imply support for decimal inputs but we insert an implicit cast). You won't get an error if you try the integer types but it will insert an implicit cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only included all of the others to follow how some other functions have been specified, but I don't see any value in doing it this way if we don't need to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, if the end result is no different than a cast the kernel is not needed. If there are other cases like this we should get rid of those kernels as well.

Copy link
Contributor

@ianmcook ianmcook Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should allow an impl to specify multiple allowed types for a single value arg, e.g.

    impls:
      - args:
           - value: [ i8, i16, i32, i64, fp32,  fp64 ]
         return: fp64

That would reduce the amount of boilerplate in these YAML files and at the same time avoid Substrait producers needing to insert lots of fussy explicit casts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the current pattern requires this.

Do you have an i32 + i64? Is it because of some performance benefit? I would expect that to be rare. The most common pattern systems have is an implementation of i64 + i64 and they cast i32 to i64 before doing i64 + i64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated to just support i64 and fp64 now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fp32 is probably more valuable than i64 and I apologize for being obnoxious :). My goal isn't to limit the number of kernels but to create one kernel per distinct implementation. I think it is common for hardware/engines to have different implementations for float and double. By "different implementation" I mean "passing the same inputs yields a different result"

For an example, consider the C++ standard library implementation for cosine. There are three ways to call std::cos, with a float, with a double, or with an integral type. However, calling cosine with an integral type is identical to casting to double and calling cosine with that double value. We can see this both in the docs:

  1. A set of overloads or a function template accepting an argument of any integral type. Equivalent to 2) (the argument is cast to double).

and in practice:

https://godbolt.org/z/h6Pb5WY86

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for going into detail about that, makes sense! Updated to now have fp32 and fp64

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of this discussion is somewhat related to #251, so linking for posterity.

description: "Get the tangent of a value in radians."
impls:
- args:
- options: [ SILENT, SATURATE, ERROR ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the right error behaviors? (I can't remember my trig classes...)

Also, same question here wrt all the different input types. If all implementations do an unpromotion, let's not introduce signatures that do that implicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some good input from @jvanstraten on this topic (well, related topic at least) here: #230 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies; should have put this in the original PR. The reasoning for specifying the overflow behaviour options for tangent only was on the basis that the ranges for the returned value for this function are wider than the other ones and so I assumed it'd need it.

My trig is also a bit rusty, but from what I can gather from a bit of a search, in terms of output values:

  • range of the cosine and sine functions: [-1, 1]
  • range of tangent: [-infinity, infinity].
  • range of arctangent, arcsine: [-pi/2, pi/2]
  • range of arccosine: [0,pi]

Though as @westonpace has said, the approach suggested by @jvanstraten might be more appropriate here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tan is technically undefined at (1/2 + k) pi for all integer k, because that's a division by zero when expressed as sin(x) / cos(x). However, due to pi's transcendental nature, none of those numbers can be exactly represented with floating-point numbers, so tan(x) is defined for all finite floating-point values of x. So tan only needs the rounding option.

asin and acos do have domain errors; they are undefined for x < -1 and x > 1. So they should be defined however sqrt ends up being defined. atan2 is also a bit weird when 0, 0 is passed, since it's basically atan(y / x) (with some boundary conditions to deal with x = 0 when y != 0 and to get all quadrants to behave nicely), so I guess it should get the same treatment.

The other trig functions are defined for all finite real numbers, and thus only need rounding options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to pi's transcendental nature

Especially when topped with ice cream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the conversation regarding those other options now appears to be resolved, have updated the options here in line with those suggestions.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, minor nits

I don't know about the hyphen. Merriam-webster gives one word without hyphen. Wolfram Alpha and cpp reference give two words without hyphen.

extensions/functions_arithmetic.yaml Outdated Show resolved Hide resolved
extensions/functions_arithmetic.yaml Outdated Show resolved Hide resolved
extensions/functions_arithmetic.yaml Outdated Show resolved Hide resolved
extensions/functions_arithmetic.yaml Outdated Show resolved Hide resolved
@thisisnic thisisnic force-pushed the add_trig_funcs branch 4 times, most recently from f8bd33b to fa9f79c Compare July 18, 2022 19:17
jvanstraten
jvanstraten previously approved these changes Jul 18, 2022
Copy link
Contributor

@jvanstraten jvanstraten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

impls:
- args:
- name: rounding
options: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as elsewhere, are these options in the order of most preferred to least preferred? Remember that an engine that support multiple option values on a non-specified value will pick them in the order they are declared.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIE_TO_EVEN a.k.a. "round to nearest" is the default mode used by glibc, and as such is probably what everyone is already implicitly doing, so I placed that first intentionally.

Side note; reading that description I guess my earlier assumptions that "even" means "even number" is wrong, and it actually refers to the least significant bit of the mantissa. I don't think that changes anything for us though.

@jacques-n
Copy link
Contributor

This looks good to me. Looks like it needs a conflict resolved before merge, however.

@jvanstraten , are you happy with this?

@jvanstraten
Copy link
Contributor

Yep, LGTM

@jacques-n
Copy link
Contributor

@thisisnic, can you do a conflict resolution?

@thisisnic
Copy link
Contributor Author

@jvanstraten Apologies, for some reason when I fixed the conflict and force-pushed, it dismissed your review.

Copy link
Contributor

@jvanstraten jvanstraten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM 😄

@jacques-n jacques-n merged commit d83d566 into substrait-io:main Jul 26, 2022
@jacques-n
Copy link
Contributor

Thanks @thisisnic !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants