Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Euclidean modulo #2169

Merged
merged 8 commits into from
Mar 15, 2018
Merged

Euclidean modulo #2169

merged 8 commits into from
Mar 15, 2018

Conversation

varkor
Copy link
Member

@varkor varkor commented Oct 9, 2017

Proposal to add Euclidean modulo & division functionality for integers and floating-point numbers, to address common issues with taking remainders involving negative numbers.

Internals discussion here.

Rendered.
Tracking issue.

@scottmcm scottmcm added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Oct 10, 2017
@mcarton
Copy link
Member

mcarton commented Oct 10, 2017

While I like the proposal, I really don't like the _e suffix.

@Centril
Copy link
Contributor

Centril commented Oct 11, 2017

@mcarton What part of it? That it has a short suffix, or that it has a suffix and not a prefix? Both?

@Centril
Copy link
Contributor

Centril commented Oct 11, 2017

How common is usage of these operations? Would it warrant adding another operator to the language?

@varkor
Copy link
Member Author

varkor commented Oct 11, 2017

@Centril: I couldn't think of a good way to measure the use of these operations, as they can be implemented in various different ways. The rationale behind advocating new methods, rather than new operators, was that we could see how often they were used in Rust (say, 12 months down the line), to see whether they were used common enough to be worth considering as operators.

@Centril
Copy link
Contributor

Centril commented Oct 11, 2017

@varkor this makes sense to me. However, many users will shy away from functionality until it has been stabilized, thus usage metrics for unstable stuff might not be fully representative.

@mcarton
Copy link
Member

mcarton commented Oct 11, 2017

@mcarton What part of it? That it has a short suffix, or that it has a suffix and not a prefix? Both?

The shortness.

@scottmcm
Copy link
Member

If the output of these are always positive, should they return unsigned types instead?

Similarly, it feels like something here should enable (-1i8).modulo(200u8) => 199u8.

@varkor
Copy link
Member Author

varkor commented Oct 11, 2017

@scottmcm: I followed the precedent set by the abs method for returning an integer of the same type, rather than the unsigned variant, as I imagined there was previous rationale for this decision.

Regarding inter-type modulo, the mod_e method as specified in the proposal behaves similarly to the built-in % operator, which also does not allow operations like -1i8 % 200u8. Again, the focus here was on consistency: it feels like this issue is something better addressed by a separate discussion (e.g. the internals thread on implicit widening).

@est31
Copy link
Member

est31 commented Oct 12, 2017

As someone who has hacked euclidean modulo as ((a % b) + b) % b previously (that's branchless 😎 ; proof they are the same at least for positive b ), I can only support this RFC! I in fact implemented the variant for floats, so it'd be great if the RFC could include them :)

@DerSaidin
Copy link
Contributor

Alternative name possibilities?

  • mod_e
  • emod
  • e_mod
  • euc_mod
  • eucmod

@tspiteri
Copy link

Maybe floor_div and floor_rem can be used: whereas div and rem truncate the quotient towards zero, the floor_ counterparts round the quotient down.

// Comparison of the behaviour of Rust's truncating division
// and remainder, vs Euclidean division & modulo.
(-8 / 3, -8 % 3) // (-2, -2)
(-8.div_e(3), -8.mod_e(3)) // (-3, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. has higher precedence than unary minus. This will be evaluated as -( 8.mod_e(3) ) i.e. -2. You want (-8).mod_e(3).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

@varkor
Copy link
Member Author

varkor commented Oct 18, 2017

@est31: I ran some benchmarks before making the RFC comparing the implementation for Euclidean modulo as suggested in the RFC with the branchless variant, and the RFC version turned out to be ~5x faster. It could be that the branch prediction is making the benchmarks nonrepresentative, so if you have any results to the contrary, please do share them!

@tspiteri: Flooring division and modulo is subtly different (see this paper for more details — essentially Euclidean modulo always returns a nonnegative value), so avoiding the terminology floor would be preferable, to avoid confusion.

@varkor
Copy link
Member Author

varkor commented Oct 18, 2017

Regarding the name: I'm happy with a change of name, provided it doesn't become cumbersome — I'd personally prefer the operation name to come first for aiding discoverability (for example, if other methods of rounding are added in the future) via autocomplete, etc. Perhaps div_euc and mod_euc strike the right balance between descriptiveness and succinctness?

@est31
Copy link
Member

est31 commented Oct 18, 2017

@varkor I haven't run any benchmarks and was just guessing that a branchless version is faster. The code path is definitely not hot so I didn't care much.

@tspiteri
Copy link

@varkor Yes, flooring and Euclidean division are different; to use floor_div and floor_rem flooring division would of course need to be used. I much prefer flooring over Euclidean; in Wikipedia's list of modulo operators in various programming languages, flooring division is supported by 71 languages including Common Lisp, Clojure, Haskell, Java, Mathematica, MATLAB, Python and Ruby, while Euclidean division is supported by 8 languages including Maple and Scheme. Also, for example the GMP bignum library supports truncating and flooring division, while not supporting Euclidean division. The paper you linked to makes some arguments for Euclidean over flooring, but I think the arguments are simply a matter of taste, while the similarity with other software is a concrete advantage of flooring over Euclidean.

@tspiteri
Copy link

@est31 Your version can overflow. (50i8 % 100i8) + 100i8 overflows.

@fanzier
Copy link

fanzier commented Oct 18, 2017

@tspiteri I agree that it makes sense to consider what other languages do but it shouldn't be the only argument -- Rust does a lot of things better/differently than other languages. I don't see how differences to other software would be a disadvantage for Rust's modulo function, given that the modulo operator % is already the same as in C, Java etc.

You can also view the bandwagon argument from the opposite side: In mathematics, Euclidean modulo is standard that everyone uses, presumably because it has the most regular properties. I also don't think the arguments in the paper are a matter of taste: the code examples really are more uniform if Euclidean division is used. That is anecdotal evidence but I don't think we have any better kind of evidence.

That said, I don't care very much whether it's flooring or Euclidean modulo. I just think the discussion should include more than personal preference and the bandwagon argument. (Also, I'm kind of afraid this proposal will die because of bikeshedding about this.)

@scottmcm
Copy link
Member

In mathematics, Euclidean modulo is standard that everyone uses

Can you elaborate on usage of negative-divisor modulo in mathematics? I'm only familiar with strictly-positive ones, and for that flooring and euclidean are the same.

@fstirlitz
Copy link

Rambling a bit here:

Parroting mathematical usage can just as well be considered 'the bandwagon argument': π is the standard that everyone uses, mostly for the sake of tradition, even though τ has better mathematical properties.

Euclidean division is defined the way it is because it aligns well with Euclid's division theorem for integers (which requires the remainder to satisfy 0 ≤ r < |d|). This theorem in turn is formulated this way presumably because this is the most concise way to make the division result unique.

But this property cannot be maintained in general Euclidean domains (the 'proper' structure to study division-with-remainder), where a total order relation may fail to exist, nor there may be any way to make the result 'naturally' unique (Gaussian integers, polynomials). From this more general perspective, floor division is just as good a definition of division-with-remainder as the 'Euclidean' division. I see no properly mathematical reason to prefer one over the other.

And I don't recall any programming problem where a division-with-remainder by a negative divisor is meaningful, so from this point of view, there also seems to be no reason to prefer either. (Well, I think I can imagine a problem where the choice of definition may conceivably matter, but I still don't see how either is more advantageous).

That perspective, however, would suggest also adding a ceiling division operator (it's the answer to the question 'how many buckets of capacity n do you need to contain N items?') and a combined division-modulo operation (which answers the problem 'given a scalar offset into a two-dimensional array, what are its corresponding two-dimensional cordinates?'). Admittedly the former can be expressed with (N + n - 1) / n, but that behaves badly in the presence of overflow; while the latter can be probably taken care of by the optimiser, but you may want not to rely on that, and it's still neater to write let (y, x) = offset.divmod(stride) than to compute the quotient and remainder separately.

@tmccombs
Copy link

+1 for a combined division-modulo operation.

Rename `div_e` and `mod_e` to `div_euc` and `mod_euc`, respectively. This is more descriptive whilst still being relatively concise.
@varkor
Copy link
Member Author

varkor commented Oct 23, 2017

I've updated the method names in the proposal with a more descriptive _euc suffix, which is clearer, whilst still remaining relatively concise.

Additionally, I've extended the RFC slightly to include Euclidean division and modulo methods for f32 and f64, as there was support in this thread for a complete implementation (and the floating-point exclusion did seem like a gap in the RFC before).

Regarding the discussion about flooring versus Euclidean division/modulo that arose: I think @fstirlitz summed it up well; in practice, there will very rarely be a difference between the results of flooring and Euclidean modulo/division — taking the negative modulo of a number is a very uncommon operation: it's just a matter of deciding what the behaviour in this particular edge case should be. My feeling was, and the general feeling on the internals thread seemed to be, that Euclidean modulo was the least surprising of the two (and mathematical consistency is a bonus).

I'd be hesitant to add a combined division-modulo method in this RFC — it seems like a separate, though tangentially related, issue — and it'd be better not to crowd this RFC with new methods.

@varkor
Copy link
Member Author

varkor commented Feb 13, 2018

Following on from @est31's point: I don't see the advantage to spelling the name euclidean out fully (when other methods like mul/div, etc. are shortened). Anyone who knows what Euclideon modulo is will know what euc stands for. On the other hand, anyone who doesn't know won't be enlightened by reading the full name — they'll likely just head for the docs, which is equally elucidating for any name. Considering that we expect mod_euc to be reasonably common (especially in particular domains like mathematics or game development), making it significantly longer seems disadvantageous.

I'd like to know what other people think about changing the return type before making any changes, though. It'd be really nice to finalise this RFC at last.

@tspiteri
Copy link

tspiteri commented Feb 13, 2018

About the return type, it depends on whether this is meant to look like the other arithmetic operators, or like other inherent functions. For operators on primitives I would expect the output to be like self, whereas for a non-operator-like inherent function a different return type is not something new, see for example i32::count_ones.

@tspiteri
Copy link

And another thing, returning unsigned values suggests to me that this is only intended for indexing, not as a normal division operation where operating on signed integers should return a signed integer. But in that case it would make more sense to call it something like cyclic_index.

@varkor
Copy link
Member Author

varkor commented Feb 13, 2018

@tspiteri makes a good point. I think that considering div_euc will return signed values, mod_euc should have the same type (like an operator) — because otherwise the identity x = n * x.div_euc(n) + x.mod_euc(n) can't be expressed without casting, which just seems wrong, as the two functions are intrinsically linked.

It's unfortunate that the signature will mean casting is required in some cases, but the consistency seems more important here.

@varkor
Copy link
Member Author

varkor commented Feb 21, 2018

@sfackler: is it possible to follow up regarding the naming convention — taking into account the points others have made, is there strong motivation for requiring euclidean over euc? I feel most people would prefer shorter names (given that they're unambiguous) over longer ones, for common operations.

After that has been resolved, are there any other issues stalling this RFC from a FCP?

@sfackler
Copy link
Member

@est31 something something foolish consistency something something :P

Yeah, we can figure out the naming as part of stabilization.

@rfcbot fcp merge

@rfcbot
Copy link
Collaborator

rfcbot commented Feb 27, 2018

Team member @sfackler has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Feb 27, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 28, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Feb 28, 2018
@scottmcm
Copy link
Member

because otherwise the identity x = n * x.div_euc(n) + x.mod_euc(n) can't be expressed without casting

But it's also adding new methods, so maybe that just means the set of added methods isn't sufficient yet. Conveniently, there's even naming and semantic precedent for it in f32::mul_add.

You can have x.div_mod(n) == (a, b)a.mul_add(n, b) == x. And those two methods are exactly the ones you'd need for @fstirlitz's coordinate sectoring use case above.

(Their types being div_mod : (iN, uN) -> (iN, uN) and mul_add : (iN, uN, uN) -> iN)

@varkor
Copy link
Member Author

varkor commented Feb 28, 2018

(Their types being div_mod : (iN, uN) -> (iN, uN) and mul_add : (iN, uN, uN) -> iN))

This signature (and similarly if mod_euc were modified in the same way) is hardly better than the current signature: you can't index with a uN unless N is size, so you'd still have to cast in every other case. (And one can't just return usizes, because this may be smaller than the input types.) On top of that, it comes at the cost of consistency.

On the other hand, the current signature is very amenable to modification in the future when implicit widening is introduced, which should eliminate the problems entirely.

(Additionally, for consistency purposes, such a signature for i/uN::mul_add is undesirable: the current signature is mul_add: (fN, fN, fN) -> fN) and having a different signature for integers seems prone to confusion.)

@varkor
Copy link
Member Author

varkor commented Feb 28, 2018

Regarding div_mod specifically: I think there is good reason to add such a method (or family of methods), but think it's outside the scope of this RFC; it's been long enough that I'd rather start with what is currently proposed, and introduce a combined method later (as though it's related, it's orthogonal to the motivations here). (Maybe such a method would even be minor enough to forgo the RFC process?)

@bstrie
Copy link
Contributor

bstrie commented Mar 8, 2018

Hold up, why is rfcbot only listing four members of the libs team? The libs team has eight people on it.

@sfackler
Copy link
Member

sfackler commented Mar 8, 2018

We subdivided the libs team a bit.

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 10, 2018

The final comment period is now complete.

@alexcrichton
Copy link
Member

Alright! FCP has now elapsed and it looks like there were no major things brought up, so I'm going to merge!

Tracking issue

@alexcrichton alexcrichton merged commit dfad2c9 into rust-lang:master Mar 15, 2018
@varkor varkor deleted the euclidean-modulo branch March 15, 2018 15:43
@Centril
Copy link
Contributor

Centril commented Mar 16, 2018

Updated Rendered link + added tracking issue to top post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-arithmetic Arithmetic related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.