-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Integer Guidelines RFC #741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
- Start Date: 2015-01-26 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
|
||
This RFC proposes usage guidelines for the various integer types. | ||
|
||
# Motivation | ||
|
||
The goal of this RFC is to help people decide what integer types to use when they need | ||
to make a decision for a new API. | ||
|
||
It builds on [https://github.com/rust-lang/rfcs/pull/560](the integer overflow RFC), | ||
which provides debug-time assertions for overflow and underflow. One of the goals | ||
of this RFC is to provide guidance for when to use unsigned types, and these | ||
assertions affect the traditional tradeoffs about unsigned types, which are described | ||
under the detailed design below. | ||
|
||
It also draws inspiration from the [Google C++ Style Guide](http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Integer_Types) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Google C++ Style Guide is based on the premise that having a very small language is more valuable than anything else because it makes the code accessible to people who only know Java. Most of the recommendations in it are solely based on that school of thought, which is diametrically opposed to Rust's language design. Their reasoning for forbidding unsigned integers has a lot more to do with removing choice and aligning C++ with Java than anything else. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @thestinger Well, some Googlers might think along those lines but I've never heard anyone at Google say that. Also to commit C++ code, a programmer has to pass a "C++ readability review" or get commit approvals from someone who has. (Ditto in any other language.) The key rationale behind the guidelines is to rule out some gotcha's, make the code more familiar, and avoid code churn due to minor choices. This particular case is most likely a reaction to bugs in production. Indeed, a Rust iterator is a better way to avoid the example count-down bug. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @1fish2: I don't know if any individuals actually think that way, but it's the logic that corporate group think produced for their style guide and the Go language design. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is nonsense, sorry. I read the (constant) internal discussions that precede changes to the Google C++ Style guide, and what you've said above is incorrect. The reasoning is always "what can we do here to prevent nasty bugs" and never has anything to do with Java. |
||
|
||
This RFC attempts to balance several concerns: | ||
|
||
* Rust developers should have an easy, go-to heuristic for deciding what | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If they don't know the bounds, the only valid choice is a big integer. Suggesting anything else is irresponsible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Big ints have bounds too, they're just bigger. Maybe they should be called "bigger ints"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cgaebel "Big Integer" is typically another way of saying "Arbitrary Precision Integer" (it's the name Java uses for it's arbitrary precision integer support). Sure, they are technically bounded (we could run out of system memory), but Rust already doesn't have memory allocation failures. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean to say that replacing an integer with bigint is akin to saying "an unbounded number that overflows its type should be represented by a type that also has a bound, just bigger (and destabilizes your system when it gets too big)". |
||
integer sizes to use when they need to use a number type. | ||
* 32-bit integers are considerably faster than 64-bit integers in some | ||
situations, so using 64-bit integers for tiny numbers, especially in | ||
hot code, can result in unnecessarily slower programs. | ||
* Reflexive use of 32-bit integers too-often results in overflows when the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So does "reflexive" use of 64-bit integers. It happens to hide overflows in common cases, but it's the edge cases that are hard to find in testing which are the most problematic. You're only exasperating the problem by suggesting that intuition should be used to make this decision. |
||
numbers aren't expected to be "laughably smaller" than the maximum | ||
32-bit number. | ||
* Occasional use of `usize` when building a brand new data structure may | ||
be appropriate, but use of `usize` in general can introduce portability | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it is ever appropriate - it only used to be common to use int/uint before the rename because they were easy names to use. I can't imagine why, even when prototyping you would reach for usize first, unless you are using that number for array accesses/memory manipulation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I personally took "data structure" to imply "something related to memory". |
||
hazards when the use-case is not proportional to the amount of | ||
addressable memory. | ||
* Using of unsigned integers is traditionally thought to be error-prone, and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am somewhat cynical of this constraint - I have never come across this school of thought outside the Google style guide linked above. Sure they have invariants you need to be careful not to violate, but no more so than signed integers (they both overflow and underflow, just at different places). Using an unsigned int inappropriately is error prone, but the same goes for signed ints. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More aspects about unsigned integers:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
style guides often suggest avoiding them. That said, Rust's unsigned integers | ||
have built-in underflow assertions, which changes the analysis. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You seem to forget that that RFC is still open. Or maybe that's just distraction. |
||
|
||
# Detailed design | ||
|
||
When deciding what integer type to use for a given situation, you can make the decision | ||
to a first approximation through this heuristic: | ||
|
||
* If you expect all uses of this API to use numbers "laughably smaller" then 32-bits, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is an inappropriate way to make the decision. If you don't know the bounds, you need a big integer or you need to implement a way of enforcing sane bounds. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree - 'expect' and 'laughably smaller' are appeals to intuition and this is an area where things are famously unintuitive. Either you have guarantees on the bounds, or you insert assertions, or you use a BigInt, anything else is hand-waving. Also, how much laughably smaller does something have to be before you should use 8 or 16 bit integers? |
||
use 32-bit numbers. Otherwise, prefer 64-bit numbers. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'd be helpful to point out that the maximum There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally, I start using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Choosing integer sizes based on intuition is exactly the problem. It's incorrect and irresponsible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not to imply that there aren't appropriate times to use 8- and 16-bit numbers, esp. when you have many of them or the domain is, say, 32-bit ARGB pixels. In all cases, programmers should remember to analyze the range of intermediate values as well as the range of stored values. E.g. can the intermediate values for your unsigned computation go negative? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are plenty of correct choices: a) use a fixed-size integer type covering the entire possible range Picking a type based on intuition and hoping for the best is not one of them. If you're throwing a 64-bit integer at the problem, it's doubtful that you're going to catch anything with assertions while testing or even most fuzzing but carefully crafted input or just natural edge cases in the wild will do it. It's still going to be broken, just in a less obvious way. |
||
* If you are sure that the number will never be less than 0, use unsigned integers. | ||
Otherwise, use signed numbers. | ||
* If you are building a new data structure, and your number refers to the size of memory, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Make this more precise as "... your number is bounded by the number of bytes in memory". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like it! |
||
you may want to use `usize`, which is described below. | ||
|
||
> Note: Rust does not yet have BigInts in `std`, which might, in theory, be a better | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These guidelines are brain-dead, not simply worse "in theory" |
||
> go-to big integer type. These guidelines will probably be revised in the future | ||
> once that changes. | ||
|
||
## Signed vs. Unsigned Integers | ||
|
||
Traditionally, style guides for low-level languages have [warned against the use | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The fact that one awful style guide makes this recommendation doesn't make it "traditional". |
||
of unsigned integers](http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Integer_Types) because | ||
they can introduce bugs when comparing unsigned value with 0 | ||
(`for (unsigned int i = foo.len() - 1; i >= 0; --i) ...`). The Google Style Guide | ||
argues that instead of unsigned integers, programmers should use signed | ||
integers and assertions that the value does not go below 0. | ||
|
||
In Rust, unsigned integers have underflow checking assertions built-into | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would something like this be caught?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. That’s in the plan. |
||
the type (assuming that RFC XXX is accepted), so using a `u32` is equivalent | ||
to the advice in Google style guide (with a larger maximum value). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can often find situations where a break statement in such a for loop means that it rarely hits zero (and thus rarely decrements below zero). Where, by "often"... I've seen such happen. Other examples include correct code that "subtracts first", e.g. x - 1 + y, where y is positive, where the ephemeral negative value isn't a bad thing. Signed types avoid such edge cases. You can consider me firmly in the camp of signed favoritism. Ideally, use signed types everywhere and never encounter unsigned types. Unfortunately the real situation is that you should use whatever matches best with the libraries and interfaces you're using. It's ridiculous that some RFC would deign to decide this question. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's reasonable for there to be general guidelines for stuff like this when there's a strong consensus, but I don't think there can be one for these issues. There is little to no evidence in favour of any specific choices, just a lot of dubious claims from every side. |
||
|
||
## The "size" Types | ||
|
||
The `isize` and `usize` types represent numbers that are proportional to | ||
the size of available memory. Most normal uses of `isize` and `usize` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. Also I think it's good to bring up the fact that these types scale both up and down with the address space. ("Address space" is more accurate than "available memory.") People should beware of using |
||
should arise through the use of existing data structures that expose | ||
values with those types. | ||
|
||
For example, if a new structure wanted to store the length of a `Vec` in | ||
one of its fields, it should store it as a `usize`, because that's the return | ||
value of `vec.len()`. | ||
|
||
When building entirely new low-level data structures, you may want to use | ||
`usize` to represent a value (e.g. node id) that scales linearly with the amount | ||
of addressable memory. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A common use case for usize is for representing pointers as numbers. E.g., if you want to do pointer arithmetic, then pointers are usize and offsets are isize. Also, indexes into arrays. These two should cover the more abstract concept of 'scaling with addressable memory' |
||
## Mixing and Matching (Casting) | ||
|
||
You may occasionally encounter a situation where the integer types provided | ||
by one API does not match your own storage for the value or the integer | ||
type taken by another API. | ||
|
||
If you are working with your own storage, you should change your own | ||
internal storage to match the value you have received. For example, if | ||
you have a field that stores a number of nanoseconds as a `u32`, and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A nanosecond is a billionth of a second so a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reading this I would at first think the nanoseconds field was designed as part of a seconds, nanoseconds pair, such as in a struct timespec. My first read is that this was a bad example because what you'd actually be doing is dividing and modding the result of |
||
`precise_time_nanos` returns a time as `u64`, you should change your | ||
field to a `u64`. In this situation, you should avoid casting the value. | ||
|
||
If you are passing a value provided by one API into another API, is it | ||
generally safe to cast a **smaller** sized value into a **larger** sized | ||
value. | ||
|
||
For example, it is generally safe to cast an `i32` to an `i64`. It is also | ||
generally safe to case a `usize` to a `u64`, since it will never be bigger | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not expected to exceed u64 in a foreseeable future, but certainly will… at some time. |
||
than `u64`. | ||
|
||
However, you should avoid **truncating** casts, which cast a larger sized | ||
integer to a smaller sized integer. You should also avoid casting between | ||
signed and unsigned integers (in either direction). | ||
|
||
If you need to truncate a number, or cast between signed and unsigned | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section has very good advice, which is a stark contrast to the earlier advice to make an intuitive guess about the necessary size. It's funny how one the strongest supporters of having a small standard library is willing to make awful recommendations based on big integers not being in the standard library. |
||
types, you should carefully consider what will happen if the source number | ||
is outside the bounds of the target type, and handle that case explicitly. | ||
|
||
# Drawbacks | ||
|
||
The primary drawback of this approach is the same as the drawback of | ||
not having a default integer type in the first place: unnecessary | ||
incompatibilities between APIs that chose different sides of the API | ||
tradeoff. | ||
|
||
In today's Rust, a library or program that chooses to use `u32` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the code was designed sanely by considering or implementing bounds instead of simply making a dubious intuitive guess, this wouldn't come up. |
||
in "laughably small" cases would be incompatible with libraries | ||
that used the more conservative `u64`. This will result in more | ||
casting overall, likely making people less concerned when they | ||
perform more dangerous, truncating casts. | ||
|
||
We have previously discussed the possibility of automatically | ||
performing "widening" casts (`u32` to `u64`). This would allow | ||
libraries to freely be as conservative as they want, and | ||
reserve casts for dangerous situations that truly require | ||
thought. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another idea discussed is to use different calls for safe same-or-widening vs. possibly-narrowing casts so the latter ones are easier to find and audit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for implicit widening, I'd like to see an RFC for this, probably after 1.0 |
||
|
||
# Alternatives | ||
|
||
We could decide not to issue guidance here, instead relying | ||
on sub-ecosystems in Rust to make decisions appropriate to | ||
their domains. The drawback of that approach is that developers | ||
new to Rust will lack the benefit of any heuristic, and will | ||
be more likely to cargo-cult solutions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're the one proposing cargo-cult solutions like guessing based on intuition. It's cute that you're misrepresenting the other side of the issue this way though. |
||
|
||
We could decide to encourage the use of `u64` even for small | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Simply using larger integer types doesn't prevent overflow. It does make the assertions / tests almost completely useless at catching these issues, so if anything it exasperates the issue. Overflow is only prevented when either the integer type is chosen to always cover the necessary bounds or artificial bounds are implemented and correctly enforced. |
||
integers. While this would have the benefit of being | ||
significantly simpler, it would also mean that programs using | ||
tiny numbers would be unnecessarily slow. Because of this, | ||
it is doubtful that many people would even follow such a | ||
guideline, making it a dead letter and being practically | ||
similar to having no guideline at all. | ||
|
||
We could also encourage the use of `u32` as a default integer, | ||
discouraging the use of `u64` unless the user has a specific | ||
reason to believe the number would exceed `u32`. This has | ||
two problems. | ||
|
||
1. This guideline doesn't work very well in libraries, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Stating that there aren't known bounds is equivalent to stating that a big integer is required. Your portrayal of the pros and cons throughout this proposal isn't based in reality. It's just a bunch of circular reasoning and "feel good" solutions to the problem. |
||
which don't usually have a very concrete understanding | ||
of what integers they will be used with. | ||
2. We want to discourage the use of `u32` in situations | ||
where 32-bit overflow is a possibility. The | ||
"laughably smaller" guideline, originally proposed by | ||
@Valloric, captures most of the performance-critical | ||
cases involving small numbers without reintroducing | ||
significant opportunities for 32-bit overflows. | ||
|
||
# Unresolved questions | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ad hominems are not appropriate in an RFC, please stick to the technical arguments. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can take your self-righteous, condescending bullshit somewhere else. @wycats is the one who decided to question my competence, and you're just making yourself look like a fool for supporting your own here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ITT: Mozilla employee supporting a Mozilla contractor calling a volunteer contribute an idiot and attacking the quality of their contributions without provocation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I unreservedly apologize for using the phrase "technically sloppy" to refer to your work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a non-Mozilla employee and someone who has never met anyone in the upper echelons of Rust development, I found your comment here inappropriate and find the personal drama distracting and irrelevant in general. I nearly always find your technical arguments compelling, and worry that they won't get the credence they deserve due to your frequently adversarial approach, which is a shame for the community as a whole. It seems like @nick29581 was just trying to keep the thread on track, which is certainly one of the roles of a core team member on a project like this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It wasn't intended to be relevant or productive. I'm not here to help Mozilla increase their productivity by greasing the wheels and keeping threads on track. I respect people as a default but it can be lost, as is the case here. The example set by the core developers is that everything is personal. They're allowed to do whatever they want and anyone who speaks up against them is silenced by appealing to their own authority and policies. Rust's code of conduct isn't worth anything because it's just another policy to selectively apply and abuse. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's an odd response, because you've been a primary beneficiary of selective CoC enforcement. People excuse your abusive behavior time and time again due to the quality of your technical contributions. I am pretty sick of it, and I think it's coming to an end. When you are eventually banned through consistent enforcement of the CoC, I'm sure you'll frame it as proof of the bogus claim you're making here. That's fine, whatever, but I hope nobody falls for the rather obvious set-up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Strongly worded criticism of an idea isn't abusive. Shitting on all of the contributions someone has made to the project and calling them an idiot is abusive. A satirical remark responding to that insult is not abusive. The only bogus claims being made in this thread are your own. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both are abusive. I am partial to the idea that abusive content should be punished more than abusive form, but I'd rather see less of both. I think people should be nice and courteous, both in form and content. |
||
Should we implement automatic widening coercions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^ the URL and link text are reversed in this markup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.