Skip to content

Integer Guidelines RFC #741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 164 additions & 0 deletions text/0000-int-guidelines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
- Start Date: 2015-01-26
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary

This RFC proposes usage guidelines for the various integer types.

# Motivation

The goal of this RFC is to help people decide what integer types to use when they need
to make a decision for a new API.

It builds on [https://github.com/rust-lang/rfcs/pull/560](the integer overflow RFC),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ the URL and link text are reversed in this markup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

which provides debug-time assertions for overflow and underflow. One of the goals
of this RFC is to provide guidance for when to use unsigned types, and these
assertions affect the traditional tradeoffs about unsigned types, which are described
under the detailed design below.

It also draws inspiration from the [Google C++ Style Guide](http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Integer_Types)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Google C++ Style Guide is based on the premise that having a very small language is more valuable than anything else because it makes the code accessible to people who only know Java. Most of the recommendations in it are solely based on that school of thought, which is diametrically opposed to Rust's language design. Their reasoning for forbidding unsigned integers has a lot more to do with removing choice and aligning C++ with Java than anything else.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thestinger Well, some Googlers might think along those lines but I've never heard anyone at Google say that. Also to commit C++ code, a programmer has to pass a "C++ readability review" or get commit approvals from someone who has. (Ditto in any other language.)

The key rationale behind the guidelines is to rule out some gotcha's, make the code more familiar, and avoid code churn due to minor choices.

This particular case is most likely a reaction to bugs in production.

Indeed, a Rust iterator is a better way to avoid the example count-down bug.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1fish2: I don't know if any individuals actually think that way, but it's the logic that corporate group think produced for their style guide and the Go language design.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Google C++ Style Guide is based on the premise that having a very small language is more valuable than anything else because it makes the code accessible to people who only know Java. Most of the recommendations in it are solely based on that school of thought, which is diametrically opposed to Rust's language design. Their reasoning for forbidding unsigned integers has a lot more to do with removing choice and aligning C++ with Java than anything else.

This is nonsense, sorry. I read the (constant) internal discussions that precede changes to the Google C++ Style guide, and what you've said above is incorrect. The reasoning is always "what can we do here to prevent nasty bugs" and never has anything to do with Java.


This RFC attempts to balance several concerns:

* Rust developers should have an easy, go-to heuristic for deciding what

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they don't know the bounds, the only valid choice is a big integer. Suggesting anything else is irresponsible.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big ints have bounds too, they're just bigger. Maybe they should be called "bigger ints"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgaebel "Big Integer" is typically another way of saying "Arbitrary Precision Integer" (it's the name Java uses for it's arbitrary precision integer support).

Sure, they are technically bounded (we could run out of system memory), but Rust already doesn't have memory allocation failures.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean to say that replacing an integer with bigint is akin to saying "an unbounded number that overflows its type should be represented by a type that also has a bound, just bigger (and destabilizes your system when it gets too big)".

integer sizes to use when they need to use a number type.
* 32-bit integers are considerably faster than 64-bit integers in some
situations, so using 64-bit integers for tiny numbers, especially in
hot code, can result in unnecessarily slower programs.
* Reflexive use of 32-bit integers too-often results in overflows when the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does "reflexive" use of 64-bit integers. It happens to hide overflows in common cases, but it's the edge cases that are hard to find in testing which are the most problematic. You're only exasperating the problem by suggesting that intuition should be used to make this decision.

numbers aren't expected to be "laughably smaller" than the maximum
32-bit number.
* Occasional use of `usize` when building a brand new data structure may
be appropriate, but use of `usize` in general can introduce portability
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is ever appropriate - it only used to be common to use int/uint before the rename because they were easy names to use. I can't imagine why, even when prototyping you would reach for usize first, unless you are using that number for array accesses/memory manipulation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally took "data structure" to imply "something related to memory".

hazards when the use-case is not proportional to the amount of
addressable memory.
* Using of unsigned integers is traditionally thought to be error-prone, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat cynical of this constraint - I have never come across this school of thought outside the Google style guide linked above. Sure they have invariants you need to be careful not to violate, but no more so than signed integers (they both overflow and underflow, just at different places). Using an unsigned int inappropriately is error prone, but the same goes for signed ints.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More aspects about unsigned integers:

  • Unsigned integers are fitting for bit patterns, hash codes, wraparound mod 2^N, and values that aren't really numbers.
  • Since Rust requires explicit conversions from signed to unsigned integers, some C/C++ cautionary cases don't apply. [My mistake in not realizing this sooner.] E.g. it's tempting to make a C function take an uint32_t so it doesn't have to deal with negative inputs but it's easy in C to accidentally pass in a negative integer. Now the function can't detect that bad input unless it has a suitable upper bound.
    [BTW, "polymorphic array indexing" would make it OK to index a Rust array with a signed integer. Rustc could implement a signed bounds check with a single unsigned comparison (at usize or larger width) since the upper bound cannot exceed isize::MAX.]
  • Hopefully rustc will always complain about i >= 0 for unsigned i.
  • There are plenty of cases to be careful about exceeding the unsigned conceptual domain. Making a value unsigned looks better than it works out.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It's easy for values (esp. intermediate values) to go negative.
  • Be careful. Don't code like my brother.

style guides often suggest avoiding them. That said, Rust's unsigned integers
have built-in underflow assertions, which changes the analysis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to forget that that RFC is still open. Or maybe that's just distraction.


# Detailed design

When deciding what integer type to use for a given situation, you can make the decision
to a first approximation through this heuristic:

* If you expect all uses of this API to use numbers "laughably smaller" then 32-bits,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expect

This is an inappropriate way to make the decision. If you don't know the bounds, you need a big integer or you need to implement a way of enforcing sane bounds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - 'expect' and 'laughably smaller' are appeals to intuition and this is an area where things are famously unintuitive. Either you have guarantees on the bounds, or you insert assertions, or you use a BigInt, anything else is hand-waving.

Also, how much laughably smaller does something have to be before you should use 8 or 16 bit integers?

use 32-bit numbers. Otherwise, prefer 64-bit numbers.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be helpful to point out that the maximum i32 is about 2 billion, and the maximum u32 is about 4 billion. So beware of using these types for any number that gets anywhere near a billion, considering that intermediate values may be larger than stored values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I start using u64 for numbers in the many-millions. Once you get into that territory, it's hard to predict just how big they'll get.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choosing integer sizes based on intuition is exactly the problem. It's incorrect and irresponsible.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to imply that there aren't appropriate times to use 8- and 16-bit numbers, esp. when you have many of them or the domain is, say, 32-bit ARGB pixels.

In all cases, programmers should remember to analyze the range of intermediate values as well as the range of stored values. E.g. can the intermediate values for your unsigned computation go negative?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are plenty of correct choices:

a) use a fixed-size integer type covering the entire possible range
b) use a big integer type
c) enforce the necessary bounds on inputs to prevent overflow
d) check for overflow, and handle it sanely
e) design with overflow in mind, producing correct results even when it happens

Picking a type based on intuition and hoping for the best is not one of them. If you're throwing a 64-bit integer at the problem, it's doubtful that you're going to catch anything with assertions while testing or even most fuzzing but carefully crafted input or just natural edge cases in the wild will do it. It's still going to be broken, just in a less obvious way.

* If you are sure that the number will never be less than 0, use unsigned integers.
Otherwise, use signed numbers.
* If you are building a new data structure, and your number refers to the size of memory,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Make this more precise as "... your number is bounded by the number of bytes in memory".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

you may want to use `usize`, which is described below.

> Note: Rust does not yet have BigInts in `std`, which might, in theory, be a better

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These guidelines are brain-dead, not simply worse "in theory"

> go-to big integer type. These guidelines will probably be revised in the future
> once that changes.

## Signed vs. Unsigned Integers

Traditionally, style guides for low-level languages have [warned against the use

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that one awful style guide makes this recommendation doesn't make it "traditional".

of unsigned integers](http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Integer_Types) because
they can introduce bugs when comparing unsigned value with 0
(`for (unsigned int i = foo.len() - 1; i >= 0; --i) ...`). The Google Style Guide
argues that instead of unsigned integers, programmers should use signed
integers and assertions that the value does not go below 0.

In Rust, unsigned integers have underflow checking assertions built-into

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this be caught?

let x: u32 = 4000000000;
let y: i32 = x as i32;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That’s in the plan.

the type (assuming that RFC XXX is accepted), so using a `u32` is equivalent
to the advice in Google style guide (with a larger maximum value).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can often find situations where a break statement in such a for loop means that it rarely hits zero (and thus rarely decrements below zero). Where, by "often"... I've seen such happen. Other examples include correct code that "subtracts first", e.g. x - 1 + y, where y is positive, where the ephemeral negative value isn't a bad thing. Signed types avoid such edge cases. You can consider me firmly in the camp of signed favoritism. Ideally, use signed types everywhere and never encounter unsigned types. Unfortunately the real situation is that you should use whatever matches best with the libraries and interfaces you're using.

It's ridiculous that some RFC would deign to decide this question.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's reasonable for there to be general guidelines for stuff like this when there's a strong consensus, but I don't think there can be one for these issues. There is little to no evidence in favour of any specific choices, just a lot of dubious claims from every side.


## The "size" Types

The `isize` and `usize` types represent numbers that are proportional to
the size of available memory. Most normal uses of `isize` and `usize`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Also I think it's good to bring up the fact that these types scale both up and down with the address space. ("Address space" is more accurate than "available memory.") People should beware of using isize or usize where it might not suffice in 32-bit or 16-bit address spaces.

should arise through the use of existing data structures that expose
values with those types.

For example, if a new structure wanted to store the length of a `Vec` in
one of its fields, it should store it as a `usize`, because that's the return
value of `vec.len()`.

When building entirely new low-level data structures, you may want to use
`usize` to represent a value (e.g. node id) that scales linearly with the amount
of addressable memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common use case for usize is for representing pointers as numbers. E.g., if you want to do pointer arithmetic, then pointers are usize and offsets are isize.

Also, indexes into arrays.

These two should cover the more abstract concept of 'scaling with addressable memory'

## Mixing and Matching (Casting)

You may occasionally encounter a situation where the integer types provided
by one API does not match your own storage for the value or the integer
type taken by another API.

If you are working with your own storage, you should change your own
internal storage to match the value you have received. For example, if
you have a field that stores a number of nanoseconds as a `u32`, and
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nanosecond is a billionth of a second so a u32 can only hold 4 seconds in nanoseconds. This case is a good example for "not laughably small" even if precise_time_nanos didn't push us to u64.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this I would at first think the nanoseconds field was designed as part of a seconds, nanoseconds pair, such as in a struct timespec. My first read is that this was a bad example because what you'd actually be doing is dividing and modding the result of precise_time_nanos by a billion, but I guess it's a bad because, if you're not doing that, it should never have been a u32 in the first place. (And if you are doing that, u32 or i32 is appropriate.)

`precise_time_nanos` returns a time as `u64`, you should change your
field to a `u64`. In this situation, you should avoid casting the value.

If you are passing a value provided by one API into another API, is it
generally safe to cast a **smaller** sized value into a **larger** sized
value.

For example, it is generally safe to cast an `i32` to an `i64`. It is also
generally safe to case a `usize` to a `u64`, since it will never be bigger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not expected to exceed u64 in a foreseeable future, but certainly will… at some time.

than `u64`.

However, you should avoid **truncating** casts, which cast a larger sized
integer to a smaller sized integer. You should also avoid casting between
signed and unsigned integers (in either direction).

If you need to truncate a number, or cast between signed and unsigned

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section has very good advice, which is a stark contrast to the earlier advice to make an intuitive guess about the necessary size. It's funny how one the strongest supporters of having a small standard library is willing to make awful recommendations based on big integers not being in the standard library.

types, you should carefully consider what will happen if the source number
is outside the bounds of the target type, and handle that case explicitly.

# Drawbacks

The primary drawback of this approach is the same as the drawback of
not having a default integer type in the first place: unnecessary
incompatibilities between APIs that chose different sides of the API
tradeoff.

In today's Rust, a library or program that chooses to use `u32`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code was designed sanely by considering or implementing bounds instead of simply making a dubious intuitive guess, this wouldn't come up.

in "laughably small" cases would be incompatible with libraries
that used the more conservative `u64`. This will result in more
casting overall, likely making people less concerned when they
perform more dangerous, truncating casts.

We have previously discussed the possibility of automatically
performing "widening" casts (`u32` to `u64`). This would allow
libraries to freely be as conservative as they want, and
reserve casts for dangerous situations that truly require
thought.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea discussed is to use different calls for safe same-or-widening vs. possibly-narrowing casts so the latter ones are easier to find and audit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for implicit widening, I'd like to see an RFC for this, probably after 1.0


# Alternatives

We could decide not to issue guidance here, instead relying
on sub-ecosystems in Rust to make decisions appropriate to
their domains. The drawback of that approach is that developers
new to Rust will lack the benefit of any heuristic, and will
be more likely to cargo-cult solutions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're the one proposing cargo-cult solutions like guessing based on intuition. It's cute that you're misrepresenting the other side of the issue this way though.


We could decide to encourage the use of `u64` even for small

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply using larger integer types doesn't prevent overflow. It does make the assertions / tests almost completely useless at catching these issues, so if anything it exasperates the issue. Overflow is only prevented when either the integer type is chosen to always cover the necessary bounds or artificial bounds are implemented and correctly enforced.

integers. While this would have the benefit of being
significantly simpler, it would also mean that programs using
tiny numbers would be unnecessarily slow. Because of this,
it is doubtful that many people would even follow such a
guideline, making it a dead letter and being practically
similar to having no guideline at all.

We could also encourage the use of `u32` as a default integer,
discouraging the use of `u64` unless the user has a specific
reason to believe the number would exceed `u32`. This has
two problems.

1. This guideline doesn't work very well in libraries,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stating that there aren't known bounds is equivalent to stating that a big integer is required. Your portrayal of the pros and cons throughout this proposal isn't based in reality. It's just a bunch of circular reasoning and "feel good" solutions to the problem.

which don't usually have a very concrete understanding
of what integers they will be used with.
2. We want to discourage the use of `u32` in situations
where 32-bit overflow is a possibility. The
"laughably smaller" guideline, originally proposed by
@Valloric, captures most of the performance-critical
cases involving small numbers without reintroducing
significant opportunities for 32-bit overflows.

# Unresolved questions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is @wycats claim that I lack technical rigour in my arguments and coding based on an inferiority complex?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ad hominems are not appropriate in an RFC, please stick to the technical arguments.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can take your self-righteous, condescending bullshit somewhere else. @wycats is the one who decided to question my competence, and you're just making yourself look like a fool for supporting your own here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ITT: Mozilla employee supporting a Mozilla contractor calling a volunteer contribute an idiot and attacking the quality of their contributions without provocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thestinger

Is @wycats claim that I lack technical rigour in my arguments and coding based on an inferiority complex?

I unreservedly apologize for using the phrase "technically sloppy" to refer to your work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a non-Mozilla employee and someone who has never met anyone in the upper echelons of Rust development, I found your comment here inappropriate and find the personal drama distracting and irrelevant in general. I nearly always find your technical arguments compelling, and worry that they won't get the credence they deserve due to your frequently adversarial approach, which is a shame for the community as a whole.

It seems like @nick29581 was just trying to keep the thread on track, which is certainly one of the roles of a core team member on a project like this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't intended to be relevant or productive. I'm not here to help Mozilla increase their productivity by greasing the wheels and keeping threads on track. I respect people as a default but it can be lost, as is the case here. The example set by the core developers is that everything is personal. They're allowed to do whatever they want and anyone who speaks up against them is silenced by appealing to their own authority and policies. Rust's code of conduct isn't worth anything because it's just another policy to selectively apply and abuse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an odd response, because you've been a primary beneficiary of selective CoC enforcement. People excuse your abusive behavior time and time again due to the quality of your technical contributions. I am pretty sick of it, and I think it's coming to an end.

When you are eventually banned through consistent enforcement of the CoC, I'm sure you'll frame it as proof of the bogus claim you're making here. That's fine, whatever, but I hope nobody falls for the rather obvious set-up.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly worded criticism of an idea isn't abusive. Shitting on all of the contributions someone has made to the project and calling them an idiot is abusive. A satirical remark responding to that insult is not abusive. The only bogus claims being made in this thread are your own.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are abusive. I am partial to the idea that abusive content should be punished more than abusive form, but I'd rather see less of both. I think people should be nice and courteous, both in form and content.

Should we implement automatic widening coercions?