Skip to content

Efficient integer formatting into fixed-size buffer #546

Closed
@hanna-kruppe

Description

@hanna-kruppe

Proposal

Problem statement

The standard library provides highly optimized implementations of integer to decimal string conversions, but these are only accessible via the core::fmt machinery, which forces 1-2 layers of dynamic dispatch between user code and the actual formatting logic. Benchmarks in the itoa crate demonstrate that side-stepping fmt makes formatting much more efficient. Currently, any Rust user who wants that performance has to use third-party crates like itoa or lexical(-core) which essentially duplicate standard library functionality.

Motivating examples or use cases

Solution sketch

impl {iN, uN, usize, isize} {
    const MAX_STR_LEN: usize;
    fn format_into(self, buf: &mut [MaybeUninit<u8>; Self::MAX_STR_LEN]) -> &str;
}

This can be used from safe code, though it's a little more noisy than the itoa API since MaybeUninit::uninit_array is slated for removal:

use_str(itoa::Buffer::new().format(n));
use_str(n.format_into(&mut [const { MaybeUninit::uninit() }; TypeOfN::MAX_STR_LEN]));
// With uninit_array the length could be inferred:
use_str(n.format_into(&mut MaybeUninit::uninit_array()));

Alternatively, unsafe code can write directly into the buffer they want, e.g., for the itoa usage in http could write directly into the spare_capacity_mut() of the BytesMut it creates. I believe it could also replace the homebrew integer formatting in rustix::DecInt. (edit: not so simple, see first reply)

Alternatives

The obvious option would be to import the API of itoa directly (single Buffer type with fn format(&mut self, n: impl SealedTrait) -> &str), since it's already widely used. However:

  • Not being able to format directly into part of a buffer you own is insufficient for some users, who end up vendoring their own integer formatting code (e.g., rustix and compact_str as mentioned under motivation). (edit: not so simple, see first reply)
  • If Rust later adds const-generic u<N> and i<N> types with generous limits on N (e.g., Generic Integers V2: It's Time rfcs#3686) then a one-size-fits-all buffer may become excessively large. Even with a limit of N <= 4096, it would be well over a kilobyte. Even though the buffer doesn't have to be initialized, it'll still increase the stack frame size, which can have undesirable side effects on code generation (stack probes are generally inserted for frames larger than one page, SP-relative loads and stores need larger offsets that may no longer fit into an immediate operand).
  • The trait is extra API surface. While it's useful to expose (for bounds and for accessing the associated constant MAX_STR_LEN), the general trend in the standard library is to have inherent associated functions and constants on every integer type, not traits implemented for every integer type.

Other alternatives:

  • lexical-core has fn write(n: impl SomeTrait, buf: &mut [u8]) -> &[u8], but this requires unsafe to get a string out of it, and even if the return type is changed to &str, it can panic if the buffer is too small for the given n and requires a fully initialized buffer.
  • lexical has fn to_string(n: impl SomeTrait) -> String but this requires alloc (not just core) and does an unnecessary heap allocation when the result is immediately copied into another buffer.

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ACP-acceptedAPI Change Proposal is accepted (seconded with no objections)T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions