implement ryu 64-bit backend #19484

tiehuis · 2024-03-29T23:01:46Z

The 64-bit backend supports printing all floats up to 64-bits. The 128-bit continues to be used for larger values.

This implementation uses the same code-paths as the 128-bit, parameterised by a new table structure.

I have fuzzed the 128-bit backend against the 64-bit backend and found no differences in output (shortest mode) for all f16, f32, and ~1 trillion f64. Behaviour is expected to be identical.

Performance

~3x faster than the 128-bit backend. ReleaseSmall notably is ~7x faster.

Master

# ReleaseFast
perf: type=f64 backend=std seed=1
112.36ns per trial (1000000 trials) (check 0x2e00419)
# ReleaseSmall
perf: type=f64 backend=std seed=1
346.16ns per trial (1000000 trials) (check 0x2e00419)

This PR

# ReleaseFast
perf: type=f64 backend=ryu seed=1
36.34ns per trial (1000000 trials) (check 0x2e00419)
# ReleaseSmall
perf: type=f64 backend=ryu seed=1
54.78ns per trial (1000000 trials) (check 0x2e00419)

Size

ReleaseSmall: 13.7Ki -> 4.58Ki
ReleaseFast: 22.6Ki -> 19.2Ki

Using the following sample program and https://github.com/google/bloaty.

const std = @import("std");
const format_float = @import("format_float.zig");

export fn formatFloat(n: [*]u8, len: usize, f: f64) usize {
    const output = format_float.formatFloat(n[0..len], f, .{}) catch return 0;
    return output.len;
}

Master

ReleaseSmall (13.7Ki)

 (ryu64-backend =) $ zig build-obj size.zig -O ReleaseSmall && strip size.o
 (ryu64-backend =) $ bloaty size.o
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  64.9%  8.88Ki  67.8%  8.88Ki    .rodata
  24.9%  3.41Ki  26.0%  3.40Ki    .text
   4.5%     632   4.7%     632    .eh_frame
   3.2%     448   0.0%       0    [ELF Section Headers]
   1.5%     216   1.6%     209    .rodata.str1.1
   0.5%      72   0.0%       0    .shstrtab
   0.5%      64   0.0%       0    [ELF Header]
 100.0%  13.7Ki 100.0%  13.1Ki    TOTAL

ReleaseFast (22.6Ki)

 (ryu64-backend =) $ zig build-obj size.zig -O ReleaseFast && strip size.o
 (ryu64-backend =) $ bloaty size.o
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  56.5%  12.8Ki  58.0%  12.8Ki    .text
  39.2%  8.88Ki  40.3%  8.88Ki    .rodata
   1.9%     448   0.0%       0    [ELF Section Headers]
   0.9%     216   0.9%     209    .rodata.str1.1
   0.8%     192   0.8%     192    .eh_frame
   0.3%      72   0.0%       0    .shstrtab
   0.3%      64   0.0%       0    [ELF Header]
 100.0%  22.6Ki 100.0%  22.1Ki    TOTAL

This PR

ReleaseSmall (4.58Ki)

 (ryu64-backend =) $ zig build-obj size.zig -O ReleaseSmall && strip size.o
 (ryu64-backend =) $ bloaty size.o
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  53.6%  2.45Ki  62.5%  2.45Ki    .text
  16.3%     764  19.0%     764    .rodata
  10.9%     512   0.0%       0    [ELF Section Headers]
   9.7%     456  11.4%     456    .eh_frame
   4.6%     216   5.2%     209    .rodata.str1.1
   1.9%      88   0.0%       0    .shstrtab
   1.6%      76   1.9%      76    .rodata.str4.4
   1.4%      64   0.0%       0    [ELF Header]
 100.0%  4.58Ki 100.0%  3.92Ki    TOTAL

ReleaseFast (19.2Ki)

 (ryu64-backend =) $ zig build-obj size.zig -O ReleaseFast && strip size.o
 (ryu64-backend =) $ bloaty size.o
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  54.4%  10.5Ki  56.1%  10.5Ki    .rodata
  41.0%  7.88Ki  42.3%  7.88Ki    .text
   2.3%     448   0.0%       0    [ELF Section Headers]
   1.1%     216   1.1%     209    .rodata.str1.1
   0.5%     104   0.5%     104    .eh_frame
   0.4%      72   0.0%       0    .shstrtab
   0.3%      64   0.0%       0    [ELF Header]
 100.0%  19.2Ki 100.0%  18.6Ki    TOTAL

Notes

If formatting f64 and f128 in the same program, the two backends will both be in the output binary. I consider this a non-concern.
f32 could be easily added but until someone requests (likely an embedded user) I will omit the tables+path in formatFloat.
Unrelated but the fixed-precision output format differs slightly from upstream. Specifically we pad 0's after shortest to match the requested precision. Upstream ryu prints the complete accurate output. This however requires an enourmous amount of tables so I note the difference and we use the current method. https://github.com/ulfjack/ryu/blob/1264a946ba66eab320e927bfd2362e0c8580c42f/ryu/d2fixed_full_table.h

Closes #19264.

The 64-bit backend supports printing all floats up to 64-bits. The 128-bit continues to be used for larger values. This backend is approximately ~3x faster. Code size is a little smaller in the full table case and much smaller if using the samll tables. The implementation uses the same code-paths, parameterized by a set of tables and their pow5 implementations. We continue to use the same rounding/formatting mechanisms. Initially I explored a separate implementation, as upstream does this and has specific optimizations for these paths but for simplicity we don't. The performance loss is small enough at this point and keeping them combined keeps them in sync. Closes ziglang#19264.

tiehuis · 2024-03-29T23:51:06Z

lib/std/fmt/format_float.zig

    const has_explicit_leading_bit = std.math.floatMantissaBits(T) - std.math.floatFractionalBits(T) != 0;
-    const d = binaryToDecimal(@as(I, @bitCast(v)), std.math.floatMantissaBits(T), std.math.floatExponentBits(T), has_explicit_leading_bit);
+    const d = binaryToDecimal(DT, @as(I, @bitCast(v)), std.math.floatMantissaBits(T), std.math.floatExponentBits(T), has_explicit_leading_bit, tables);


Should probably place the comptime tables as the second argument here instead of last.

andrewrk · 2024-03-30T05:15:34Z

Thanks for the follow up! Great work.

tiehuis force-pushed the ryu64-backend branch from 00a72e4 to 124e188 Compare March 29, 2024 23:08

tiehuis commented Mar 29, 2024

View reviewed changes

andrewrk merged commit aff71c6 into ziglang:master Mar 30, 2024

tiehuis deleted the ryu64-backend branch March 30, 2024 07:24

tiehuis mentioned this pull request Feb 7, 2025

Printing a float with high precision is incorrect #22779

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

implement ryu 64-bit backend #19484

implement ryu 64-bit backend #19484

Uh oh!

tiehuis commented Mar 29, 2024 •

edited

Loading

Uh oh!

tiehuis Mar 29, 2024

Uh oh!

andrewrk commented Mar 30, 2024

Uh oh!

Uh oh!

Uh oh!

implement ryu 64-bit backend #19484

implement ryu 64-bit backend #19484

Uh oh!

Conversation

tiehuis commented Mar 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Master

This PR

Size

Master

ReleaseSmall (13.7Ki)

ReleaseFast (22.6Ki)

This PR

ReleaseSmall (4.58Ki)

ReleaseFast (19.2Ki)

Notes

Uh oh!

tiehuis Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

andrewrk commented Mar 30, 2024

Uh oh!

Uh oh!

tiehuis commented Mar 29, 2024 •

edited

Loading