Skip to content

implement ryu 64-bit backend #19484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 30, 2024
Merged

implement ryu 64-bit backend #19484

merged 1 commit into from
Mar 30, 2024

Conversation

tiehuis
Copy link
Member

@tiehuis tiehuis commented Mar 29, 2024

The 64-bit backend supports printing all floats up to 64-bits. The 128-bit continues to be used for larger values.

This implementation uses the same code-paths as the 128-bit, parameterised by a new table structure.

I have fuzzed the 128-bit backend against the 64-bit backend and found no differences in output (shortest mode) for all f16, f32, and ~1 trillion f64. Behaviour is expected to be identical.

Performance

~3x faster than the 128-bit backend. ReleaseSmall notably is ~7x faster.

Master

# ReleaseFast
perf: type=f64 backend=std seed=1
112.36ns per trial (1000000 trials) (check 0x2e00419)
# ReleaseSmall
perf: type=f64 backend=std seed=1
346.16ns per trial (1000000 trials) (check 0x2e00419)

This PR

# ReleaseFast
perf: type=f64 backend=ryu seed=1
36.34ns per trial (1000000 trials) (check 0x2e00419)
# ReleaseSmall
perf: type=f64 backend=ryu seed=1
54.78ns per trial (1000000 trials) (check 0x2e00419)

Size

ReleaseSmall: 13.7Ki -> 4.58Ki
ReleaseFast: 22.6Ki -> 19.2Ki

Using the following sample program and https://github.com/google/bloaty.

const std = @import("std");
const format_float = @import("format_float.zig");

export fn formatFloat(n: [*]u8, len: usize, f: f64) usize {
    const output = format_float.formatFloat(n[0..len], f, .{}) catch return 0;
    return output.len;
}

Master

ReleaseSmall (13.7Ki)

 (ryu64-backend =) $ zig build-obj size.zig -O ReleaseSmall && strip size.o
 (ryu64-backend =) $ bloaty size.o
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  64.9%  8.88Ki  67.8%  8.88Ki    .rodata
  24.9%  3.41Ki  26.0%  3.40Ki    .text
   4.5%     632   4.7%     632    .eh_frame
   3.2%     448   0.0%       0    [ELF Section Headers]
   1.5%     216   1.6%     209    .rodata.str1.1
   0.5%      72   0.0%       0    .shstrtab
   0.5%      64   0.0%       0    [ELF Header]
 100.0%  13.7Ki 100.0%  13.1Ki    TOTAL

ReleaseFast (22.6Ki)

 (ryu64-backend =) $ zig build-obj size.zig -O ReleaseFast && strip size.o
 (ryu64-backend =) $ bloaty size.o
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  56.5%  12.8Ki  58.0%  12.8Ki    .text
  39.2%  8.88Ki  40.3%  8.88Ki    .rodata
   1.9%     448   0.0%       0    [ELF Section Headers]
   0.9%     216   0.9%     209    .rodata.str1.1
   0.8%     192   0.8%     192    .eh_frame
   0.3%      72   0.0%       0    .shstrtab
   0.3%      64   0.0%       0    [ELF Header]
 100.0%  22.6Ki 100.0%  22.1Ki    TOTAL

This PR

ReleaseSmall (4.58Ki)

 (ryu64-backend =) $ zig build-obj size.zig -O ReleaseSmall && strip size.o
 (ryu64-backend =) $ bloaty size.o
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  53.6%  2.45Ki  62.5%  2.45Ki    .text
  16.3%     764  19.0%     764    .rodata
  10.9%     512   0.0%       0    [ELF Section Headers]
   9.7%     456  11.4%     456    .eh_frame
   4.6%     216   5.2%     209    .rodata.str1.1
   1.9%      88   0.0%       0    .shstrtab
   1.6%      76   1.9%      76    .rodata.str4.4
   1.4%      64   0.0%       0    [ELF Header]
 100.0%  4.58Ki 100.0%  3.92Ki    TOTAL

ReleaseFast (19.2Ki)

 (ryu64-backend =) $ zig build-obj size.zig -O ReleaseFast && strip size.o
 (ryu64-backend =) $ bloaty size.o
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  54.4%  10.5Ki  56.1%  10.5Ki    .rodata
  41.0%  7.88Ki  42.3%  7.88Ki    .text
   2.3%     448   0.0%       0    [ELF Section Headers]
   1.1%     216   1.1%     209    .rodata.str1.1
   0.5%     104   0.5%     104    .eh_frame
   0.4%      72   0.0%       0    .shstrtab
   0.3%      64   0.0%       0    [ELF Header]
 100.0%  19.2Ki 100.0%  18.6Ki    TOTAL

Notes

  • If formatting f64 and f128 in the same program, the two backends will both be in the output binary. I consider this a non-concern.
  • f32 could be easily added but until someone requests (likely an embedded user) I will omit the tables+path in formatFloat.
  • Unrelated but the fixed-precision output format differs slightly from upstream. Specifically we pad 0's after shortest to match the requested precision. Upstream ryu prints the complete accurate output. This however requires an enourmous amount of tables so I note the difference and we use the current method. https://github.com/ulfjack/ryu/blob/1264a946ba66eab320e927bfd2362e0c8580c42f/ryu/d2fixed_full_table.h

Closes #19264.

The 64-bit backend supports printing all floats up to 64-bits. The
128-bit continues to be used for larger values.

This backend is approximately ~3x faster. Code size is a little smaller
in the full table case and much smaller if using the samll tables.

The implementation uses the same code-paths, parameterized by a set of
tables and their pow5 implementations. We continue to use the same
rounding/formatting mechanisms. Initially I explored a separate
implementation, as upstream does this and has specific optimizations for
these paths but for simplicity we don't. The performance loss is small
enough at this point and keeping them combined keeps them in sync.

Closes ziglang#19264.
const has_explicit_leading_bit = std.math.floatMantissaBits(T) - std.math.floatFractionalBits(T) != 0;
const d = binaryToDecimal(@as(I, @bitCast(v)), std.math.floatMantissaBits(T), std.math.floatExponentBits(T), has_explicit_leading_bit);
const d = binaryToDecimal(DT, @as(I, @bitCast(v)), std.math.floatMantissaBits(T), std.math.floatExponentBits(T), has_explicit_leading_bit, tables);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably place the comptime tables as the second argument here instead of last.

@andrewrk andrewrk merged commit aff71c6 into ziglang:master Mar 30, 2024
@andrewrk
Copy link
Member

Thanks for the follow up! Great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

std.fmt.formatFloat: implement 32-bit and 64-bit ryu backends
2 participants