Skip to content

Commit fc0c868

Browse files
committed
Implement the big-integer arithmetic algorithm.
Replaces the existing decimal implementation, for substantial performance improvements with near-halfway cases. This is especially fast with a large number of digits. **Big Integer Implementation** A small subset of big-integer arithmetic has been added, with the `bigint` struct. It uses a stack-allocated vector with enough bits to store the float with the large number of significant digits. This is log2(10^(769 + 342)), to account for the largest possible magnitude exponent, and number of digits (3600 bits), and then rounded up to 4k bits. The limb size is determined by the architecture: most 64-bit architectures have efficient 128-bit multiplication, either by a single hardware instruction or 2 native multiplications for the high and low bits. This includes x86_64, mips64, s390x, aarch64, powerpc64, riscv64, and the only known exception is sparcv8 and sparcv9. Therefore, we define a limb size of 64-bits on 64-bit architectures except SPARC, otherwise we fallback to 32-bit limbs. A simple stackvector is used, which just has operations to add elements, index, and truncate the vector. `bigint` is then just a wrapper around this, with methods for big-integer arithmetic. For our algorithms, we just need multiplication by a power (x * b^N), multiplication by a bigint or scalar value, and addition by a bigint or scalar value. Scalar addition and multiplication uses compiler extensions when possible (__builtin_add_overflow and __uint128_t), if not, then we implement simple logic shown to optimize well on MSVC. Big-integer multiplication is done via grade school multiplication, which is more efficient than any asymptotically faster algorithms. Multiplication by a power is then done via bitshifts for powers-of-two, and by iterative multiplications of a large and then scalar value for powers-of-5. **compute_float** Compute float has been slightly modified so if the algorithm cannot round correctly, it returns a normalized, extended-precision adjusted mantissa with the power2 shifted by INT16_MIN so the exponent is always negative. `compute_error` and `compute_error_scaled` have been added. **Digit Optimiations** To improve performance for numbers with many digits, `parse_eight_digits_unrolled` is used for both integers and fractions, and uses a while loop than two nested if statements. This adds no noticeable performance cost for common floats, but dramatically improves performance for numbers with large digits (without these optimizations, ~65% of the total runtime cost is in parse_number_string). **Parsed Number** Two fields have been added to `parsed_number_string`, which contains a slice of the integer and fraction digits. This is extremely cheap, since the work is already done, and the strings are pre-tokenized during parsing. This allows us on overflow to re-parse these tokenized strings, without checking if each character is an integer. Likewise, for the big-integer algorithms, we can merely re-parse the pre-tokenized strings. **Slow Algorithm** The new algorithm is `digit_comp`, which takes the parsed number string and the `adjusted_mantissa` from `compute_float`. The significant digits are parsed into a big integer, and the exponent relative to the significant digits is calculated. If the exponent is >= 0, we use `positive_digit_comp`, otherwise, we use `negative_digit_comp`. `positive_digit_comp` is quite simple: we scale the significant digits to the exponent, and then we get the high 64-bits for the native float, determine if any lower bits were truncated, and use that to direct rounding. `negative_digit_comp` is a little more complex, but also quite trivial: we use the parsed significant digits as the real digits, and calculate the theoretical digits from `b+h`, the halfway point between `b` and `b+u`, the next-positive float. To get `b`, we round the adjusted mantissa down, create an extended-precision representation, and calculate the halfway point. We now have a base-10 exponent for the real digits, and a base-2 exponent for the theoretical digits. We scale these two to the same exponent by multiplying the theoretixal digits by `5**-real_exp`. We then get the base-2 exponent as `theor_exp - real_exp`, and if this is positive, we multipy the theoretical digits by it, otherwise, we multiply the real digits by it. Now, both are scaled to the same magnitude, and we simply compare the digits in the big integer, and use that to direct rounding. **Rust-Isms** A few Rust-isms have been added, since it simplifies logic assertions. These can be trivially removed or reworked, as needed. - a `slice` type has been added, which is a pointer and length. - `FASTFLOAT_ASSERT`, `FASTFLOAT_DEBUG_ASSERT`, and `FASTFLOAT_TRY` have been added - `FASTFLOAT_ASSERT` aborts, even in release builds, if the condition fails. - `FASTFLOAT_DEBUG_ASSERT` defaults to `assert`, for logic errors. - `FASTFLOAT_TRY` is like a Rust `Option` type, which propagates errors. Specifically, `FASTFLOAT_TRY` is useful in combination with `FASTFLOAT_ASSERT` to ensure there are no memory corruption errors possible in the big-integer arithmetic. Although the `bigint` type ensures we have enough storage for all valid floats, memory issues are quite a severe class of vulnerabilities, and due to the low performance cost of checks, we abort if we would have out-of-bounds writes. This can only occur when we are adding items to the vector, which is a very small number of steps. Therefore, we abort if our memory safety guarantees ever fail. lexical has never aborted, so it's unlikely we will ever fail these guarantees.
1 parent 25b240a commit fc0c868

File tree

8 files changed

+1140
-189
lines changed

8 files changed

+1140
-189
lines changed

include/fast_float/ascii_number.h

Lines changed: 32 additions & 123 deletions
Original file line numberDiff line numberDiff line change
@@ -91,16 +91,20 @@ CXX20_CONSTEXPR fastfloat_really_inline bool is_made_of_eight_digits_fast(const
9191
return is_made_of_eight_digits_fast(read_u64(chars));
9292
}
9393

94+
typedef span<const char> byte_span;
95+
9496
struct parsed_number_string {
95-
int64_t exponent;
96-
uint64_t mantissa;
97-
const char *lastmatch;
98-
bool negative;
99-
bool valid;
100-
bool too_many_digits;
97+
int64_t exponent{0};
98+
uint64_t mantissa{0};
99+
const char *lastmatch{nullptr};
100+
bool negative{false};
101+
bool valid{false};
102+
bool too_many_digits{false};
103+
// contains the range of the significant digits
104+
byte_span integer{}; // non-nullable
105+
byte_span fraction{}; // nullable
101106
};
102107

103-
104108
// Assuming that you use no more than 19 digits, this will
105109
// parse an ASCII string.
106110
CXX20_CONSTEXPR fastfloat_really_inline
@@ -125,6 +129,10 @@ parsed_number_string parse_number_string(const char *p, const char *pend, parse_
125129

126130
uint64_t i = 0; // an unsigned int avoids signed overflows (which are bad)
127131

132+
while ((std::distance(p, pend) >= 8) && is_made_of_eight_digits_fast(p)) {
133+
i = i * 100000000 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok
134+
p += 8;
135+
}
128136
while ((p != pend) && is_integer(*p)) {
129137
// a multiplication by 10 is cheaper than an arbitrary integer
130138
// multiplication
@@ -134,24 +142,24 @@ parsed_number_string parse_number_string(const char *p, const char *pend, parse_
134142
}
135143
const char *const end_of_integer_part = p;
136144
int64_t digit_count = int64_t(end_of_integer_part - start_digits);
145+
answer.integer = byte_span(start_digits, size_t(digit_count));
137146
int64_t exponent = 0;
138147
if ((p != pend) && (*p == decimal_point)) {
139148
++p;
140-
// Fast approach only tested under little endian systems
141-
if ((std::distance(p, pend) >= 8) && is_made_of_eight_digits_fast(p)) {
142-
i = i * 100000000 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok
143-
p += 8;
144-
if ((std::distance(p, pend) >= 8) && is_made_of_eight_digits_fast(p)) {
149+
const char* before = p;
150+
// can occur at most twice without overflowing, but let it occur more, since
151+
// for integers with many digits, digit parsing is the primary bottleneck.
152+
while ((std::distance(p, pend) >= 8) && is_made_of_eight_digits_fast(p)) {
145153
i = i * 100000000 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok
146154
p += 8;
147155
}
148-
}
149156
while ((p != pend) && is_integer(*p)) {
150157
uint8_t digit = uint8_t(*p - '0');
151158
++p;
152159
i = i * 10 + digit; // in rare cases, this will overflow, but that's ok
153160
}
154-
exponent = end_of_integer_part + 1 - p;
161+
exponent = before - p;
162+
answer.fraction = byte_span(before, size_t(p - before));
155163
digit_count -= exponent;
156164
}
157165
// we must have encountered at least one integer!
@@ -179,7 +187,7 @@ parsed_number_string parse_number_string(const char *p, const char *pend, parse_
179187
} else {
180188
while ((p != pend) && is_integer(*p)) {
181189
uint8_t digit = uint8_t(*p - '0');
182-
if (exp_number < 0x10000) {
190+
if (exp_number < 0x10000000) {
183191
exp_number = 10 * exp_number + digit;
184192
}
185193
++p;
@@ -212,23 +220,26 @@ parsed_number_string parse_number_string(const char *p, const char *pend, parse_
212220
if (digit_count > 19) {
213221
answer.too_many_digits = true;
214222
// Let us start again, this time, avoiding overflows.
223+
// We don't need to check if is_integer, since we use the
224+
// pre-tokenized spans from above.
215225
i = 0;
216-
p = start_digits;
226+
p = answer.integer.ptr;
227+
const char* int_end = p + answer.integer.len();
217228
const uint64_t minimal_nineteen_digit_integer{1000000000000000000};
218-
while((i < minimal_nineteen_digit_integer) && (p != pend) && is_integer(*p)) {
229+
while((i < minimal_nineteen_digit_integer) && (p != int_end)) {
219230
i = i * 10 + uint64_t(*p - '0');
220231
++p;
221232
}
222233
if (i >= minimal_nineteen_digit_integer) { // We have a big integers
223234
exponent = end_of_integer_part - p + exp_number;
224235
} else { // We have a value with a fractional component.
225-
p++; // skip the dot
226-
const char *first_after_period = p;
227-
while((i < minimal_nineteen_digit_integer) && (p != pend) && is_integer(*p)) {
236+
p = answer.fraction.ptr;
237+
const char* frac_end = p + answer.fraction.len();
238+
while((i < minimal_nineteen_digit_integer) && (p != frac_end)) {
228239
i = i * 10 + uint64_t(*p - '0');
229240
++p;
230241
}
231-
exponent = first_after_period - p + exp_number;
242+
exponent = answer.fraction.ptr - p + exp_number;
232243
}
233244
// We have now corrected both exponent and i, to a truncated value
234245
}
@@ -238,108 +249,6 @@ parsed_number_string parse_number_string(const char *p, const char *pend, parse_
238249
return answer;
239250
}
240251

241-
242-
// This should always succeed since it follows a call to parse_number_string
243-
// This function could be optimized. In particular, we could stop after 19 digits
244-
// and try to bail out. Furthermore, we should be able to recover the computed
245-
// exponent from the pass in parse_number_string.
246-
CXX20_CONSTEXPR fastfloat_really_inline decimal parse_decimal(const char *p, const char *pend, parse_options options) noexcept {
247-
const char decimal_point = options.decimal_point;
248-
249-
decimal answer;
250-
answer.num_digits = 0;
251-
answer.decimal_point = 0;
252-
answer.truncated = false;
253-
answer.negative = (*p == '-');
254-
if (*p == '-') { // C++17 20.19.3.(7.1) explicitly forbids '+' sign here
255-
++p;
256-
}
257-
// skip leading zeroes
258-
while ((p != pend) && (*p == '0')) {
259-
++p;
260-
}
261-
while ((p != pend) && is_integer(*p)) {
262-
if (answer.num_digits < max_digits) {
263-
answer.digits[answer.num_digits] = uint8_t(*p - '0');
264-
}
265-
answer.num_digits++;
266-
++p;
267-
}
268-
if ((p != pend) && (*p == decimal_point)) {
269-
++p;
270-
const char *first_after_period = p;
271-
// if we have not yet encountered a zero, we have to skip it as well
272-
if(answer.num_digits == 0) {
273-
// skip zeros
274-
while ((p != pend) && (*p == '0')) {
275-
++p;
276-
}
277-
}
278-
// We expect that this loop will often take the bulk of the running time
279-
// because when a value has lots of digits, these digits often
280-
while ((std::distance(p, pend) >= 8) && (answer.num_digits + 8 < max_digits)) {
281-
uint64_t val = read_u64(p);
282-
if(! is_made_of_eight_digits_fast(val)) { break; }
283-
// We have eight digits, process them in one go!
284-
val -= 0x3030303030303030;
285-
write_u64(answer.digits + answer.num_digits, val);
286-
answer.num_digits += 8;
287-
p += 8;
288-
}
289-
while ((p != pend) && is_integer(*p)) {
290-
if (answer.num_digits < max_digits) {
291-
answer.digits[answer.num_digits] = uint8_t(*p - '0');
292-
}
293-
answer.num_digits++;
294-
++p;
295-
}
296-
answer.decimal_point = int32_t(first_after_period - p);
297-
}
298-
// We want num_digits to be the number of significant digits, excluding
299-
// leading *and* trailing zeros! Otherwise the truncated flag later is
300-
// going to be misleading.
301-
if(answer.num_digits > 0) {
302-
// We potentially need the answer.num_digits > 0 guard because we
303-
// prune leading zeros. So with answer.num_digits > 0, we know that
304-
// we have at least one non-zero digit.
305-
const char *preverse = p - 1;
306-
int32_t trailing_zeros = 0;
307-
while ((*preverse == '0') || (*preverse == decimal_point)) {
308-
if(*preverse == '0') { trailing_zeros++; };
309-
--preverse;
310-
}
311-
answer.decimal_point += int32_t(answer.num_digits);
312-
answer.num_digits -= uint32_t(trailing_zeros);
313-
}
314-
if(answer.num_digits > max_digits) {
315-
answer.truncated = true;
316-
answer.num_digits = max_digits;
317-
}
318-
if ((p != pend) && (('e' == *p) || ('E' == *p))) {
319-
++p;
320-
bool neg_exp = false;
321-
if ((p != pend) && ('-' == *p)) {
322-
neg_exp = true;
323-
++p;
324-
} else if ((p != pend) && ('+' == *p)) { // '+' on exponent is allowed by C++17 20.19.3.(7.1)
325-
++p;
326-
}
327-
int32_t exp_number = 0; // exponential part
328-
while ((p != pend) && is_integer(*p)) {
329-
uint8_t digit = uint8_t(*p - '0');
330-
if (exp_number < 0x10000) {
331-
exp_number = 10 * exp_number + digit;
332-
}
333-
++p;
334-
}
335-
answer.decimal_point += (neg_exp ? -exp_number : exp_number);
336-
}
337-
// In very rare cases, we may have fewer than 19 digits, we want to be able to reliably
338-
// assume that all digits up to max_digit_without_overflow have been initialized.
339-
for(uint32_t i = answer.num_digits; i < max_digit_without_overflow; i++) { answer.digits[i] = 0; }
340-
341-
return answer;
342-
}
343252
} // namespace fast_float
344253

345254
#endif

0 commit comments

Comments
 (0)