Simplified number regexes #5430

STRd6 · 2022-10-15T15:27:53Z

#5429 (comment)

I updated the SIMPLENUM regex in src/nodes.coffee and also the decimal parts of the number regex in lexer.coffee.

Also added a fix for numeric separators in decimal big int literals while I was in there.

GeoffreyBooth

LGTM, @edemaine and @Inve1951 ?

edemaine · 2022-10-16T19:00:38Z

Looks good, and nice catch on the BigInt decimals.

I still prefer my regex \d+(?:_\d+)* from #5429 (comment) but this one is OK.

I wrote a small benchmark that confirms my intuition that my regex is a bit more performant, especially for larger strings.

GeoffreyBooth · 2022-10-16T23:43:20Z

Does \d+(?:_\d+)* pass all the tests? If so, is there any reason not to use it instead?

STRd6 · 2022-10-17T03:17:30Z

@GeoffreyBooth that regex works for most places but causes catastrophic backtracking when added to BigInt. I think since the n comes after it keeps trying to match on it with every different combination of number lengths. The version without the + inside the * avoids this.

@edemaine https://jsbench.me/j7l9c2s0zl/1 it seems to only be faster for the SIMPLENUM test so maybe I'll update again with just that.

edemaine · 2022-10-17T13:57:39Z

Fascinating! That n at the end makes a huge difference, relative to the regexes that don't have a trailing n. But I'm not sure it's the +s that are causing backtracking... it's the n itself.

I benchmarked among /^\d(?:_?\d)*n/, /^\d+(?:_\d+)*n/, and /^\d+(?:_?\d)*n/ and found that /^\d+(?:_\d+)*n/ (my original proposal) is fastest. Or did I miss something? Sadly, it's about 3x slower than /^\d+n/ that this PR replaces, but that's the price of supporting underscores.

(I did double-check the numeric separator spec to confirm that double underscore isn't allowed, and these are indeed the correct regexes.)

Inve1951 · 2022-10-24T02:33:56Z

From my experience catastrophic backtracking can be avoided with a negative lookahead (?!n). It causes the (dumb) regex engine to return early.
But I'm having a hard time following what the exact issue is here.

STRd6 · 2022-10-24T02:52:36Z

From my testing changing the regex here: https://github.com/jashkenas/coffeescript/pull/5430/files#diff-b1724eb7967924ebd01486bf963f8281f5fff1acdf6f9354124c9ee50f0addcdR1313 caused the tests to never finish (I gave up after a few minutes when usually they only take six seconds). Changing the regexps on lines 1314 and 1315 seemed fine though. Please test it out on your local and let me know what you find in case I may be gravely mistaken.

Inve1951 · 2022-10-24T03:13:19Z

Oh I see. It should probably be non-greedy:

- \d(?:_?\d)*n
+ \d(?:_?\d)*?n

The difference here is that the first one starts testing from the end of the input, the second one from the start.
This could probably be changed in more places to gain some performance. (At the cost of readability)

STRd6 · 2022-10-24T17:57:57Z

I tried some more testing but non-greedy didn't seem to help with the BigInt backtracking

changing to \d+(?:_?\d+)*?n still took forever on my local machine.

I'm happy with this as it is currently stands since it is close to the minimal amount of change and the performance seems in line with the existing compiler.

@GeoffreyBooth

edemaine · 2022-10-24T20:27:45Z

@STRd6 I confirmed that changing that line to use \d+(?:_?\d+)*n makes cake test very slow (I terminated after a minute; my guess is it's quadratic time).

But my suggestion was \d+(?:_\d+)*n — note lack of ? after _. Then the benchmark runs slightly faster for me (~18 seconds instead of ~19 seconds, on an old CPU running Node 18.2). \d+(?:_\d+)*?n seemed to perform about the same as with greedy *.

STRd6 · 2022-10-24T21:19:28Z

@edemaine Thanks for clarifying, I've updated the regexes to match and I think this is good for now.

GeoffreyBooth · 2022-10-25T00:25:21Z

Okay, so can everyone on this thread please confirm that the regex in this PR is the one we want to go with? 👍 or 👎

simplified number regexes

41afdba

STRd6 force-pushed the number-regexp-improve branch from 7278b11 to 41afdba Compare October 15, 2022 15:38

fix numeric separator in decimal bigint

b5d547a

GeoffreyBooth approved these changes Oct 15, 2022

View reviewed changes

fancier regex

56365c2

switched regex based on alternative benchmark

8ab6b0f

STRd6 added 2 commits October 24, 2022 14:11

regex tune up

e1ef679

updated SIMPLENUM regex

9912fa8

GeoffreyBooth merged commit 98a2331 into jashkenas:main Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplified number regexes #5430

Simplified number regexes #5430

STRd6 commented Oct 15, 2022 •

edited

Loading

Uh oh!

GeoffreyBooth left a comment

Uh oh!

edemaine commented Oct 16, 2022

Uh oh!

GeoffreyBooth commented Oct 16, 2022

Uh oh!

STRd6 commented Oct 17, 2022 •

edited

Loading

Uh oh!

edemaine commented Oct 17, 2022 •

edited

Loading

Uh oh!

Inve1951 commented Oct 24, 2022

Uh oh!

STRd6 commented Oct 24, 2022

Uh oh!

Inve1951 commented Oct 24, 2022

Uh oh!

STRd6 commented Oct 24, 2022

Uh oh!

edemaine commented Oct 24, 2022

Uh oh!

STRd6 commented Oct 24, 2022

Uh oh!

GeoffreyBooth commented Oct 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Simplified number regexes #5430

Simplified number regexes #5430

Conversation

STRd6 commented Oct 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GeoffreyBooth left a comment

Choose a reason for hiding this comment

Uh oh!

edemaine commented Oct 16, 2022

Uh oh!

GeoffreyBooth commented Oct 16, 2022

Uh oh!

STRd6 commented Oct 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edemaine commented Oct 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Inve1951 commented Oct 24, 2022

Uh oh!

STRd6 commented Oct 24, 2022

Uh oh!

Inve1951 commented Oct 24, 2022

Uh oh!

STRd6 commented Oct 24, 2022

Uh oh!

edemaine commented Oct 24, 2022

Uh oh!

STRd6 commented Oct 24, 2022

Uh oh!

GeoffreyBooth commented Oct 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

STRd6 commented Oct 15, 2022 •

edited

Loading

STRd6 commented Oct 17, 2022 •

edited

Loading

edemaine commented Oct 17, 2022 •

edited

Loading