Skip to content

Add isascii to normalize_identifier for 25% reduction in parse time #228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 25, 2023

Conversation

ndinsmore
Copy link
Contributor

I was trying to use JuliaSyntax is some string benchmarking and noticed a lot of calls to utf8_normalization.

In master isascii is now blazing fast, so I thought that checking if a string was ascii before any UTF8 bs would speed things up. Julia code is mostly ascii

I benchmarked by parsing base/abstractarray.jl which is the largest file in the codebase.

Before pr:

julia> @benchmark t= JuliaSyntax.parseall(JuliaSyntax.SyntaxNode,$f)
BenchmarkTools.Trial: 291 samples with 1 evaluation.
 Range (min  max):  15.019 ms  25.913 ms  ┊ GC (min  max): 0.00%  36.31%
 Time  (median):     16.389 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.184 ms ±  1.963 ms  ┊ GC (mean ± σ):  4.12% ±  7.50%

       ▂▂▂█▂
  ▃▅▆▇▇█████▇▆▅█▄▃▄▁▄▃▄▁▁▂▃▃▁▁▃▁▆▃▃▃▄▁▄▂▃▃▂▃▃▁▁▃▃▁▂▁▁▂▁▁▁▂▁▁▂ ▃
  15 ms           Histogram: frequency by time        23.9 ms <

 Memory estimate: 7.97 MiB, allocs estimate: 171744.

After PR

julia> @benchmark t= JuliaSyntax.parseall(JuliaSyntax.SyntaxNode,$f)
BenchmarkTools.Trial: 373 samples with 1 evaluation.
 Range (min  max):  11.547 ms  32.907 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     12.722 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.422 ms ±  2.064 ms  ┊ GC (mean ± σ):  3.20% ± 6.65%

          ▇█▃
  ▂▂▂▃▅█▇████▆▄▄▄▃▃▄▃▃▂▂▂▃▂▃▃▃▂▄▃▄▂▂▃▂▃▂▃▃▁▁▁▂▂▂▂▁▁▁▁▁▁▁▁▂▁▂▂ ▃
  11.5 ms         Histogram: frequency by time        18.6 ms <

 Memory estimate: 6.48 MiB, allocs estimate: 97785.

@c42f
Copy link
Member

c42f commented Mar 23, 2023

Nice!! I assume you refer to JuliaLang/julia#48568

Does this also help with the older isascii in Julia 1.9?

@ndinsmore
Copy link
Contributor Author

The old isascii could only do about 2 bytes a cycle in <1.10, The new version will do 20+bytes/cycle. The UTF8 code was slow so I am assuming the old version should still offer a good benefit.

@KristofferC
Copy link
Member

KristofferC commented Mar 24, 2023

The old isascii could only do about 2 bytes a cycle in <1.10, The new version will do 20+bytes/cycle.

Most identifiers are quite short though, and the chunk size in that PR is 1024, so I am not sure you will get out the full speed from it on your average Julia identifier. Would be interesting to benchmark.

@ndinsmore
Copy link
Contributor Author

I had meant to write on larger strings, none the less it does still speed up things by ~25%. We could likely craft a isascii that is tailored for smaller strings.

I should note that the benchmarking was also done with JuliaLang/julia#48887 applied. But that shouldn't effect the back to back.

The thing is that when I profiled this utf8proc_decompose was taking up the majority of the TreeNode processing time, now it doesn't even show up in the profile.

@KristofferC
Copy link
Member

KristofferC commented Mar 24, 2023

I totally agree that this thing is the right thing to do. I am just not sure the performance win is due to the improvements of isascii on master. There might be the same win in 1.9 for example.

@c42f
Copy link
Member

c42f commented Mar 25, 2023

I agree this probably works and is beneficial on 1.9. I was just hoping you'd go ahead and do the benchmark for me ;-)

I ask because JuliaSyntax isn't just for Julia 1.10+. It's tested and works all the way back to julia 1.0 as needed to support use in the VSCode tooling.

Anyway, I checked and it's also a big improvement on 1.9:

julia> f = read(joinpath(Sys.BINDIR, Base.DATAROOTDIR, "julia", "base", "abstractarray.jl"), String);

# before
julia> @benchmark JuliaSyntax.parseall(JuliaSyntax.SyntaxNode,$f)
BenchmarkTools.Trial: 360 samples with 1 evaluation.
 Range (min … max):  12.632 ms … 22.863 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     12.998 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.906 ms ±  1.894 ms  ┊ GC (mean ± σ):  5.84% ± 9.83%

  ▅██▇▅▂   ▂                                                   
  ██████▇█▇███▄▄▁▁▁▄▁▁▁▁▁▁▄▁▁▁▁▁▄▇▇▇▇▇▇▇█▆▆▄▆▇█▆▁▆█▁▁▆▆▄▄▁▁▁▄ ▇
  12.6 ms      Histogram: log(frequency) by time      19.6 ms <

 Memory estimate: 8.03 MiB, allocs estimate: 172102.

# after
julia> @benchmark JuliaSyntax.parseall(JuliaSyntax.SyntaxNode,$f)
BenchmarkTools.Trial: 496 samples with 1 evaluation.
 Range (min … max):   9.157 ms … 15.936 ms  ┊ GC (min … max): 0.00% … 35.99%
 Time  (median):      9.444 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.075 ms ±  1.461 ms  ┊ GC (mean ± σ):  5.73% ± 10.30%

  ▄▇█▇▅▃▁                                                      
  ███████▇▇█▅▇▅▅▇▅▄▁▁▄▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆█▅▇██▆▇▇▁▇▆▁▇▇▆▄▆▄▅▄ ▇
  9.16 ms      Histogram: log(frequency) by time      14.4 ms <

 Memory estimate: 6.53 MiB, allocs estimate: 98538.

@codecov
Copy link

codecov bot commented Mar 25, 2023

Codecov Report

Merging #228 (320738a) into main (56c33b2) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #228   +/-   ##
=======================================
  Coverage   96.31%   96.31%           
=======================================
  Files          15       15           
  Lines        3913     3913           
=======================================
  Hits         3769     3769           
  Misses        144      144           
Impacted Files Coverage Δ
src/literal_parsing.jl 97.87% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@c42f c42f merged commit 6b947db into JuliaLang:main Mar 25, 2023
@PallHaraldsson
Copy link

PallHaraldsson commented Apr 10, 2023

Do you know if the same improvement (automatically) applies in the default Julia parser? Can (and should) it otherwise be added there easily, until yours becomes the default, and maybe even be backported?

@c42f
Copy link
Member

c42f commented Apr 11, 2023

Julia's reference parser already uses an equivalent optimization internally (see the C code in - fl_accum_julia_symbol)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants