Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Path#dirname and Path#extension #11001

Merged
merged 7 commits into from
Sep 30, 2021

Conversation

BlobCodes
Copy link
Contributor

@BlobCodes BlobCodes commented Jul 24, 2021

This PR improves the performance of the Path#dirname and Path#extension methods.
The new methods are also smaller.

#dirname Benchmarks

Benchmark: Path["😎"].dirname
old  56.18M ( 17.80ns) (± 1.34%)  0.0B/op   1.30× slower
new  73.21M ( 13.66ns) (± 0.75%)  0.0B/op        fastest
===
Benchmark: Path["C://"].dirname
old  28.56M ( 35.01ns) (± 0.16%)  0.0B/op   2.44× slower
new  69.58M ( 14.37ns) (± 0.44%)  0.0B/op        fastest
===
Benchmark: Path["C://Users/"].dirname
old  12.08M ( 82.77ns) (± 0.53%)  16.0B/op   2.39× slower
new  28.90M ( 34.60ns) (± 1.97%)  16.0B/op        fastest
===
Benchmark: Path["i/🇱 🇮 🇰 🇪/😎.uni🇨ode"].dirname
old   8.12M (123.19ns) (± 0.83%)  48.0B/op   1.44× slower
new  11.71M ( 85.39ns) (± 1.40%)  48.0B/op        fastest
===
Benchmark: Path["trailing/sepe.rators///"].dirname
old   7.24M (138.13ns) (± 1.02%)  32.0B/op   2.65× slower
new  19.16M ( 52.20ns) (± 1.92%)  32.0B/op        fastest
===
Benchmark: Path["/hidden/.file"].dirname
old  13.62M ( 73.43ns) (± 1.11%)  32.0B/op   2.00× slower
new  27.25M ( 36.70ns) (± 3.91%)  32.0B/op        fastest
===
Benchmark: Path["/mnt"].dirname
old  17.63M ( 56.72ns) (± 1.07%)  48.0B/op   2.40× slower
new  42.32M ( 23.63ns) (± 1.93%)  16.0B/op        fastest
===
Benchmark: Path["."].dirname
old  75.72M ( 13.21ns) (± 0.92%)  0.0B/op   1.27× slower
new  95.96M ( 10.42ns) (± 0.19%)  0.0B/op        fastest
===
Benchmark: Path["file"].dirname
old  29.21M ( 34.23ns) (± 0.36%)  0.0B/op   2.52× slower
new  73.73M ( 13.56ns) (± 1.22%)  0.0B/op        fastest
===
Benchmark: Path["very/long/path_with_many/different_parts_to/test_the_performance_of_the_methods"].dirname
old   3.14M (318.56ns) (± 1.21%)  64.0B/op   2.61× slower
new   8.19M (122.08ns) (± 3.27%)  64.0B/op        fastest
===


Windows:
Benchmark: Path.windows("😎").dirname
old  52.28M ( 19.13ns) (± 0.71%)  0.0B/op   1.15× slower
new  60.33M ( 16.58ns) (± 1.11%)  0.0B/op        fastest
===
Benchmark: Path.windows("C://").dirname
old  11.57M ( 86.46ns) (± 1.76%)  64.0B/op   1.26× slower
new  14.60M ( 68.47ns) (± 1.59%)  64.0B/op        fastest
===
Benchmark: Path.windows("C://Users/").dirname
old   7.70M (129.83ns) (± 2.36%)  64.0B/op        fastest
new   7.70M (129.85ns) (± 4.81%)   128B/op   1.00× slower
===
Benchmark: Path.windows("i/🇱 🇮 🇰 🇪/😎.uni🇨ode").dirname
old   7.94M (125.92ns) (± 2.09%)  48.0B/op   1.93× slower
new  15.30M ( 65.38ns) (± 1.56%)  48.0B/op        fastest
===
Benchmark: Path.windows("trailing/sepe.rators///").dirname
old   7.03M (142.30ns) (± 1.61%)  32.0B/op   2.60× slower
new  18.24M ( 54.83ns) (± 2.03%)  32.0B/op        fastest
===
Benchmark: Path.windows("/hidden/.file").dirname
old  13.31M ( 75.13ns) (± 0.97%)  32.0B/op   1.93× slower
new  25.69M ( 38.92ns) (± 2.83%)  32.0B/op        fastest
===
Benchmark: Path.windows("/mnt").dirname
old  17.35M ( 57.65ns) (± 0.74%)  48.0B/op   2.22× slower
new  38.44M ( 26.02ns) (± 2.35%)  16.0B/op        fastest
===
Benchmark: Path.windows(".").dirname
old  71.74M ( 13.94ns) (± 0.41%)  0.0B/op   1.07× slower
new  77.12M ( 12.97ns) (± 0.69%)  0.0B/op        fastest
===
Benchmark: Path.windows("file").dirname
old  27.96M ( 35.76ns) (± 0.05%)  0.0B/op   2.16× slower
new  60.28M ( 16.59ns) (± 0.63%)  0.0B/op        fastest
===
Benchmark: Path.windows("very/long/path_with_many/different_parts_to/test_the_performance_of_the_methods").dirname
old   3.06M (327.05ns) (± 1.38%)  64.0B/op   2.68× slower
new   8.21M (121.86ns) (± 1.22%)  64.0B/op        fastest
===

#extension Benchmarks

Benchmark: Path["test.ext/foo"].extension
old  57.58M ( 17.37ns) (± 2.22%)  0.0B/op   1.06× slower
new  61.14M ( 16.36ns) (± 0.53%)  0.0B/op        fastest
===
Benchmark: Path["my_test_dir/image.jpeg"].extension
old  22.38M ( 44.69ns) (± 1.20%)  32.0B/op   1.10× slower
new  24.72M ( 40.46ns) (± 1.18%)  32.0B/op        fastest
===
Benchmark: Path["my_test_dir//image.jpeg//"].extension
old  20.84M ( 47.99ns) (± 1.47%)  32.0B/op   1.11× slower
new  23.12M ( 43.26ns) (± 1.16%)  32.0B/op        fastest
===
Benchmark: Path["test"].extension
old  82.85M ( 12.07ns) (± 0.86%)  0.0B/op   1.44× slower
new 119.28M (  8.38ns) (± 1.74%)  0.0B/op        fastest
===
Benchmark: Path["path_with/multiple...dots"].extension
old  21.50M ( 46.51ns) (± 0.77%)  32.0B/op   1.11× slower
new  23.93M ( 41.79ns) (± 1.23%)  32.0B/op        fastest
===
Benchmark: Path[".dotfile"].extension
old  51.82M ( 19.30ns) (± 1.92%)  0.0B/op   1.42× slower
new  73.41M ( 13.62ns) (± 0.91%)  0.0B/op        fastest
===
Benchmark: Path["path/to/.dotfile"].extension
old  41.21M ( 24.26ns) (± 3.96%)  0.0B/op   1.28× slower
new  52.60M ( 19.01ns) (± 2.82%)  0.0B/op        fastest
===
Benchmark: Path["path/archive.tar.gz"].extension
old  23.93M ( 41.78ns) (± 1.40%)  32.0B/op   1.10× slower
new  26.32M ( 37.99ns) (± 1.09%)  32.0B/op        fastest
===
Benchmark: Path["file/ending/with/a/dot."].extension
old  59.40M ( 16.84ns) (± 1.98%)  0.0B/op   1.00× slower
new  59.68M ( 16.76ns) (± 1.00%)  0.0B/op        fastest
===
Benchmark: Path[""].extension
old 291.64M (  3.43ns) (± 0.07%)  0.0B/op   1.08× slower
new 314.09M (  3.18ns) (± 0.25%)  0.0B/op        fastest
===
Benchmark: Path["verylong/file.extensionnnnnnnnnnnnnnnnn"].extension
old  13.10M ( 76.33ns) (± 1.28%)  48.0B/op   1.30× slower
new  16.99M ( 58.85ns) (± 1.19%)  48.0B/op        fastest
===

All specs ran fine, also I tested the methods using the strings seen in the Benchmark.

@BlobCodes
Copy link
Contributor Author

BlobCodes commented Jul 24, 2021

Hmm the #extension method seems to fail some specs (appearantly not all Path specs were in path_spec.cr.. hmm)
I'll probably revert it.

@straight-shoota
Copy link
Member

This looks great, really.

To fix #extension I think you just need to make sure to find any separators after the last dot. bytes.rindex(offset: pos) { |byte| byte === '.' || byte.in?(sep) } should work for that. Then you can check if bytes.unsafe_fetch(dot_index) is actually . and return empty otherwise.

The specs are in file_spec because these methods were originally implemented as class methods of File. We just moved the implementation to Path when that was introduced, and I kept the original specs exactly like they were to demonstrate backwards-compatibility. I think we should better move them to path_spec, though (that should better be a separate PR).

Your code comments in #dirname are really great and explain what's going on really well. #extname on the other hand could use a couple more on the special cases handling. The previous ones could just carry over.

@BlobCodes
Copy link
Contributor Author

It seems like Path#extension now properly works.
It is not as fast as before, but still an improvement.

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please fix the spelling of occurrEnce and sepArator in the code comments?

src/path.cr Outdated Show resolved Hide resolved
src/path.cr Outdated Show resolved Hide resolved
src/path.cr Outdated Show resolved Hide resolved
Copy link
Member

@sdogruyol sdogruyol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Thank you @BlobCodes 🙏

@straight-shoota straight-shoota added this to the 1.2.0 milestone Sep 29, 2021
@straight-shoota straight-shoota merged commit f90d914 into crystal-lang:master Sep 30, 2021
@straight-shoota straight-shoota changed the title Improve performance of Path#dirname and Path#extension Improve performance of Path#dirname and Path#extension Sep 30, 2021
@BlobCodes BlobCodes deleted the path-performance branch January 29, 2022 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants