Closed
Description
⚡ for N in 10000 20000 40000 80000; do (yes /path/to/fileeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee.scal
10000
error: source file '/path/to/fileeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee.scala' could not be found
one error found
real 0m2.662s
user 0m6.006s
sys 0m0.323s
20000
error: source file '/path/to/fileeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee.scala' could not be found
one error found
real 0m6.835s
user 0m13.575s
sys 0m0.627s
40000
error: source file '/path/to/fileeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee.scala' could not be found
one error found
real 0m24.827s
user 0m50.809s
sys 0m1.815s
80000
error: source file '/path/to/fileeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee.scala' could not be found
one error found
real 2m4.690s
user 4m5.447s
sys 0m6.882s
I thought I fixed this in scala/scala@0c520b5, but it was only fixed on JDK versions in which String.substring
was cheap (sharing the backing char[]
).
The code in question is in https://github.com/scala/scala/blob/0022db978beadb8759cc2ede6d8ef5e4a8a1c58c/src/compiler/scala/tools/cmd/CommandLineParser.scala#L75
- Write a unit test to show the current behaviour of this code
- Write a JMH benchmark that ranges across different input sizes (like the bash command above) to show the performance problem
- Rewrite the recursive method
commandLine
to avoid taking ever-smaller substrings of the input, perhaps by usingString.subsequence
(which is cheap), or some analagous method keeping track of how much input remains to be parsed.
I believe that most build tools aren't affected by this performance bug, as they tend to programattically provide the list of sources to the compiler, rather than providing them as part of the list of compiler options.