s.t.cmd.CommandLineParser has exponential performance

```
⚡ for N in 10000 20000 40000 80000; do (yes /path/to/fileeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee.scal
10000
error: source file '/path/to/fileeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee.scala' could not be found
one error found

real	0m2.662s
user	0m6.006s
sys	0m0.323s
20000
error: source file '/path/to/fileeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee.scala' could not be found
one error found

real	0m6.835s
user	0m13.575s
sys	0m0.627s
40000
error: source file '/path/to/fileeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee.scala' could not be found
one error found

real	0m24.827s
user	0m50.809s
sys	0m1.815s
80000
error: source file '/path/to/fileeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee.scala' could not be found
one error found

real	2m4.690s
user	4m5.447s
sys	0m6.882s
```

I thought I fixed this in https://github.com/scala/scala/commit/0c520b59a04acf26350f5ea25ba80b9e235aa4f6, but it was only fixed on JDK versions in which `String.substring` was cheap (sharing the backing `char[]`).

The code in question is in https://github.com/scala/scala/blob/0022db978beadb8759cc2ede6d8ef5e4a8a1c58c/src/compiler/scala/tools/cmd/CommandLineParser.scala#L75

 - [ ] Write a unit test to show the current behaviour of this code
 - [ ] Write a JMH benchmark that ranges across different input sizes (like the bash command above) to show the performance problem
 - [ ] Rewrite the recursive method `commandLine` to avoid taking ever-smaller substrings of the input, perhaps by using `String.subsequence` (which is cheap), or some analagous method keeping track of how much input remains to be parsed.

I believe that most build tools aren't affected by this performance bug, as they tend to programattically provide the list of sources to the compiler, rather than providing them as part of the list of compiler options. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

s.t.cmd.CommandLineParser has exponential performance #485

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

s.t.cmd.CommandLineParser has exponential performance #485

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions