Commit d4b77a8
committed
searcher: fix a performance bug with
Previously (with the previous commit):
```
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999) | wc -l
real 2.321
user 0.674
sys 0.735
maxmem 30 MB
faults 0
1000
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A9999) | wc -l
real 2.513
user 0.823
sys 0.686
maxmem 30 MB
faults 0
10000
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A99999) | wc -l
real 5.067
user 3.254
sys 0.676
maxmem 30 MB
faults 0
100000
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999999) | wc -l
real 6.658
user 4.841
sys 0.778
maxmem 51 MB
faults 0
1000000
```
Now with this commit:
```
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999) | wc -l
real 1.845
user 0.328
sys 0.757
maxmem 30 MB
faults 0
1000
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A9999) | wc -l
real 1.917
user 0.334
sys 0.771
maxmem 30 MB
faults 0
10000
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A99999) | wc -l
real 1.972
user 0.319
sys 0.812
maxmem 30 MB
faults 0
100000
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999999) | wc -l
real 2.005
user 0.333
sys 0.855
maxmem 30 MB
faults 0
1000000
```
And compare to GNU grep:
```
$ cat bigger.txt | (time grep ZQZQZQZQZQ -A999) | wc -l
real 1.488
user 0.143
sys 0.866
maxmem 30 MB
faults 0
1000
$ cat bigger.txt | (time grep ZQZQZQZQZQ -A9999) | wc -l
real 1.697
user 0.170
sys 0.986
maxmem 30 MB
faults 1
10000
$ cat bigger.txt | (time grep ZQZQZQZQZQ -A99999) | wc -l
real 1.515
user 0.166
sys 0.856
maxmem 29 MB
faults 0
100000
$ cat bigger.txt | (time grep ZQZQZQZQZQ -A999999) | wc -l
real 1.490
user 0.174
sys 0.851
maxmem 30 MB
faults 0
1000000
```
Interestingly, GNU grep is still a bit faster. But both commands remain
roughly invariant in search time as `-A` is increased.
There is definitely something "odd" about searching `stdin`, where it
seems substantially slower. We can also observe with GNU grep:
```
$ (time grep ZQZQZQZQZQ -A999999 bigger.txt) | wc -l
real 0.692
user 0.184
sys 0.506
maxmem 30 MB
faults 0
1000000
$ cat bigger.txt | (time grep ZQZQZQZQZQ -A999999) | wc -l
real 1.700
user 0.201
sys 0.954
maxmem 30 MB
faults 0
1000000
$ (time rg ZQZQZQZQZQ -A999999 bigger.txt) | wc -l
real 0.640
user 0.428
sys 0.209
maxmem 7734 MB
faults 0
1000000
$ (time rg ZQZQZQZQZQ --no-mmap -A999999 bigger.txt) | wc -l
real 0.866
user 0.282
sys 0.581
maxmem 30 MB
faults 0
1000000
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999999) | wc -l
real 1.991
user 0.338
sys 0.819
maxmem 30 MB
faults 0
1000000
```
I wonder if this is related to my discovery in the previous commit where
`read` calls on `stdin` seem to never return anything more than ~64K. Oh
well, I'm satisfied at this point, especially given that GNU grep seems
to do a lot worse than ripgrep with bigger values of
`-B/--before-context`:
```
$ cat bigger.txt | (time grep ZQZQZQZQZQ -B9) | wc -l
real 1.568
user 0.170
sys 0.885
maxmem 30 MB
faults 0
1
$ cat bigger.txt | (time grep ZQZQZQZQZQ -B99) | wc -l
real 1.734
user 0.338
sys 0.879
maxmem 30 MB
faults 0
1
$ cat bigger.txt | (time grep ZQZQZQZQZQ -B999) | wc -l
real 2.349
user 1.723
sys 0.620
maxmem 30 MB
faults 0
1
$ cat bigger.txt | (time grep ZQZQZQZQZQ -B9999) | wc -l
real 16.459
user 15.848
sys 0.586
maxmem 30 MB
faults 0
1
$ time grep ZQZQZQZQZQ -B99999 bigger.txt
ZQZQZQZQZQ
real 1:45.06
user 1:44.12
sys 0.772
maxmem 30 MB
faults 0
```
The above pattern occurs regardless of whether you put `bigger.txt` on
stdin or whether you search it directly.
And now ripgrep:
```
$ cat bigger.txt | (time rg ZQZQZQZQZQ -B9) | wc -l
real 1.965
user 0.326
sys 0.814
maxmem 29 MB
faults 0
1
$ cat bigger.txt | (time rg ZQZQZQZQZQ -B99) | wc -l
real 1.941
user 0.423
sys 0.813
maxmem 29 MB
faults 0
1
$ cat bigger.txt | (time rg ZQZQZQZQZQ -B999) | wc -l
real 2.372
user 0.759
sys 0.703
maxmem 30 MB
faults 0
1
$ cat bigger.txt | (time rg ZQZQZQZQZQ -B9999) | wc -l
real 2.638
user 0.895
sys 0.665
maxmem 29 MB
faults 0
1
$ cat bigger.txt | (time rg ZQZQZQZQZQ -B99999) | wc -l
real 5.172
user 3.282
sys 0.748
maxmem 29 MB
faults 0
1
```
NOTE: To get `bigger.txt`:
```
$ curl -LO 'https://burntsushi.net/stuff/opensubtitles/2018/en/sixteenth.txt.gz'
$ gzip -d sixteenth.txt.gz
$ (echo ZQZQZQZQZQ && for ((i=0;i<10;i++)); do cat sixteenth.txt; done) > bigger.txt
```-A/--after-context
1 parent 8c6595c commit d4b77a8
1 file changed
+5
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
191 | 191 | | |
192 | 192 | | |
193 | 193 | | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
194 | 198 | | |
195 | 199 | | |
196 | 200 | | |
197 | | - | |
| 201 | + | |
198 | 202 | | |
199 | 203 | | |
200 | 204 | | |
| |||
0 commit comments