You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Showing nodes accounting for1.42s, 100% of 1.42s total
210
+
flat flat% sum% cum cum%
211
+
1.41s 99.30% 99.30% 1.41s 99.30% syscall.Syscall
212
+
0.01s 0.7% 100% 1.42s 100% main.readbyte
213
+
00% 100% 1.41s 99.30% internal/poll.(*FD).Read
214
+
00% 100% 1.42s 100% main.main
215
+
00% 100% 1.41s 99.30% os.(*File).Read
216
+
00% 100% 1.41s 99.30% os.(*File).read
217
+
00% 100% 1.42s 100% runtime.main
218
+
00% 100% 1.41s 99.30% syscall.Read
219
+
00% 100% 1.41s 99.30% syscall.read
220
+
```
221
+
The `top` command is one you'll use the most. We can see that 99% of the time this program spends in `syscall.Syscall`, and a small part in `main.readbyte`.
222
+
223
+
We can also visualise this call the with the `web` command. This will generate a directed graph from the profile data. Under the hood this uses the `dot` command from Graphviz.
224
+
225
+

226
+
227
+
On the graph the box that consumes the _most_ CPU time is the largest -- we see `sys call.Syscall` at 99.3% of the total time spent in the program. The string of boxes leading to `syscall.Syscall` represent the immediate callers -- there can be more than one if multiple code paths converge on the same function. The size of the arrow represents how much time was spent in children of a box, we see that from `main.readbyte` onwards they account for near 0 of the 1.41 second spent in this arm of the graph.
228
+
229
+
_Question_: Can anyone guess why our version is so much slower than `wc`?
230
+
231
+
### Improving our version
232
+
233
+
The reason our program is slow is not because Go's `syscall.Syscall` is slow. It is because syscalls in general are expensive operations (and getting more expensive as more Spectre family vulnerabilities are discovered).
234
+
235
+
Each call to `readbyte` results in a syscall.Read with a buffer size of 1. So the number of syscalls executed by our program is equal to the size of the input. We can see that in the pprof graph that reading the input dominates everything else.
236
+
```go
237
+
funcmain() {
238
+
f, err:= os.Open(os.Args[1])
239
+
if err != nil {
240
+
log.Fatalf("could not open file %q: %v", os.Args[1], err)
241
+
}
242
+
243
+
b:= bufio.NewReader(f)
244
+
words:=0
245
+
inword:=false
246
+
for {
247
+
r, err:=readbyte(b)
248
+
if err == io.EOF {
249
+
break
250
+
}
251
+
if err != nil {
252
+
log.Fatalf("could not read file %q: %v", os.Args[1], err)
253
+
}
254
+
if unicode.IsSpace(r) && inword {
255
+
words++
256
+
inword = false
257
+
}
258
+
inword = unicode.IsLetter(r)
259
+
}
260
+
fmt.Printf("%q: %d words\n", os.Args[1], words)
261
+
}
262
+
```
263
+
By inserting a `bufio.Reader` between the input file and `readbyte` will
202
264
203
-
Why did I use a large piece of text to benchmark my code?
265
+
_Exercise_: Compare the times of this revised program to `wc`. How close is it? Take a profile and see what remains.
0 commit comments