Skip to content

Commit 1925e75

Browse files
committed
cpu profile
1 parent 3d0ae3b commit 1925e75

File tree

2 files changed

+66
-8
lines changed

2 files changed

+66
-8
lines changed

3-profiling/1-profiling.md

Lines changed: 66 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -194,13 +194,75 @@ func main() {
194194
Now when we run the program a `cpu.pprof` file is created.
195195
```
196196
% go run main.go moby.txt
197-
2018/08/18 15:46:05 profile: cpu profiling enabled, /var/folders/by/3gf34_z95zg05cyj744_vhx40000gn/T/profile309242028/cpu.pprof
197+
2018/08/25 14:09:01 profile: cpu profiling enabled, /var/folders/by/3gf34_z95zg05cyj744_vhx40000gn/T/profile239941020/cpu.pprof
198198
"moby.txt": 181275 words
199-
2018/08/18 15:46:07 profile: cpu profiling disabled, /var/folders/by/3gf34_z95zg05cyj744_vhx40000gn/T/profile309242028/cpu.pprof
199+
2018/08/25 14:09:03 profile: cpu profiling disabled, /var/folders/by/3gf34_z95zg05cyj744_vhx40000gn/T/profile239941020/cpu.pprof
200200
```
201-
#### Extra credit
201+
Now we have the profile we can analyse it with `go tool pprof`
202+
```
203+
% go tool pprof /var/folders/by/3gf34_z95zg05cyj744_vhx40000gn/T/profile239941020/cpu.pprof
204+
Type: cpu
205+
Time: Aug 25, 2018 at 2:09pm (AEST)
206+
Duration: 2.05s, Total samples = 1.36s (66.29%)
207+
Entering interactive mode (type "help" for commands, "o" for options)
208+
(pprof) top
209+
Showing nodes accounting for 1.42s, 100% of 1.42s total
210+
flat flat% sum% cum cum%
211+
1.41s 99.30% 99.30% 1.41s 99.30% syscall.Syscall
212+
0.01s 0.7% 100% 1.42s 100% main.readbyte
213+
0 0% 100% 1.41s 99.30% internal/poll.(*FD).Read
214+
0 0% 100% 1.42s 100% main.main
215+
0 0% 100% 1.41s 99.30% os.(*File).Read
216+
0 0% 100% 1.41s 99.30% os.(*File).read
217+
0 0% 100% 1.42s 100% runtime.main
218+
0 0% 100% 1.41s 99.30% syscall.Read
219+
0 0% 100% 1.41s 99.30% syscall.read
220+
```
221+
The `top` command is one you'll use the most. We can see that 99% of the time this program spends in `syscall.Syscall`, and a small part in `main.readbyte`.
222+
223+
We can also visualise this call the with the `web` command. This will generate a directed graph from the profile data. Under the hood this uses the `dot` command from Graphviz.
224+
225+
![pprof](images/pprof.png)
226+
227+
On the graph the box that consumes the _most_ CPU time is the largest -- we see `sys call.Syscall` at 99.3% of the total time spent in the program. The string of boxes leading to `syscall.Syscall` represent the immediate callers -- there can be more than one if multiple code paths converge on the same function. The size of the arrow represents how much time was spent in children of a box, we see that from `main.readbyte` onwards they account for near 0 of the 1.41 second spent in this arm of the graph.
228+
229+
_Question_: Can anyone guess why our version is so much slower than `wc`?
230+
231+
### Improving our version
232+
233+
The reason our program is slow is not because Go's `syscall.Syscall` is slow. It is because syscalls in general are expensive operations (and getting more expensive as more Spectre family vulnerabilities are discovered).
234+
235+
Each call to `readbyte` results in a syscall.Read with a buffer size of 1. So the number of syscalls executed by our program is equal to the size of the input. We can see that in the pprof graph that reading the input dominates everything else.
236+
```go
237+
func main() {
238+
f, err := os.Open(os.Args[1])
239+
if err != nil {
240+
log.Fatalf("could not open file %q: %v", os.Args[1], err)
241+
}
242+
243+
b := bufio.NewReader(f)
244+
words := 0
245+
inword := false
246+
for {
247+
r, err := readbyte(b)
248+
if err == io.EOF {
249+
break
250+
}
251+
if err != nil {
252+
log.Fatalf("could not read file %q: %v", os.Args[1], err)
253+
}
254+
if unicode.IsSpace(r) && inword {
255+
words++
256+
inword = false
257+
}
258+
inword = unicode.IsLetter(r)
259+
}
260+
fmt.Printf("%q: %d words\n", os.Args[1], words)
261+
}
262+
```
263+
By inserting a `bufio.Reader` between the input file and `readbyte` will
202264
203-
Why did I use a large piece of text to benchmark my code?
265+
_Exercise_: Compare the times of this revised program to `wc`. How close is it? Take a profile and see what remains.
204266
205267
206268
## Memory profiling

3-profiling/examples/words/main.go

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,9 @@ import (
66
"log"
77
"os"
88
"unicode"
9-
10-
"github.com/pkg/profile"
119
)
1210

1311
func main() {
14-
defer profile.Start().Stop()
15-
1612
f, err := os.Open(os.Args[1])
1713
if err != nil {
1814
log.Fatalf("could not open file %q: %v", os.Args[1], err)

0 commit comments

Comments
 (0)