⬅ Index of all go-profiler-notes
This document was last updated for go1.15.6
but probably still applies to older/newer versions for the most parts.
The Go runtime keeps track of all goroutines in a simple slice called allgs. It contains both active and dead goroutines. The latter are kept around for reuse when new goroutines are spawned.
Go has various APIs to inspect the active goroutines in allgs
along with their current stack trace, as well as various other properties. Some APIs expose this information as statistical summaries, while other APIs provide information for each individual goroutine.
Despite the differences between the APIs, the common definition of an "active" goroutine seems to be:
- It's not
dead
- It's not a system goroutine nor finalizer goroutine.
In other words, goroutines that are running as well as those waiting on i/o, locks, channels, scheduling, etc. are all considered to be "active", even so one might naively not think of the latter ones as such.
All Goroutine profiling available in Go requires an O(N)
stop-the-world phase where N
is the number of allocated goroutines. A naive benchmark indicates that the world is stopped for ~1µs per goroutine when using the runtime.GoroutineProfile() API. But this number is likely to fluctuate in response to factors such as the average stack depth of the program, the number of dead goroutines, etc..
As a rule of thumb, applications that are extremely latency sensitive and make use of thousands of active goroutines might want to be a little careful with goroutine profiling in production. That being said, large number of goroutines, and perhaps even Go itself, might not be good idea for such applications to begin with.
Most applications that don't spawn crazy amounts of goroutines and can tolerate a few ms of ocassional extra latency should have no issues with continous goroutine profiling in production.
Goroutines have a lot of properties that can help to debug Go applications. The ones below are particulary interesting and exposed via the APIs described later on in this document to varying degrees.
goid
: The unique id of the goroutine, the main goroutine has id1
.atomicstatus
: The status of the goroutine, one of the following:idle
: just got allocatedrunnable
: on a run queue, waiting to be scheduledrunning
: executing on an OS threadsyscall
: blocked on a syscallwaiting
: parked by the scheduler, seeg.waitreason
dead
: just exited or being reinitializedcopystack
: stack is currently being movedpreempted
: just preempted itself
waitreason
: The reason a goroutine is inwaiting
status, e.g. sleep, channel operations, i/o, gc, etc.waitsince
: The approximate timestamp a goroutine has enteredwaiting
orsyscall
status as determined by the first gc after the wait started.labels
: A set of key/value profiler labels that can be attached to goroutines.stack trace
: The function that is currently being executed as well as its callers. This is exposed as either a plain text output of filenames, function names and line numbers or a slice of program counter addresses (pcs). 🚧 Research more details on this, e.g. can func/file/line text be converted to pcs?gopc
: The program counter address (pc) of thego ...
call that caused this goroutine to be created. Can be converted to the file, function name and line number.lockedm
: The thread this goroutine is locked to, if any.
The feature matrix below give you a quick idea on the current availability of these properties through the various APIs. Also available as a google sheet.
This returns unstructured text output showing the stack of all active goroutines as well as the properties listed in the feature matrix above.
The waitsince
property is included as nanotime() - gp.waitsince()
in minutes, but only if the duration exceeds 1 minute.
pprof.Lookup(debug=2)
is a simplified alias for how this profile is used. The actual invocation looks like this:
profile := pprof.Lookup("goroutine")
profile.WriteTo(os.Stdout, 2)
The profile implementation itself simply invokes runtime.Stack()
.
Below is a truncated example of the returned output, see 2.runtime.stack.txt for a full example.
goroutine 1 [running]:
main.glob..func1(0x14e5940, 0xc0000aa7b0, 0xc000064eb0, 0x2)
/Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/examples/goroutine/main.go:29 +0x6f
main.writeProfiles(0x2, 0xc0000c4008, 0x1466424)
/Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/examples/goroutine/main.go:106 +0x187
main.main()
/Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/examples/goroutine/main.go:152 +0x3d2
goroutine 22 [sleep, 1 minutes]:
time.Sleep(0x3b9aca00)
/usr/local/Cellar/go/1.15.6/libexec/src/runtime/time.go:188 +0xbf
main.shortSleepLoop()
/Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/examples/goroutine/main.go:165 +0x2a
created by main.indirectShortSleepLoop2
/Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/examples/goroutine/main.go:185 +0x35
goroutine 3 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x1e91e88, 0x72, 0x0)
/usr/local/Cellar/go/1.15.6/libexec/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc00019e018, 0x72, 0x0, 0x0, 0x1465786)
/usr/local/Cellar/go/1.15.6/libexec/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/usr/local/Cellar/go/1.15.6/libexec/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Accept(0xc00019e000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.15.6/libexec/src/internal/poll/fd_unix.go:394 +0x1fc
...
This profiling method is invoked the same way as pprof.Lookup(debug=2)
, but produces very different data:
- Instead of listing individual goroutines, goroutines with the same stack/labels are listed once along with their count.
- pprof labels are included,
debug=2
does not include them. - Most other goroutine properties from
debug=2
are not included. - The output format is also text based, but looks very different than
debug=2
.
Below is a truncated example of the returned output, see 2.pprof.lookup.goroutine.debug1.txt for a full example.
goroutine profile: total 9
2 @ 0x103b125 0x106cd1f 0x13ac44a 0x106fd81
# labels: {"test_label":"test_value"}
# 0x106cd1e time.Sleep+0xbe /usr/local/Cellar/go/1.15.6/libexec/src/runtime/time.go:188
# 0x13ac449 main.shortSleepLoop+0x29 /Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/examples/goroutine/main.go:165
1 @ 0x103b125 0x10083ef 0x100802b 0x13ac4ed 0x106fd81
# labels: {"test_label":"test_value"}
# 0x13ac4ec main.chanReceiveForever+0x4c /Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/examples/goroutine/main.go:177
1 @ 0x103b125 0x103425b 0x106a1d5 0x10d8185 0x10d91c5 0x10d91a3 0x11b8a8f 0x11cb72e 0x12df52d 0x11707c5 0x117151d 0x1171754 0x1263c2c 0x12d96ca 0x12d96f9 0x12e09ba 0x12e5085 0x106fd81
# 0x106a1d4 internal/poll.runtime_pollWait+0x54 /usr/local/Cellar/go/1.15.6/libexec/src/runtime/netpoll.go:222
# 0x10d8184 internal/poll.(*pollDesc).wait+0x44 /usr/local/Cellar/go/1.15.6/libexec/src/internal/poll/fd_poll_runtime.go:87
# 0x10d91c4 internal/poll.(*pollDesc).waitRead+0x1a4 /usr/local/Cellar/go/1.15.6/libexec/src/internal/poll/fd_poll_runtime.go:92
# 0x10d91a2 internal/poll.(*FD).Read+0x182 /usr/local/Cellar/go/1.15.6/libexec/src/internal/poll/fd_unix.go:159
# 0x11b8a8e net.(*netFD).Read+0x4e /usr/local/Cellar/go/1.15.6/libexec/src/net/fd_posix.go:55
# 0x11cb72d net.(*conn).Read+0x8d /usr/local/Cellar/go/1.15.6/libexec/src/net/net.go:182
# 0x12df52c net/http.(*connReader).Read+0x1ac /usr/local/Cellar/go/1.15.6/libexec/src/net/http/server.go:798
# 0x11707c4 bufio.(*Reader).fill+0x104 /usr/local/Cellar/go/1.15.6/libexec/src/bufio/bufio.go:101
# 0x117151c bufio.(*Reader).ReadSlice+0x3c /usr/local/Cellar/go/1.15.6/libexec/src/bufio/bufio.go:360
# 0x1171753 bufio.(*Reader).ReadLine+0x33 /usr/local/Cellar/go/1.15.6/libexec/src/bufio/bufio.go:389
# 0x1263c2b net/textproto.(*Reader).readLineSlice+0x6b /usr/local/Cellar/go/1.15.6/libexec/src/net/textproto/reader.go:58
# 0x12d96c9 net/textproto.(*Reader).ReadLine+0xa9 /usr/local/Cellar/go/1.15.6/libexec/src/net/textproto/reader.go:39
# 0x12d96f8 net/http.readRequest+0xd8 /usr/local/Cellar/go/1.15.6/libexec/src/net/http/request.go:1012
# 0x12e09b9 net/http.(*conn).readRequest+0x199 /usr/local/Cellar/go/1.15.6/libexec/src/net/http/server.go:984
# 0x12e5084 net/http.(*conn).serve+0x704 /usr/local/Cellar/go/1.15.6/libexec/src/net/http/server.go:1851
...
This profiling method is invoked the same way as pprof.Lookup(debug=1)
, and produces the same data. The only difference is that the data format is the pprof protocol buffer format.
Below is a truncated example of the returned output as reported by go tool pprof -raw
, see 2.pprof.lookup.goroutine.debug0.pb.gz for a full example.
PeriodType: goroutine count
Period: 1
Time: 2021-01-14 16:46:23.697667 +0100 CET
Samples:
goroutine/count
2: 1 2 3
test_label:[test_value]
1: 1 4 5 6
test_label:[test_value]
1: 1 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1: 1 7 8 9 10 11 12 21 14 22 23
test_label:[test_value]
1: 1 7 8 9 24 25 26 27 28 29 30
1: 1 31 32
test_label:[test_value]
1: 1 2 33
test_label:[test_value]
1: 34 35 36 37 38 39 40 41
test_label:[test_value]
Locations
1: 0x103b124 M=1 runtime.gopark /usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:306 s=0
2: 0x106cd1e M=1 time.Sleep /usr/local/Cellar/go/1.15.6/libexec/src/runtime/time.go:188 s=0
3: 0x13ac449 M=1 main.shortSleepLoop /Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/examples/goroutine/main.go:165 s=0
4: 0x10083ee M=1 runtime.chanrecv /usr/local/Cellar/go/1.15.6/libexec/src/runtime/chan.go:577 s=0
5: 0x100802a M=1 runtime.chanrecv1 /usr/local/Cellar/go/1.15.6/libexec/src/runtime/chan.go:439 s=0
6: 0x13ac4ec M=1 main.chanReceiveForever /Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/examples/goroutine/main.go:177 s=0
...
Mappings
1: 0x0/0x0/0x0 [FN]
This function essentially returns a slice of all active goroutines and their current stack trace. The stack traces are given in the form of program addresses which can be resolved to function names using runtime.CallersFrames()
.
This method is used by fgprof to implement wall clock profiling.
The following features are not available, but might be interesting to propose to the Go project in the future:
- Include goroutine properties outlined above that are not available yet, especially labels.
- Filter by pprof labels, this could reduce stop-the-world, but would require additional book keeping by the runtime.
- Limit the number of returned goroutines to a random subset, could also reduce stop-the-world and might be easier to implement than filtering by label.
Below is a truncated example of the returned output, see 2.runtime.goroutineprofile.json for a full example.
[
{
"Stack0": [
20629256,
20629212,
20627047,
20628306,
17018153,
17235329,
...
]
},
{
"Stack0": [
17019173,
17222943,
20628554,
17235329,
...
]
},
...
]
This package exposes the pprof.Lookup("goroutine")
profiles described above via HTTP endpoints. The output is identical.
Goroutine profiling was implemented by Russ Cox and first appeared in the weekly.2012-02-22 release prior to go1.
I'm felixge and work at Datadog on Continuous Profiling for Go. You should check it out. We're also hiring : ).
The information on this page is believed to be correct, but no warranty is provided. Feedback is welcome!