Description
We recently started calling res_init
on DNS failure to fix a glibc bug, and there was some worry that we might be calling a function that isn't threadsafe. It looked like we might not have any thread safety issues with glibc specifically, but now I'm convinced that we do have them in libc on OSX. I haven't repro'd the issue in Rust yet, but I'm able to produce very scary crashes with the following Go program if I run it on OSX (but not on Linux):
package main
// #cgo LDFLAGS: -lresolv
// #include<sys/types.h>
// #include<netinet/in.h>
// #include<arpa/nameser.h>
// #include<resolv.h>
import "C"
import "fmt"
func main() {
// Loop on res_init() in a background goroutine...
go func() {
for {
fmt.Println("background res_init")
C.res_init()
}
}()
// ...and also loop on it in the main thread.
for {
fmt.Println("foreground res_init")
C.res_init()
}
}
The result is inconsistent, but here are a couple examples:
fatal error: unexpected signal during runtime execution
foreground res_init
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x7fffddf6705d]
test(30623,0x70000b910000) malloc: *** error for object 0x4601510: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
foreground res_init
foreground res_init
SIGABRT: abort
PC=0x7fffde988d42 m=3 sigcode=0
signal arrived during cgo execution
It might be that the best workaround is to limit our calls to res_init
to when we know we're linking against glibc? That way we could still fix the original bug (stale /etc/resolv.conf
data in glibc specifically), take advantage of the thread safety that glibc seems to have here (we might want to audit it more carefully than I'm able to), and not worry about breaking any other platforms.