Skip to content

When using DNS in cluster file and DNS entries not resolvable client crashes #12166

Open
@johscheuer

Description

@johscheuer

During our e2e tests with the operator we observed the case that if no coordinator pod is running (and therefore the DNS requests are not resolved)the fdb client library will crash and cause a stack trace:

 {"level":"info","ts":"2025-05-23T04:39:02Z","logger":"controller.fdbclient","msg":"Fetch values from FDB","namespace":"nightly-2515-operator-test-vzvtaddx","cluster":"operator-test-8jh94lmm","traceID":"955b160d-f898-484f-8295-3e654354e157","key":"\ufffd\ufffd/status/json"}
  Error determining public address.
  SIGSEGV: segmentation violation
  PC=0x7f27099791fa m=13 sigcode=1 addr=0x40
  signal arrived during cgo execution

  goroutine 286 gp=0xc000585500 m=13 mp=0xc000600708 [syscall]:
  runtime.cgocall(0x1571540, 0xc0005dc758)
  	/usr/local/go/src/runtime/cgocall.go:167 +0x4b fp=0xc0005dc730 sp=0xc0005dc6f8 pc=0x46e1ab
  github.com/apple/foundationdb/bindings/go/src/fdb._Cfunc_fdb_run_network()
  	_cgo_gotypes.go:441 +0x47 fp=0xc0005dc758 sp=0xc0005dc730 pc=0x137cf47
  github.com/apple/foundationdb/bindings/go/src/fdb.startNetwork.func1()
  	/go/pkg/mod/github.com/apple/foundationdb/bindings/go@v0.0.0-20250115161953-f1ab8147ed1c/src/fdb/fdb.go:209 +0x17 fp=0xc0005dc7e0 sp=0xc0005dc758 pc=0x13865b7
  runtime.goexit({})
  	/usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0005dc7e8 sp=0xc0005dc7e0 pc=0x47c5a1
  created by github.com/apple/foundationdb/bindings/go/src/fdb.startNetwork in goroutine 285
  	/go/pkg/mod/github.com/apple/foundationdb/bindings/go@v0.0.0-20250115161953-f1ab8147ed1c/src/fdb/fdb.go:208 +0x4b

I think it would be better to return an error during the DB object creation instead of crashing the whole application with a stack trace.

The error happens here: https://github.com/apple/foundationdb/blob/main/fdbclient/NativeAPI.actor.cpp#L2301-L2304 when the client tries to figure out out it's own public IP address. I haven't looked into the details, but it might be worth to allow the user to at least specify the public IP address and prevent the trace file setup to cause this stack trace.

Related: FoundationDB/fdb-kubernetes-operator#2283

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions