Skip to content

Error handling is not thead-safe #231

Closed
@qmuntal

Description

@qmuntal

We use ERR_get_error to retrieve the error that caused a given function call to fail. This function docs says this:

ERR_get_error() returns the earliest error code from the thread's error queue and removes the entry.

The problem is that cgo calls can be rescheduled on any thread when switching back from the system stack to the goroutine stack, The Go scheduler does this to avoid goroutines having to wait for the original thread to be available.

This means that our call to ERR_get_error might not return the correct error, or even no error at all.

There are two ways to solve this:

  • Calling runtime.LockOSThread and defer runtime.UnlockOSThread before every cgo call so that the goroutine is always recheduled to the same thread.
  • Replacing all OpenSSL cgo calls for a custom C wrapper function that calls the OpenSSL C function and also ERR_get_error if that fails. Note that this will work because a goroutine can't be rescheduled while in the system stack.

The first option is the easiest one, but will impact the performance of our OpenSSL backend even in the no-error cases. I would prefer to investigate how we could implement the second option, which wouldn't have any performance impact. Note that both options will require big refactors, as we will need to wrap existing cgo calls into either a C or a Go function (or even both).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions