Description
We use ERR_get_error to retrieve the error that caused a given function call to fail. This function docs says this:
ERR_get_error() returns the earliest error code from the thread's error queue and removes the entry.
The problem is that cgo calls can be rescheduled on any thread when switching back from the system stack to the goroutine stack, The Go scheduler does this to avoid goroutines having to wait for the original thread to be available.
This means that our call to ERR_get_error
might not return the correct error, or even no error at all.
There are two ways to solve this:
- Calling
runtime.LockOSThread
anddefer runtime.UnlockOSThread
before every cgo call so that the goroutine is always recheduled to the same thread. - Replacing all OpenSSL cgo calls for a custom C wrapper function that calls the OpenSSL C function and also
ERR_get_error
if that fails. Note that this will work because a goroutine can't be rescheduled while in the system stack.
The first option is the easiest one, but will impact the performance of our OpenSSL backend even in the no-error cases. I would prefer to investigate how we could implement the second option, which wouldn't have any performance impact. Note that both options will require big refactors, as we will need to wrap existing cgo calls into either a C or a Go function (or even both).