Description
The issue
Disconnecting and reconnecting a pluggable PKCS11 token leads to the PKCS11 provider being inaccessible. To reproduce the issue:
- start the Parsec service with the PKCS11 provider and a pluggable hardware backend (e.g. a NitroKey HSM)
- unplug the hardware backend
- attempt to create a key using the
parsec-tool
, you'll get:
[INFO ] Creating RSA encryption key...
[ERROR] Subcommand failed: a hardware failure was detected (ParsecClientError(Service(PsaErrorHardwareFailure)))
This is as expected.
- plug the hardware back in
- attempt to create a key again, you'll get
INFO ] Creating RSA encryption key...
[ERROR] Subcommand failed: there was a communication failure inside the implementation (ParsecClientError(Service(PsaErrorCommunicationFailure)))
This error is NOT expected. The service should continue to operate correctly in this case.
Solution
There are still bits of information missing which will require some more investigation. I'm hoping to get a way to reproduce this using SoftHSM2.
The ideal solution would be for us to simply re-establish a functional connection to the hardware token when we detect that the token has been unplugged and plugged back in. The actual solution will depend on how reliably we can tell whether this has happened and on what options we identify for re-establishing that connection in a clean way.
Outstanding questions
- What exact error is received after the device is plugged back in? Is this error sufficiently distinctive to identify this exact cause?
- Do we attempt to re-establish the connection at provider level (i.e., initializing a new PKCS11 context), or do we restart the whole service somehow? Does it matter if we have other providers involved?
- If we re-establish the connection at provider level, do we retry the operation that made us realize something's broken? Or just send back something akin to "retry later"?
This is a variant of the more generic approach discussed in #607