Closed
Description
Version
v18.13.0, v19.4.0, main
Platform
Ubuntu 22.04 without modifications; Linux desktop-ua 5.15.0-57-generic #63-Ubuntu SMP Thu Nov 24 13:43:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
crypto
What steps will reproduce the bug?
$ cat fips-loop.js
const crypto = require('crypto');
crypto.setFips(1);
crypto.randomBytes(20, console.log);
$ gdb --args ./node-v19.4.0-linux-x64/bin/node ./fips-loop.js
[…]
(gdb) r
[…]
^C
[…]
(gdb) thread apply all bt
[…]
#0 0x0000555b0aef4018 in _dopr ()
#1 0x0000555b0aef5112 in BIO_vsnprintf ()
#2 0x0000555b0af952dd in ERR_vset_error ()
#3 0x0000555b0af95433 in ERR_set_error ()
#4 0x0000555b0afbcacb in evp_generic_fetch ()
#5 0x0000555b0afc2192 in EVP_RAND_fetch ()
#6 0x0000555b0b026660 in rand_new_drbg ()
#7 0x0000555b0b0277d6 in RAND_get0_public ()
#8 0x0000555b0b027858 in RAND_bytes_ex ()
#9 0x0000555b09ac730f in node::crypto::CSPRNG (buffer=0x555b0f0b7960, length=20) at ../src/crypto/crypto_util.cc:66
#10 0x0000555b09ab9054 in node::crypto::RandomBytesTraits::DeriveBits (env=0x555b0f32db30, params=..., unused=0x555b0f34cca0) at ../src/crypto/crypto_random.cc:69
#11 0x0000555b09abd1c3 in node::crypto::DeriveBitsJob<node::crypto::RandomBytesTraits>::DoThreadPoolWork (this=0x555b0f34cb90) at ../src/crypto/crypto_util.h:500
#12 0x0000555b097dd98c in node::ThreadPoolWork::ScheduleWork()::{lambda(uv_work_s*)#1}::operator()(uv_work_s*) const (__closure=0x0, req=0x555b0f34cbd8) at ../src/threadpoolwork-inl.h:44
#13 0x0000555b097dda7d in node::ThreadPoolWork::ScheduleWork()::{lambda(uv_work_s*)#1}::_FUN(uv_work_s*) () at ../src/threadpoolwork-inl.h:47
#14 0x0000555b0ac81a58 in worker (arg=0x0) at ../deps/uv/src/threadpool.c:122
#15 0x00007fafcb516b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#16 0x00007fafcb5a8a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
How often does it reproduce? Is there a required condition?
Always. No.
What is the expected behavior?
Some type of error indicating that OpenSSL is not configured properly for FIPS mode on the machine, which I assume this is the root cause here.
(I am not expecting this to really work and give me random bytes.)
What do you see instead?
Infinite hang.
Additional information
I think this is a problem that other people have run into before, e.g. #38633 (review) cc @danbev @richardlau
In the debugger, it’s visible that RAND_poll
and RAND_status
keep returning 1
but RAND_bytes
keeps returning 0
(code).
Activity
addaleax commentedon Jan 13, 2023
@nodejs/crypto
tniessen commentedon Jan 13, 2023
Quoting myself from an internal discussion of 5cc36c3:
@richardlau mentioned running into this case when experimenting with FIPS.
richardlau commentedon Jan 13, 2023
Yes, I ran into this last year when attempting to extend #44148 to cover the FIPS provider. I got sidetracked onto other Build things, but from memory:
The issue here is that one of the RAND_* calls under the covers attempts to load an algorithm from the providers. On OpenSSL 3, Node.js'
crypto.setFips()
calls EVP_default_properties_enable_fips, which effectively filters the algorithms to those from the FIPS provider:node/src/crypto/crypto_util.cc
Line 218 in a691002
However, it is possible to load Node.js without the FIPS provider configured (either not there at all or incorrectly configured in the openssl conf being loaded) and still enable the filter, in which case no matching algorithm will be available and we end up in the observed loop.
(Note that we do not currently attempt to load the FIPS provider in
crypto.setFips()
(but do if the command line options are used, although we then unload, which I'm uncertain is what we want:node/src/crypto/crypto_util.cc
Lines 96 to 113 in a691002
I think I looked at, or was going to look at, adding a OSSL_provider_available() (which I also remember confusingly checks if the provider is actually loaded rather than being available to load) check to
crypto.setFips()
. I don't remember if that actually worked, or if I had second thoughts because theoretically you could have FIPS providers that were not called "fips" (i.e. someone added their own provider). Supporting FIPS when your FIPS provider is not called "fips" could be an edge case that we choose not to support via a Node.js API (i.e. you'd have to do it via openssl config).Or another option may have been to unconditionally load the "fips" provider (an assumption here being this is always the provider called "fips") in
crypto.setFips()
when enabling fips -- I think OpenSSL does reference counting so loading the provider multiple times should be okay. This should error out in the recreate in this issue's description. I think my question then was whether disabling FIPS viacrypto.setFips()
should also attempt to unload the "fips" provider.richardlau commentedon Jan 13, 2023
I've just written all of that and then went back to find:
and I actually quoted an example (contrived) without FIPS being involved 😆:
Same basic cause -- the RAND_* function that is attempting to load an algorithm doesn't find a matching one and then our code continually loops. I don't recall if I identified the algorithm it was looking for.
addaleax commentedon Jan 17, 2023
@richardlau Sooo … I definitely don’t feel like I am familiar enough with this subject to have an opinion about how to move forward, exactly.
I don’t think this would be a full solution to this problem, but would it make sense to at least detect this specific error in the CSPRNG call and return instead of infinite looping?
or would that be too naïve? I think for me it would solve the problem because then it would be possible to try to see what
crypto.randomBytes()
does immediately after enabling FIPS and seeing if it works, which is at least easily detectable, unlike an infinite loop.richardlau commentedon Jan 17, 2023
@addaleax I've been playing around with something similar: richardlau@def2b01
I'm a bit unsure myself whether this checking for this specific error is distinct enough from the missing entropy case the loop is meant to address, but I'm also tending towards having a detectable error in these cases.
addaleax commentedon Jan 17, 2023
I’d actually feel like in an ideal world, we’d be checking for the missing-entropy error and only in that case continue to run the loop, but at the same time that comes with a larger risk of unintentional breakage (or at least it feels like that to me).
crypto: avoid hang when no algorithm available
crypto: avoid hang when no algorithm available
crypto: avoid hang when no algorithm available
crypto: avoid hang when no algorithm available
3 remaining items