Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasm, Contexts, and Entropy #346

Open
JeremyRubin opened this issue Dec 11, 2021 · 78 comments
Open

Wasm, Contexts, and Entropy #346

JeremyRubin opened this issue Dec 11, 2021 · 78 comments

Comments

@JeremyRubin
Copy link

does signing context generate entropy on creation? testing now but this may be a big issue for using verification from a WASM module with no entropy because the get random number methods may panic.

For manually generated contexts, this happens only if you randomize them explicitly, see

rust-secp256k1/src/lib.rs

Lines 354 to 380 in 48683d8

/// (Re)randomizes the Secp256k1 context for cheap sidechannel resistance;
/// see comment in libsecp256k1 commit d2275795f by Gregory Maxwell. Requires
/// compilation with "rand" feature.
#[cfg(any(test, feature = "rand"))]
pub fn randomize<R: Rng + ?Sized>(&mut self, rng: &mut R) {
let mut seed = [0u8; 32];
rng.fill_bytes(&mut seed);
self.seeded_randomize(&seed);
}
/// (Re)randomizes the Secp256k1 context for cheap sidechannel resistance given 32 bytes of
/// cryptographically-secure random data;
/// see comment in libsecp256k1 commit d2275795f by Gregory Maxwell.
pub fn seeded_randomize(&mut self, seed: &[u8; 32]) {
unsafe {
let err = ffi::secp256k1_context_randomize(self.ctx, seed.as_c_ptr());
// This function cannot fail; it has an error return for future-proofing.
// We do not expose this error since it is impossible to hit, and we have
// precedent for not exposing impossible errors (for example in
// `PublicKey::from_secret_key` where it is impossible to create an invalid
// secret key through the API.)
// However, if this DOES fail, the result is potentially weaker side-channel
// resistance, which is deadly and undetectable, so we take out the entire
// thread to be on the safe side.
assert_eq!(err, 1);
}
}
.

For the global context, it depends on the enabled feature: The global-context feature randomizes the context automatically and depends on rand, and the global-context-less-secure feature gives you a context that is not randomized (and can't be randomized).

Originally posted by @real-or-random in #342 (comment)

We should probably think through what it means to use any of these contexts in a WASM context! Especially in wasm32-unknown-unknown you have no syscalls or way of getting entropy, so this could be problematic.

Fully isolated WASM is really great to provide first-class support for for a myriad of reasons, but we'd need to think through carefully exactly what we do. Maybe we can detect if we're in wasm and disable some targets if we don't have symbols for getting entropy linked? Not sure :)

@real-or-random
Copy link
Collaborator

real-or-random commented Dec 13, 2021

Hm, I don't think we should disable anything. This all depends on the threat model of the application and it's hard for a library to know this. (It's already hard for an application to make a guess about the threat model of the user...)

So randomization is defense in depth against timing attacks and it may provide some resistance against other side-channel attacks (electromagnetic -- but that's anyway hard). How much do you care about this in WASM? We can't tell as a library. Anyway, can WASM code even be made resistant against timing leaks? Apparently yes. So okay, without entropy, we lose defense in depth but there's no reason to believe that we lose security.

Especially in wasm32-unknown-unknown you have no syscalls or way of getting entropy, so this could be problematic.

You could try to hash your secrets at least. How do you get your secrets into WASM? (I mean you can't generate them in WASM without external entropy?)

edit: Please tell me if I seem to assume anything which is wrong. To be honest, I really have no idea how WASM works.

@elichai
Copy link
Member

elichai commented Dec 14, 2021

If you know your default implementation of getrandom either sucks or panics, then you should make your own global context (or use a local one) and randomize it using whatever randomness you do have (and you have to have some kind of randomness as you have a private key).

I'd rather we won't change anything in the context until bitcoin-core/secp256k1#988 is done and then we can rehaul our API

@apoelstra
Copy link
Member

It is really unfortunate that rand methods would panic. This seems like an abuse of panic and I don't see any reasonable way we can detect it and recover.

Are there replacements for rand out there which are widely supported?

@TheBlueMatt
Copy link
Member

This is the exact reason for global-context-less-secure - supporting environments where you can't get global randomness, especially WASM. In these environments there's probably nothing we can do better around global context, we have to hope people use their own local contexts and randomize them manually, if they want. A "better rand" here would still be one that just doesn't get any randomness.

@apoelstra
Copy link
Member

apoelstra commented Feb 16, 2022

Gotcha. So, in light of #388 (not sure if you've been following this) where we are moving toward removing global-context-less-secure and just using rand whenever it was available, I guess we can't do that, because rand sometimes unpreventably terminates the program from inside library functions.

So I guess we should keep global-context-less-secure, indicate in the docs that this is needed if you expect rand to misbehave, and we should gate the global context randomization on #[cfg(all(feature = "rand", not(feature = "global-context-less-secure"))].

@TheBlueMatt
Copy link
Member

where we are moving toward removing global-context-less-secure and just using rand whenever it was available, I guess we can't do that, because rand sometimes unpreventably terminates the program from inside library functions.

I don't see why this is a problem? We just need to make sure we do not enable the rand feature/dependency unless the user does.

@JeremyRubin
Copy link
Author

you can also test rand via https://doc.rust-lang.org/std/panic/fn.catch_unwind.html at runtime btw.

@TheBlueMatt
Copy link
Member

That is std-only, sadly (not to mention very poor practice, IMO)

@apoelstra
Copy link
Member

@TheBlueMatt we can't prevent parallel dependencies from enabling the rand feature ... I don't think we can reasonably say "this may break unless feature X is off"

@tcharding
Copy link
Member

I don't think we need #385. Since we feature gate on "rand-std", and not on "rand", we do fully control the feature gating. If we were to use "rand" we would be at the mercy of other dependencies enabling "rand" as you say @apoelstra.

no-std users will not be enabling "rand-std" so they will not get the global context randomization and will not hit the rand panic issue, right?

@apoelstra
Copy link
Member

Yeah, I think you're right. Ok.

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 4, 2022

Skipping over the details, would it be reasonable to provide a method for consumers to dynamically set a global randomization Fn? People could set it when initializing and not lose any security.

@apoelstra
Copy link
Member

That sounds reasonable to me. @elichai what do you think?

@elichai
Copy link
Member

elichai commented Jul 10, 2022

That sounds reasonable to me. @elichai what do you think?

It can work, but I doubt most rust users will understand how to use this correctly.
I do think that the best solution is to keep using getrandom and that if a users uses wasm32-unknown-unkown they would have to depend on getrandom by themselves and enable wasm-bindgen / js-sys depending on what they use.

And if they use a weirder environment then it should probably be supported in getrandom.

But, if it is common to use various exotic environments and they can't all be supported in getrandom then we could do something like that. but this sounds a bit weird to me.

FWIW there was a discussion on providing a getrandom-like interface in libstd:
https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/OS.20secure.20randomness
https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/possible.20getrandom.20interface

@TheBlueMatt
Copy link
Member

I'm not really sure what the use-case of such a method is - if you don't have access to getrandom, you should use global-context without the rand crate and randomize yourself or live without randomization. If you do have access to getrandom, the rand crate should work fine for you.

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 10, 2022

I doubt most rust users will understand how to use this correctly

I doubt most users will not. :) I imagine this doc like this

The function is provided for platforms that don't have inherent access to randomness but require some additional bridging.
Wasm is a popular example of such platform. It doesn't provide API to get randomness but in browsers you can still get randomness using JS bindings.

Warning: do not implement custom randomness function, only bridge with existing randomness APIs on your platform. Make sure that if the API fails the supplied function fails as well (no fallbacks)!

People knowledgeable enough to use exotic platform should be knowledgeable enough to write bindings.

@TheBlueMatt

use global-context without the rand crate and randomize yourself

This may be too annoying to work with. While I in general don't like global mutable state, this looks like a very reasonable exception.

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 10, 2022

Alternative idea: depend on some CSRNG (e.g. from rand crate) and only accept seed. Less bothering with creating trait objects or so. May tempt people to supply all zeroes.

@TheBlueMatt
Copy link
Member

This may be too annoying to work with. While I in general don't like global mutable state, this looks like a very reasonable exception.

I don't really understand the push-back here: afaiu, you really shouldn't (ideally) be randomizing only once and then never re-randomizing again. If you're signing with the same key over and over and over again, afaiu, re-randomizing is something you should strive for, and our API should encourage that. I think we're way over-engineering this for the never-rereandomize case.

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 10, 2022

My understanding of your previous comment is that you suggest consumers write something like this:

let sig1 = key.sign(ctx, msg1);
ctx.rerandomize(rng);
let sig2 = key.sign(ctx, msg2);
ctx.rerandomize(rng);
let sig3 = key.sign(ctx, msg3);
ctx.rerandomize(rng);

While my suggestion is to globally set a custom RNG (presumably using JS bindings) so that sign method automatically retrieves entropy from it and re-randomizes, so the code would become:

set_rng(my_rng);
let sig1 = key.sign(ctx, msg1);
let sig2 = key.sign(ctx, msg2);
let sig3 = key.sign(ctx, msg3);

Which is shorter, cleaner and thus I believe better achieves your goal of encouraging re-randomization. (I agree with the goal BTW.)

@TheBlueMatt
Copy link
Member

I don't believe we can re-randomize on sign unless we make the context mutable, at which point the whole discussion on a global context isn't really relevant. If we're talking about a mutable-context-sign-method there's all kinds of things we can do - we can also have a parallel sign_with_random_bytes that takes randomness to re-seed with, but the mutable context generally makes the API much harder to work with.

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 10, 2022

We could make it thread-local, internally using RefCell, or just make it non-Sync and use interior mutability.

@TheBlueMatt
Copy link
Member

AFAIU, thread _local is a problem on many of the platform rand is a problem on, and making them non-Sync (Send shouldn't be required?) makes the API harder to use (though maybe non-Sync isn't so bad).

@real-or-random
Copy link
Collaborator

real-or-random commented Jul 11, 2022

I don't really understand the push-back here: afaiu, you really shouldn't (ideally) be randomizing only once and then never re-randomizing again.

As far as I can speak for upstream, this is something we're currently unsure about. I think the initial idea was that users often call _context_randomize but if you start to think about it, it's pretty unclear what the benefits are. Also, even Core randomizes only once. So I think this deserves some discussion upstream. I had a conversation with @jonasnick about this today, and we should be able to open an issue upstream with some initial thoughts. To give you an idea why this is complicated:

If you're signing with the same key over and over and over again, afaiu, re-randomizing is something you should strive for,

I believe the better protection here is to use synthetic nonces k, i.e., basically k = hash(secret_key || message || randomness) (as done in BIP340). This is strictly better than deterministic nonces even you ignore sidechannels, and it gives you some pretty good protection against side-channel leakage. The problem with signing with the same key (and message) over and over is mostly the EC multiplication k*G with a fixed k. This is the only thing protected by the randomization in the context. But if you use synthetic nonces, the k is anyway different every time, and this makes it much much harder for the attacker.

BUT you can't use this trick when it comes to pubkey generation (computing the pubkey from the seckey). So maybe it would makes sense to randomize before every pubkey generation instead of after every signing.

If we're talking about a mutable-context-sign-method there's all kinds of things we can do

This is another thing that I think should be solved upstream, though I admit there's no clear timeline here. We want to change the context API, and we have a wishlist for doing so (bitcoin-core/secp256k1#780), which includes having mutable contexts that can be rerandomized automatically when useful. I don't think that it'll be a good idea to spend time on this in the bindings now. These are improvements where everyone could benefit, not only the Rust users.

TL;DR: I suggest to reconsider this after upstream made up their minds.

@Kixunil
Copy link
Collaborator

Kixunil commented Jul 11, 2022

@TheBlueMatt hmm, looks like those platforms are screwed already unless they turn off re-randomization. I take back my suggestion to make context non-Sync (yep, that's what I meant) - consumers can already do it themselves by using RefCell and forcing everyone to use it just hurts modularity. If RefCell becomes popular, we could just write additional wrapper. (And maybe we can use UnsafeCell instead?)

@TheBlueMatt
Copy link
Member

For context, on the LDK end, we re-randomize some, though maybe less than we should, but we do it via our own internal randomness interfaces, which are not global, and provided via a trait from the user. We do not assume any global randomness API, nor do we really want to add that assumption. FWIW, randomizing isn't all that hard.

@apoelstra
Copy link
Member

I recently learned that getrandom has a js feature that can be enabled if we know that we're compiling for WASM in a browser context where it has access to strong randomness. I believe that if we directly depend on getrandom rather than going through rand, and expose the js feature through this library, that might make everyone happy.

There remains the question of what to do when the target arch is wasm, getrandom is enabled but js is not enabled. I am tempted to issue a compilation error in this case.

@JeremyRubin
Copy link
Author

JeremyRubin commented Sep 19, 2022

what if we have rand stay an optional dep, and have js also an optional dep on getrandom and allow users to pick one or the other?

+1 on compiler error as mentioned!

@Kixunil
Copy link
Collaborator

Kixunil commented Sep 27, 2022

the only solution is to make the user pass the context

Not really, instead of passing a context to each individual function you could supply a custom "thread local" provider. E.g. disable interrupts and check it's not being run in one on embedded, return a reference to static on WASM...

// libsecp256k1
#[cfg(secp256k1_custom_thread_local)]
pub mod custom_thread_local {
    pub struct ThreadLocal(Context); 

    impl ThreadLocal {
        /// Initializes the thread-local data.
        ///
        /// This function is unsafe to implement and MUST NOT call secp256k1_get_thread_local nor secp256k1_get_thread_local
        pub fn new() -> Self { /* ... */ }
    }

    extern "Rust" {
        /// Provides a pointer to secp256k1-owned thread local.
        ///
        /// The implementor must return a valid pointer to `ThreadLocal` obtained by calling `ThreadLocal::new()`.
        /// The pointer must be valid until the caller calls `secp256k1_release_thread_local` or the current thread exists whichever occurs first.
        /// The caller MUST NOT dereference the pointer afterwards.
        /// The function is NOT reentrant: the caller may only call it again if it called `secp256k1_release_thread_local` first.
        ///
        /// Note that if this is called from an interrupt the implementor either has to return a separate instance or panic.
        /// If the interrupt itself can be interrupted it's probably best to just panic since each call would have to return a new, different instance
        /// and doing cryptographic operations in such interrupts is most likely a bad idea anyway.
        ///
        /// The function is both unsafe to call and unsafe to *implement*.
        #[no_mangle]
        pub unsafe fn secp256k1_get_thread_local() -> *const ThreadLocal;

        /// Called when the caller of `secp256k1_get_thread_local` no longer needs access to the thread local.
        ///
        /// This is usually a no-op but some platforms may implement "thread locals" as globals behind mutex if there is no better way or, on embedded, these may be implemented as disabling and re-enabling interrupts. (But see note above.)
        #[no_mangle]
    pub unsafe fn secp256k1_release_thread_local();
}

/// Provides safe access to the thread local.
pub(crate) fn with_thread_local<R, F: FnOnce(&ThreadLocal) -> R>() ->R {
    struct Guard;
    impl Drop for Guard {
        fn drop(&mut self) {
            unsafe { secp256k1_release_thread_local() }
        }
    }

    let ptr = secp256k1_get_thread_local();
    let _guard = Guard;
    fun(unsafe { &*ptr })
}

The root crate then sets the --cfg secp256k1_custom_thread_local and provides the symbols on less common platforms.

If you take existing code and compile for WASM, it'll work great in a threading context, just be randomly unsafe and panic-y.

I've read that there is some kind of access control to the memory so threads can't access the memory of each other unless specifically marked as shared which happens during compilation. Maybe my understanding is wrong though and if you can recommend good resources on the topic I'd love to learn!

@TheBlueMatt
Copy link
Member

The root crate then sets the --cfg secp256k1_custom_thread_local and provides the symbols on less common platforms.

I don't really buy this at all - requiring a --cfg from the root crate is just going to be missed because dependencies are transitive. Users generally aren't going to realize they need this, and even if they do their response is going to be to switch to another library, not fix it.

Still, I'm entirely convinced this is a sufficient issue to ever show up at all. Yes, if the user does cryptographic operations in an interrupt context then they need to do something, but the real solution is to just not do cryptographic operations in an interrupt context (or disable interrupts before they do so), not build a TLS API implementation that is really just a Mutex implementation (cause there's not any threads anyway).

I've read that there is some kind of access control to the memory so threads can't access the memory of each other unless specifically marked as shared which happens during compilation. Maybe my understanding is wrong though and if you can recommend good resources on the topic I'd love to learn!

Ah, I didn't read all the docs there, only looked briefly, so I'm likely wrong. I suppose this means rustc will need a whole new target for wasm-with-locking? ISTM this means you can't combine wasm-threaded and non-wasm-threaded code at all?

@Kixunil
Copy link
Collaborator

Kixunil commented Sep 28, 2022

is just going to be missed

Compiler errors are pretty hard to miss, I think.

response is going to be to switch to another library, not fix it.

To which library? AFAIK there isn't one with this level of rigor. The thing here is unfixable. The best alternative we could do is add them as normal features and document that library crates must not depend on them but I suspect some will not notice it.

but the real solution is to just not do cryptographic operations in an interrupt context

It is but the crate has to remain sound and panic is better than deadlock. But it just occurred to me that even simpler embedded implementation is if is_this_interrupt() { panic!("attempt to do crypto in interrupt"); } else { STATIC_CTX.ptr() }. The problem is we can't implement is_this_interrupt so we still need user to provide the checking function or just simply ban reentrancy but that should not be the default.

rustc will need a whole new target for wasm-with-locking?

IIRC currently there is -C target-feature=+atomics (std still needs to be rebuilt)

you can't combine wasm-threaded and non-wasm-threaded code at all?

Perhaps you can if they communicate over something like sockets but they can't access each-others memory.

@TheBlueMatt
Copy link
Member

TheBlueMatt commented Sep 28, 2022

Compiler errors are pretty hard to miss, I think.

I think we've ended up on very different pages. I don't understand how you make it a compiler error (unless you require all downstream crates to provide an implementation, at which point even I might stop using this crate lol).

To which library? AFAIK there isn't one with this level of rigor. The thing here is unfixable

We've seen many many times people use a number of other crates that are much less rigorous to work around various usability issues. We can't discount this.

It is but the crate has to remain sound and panic is better than deadlock.

I dunno if I buy this? Sometimes people run things threaded and it's fine. If it deadlocks users will see it just as much as they'd see a panic, it just happens to work correctly sometimes in addition.

IIRC currently there is -C target-feature=+atomics (std still needs to be rebuilt)

Right, so basically a different target, just in a broken way (eg I can link with C code with clang and break it, it seems...).

@Kixunil
Copy link
Collaborator

Kixunil commented Sep 28, 2022

unless you require all downstream crates to provide an implementation

Just the root crate. If there isn't any of the listed cfgs and the crate is no_std we emit compile_error!, with cfg we can do some limited things (linking to pthread) or just let users implement it themselves. And if there are popular implementations they can be turned into a crates so people don't have to reinvent the wheel.

We've seen many many times people use a number of other crates that are much less rigorous to work around various usability issues.

As I said, I want an example to be able to judge that this kind of usability issue would be a likely to push people away.

If it deadlocks users will see it just as much as they'd see a panic

There is a crazy huge difference between an app being randomly stuck (which of the dependncies caused it? How do I even get a stacktrace on embedded?) and a clear message "You called x from an interrupt which is forbidden". I've spent days with a colleage debugging deadlocks in a desktop application which is already easier than embedded. Reading a message takes two seconds, not days, which makes it literally tens of thousands times faster to debug.

just happens to work correctly sometimes in addition.

"Works correctly sometimes" (Usually in test env) and then randomly deadlocks (usually in production) is the worst thing you can get. A reliably crashing application will not get past QA in any serious company, so the end users won't see anything. With a clear message the developers will spend much less time figuring out what's wrong.

eg I can link with C code with clang and break it, it seems

I guess, interesting. From what I've read they plan to improve it but it's difficult because of poor WASM threading spec.

@TheBlueMatt
Copy link
Member

Just the root crate. If there isn't any of the listed cfgs and the crate is no_std we emit compile_error!

Yea, I think that's annoying enough even we'd stop using this crate.

There is a crazy huge difference between an app being randomly stuck (which of the dependncies caused it? How do I even get a stacktrace on embedded?) and a clear message "You called x from an interrupt which is forbidden". I've spent days with a colleage debugging deadlocks in a desktop application which is already easier than embedded. Reading a message takes two seconds, not days, which makes it literally tens of thousands times faster to debug.

I mean either way you get a backtrace? Yea, deadlocks are hard to debug, but that's because of having to read the traces and reconstruct the flow in your head. A panic-on-deadlock mutex is just as hard to debug as the deadlock itself - 95% of the time you can gdb and get a trace, even (when testing) on embedded.

In any case I don't think we're gonna agree here - I still don't buy in the slightest that the prevalence of calling-in-interrupt-context uses are more than threading-with-nostd. Worse, rust is already really painful in interrupt contexts - it totally breaks the borrow checking correctness and rustc may incorrectly optimize because of it, not to mention other crates that have interior mutability. Unless there's some common, agreed upon way to handle this, I don't really see how it's "our problem", aside from ensuring we don't end up with undefined behavior.

@TheBlueMatt
Copy link
Member

I guess, interesting. From what I've read they plan to improve it but it's difficult because of poor WASM threading spec.

Oof. Okay so maybe it's not worth trying to predict where they'll go then.

@Kixunil
Copy link
Collaborator

Kixunil commented Sep 28, 2022

even we'd stop using this crate.

What you would've used instead? How does it solve the problems we're trying to solve here?

A panic-on-deadlock mutex is just as hard to debug as the deadlock itself

That's why I don't propose that, I propose to give people options including "panic if this is called from interrupt even without actual deadlock".

I still don't buy in the slightest that the prevalence of calling-in-interrupt-context uses are more than threading-with-nostd.

Me neigher, I just want to do all I can to not sacrifice one platform to the other. As you've noted some threading-with-nostd platforms have pthread. That's good enough for those I think. Aside from embedded, which exact common no_std platforms that we didn't mention yet don't have pthread?

it totally breaks the borrow checking correctness and rustc may incorrectly optimize because of it

I don't believe this is possible unless you incorrectly use UnsafeCell, static mut (which shouldn't be used anyway), pointers or transmutes. The correctness rules are not affected by how exactly interrupting code at arbitrary point and running different code of the same program happens - be it interrupts or threads. AFAIK the embedded folks already solved these issues.

I don't really see how it's "our problem"

When explaining rust-bitcoin ecosystem to others I'd really hate to say "It's really great, polished, we did all we could do to make mistakes unlikely but there's an exception that if you use it in embedded and accidentally call a function from an interrupt you may randomly get deadlocks and because it depends on luck QA is not guaranteed to catch them." I think I'd rather say "Embedded is unsupported because it's very difficult. Read issue 346 to learn more."

@TheBlueMatt
Copy link
Member

What you would've used instead? How does it solve the problems we're trying to solve here?

C directly without a static context, probably. It is completely and totally unacceptable for us to force downstream users to implement some TLS logic just because they want to use no-std.

That's why I don't propose that, I propose to give people options including "panic if this is called from interrupt even without actual deadlock".

My point is there's no different in difficulty to debug, but, sure, if you want a debug feature for "panic if you see threading", fine?

Me neigher, I just want to do all I can to not sacrifice one platform to the other. As you've noted some threading-with-nostd platforms have pthread. That's good enough for those I think.

Yea offering an option to implement your own TLS wrapper or whatever seems fine, though I'm honestly a bit dubious anyone will use it, and I do feel strongly that the "default" absolutely must work totally fine without effort for such users. A little performance loss is tolerable, things panicking or refusing to compile is not.

Aside from embedded, which exact common no_std platforms that we didn't mention yet don't have pthread?

SGX may be relevant here. Not sure how all the wrappers work but something we do for WASM is compile in no-std to ensure we don't have to use any "surprising" shims once we go load it in JS.

I don't believe this is possible unless you incorrectly use UnsafeCell, static mut (which shouldn't be used anyway), pointers or transmutes. The correctness rules are not affected by how exactly interrupting code at arbitrary point and running different code of the same program happens - be it interrupts or threads. AFAIK the embedded folks already solved these issues.

I believe rustc is back to setting noalias on mut references, which is incorrect if you have an interrupt, no? I suppose if you do proper locking it's okay (but you can't be holding a lock going in to an interrupt) but at that point you're definitely not doing any crypto operations in interrupt contexts :p

When explaining rust-bitcoin ecosystem to others I'd really hate to say "It's really great, polished, we did all we could do to make mistakes unlikely but there's an exception that if you use it in embedded and accidentally call a function from an interrupt you may randomly get deadlocks and because it depends on luck QA is not guaranteed to catch them." I think I'd rather say "Embedded is unsupported because it's very difficult. Read issue 346 to learn more."

I don't thing anything proposed here is a solution, aside from "lol we don't compile for no-std unless the user trying to use a crate that depends on a dependency of a dependency of a dependency writes a bunch of delicate unsafe code". If this is any kind of substantial concern then we cannot move to static contexts.

I still don't really see an issue with "we support everything except calling cryptographic operations in parallel from a single thread". And if we do that I don't see why we need to sometimes-panic vs sometimes-deadlock.

@Kixunil
Copy link
Collaborator

Kixunil commented Sep 28, 2022

C directly without a static context

So you provide the context in each call? Isn't it more annoying than having to activate a special feature once?

to force downstream users to implement some TLS logic just because they want to use no-std.

I wouldn't mind keeping the current functions requiring the context argument. 🤷‍♂️ Sadly, I think library crates will be lazy and not use them.

to implement some TLS logic

I don't find "implement" an accurate description. More like "If you are on platforms x, y, z you can activate this feature and not worry about it, if you are on embedded without OS you need to provide the implementation or check these community crates providing it for common MCUs."

debug feature for "panic if you see threading", fine?

If it's considered a debug feature then perhaps also record core::panic::Location of the caller currently using the context and use track_caller. :) And have a warning about potential deadlocks at the appropriate place. (still "which of my 10 dependncies deadlocked?" is a problem, maybe unsolvable)

the "default" absolutely must work totally fine without effort

I don't think this is possible in general if they also want static context. But if there is still a way to get randomness we could do:

if GLOBAL_CTX_IS_USED.compare_exchange(false, true, Acquire, Relaxed) {
    let mut ctx = Secp256k1::new_randomized(GLOBAL_RNG.get_32_b());
    do_something_with_ctx(&mut ctx);
} else {
    let set_unused_on_drop = SetUnusedOnDrop; // guard doing `GLOBAL_CTX_IS_USED.store(Release)` on drop
    // SAFETY: we just checked nothing else uses this
    do_something_with_ctx(unsafe { &mut *GLOBAL_CTX.get_mut() })
}

Then the default is a performance hit as you proposed re-creating randomized context if there are actually "threads". Maybe we could log a warning somehow though.

SGX

Oh, interesting, I will have to check their API/docs to see what's possible. Do you have some recommended reading material?

I believe rustc is back to setting noalias on mut references, which is incorrect if you have an interrupt, no?

It's only incorrect if the said interrupt is creating a mutable reference to the same memory. This is already banned in Rust anyway and it's why you have to go through UnsafeCell and implement some synchronization if you actually want two things to hold those references - e.g. disabling interrupts in case of embedded.

writes a bunch of delicate unsafe code

... or sets a correct flag depending on their platform.

calling cryptographic operations in parallel from a single thread

That's not even physically possible. :) Our static-context functions will not call other static-context functions (while holding &mut ThreadLocal)

@apoelstra
Copy link
Member

This is a fascinating conversation but I'll just remind people that randomization is a defense-in-depth feature that is not needed for security or correctness, and my stance is that we should just disable it for wasm and maybe expose a js feature to recover it if users think the js rng is usable and ok.

I think this has gotten a bit past WASM to a general "platforms without TLS" ... okay, then if we have std let's use TLS and if we don't we'll just disable randomization on the static context.

@TheBlueMatt
Copy link
Member

So you provide the context in each call? Isn't it more annoying than having to activate a special feature once?

No - from the LDK perspective that's something we have to do, activating the special feature is something our users have to do. The first is acceptable, the second is not at all, at least not to me. I think this is kinda a big sticking point here - I absolutely refuse to use a library that causes our (LDK's) downstream users to do something with flags or whatever to get it to build.

the "default" absolutely must work totally fine without effort
I don't think this is possible in general if they also want static context.

I believe my proposed "just use a spinlock for no-std" solution works totally perfectly 99.9% of the time - for everyone but the "multi-threaded within the same thread" case, which is kinda a nutso case to care about.

calling cryptographic operations in parallel from a single thread
That's not even physically possible. :)

No, but that's exactly what an interrupt + secp call is from our perspective, even if its not what's going on physically.

@TheBlueMatt
Copy link
Member

we should just disable it for wasm and maybe expose a js feature to recover it if users think the js rng is usable and ok.

Yea, I do kinda wonder if there isn't a decent way to get this without relying on rand. Its kinda unclear to me what the "right" API would be for "no, I do really have a RNG, but I also don't have TLS", but I do think we should at least try to build that?

I think this has gotten a bit past WASM to a general "platforms without TLS" ... okay, then if we have std let's use TLS and if we don't we'll just disable randomization on the static context.

Heh, right...sounds good to me.

@apoelstra
Copy link
Member

Yea, I do kinda wonder if there isn't a decent way to get this without relying on rand. Its kinda unclear to me what the "right" API would be for "no, I do really have a RNG, but I also don't have TLS", but I do think we should at least try to build that?

If it's purely the rand dependency then we can use the secret key as entropy and only support re-randomizing when signing (which, arguably, is the only time that we need to rerandomize). Maybe we should make this change, to at least deal with the current wasm/js issue.

But this still leaves open the issue of having a mutable context at all, and what synchronization we need to layer onto that.

@TheBlueMatt
Copy link
Member

If it's purely the rand dependency then we can use the secret key as entropy and only support re-randomizing when signing (which, arguably, is the only time that we need to rerandomize). Maybe we should make this change, to at least deal with the current wasm/js issue.

ACK. Just to double-check, there's no concern with the randomization process itself being non-constant-time and leaking bits? Should we instead consider randomizing with the signature itself (or is it useless given the signature is public?)

But this still leaves open the issue of having a mutable context at all, and what synchronization we need to layer onto that.

Heh, now we're back where we started. I still favor a trivial spinlock for no-std with a feature or two to debug deadlocks or provide custom TLS.

@apoelstra
Copy link
Member

ACK. Just to double-check, there's no concern with the randomization process itself being non-constant-time and leaking bits? Should we instead consider randomizing with the signature itself (or is it useless given the signature is public?)

We should take a tagged hash of the secret key -- the concern is not constant-timeness but rather that the re-randomization internally computes a curvepoint (i.e. the public key) and so the resulting re-randomization wouldn't be a secret. And yes -- to be effective the rerandomization must be secret, so we can't use the signature itself.

In our discussions about doing this upstream we observed that when generating a signature, there is a secret bit (the original parity of R) that we just throw away, which we could use to do one bit of re-randomization per signature. Then unless we had such an egregious sidechannel that an attacker could learn information from observing only a couple signatures, we'd get a very fast form of automatic re-randomization.

The hold-up there is (a) upstream having the same kind of API fights that we're having, re the signing function mutating the context object, and (b) trying to formalize exactly what we hope to accomplish, to understand whether 1-bit of rerandomization is effective.

Heh, now we're back where we started. I still favor a trivial spinlock for no-std with a feature or two to debug deadlocks or provide custom TLS.

I tend to agree with you, but I see @Kixunil's point. Personally what I'd like is to just implement this with atomics ... but I need upstream support for this which would be difficult to get because of how badly it'd break their abstraction.

One alternative we haven't considered is to use a variant of a spirlock where after spinning a couple of times we just give up and don't rerandomize. @Kixunil how would you feel about that? This would prevent any possibility of deadlocking, still work in the vast majority of cases. We could even add another atomic to allow run-time configuring the number of times to run before giving up, though it's hard for me to imagine anybody understanding or using that capability..

@Kixunil
Copy link
Collaborator

Kixunil commented Sep 29, 2022

after spinning a couple of times we just give up and don't rerandomize.

This is quite interesting! It's still silently doing things one may not want so I'd still prefer to give debug options and ideally also thread local API. I was burnt badly by code silently swallowing errors way too many times to just ignore it. I wonder if it'd be acceptable to have an optional dependency on the log crate and logging warnings. It has a very conservative MSRV policy and is maintained by the rust-lang team.

Regarding the number of spins I would actually set it to zero. If there is contention on presumably single-threaded platform then the culprit is most likely reentrancy and there's no point in spinning. Note that I still want pthread or std or whatever means to get TLS and this should be used as last resort only.

Also anything surprising like this should be very visibly documented so that anyone who requires randomization knows about it. And ideally I'd love to see some "don't compile dangerous configurations" option somehow. This doesn't help if the crate is a dependency of a dependency but perhaps the consumers who don't audit don't care about this anyway, so it's OK?

@apoelstra
Copy link
Member

If log is useful by itself then I wouldn't mind having an optional dependency on it. IIRC you usually need some other crate to actually use it and those other crates are a bit less stable.

Regarding the number of spins I would actually set it to zero. If there is contention on presumably single-threaded platform then the culprit is most likely reentrancy and there's no point in spinning. Note that I still want pthread or std or whatever means to get TLS and this should be used as last resort only.

SGTM.

Also anything surprising like this should be very visibly documented so that anyone who requires randomization knows about it.

Sure, we can document it, though even as an author of libsecp I'm having trouble imagining anybody who would require randomization..

And ideally I'd love to see some "don't compile dangerous configurations" option somehow

We can do this with an additional cfg param, but with cargo features I think it'd be too disruptive since it would prevent people composing dependency trees.

@Kixunil
Copy link
Collaborator

Kixunil commented Sep 29, 2022

I'm having trouble imagining anybody who would require randomization..

Oh, doesn't it imply it's not actually valuable enough to have it?

additional cfg param, but with cargo features I think it'd be too disruptive

Indeed. It's a shame this wasn't addressed in Cargo yet.

@TheBlueMatt
Copy link
Member

I was burnt badly by code silently swallowing errors way too many times to just ignore it.

Please don't hyperbolize - deadlocking is not "silently swallowing errors" - it very clearly notifies users something is very wrong.

I wonder if it'd be acceptable to have an optional dependency on the log crate and logging warnings. It has a very conservative MSRV policy and is maintained by the rust-lang team.

If someone is paying enough attention to know to add the log dependency, I think they're paying enough attention to also just provide a TLS implementation, but, sure.

Regarding the number of spins I would actually set it to zero.

So, in other words, just panicking immediately? You started with "okay" and then ended at "No". :p

I think we should look more towards "one minute of spins". We're talking about panicking as a way to give developers more immediate feedback than having to gdb, which isn't exactly a huge thing, but I agree it's nice. What we absolutely shouldn't do here is pick a number too low and end up panicking at a very slight bit of contention.

If there is contention on presumably single-threaded platform then the culprit is most likely reentrancy and there's no point in spinning. Note that I still want pthread or std or whatever means to get TLS and this should be used as last resort only.

I don't think it does. I think it's far more likely that we're actually running threaded with no-std. but we've already beaten that horse to death above.

"don't compile dangerous configurations"

Again, please don't hyperbolize here, none of the discussed options are "dangerous". They're only a tradeoff between working in some cases and always panicking.

@apoelstra
Copy link
Member

@TheBlueMatt we're no longer talking about panicking, but instead just not randomizing.

Oh, doesn't it imply it's not actually valuable enough to have it?

I mean, we should support it when it's low-effort. It's defense in depth. I can't quantify its value. My point is that nobody else can quantify it either, so nobody "needs" it, and we definitely are not going to have people using such obscure options that even I can't tell what their purpose is.

@TheBlueMatt
Copy link
Member

@TheBlueMatt we're no longer talking about panicking, but instead just not randomizing.

Oh! I totally misread your comment - I thought you were suggesting "spin for a while and then panic". Yea, I'm much more okay with not spinning at all, then. Given we'll still randomize some in the face of contention I don't see why we need to think too hard about this. Logging about it seems fine and letting people debug-flag to see it also does, but given the ill-defined threat model it s eems hard to get too upset about it.

@Kixunil
Copy link
Collaborator

Kixunil commented Sep 30, 2022

deadlocking is not "silently swallowing errors"

I meant the library compiling without complaining but:

we should support it when it's low-effort.

is a nice way to think about it and I can finally understand it better. Although the question is "what considered low-effort?"
std with thread local definitely yes, pthread - IDK, maybe? We need to detect it being available and I suspect the only reliable way is to attempt to compile a trivial piece of code using it. No-rerandomize fallback definitely. A feature to debug it also. Thread local extern interface looks simple enough to me as well - trivial on our side and trivial for consumers whose platforms provide thread_local(Key) -> *const X.

For logging, I guess it could be used elsewhere too. That'd make it more justified.

@apoelstra
Copy link
Member

apoelstra commented Sep 30, 2022

IMHO "low-effort" means

  • with std we use TLS
  • without std we use the "sorta-spinlock which skips randomization if there is any contention at all"

I think pthreads is a bit too much, since as you say it'll compilcate the build system. I'd be fine with adding logging though if you want.

@Kixunil
Copy link
Collaborator

Kixunil commented Oct 7, 2022

OK, I on board with doing this (with appropriate documentation; including logging) initially and leaving the rest until someone actually asks for more.

apoelstra added a commit to apoelstra/rust-secp256k1 that referenced this issue Nov 25, 2022
This covers both the std and no-std cases. This is perhaps an unconventional
strategy; see rust-bitcoin#346 for
discussion leading up to it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants