BIP39 Implementation #644

evanlinjin · 2022-06-29T09:17:04Z

Description

This is a continuation of PR #607 which closes #561

This PR includes commits for own implementation of PBKFD2.
I've also modified the .gitignore, I hope that is okay.

Notes to the reviewers

Although complete, I still have some security concerns for the current implementation (please check my comment below).

Checklists

All Submissions:

I've signed all my commits
I followed the contribution guidelines
I ran cargo fmt and cargo clippy before committing

New Features:

Add pbkfd2 implementation
I've added docs for the new feature
I've updated CHANGELOG.md

danielabrozzoni

Copy-pasted some review comments from #607 that it seems still need to be addressed

CHANGELOG.md

src/keys/bip39/mod.rs

danielabrozzoni · 2022-06-30T10:23:11Z

src/keys/bip39/mod.rs

+    }
+
+    /// Convert a mnemonic to a seed with an optional passphrase
+    fn to_seed(&self, passphrase: Option<String>) -> Seed {


From #607:

@vladimirfomene: It might not be a good idea to change the type of passphrase from &str to option as that has the potential of breaking code which consumes this method.

@atalw: It makes sense to have the passphrase as Option as it really is optional, so if a breaking change is okay we can go ahead with this.

Personally, I agree that we should try not to break the API, and leave P: Into<Cow<'a, str> (https://docs.rs/bip39/latest/src/bip39/lib.rs.html#479-486) here

In my opinion, having a P: Into<Cow<'a, str>> makes little sense based on our implementation. I propose just using a &str, this way for most people, the API shouldn't break.

Changed in dd42594. Let me know if it is sufficient!

bdk/src/keys/bip39/mod.rs

Lines 204 to 208 in dd42594

pub fn to_seed(&self, passphrase: &str) -> Seed {

let mut seed = [0_u8; SEED_LEN];

pbkdf2::generate_seed(self.word_iter(), passphrase, &mut seed);

seed

}

There are a couple of features that have been implemented in this commit: - Parse mnemonic string to Mnenomic type - Generate Mnemonic from entropy - Derive seed from Mnemonic (with and without passphrase) - All language wordlists (with verification test to ensure they were untampered) - Mnemonic test vectors from BIP39 - Error handling Function names have mostly been kept the same to maintain backwards compatability. Co-authored-by: Vladimir Fomene <vladimirfomene@gmail.com>

* `Mnemonic::parse_in` now verifies the checksum against the entropy. * Add test: Make sure `from_entropy_in` produces error if length of entropy bits is less than 128, greater than 256 and not a multipe of 32. * Add test: invalid mnemonic sentence.Throw error if mnemonic sentence is less than 12 words or greater than 24 words or number of words is not a multiple of six or the contains a word not in wordlist or has invalid checksum.

Also removed unused dependencies

* Introduce `Language::word_map` method for faster word index finding. * Readability changes to various `Mnemonic` methods and tests. * Re-introduce various methods back into `Mnemonic`. * `bip39::Error` no longer includes sensitive information.

evanlinjin · 2022-07-02T18:43:47Z

I still have some security concerns regarding the current implementation (although, I am no security expert, just based on what I've read on the internet). I will list them here, and hopefully someone knowledgeable enough could provide some clarity.

Should we avoid using heap memory, and keep everything on the stack? Apparently, heap memory is more prone to exploints. Reference: github.com/shellphish/how2heap
Is implementing std::fmt::Display and Debug a good idea? As it may leak the secrets to logs. We can potentially remove these implementations completely, or provide implementations with redacted secrets, or use an a crate such as secrey.

Thank you all in advance!

P.S. Test blockchain::esplora::bdk_blockchain_tests::test_sync_stop_gap_20 seems to fail occasionally.

Fixes: * Fixed implementations of `GeneratableKey` to work for all word lengths * Fixed example in `rpcwallet` Changes: * Added various `derive`s for bip39 structures * Added `Mnemonic::with_passphrase` method * Added `TryFrom<uszie>` implementation for `WordCount` * Introduced `Bip39TestVector` struct for more comprehensive testing * Various refactoring CI/CC Changes: * Added `all-languages` feature to `[package.metadata.doc.rs]` * Added `all-languages` feature to code coverage and CI tests

danielabrozzoni · 2022-07-03T16:28:34Z

Should we avoid using heap memory, and keep everything on the stack? Apparently, heap memory is more prone to exploints. Reference: github.com/shellphish/how2heap

We could look into that, but it's better if we do so in a new PR. This one is already quite big, and the bigger it gets, the more difficult it is to collect reviews :)

Is implementing std::fmt::Display and Debug a good idea? As it may leak the secrets to logs. We can potentially remove these implementations completely, or provide implementations with redacted secrets, or use an a crate such as secrey.

This, instead, I think should be tackled here: for now, avoiding Debug and Display (or manually implementing a really generic one) should be enough (with an appropriate comment on why we do so). I'd avoid adding YA dependency :)

evanlinjin · 2022-07-03T16:45:53Z

Another aspect I've been thinking about, is the great majority of the time people will be using English (which shouldn't require Unicode normalization). For the passphrase, we can do a check only (and fail if not normalized).

Since normalization sometimes requires resizing the vector (so it's a heap operation), and for most people, it also means one less dependably.

evanlinjin · 2022-07-04T03:35:29Z

This, instead, I think should be tackled here: for now, avoiding Debug and Display (or manually implementing a really generic one) should be enough (with an appropriate comment on why we do so). I'd avoid adding YA dependency :)

Addressed in 08e1cc9.

`Mnemonic` contains sensitive data so we should ensure internal fields are not easily leaked. * Explicitly implement `fmt::Debug` and redact all fields. * Explicitly implement `ToString` instead of `Display`. * Remove various comparative `derive()`s.

afilini · 2022-07-07T20:25:58Z

Should we avoid using heap memory, and keep everything on the stack?

One advantage of this (on top of the extra safety) is that it would be much easier to then port to embedded hardware. We have many features in bdk which I guess are not really fit for hardware wallets, but mnemonics are for sure one that we'll need to have.

Are you able to do a rough estimation of how much longer/how much harder it would be to implement in this way?

afilini

This is just a partial review, I still haven't looked at all the files.

I just wanted to post this comments so that you could start thinking about them and see if they make any sense.

afilini · 2022-07-07T20:29:40Z

src/keys/bip39/pbkdf2.rs

+/// Password is the UTF8-NFKD-normalized result of mnemonic words separated by space.
+fn make_password<'a, W>(words: W) -> String
+where
+    W: Iterator<Item = &'a str>,


You could define Item as another generic that implements AsRef<str>. This should allow you to pass a vec of strings as well if you want

afilini · 2022-07-07T20:41:29Z

src/keys/bip39/pbkdf2.rs

+
+/// Salt is the UTF8-NFKD-normalized result of (SALT_PREFIX + passphrase).
+fn make_salt(passphrase: &str) -> Cow<'static, str> {
+    let mut salt = Cow::from(SALT_PREFIX);


I understand the small performance benefit of reusing the ref as-is, but I don't think it's worth using Cow here especially considering that moving forward we'd like to avoid using the heap (even if we don't manage to finalize that transition in this PR)

afilini · 2022-07-07T20:51:30Z

src/keys/bip39/pbkdf2.rs

+/// Make hmac-sha512 engine from password.
+/// The hmac engine is used as the pseudo-random function.
+fn make_prf(password: &str) -> HmacPRF {
+    HmacEngine::new(password.as_bytes())


I was gonna comment here that for extra safety we should re-normalize the string, but then I realized: wouldn't it be better to immediately convert strings to &[u8] immediately after normalization?

This would be like a marker for us, anything that's &str or similar is potentially not normalized, but as soon as we are done we just convert to bytes and forget about it.

With this change I guess you would make this function take a &[u8] directly, and do the conversion in the caller which as far as I can see is already normalizing correctly.

afilini · 2022-07-07T20:59:26Z

src/keys/bip39/pbkdf2.rs

+/// Generate block (of given block_index) by calculating xor sum of iterations of PRF.
+fn xor_sum(hmac_prf: &HmacPRF, salt: &str, iter_count: u32, block_index: u32, block: &mut [u8]) {
+    // for the first iteration, we concat: salt + block_index (as big-endian bytes)
+    let mut prev_u = Vec::with_capacity(salt.len() + 4);


Since the length is fixed i guess this could also be an array with static length (there should be a constant in bitcoin_hashes for this).

I think the code would probably still look decently good with copy_from_slice: https://doc.rust-lang.org/std/primitive.slice.html#method.copy_from_slice

evanlinjin · 2022-07-07T21:11:29Z

Should we avoid using heap memory, and keep everything on the stack?

One advantage of this (on top of the extra safety) is that it would be much easier to then port to embedded hardware. We have many features in bdk which I guess are not really fit for hardware wallets, but mnemonics are for sure one that we'll need to have.

Are you able to do a rough estimation of how much longer/how much harder it would be to implement in this way?

Less than a week. But I'm stuck into multi descriptor wallet business 😅😂

afilini · 2022-07-08T09:50:42Z

Yes, multi-descriptor is definitely the priority right now. We'll get back to this once you are done there :)

vladimirfomene

Thanks for helping us move this forward! Just a couple of questions and comments.

vladimirfomene · 2022-07-08T15:22:55Z

.gitignore

@@ -3,3 +3,6 @@ Cargo.lock

 *.swp
 .idea
+
+# IDE


Nits: .idea is for IntelliJ. I don't know if it is necessary to have that IDE comment.

vladimirfomene · 2022-07-08T15:31:32Z

src/keys/bip39/mod.rs

+        // parse word indexes and ENT+CS bits from mnemonic words
+        let parse_result = sentence_words
+            .iter()
+            .map(|&word| word_to_index_map.get(word).unwrap_or(&utils::U11_EOF))


Why not throw an invalid word error here if the word is not in the HashMap? What is the utility of having &utils::U11_EOF as the default value here?

vladimirfomene · 2022-07-08T15:52:55Z

src/keys/bip39/mod.rs

+        let mut word_indexes = Vec::with_capacity(MS_MAX); // word indexes
+        let mut ent_cs_bits = Vec::with_capacity(MS_MAX * utils::U11_BITS); // ENT+CS bits


why not just use ms as your vector size instead of MS_MAX?

vladimirfomene · 2022-07-08T15:58:00Z

src/keys/bip39/mod.rs

+            .iter()
+            .map(|&word| word_to_index_map.get(word).unwrap_or(&utils::U11_EOF))
+            .try_for_each(|word_index| {
+                if *word_index > utils::U11_MAX {


Given that you are getting the word_index from the word_map is there a scenario where the word_index will be greater than utils::U11_MAX. I'm thinking if you throw an error for invalid words there will be no need for this if/else logic.

vladimirfomene · 2022-07-08T16:07:53Z

examples/rpcwallet.rs

+    let mnemonic_with_passphrase: GeneratedKey<_, _> =
+        MnemonicWithPassphrase::generate((WordCount::Words12, Language::English, password))?;
+    Ok(mnemonic_with_passphrase)


Given that there is a change in the examples, I believe this will affect users. Is it possible to implement the BIP in such a way that it doesn't change anything for users.

vladimirfomene · 2022-07-08T16:12:22Z

src/keys/bip39/pbkdf2.rs

+// or http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
+// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your option.
+// You may not use this file except in accordance with one or both of these
+// licenses.


It will be great to have a reference to the PBKDF2 RFC link as part of this module's documentation. https://datatracker.ietf.org/doc/html/rfc2898

vladimirfomene · 2022-07-08T16:16:16Z

src/keys/bip39/wordlists/mod.rs

+    }
+
+    /// Generate word map for given language.
+    pub fn word_map(&self) -> HashMap<&str, u16> {


can we write a test for this method?

What kind of test are you suggesting?

I was thinking of writing a test to make sure we have correct word to indices mapping in the hashmap.

rajarshimaitra · 2022-10-15T07:54:30Z

Is this a good idea to have it in bdk_core eventually? Or we wanna do key generation outside of core separately?

danielabrozzoni · 2023-03-16T17:11:22Z

We closed #561, let's close this one as well :)

evanlinjin mentioned this pull request Jun 29, 2022

BIP39 implementation #607

Closed

9 tasks

evanlinjin force-pushed the bip-0039 branch 2 times, most recently from 05b53d1 to 61b7dea Compare June 29, 2022 14:24

evanlinjin changed the title ~~WIP: pbkfd2 implementation for BIP39~~ BIP-39 Implementation (with own PBKFD2) Jun 29, 2022

evanlinjin changed the title ~~BIP-39 Implementation (with own PBKFD2)~~ BIP39 Implementation (with own PBKFD2) Jun 29, 2022

evanlinjin force-pushed the bip-0039 branch 2 times, most recently from dce62db to 2136bd2 Compare June 29, 2022 14:34

evanlinjin marked this pull request as ready for review June 29, 2022 14:35

evanlinjin force-pushed the bip-0039 branch 2 times, most recently from 82da908 to 29ae147 Compare June 29, 2022 15:02

MOVEONLY: Move bip39.rs to bip39/mod.rs

e4a1ccf

evanlinjin force-pushed the bip-0039 branch 3 times, most recently from d4f35e7 to 0b5c558 Compare June 29, 2022 17:25

danielabrozzoni reviewed Jun 30, 2022

View reviewed changes

evanlinjin force-pushed the bip-0039 branch 5 times, most recently from b003391 to 8952808 Compare July 1, 2022 12:43

evanlinjin requested a review from danielabrozzoni July 1, 2022 12:44

atalw and others added 5 commits July 1, 2022 20:46

BIP39: Add feature flags for mnemonic languages

94189cf

.gitignore for vscode

8415e0b

BIP39: Add own PBKFD2 implementation

59f41dc

Also removed unused dependencies

evanlinjin force-pushed the bip-0039 branch from 8952808 to dd42594 Compare July 1, 2022 12:51

evanlinjin changed the title ~~BIP39 Implementation (with own PBKFD2)~~ BIP39 Implementation Jul 1, 2022

evanlinjin force-pushed the bip-0039 branch 2 times, most recently from 2399c6b to 75fa8f6 Compare July 2, 2022 02:36

evanlinjin force-pushed the bip-0039 branch 3 times, most recently from c4334a1 to cdb3e25 Compare July 2, 2022 18:23

evanlinjin force-pushed the bip-0039 branch from cdb3e25 to 464a729 Compare July 2, 2022 18:57

evanlinjin force-pushed the bip-0039 branch from 3594f2d to 08e1cc9 Compare July 4, 2022 03:46

notmandatory added the new feature New feature or request label Jul 4, 2022

notmandatory assigned evanlinjin Jul 4, 2022

notmandatory mentioned this pull request Jul 5, 2022

W27 BDK Library Team Call bitcoindevkit/.github#12

Closed

24 tasks

afilini reviewed Jul 7, 2022

View reviewed changes

vladimirfomene reviewed Jul 8, 2022

View reviewed changes

tnull mentioned this pull request Sep 7, 2022

Make wallet entropy source configurable lightningdevkit/ldk-node#14

Closed

ConorOkus mentioned this pull request Sep 9, 2022

Add ability to retrieve private master key from a BIP39 seed bitcoindevkit/bdk-ffi#188

Closed

thunderbiscuit mentioned this pull request Nov 2, 2022

Added Mnemonic Interface bitcoindevkit/bdk-ffi#219

Merged

5 tasks

notmandatory mentioned this pull request Dec 19, 2022

Write own BIP39 implementation #561

Closed

danielabrozzoni closed this Mar 16, 2023

	pub fn to_seed(&self, passphrase: &str) -> Seed {
	let mut seed = [0_u8; SEED_LEN];
	pbkdf2::generate_seed(self.word_iter(), passphrase, &mut seed);
	seed
	}

@@ @@ -3,3 +3,6 @@ Cargo.lock @@
               *.swp
               .idea
+              # IDE

		let mut word_indexes = Vec::with_capacity(MS_MAX); // word indexes
		let mut ent_cs_bits = Vec::with_capacity(MS_MAX * utils::U11_BITS); // ENT+CS bits

BIP39 Implementation #644

BIP39 Implementation #644

Uh oh!

Conversation

evanlinjin commented Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Notes to the reviewers

Checklists

All Submissions:

New Features:

Uh oh!

danielabrozzoni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evanlinjin Jul 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evanlinjin commented Jul 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielabrozzoni commented Jul 3, 2022

Uh oh!

evanlinjin commented Jul 3, 2022

Uh oh!

evanlinjin commented Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afilini commented Jul 7, 2022

Uh oh!

afilini left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evanlinjin commented Jul 7, 2022

Uh oh!

afilini commented Jul 8, 2022

Uh oh!

vladimirfomene left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rajarshimaitra commented Oct 15, 2022

Uh oh!

danielabrozzoni commented Mar 16, 2023

Uh oh!

Uh oh!

evanlinjin commented Jun 29, 2022 •

edited

Loading

evanlinjin Jul 1, 2022 •

edited

Loading

evanlinjin commented Jul 2, 2022 •

edited

Loading

evanlinjin commented Jul 4, 2022 •

edited

Loading

vladimirfomene left a comment •

edited

Loading