Skip to content

BIP39 Implementation #644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

Conversation

evanlinjin
Copy link
Member

@evanlinjin evanlinjin commented Jun 29, 2022

Description

This is a continuation of PR #607 which closes #561

This PR includes commits for own implementation of PBKFD2.
I've also modified the .gitignore, I hope that is okay.

Notes to the reviewers

Although complete, I still have some security concerns for the current implementation (please check my comment below).

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

New Features:

  • Add pbkfd2 implementation
  • I've added docs for the new feature
  • I've updated CHANGELOG.md

@evanlinjin evanlinjin mentioned this pull request Jun 29, 2022
9 tasks
@evanlinjin evanlinjin force-pushed the bip-0039 branch 2 times, most recently from 05b53d1 to 61b7dea Compare June 29, 2022 14:24
@evanlinjin evanlinjin changed the title WIP: pbkfd2 implementation for BIP39 BIP-39 Implementation (with own PBKFD2) Jun 29, 2022
@evanlinjin evanlinjin changed the title BIP-39 Implementation (with own PBKFD2) BIP39 Implementation (with own PBKFD2) Jun 29, 2022
@evanlinjin evanlinjin force-pushed the bip-0039 branch 2 times, most recently from dce62db to 2136bd2 Compare June 29, 2022 14:34
@evanlinjin evanlinjin marked this pull request as ready for review June 29, 2022 14:35
@evanlinjin evanlinjin force-pushed the bip-0039 branch 2 times, most recently from 82da908 to 29ae147 Compare June 29, 2022 15:02
@evanlinjin evanlinjin force-pushed the bip-0039 branch 3 times, most recently from d4f35e7 to 0b5c558 Compare June 29, 2022 17:25
Copy link
Member

@danielabrozzoni danielabrozzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-pasted some review comments from #607 that it seems still need to be addressed

}

/// Convert a mnemonic to a seed with an optional passphrase
fn to_seed(&self, passphrase: Option<String>) -> Seed {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From #607:

@vladimirfomene: It might not be a good idea to change the type of passphrase from &str to option as that has the potential of breaking code which consumes this method.

@atalw: It makes sense to have the passphrase as Option as it really is optional, so if a breaking change is okay we can go ahead with this.


Personally, I agree that we should try not to break the API, and leave P: Into<Cow<'a, str> (https://docs.rs/bip39/latest/src/bip39/lib.rs.html#479-486) here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, having a P: Into<Cow<'a, str>> makes little sense based on our implementation. I propose just using a &str, this way for most people, the API shouldn't break.

Copy link
Member Author

@evanlinjin evanlinjin Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in dd42594. Let me know if it is sufficient!

bdk/src/keys/bip39/mod.rs

Lines 204 to 208 in dd42594

pub fn to_seed(&self, passphrase: &str) -> Seed {
let mut seed = [0_u8; SEED_LEN];
pbkdf2::generate_seed(self.word_iter(), passphrase, &mut seed);
seed
}

@evanlinjin evanlinjin force-pushed the bip-0039 branch 5 times, most recently from b003391 to 8952808 Compare July 1, 2022 12:43
atalw and others added 5 commits July 1, 2022 20:46
There are a couple of features that have been implemented in this
commit:
- Parse mnemonic string to Mnenomic type
- Generate Mnemonic from entropy
- Derive seed from Mnemonic (with and without passphrase)
- All language wordlists (with verification test to ensure they were
  untampered)
- Mnemonic test vectors from BIP39
- Error handling

Function names have mostly been kept the same to maintain backwards
compatability.

Co-authored-by: Vladimir Fomene <vladimirfomene@gmail.com>
* `Mnemonic::parse_in` now verifies the checksum against the entropy.

* Add test: Make sure `from_entropy_in` produces error if length of
  entropy bits is less than 128, greater than 256 and not a multipe of 32.

* Add test: invalid mnemonic sentence.Throw error if mnemonic sentence
  is less than 12 words or greater than 24 words or number of words is
  not a multiple of six or the contains a word not in wordlist or has
  invalid checksum.
Also removed unused dependencies
@evanlinjin evanlinjin changed the title BIP39 Implementation (with own PBKFD2) BIP39 Implementation Jul 1, 2022
@evanlinjin evanlinjin force-pushed the bip-0039 branch 2 times, most recently from 2399c6b to 75fa8f6 Compare July 2, 2022 02:36
* Introduce `Language::word_map` method for faster word index finding.
* Readability changes to various `Mnemonic` methods and tests.
* Re-introduce various methods back into `Mnemonic`.
* `bip39::Error` no longer includes sensitive information.
@evanlinjin evanlinjin force-pushed the bip-0039 branch 3 times, most recently from c4334a1 to cdb3e25 Compare July 2, 2022 18:23
@evanlinjin
Copy link
Member Author

evanlinjin commented Jul 2, 2022

I still have some security concerns regarding the current implementation (although, I am no security expert, just based on what I've read on the internet). I will list them here, and hopefully someone knowledgeable enough could provide some clarity.

  1. Should we avoid using heap memory, and keep everything on the stack? Apparently, heap memory is more prone to exploints. Reference: github.com/shellphish/how2heap

  2. Is implementing std::fmt::Display and Debug a good idea? As it may leak the secrets to logs. We can potentially remove these implementations completely, or provide implementations with redacted secrets, or use an a crate such as secrey.

Thank you all in advance!

P.S. Test blockchain::esplora::bdk_blockchain_tests::test_sync_stop_gap_20 seems to fail occasionally.

Fixes:
* Fixed implementations of `GeneratableKey` to work for all word lengths
* Fixed example in `rpcwallet`

Changes:
* Added various `derive`s for bip39 structures
* Added `Mnemonic::with_passphrase` method
* Added `TryFrom<uszie>` implementation for `WordCount`
* Introduced `Bip39TestVector` struct for more comprehensive testing
* Various refactoring

CI/CC Changes:
* Added `all-languages` feature to `[package.metadata.doc.rs]`
* Added `all-languages` feature to code coverage and CI tests
@danielabrozzoni
Copy link
Member

Should we avoid using heap memory, and keep everything on the stack? Apparently, heap memory is more prone to exploints. Reference: github.com/shellphish/how2heap

We could look into that, but it's better if we do so in a new PR. This one is already quite big, and the bigger it gets, the more difficult it is to collect reviews :)

Is implementing std::fmt::Display and Debug a good idea? As it may leak the secrets to logs. We can potentially remove these implementations completely, or provide implementations with redacted secrets, or use an a crate such as secrey.

This, instead, I think should be tackled here: for now, avoiding Debug and Display (or manually implementing a really generic one) should be enough (with an appropriate comment on why we do so). I'd avoid adding YA dependency :)

@evanlinjin
Copy link
Member Author

Another aspect I've been thinking about, is the great majority of the time people will be using English (which shouldn't require Unicode normalization). For the passphrase, we can do a check only (and fail if not normalized).

Since normalization sometimes requires resizing the vector (so it's a heap operation), and for most people, it also means one less dependably.

@evanlinjin
Copy link
Member Author

evanlinjin commented Jul 4, 2022

This, instead, I think should be tackled here: for now, avoiding Debug and Display (or manually implementing a really generic one) should be enough (with an appropriate comment on why we do so). I'd avoid adding YA dependency :)

Addressed in 08e1cc9.

`Mnemonic` contains sensitive data so we should ensure internal fields
are not easily leaked.

* Explicitly implement `fmt::Debug` and redact all fields.
* Explicitly implement `ToString` instead of `Display`.
* Remove various comparative `derive()`s.
@afilini
Copy link
Member

afilini commented Jul 7, 2022

Should we avoid using heap memory, and keep everything on the stack?

One advantage of this (on top of the extra safety) is that it would be much easier to then port to embedded hardware. We have many features in bdk which I guess are not really fit for hardware wallets, but mnemonics are for sure one that we'll need to have.

Are you able to do a rough estimation of how much longer/how much harder it would be to implement in this way?

Copy link
Member

@afilini afilini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a partial review, I still haven't looked at all the files.

I just wanted to post this comments so that you could start thinking about them and see if they make any sense.

/// Password is the UTF8-NFKD-normalized result of mnemonic words separated by space.
fn make_password<'a, W>(words: W) -> String
where
W: Iterator<Item = &'a str>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could define Item as another generic that implements AsRef<str>. This should allow you to pass a vec of strings as well if you want


/// Salt is the UTF8-NFKD-normalized result of (SALT_PREFIX + passphrase).
fn make_salt(passphrase: &str) -> Cow<'static, str> {
let mut salt = Cow::from(SALT_PREFIX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the small performance benefit of reusing the ref as-is, but I don't think it's worth using Cow here especially considering that moving forward we'd like to avoid using the heap (even if we don't manage to finalize that transition in this PR)

/// Make hmac-sha512 engine from password.
/// The hmac engine is used as the pseudo-random function.
fn make_prf(password: &str) -> HmacPRF {
HmacEngine::new(password.as_bytes())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was gonna comment here that for extra safety we should re-normalize the string, but then I realized: wouldn't it be better to immediately convert strings to &[u8] immediately after normalization?

This would be like a marker for us, anything that's &str or similar is potentially not normalized, but as soon as we are done we just convert to bytes and forget about it.

With this change I guess you would make this function take a &[u8] directly, and do the conversion in the caller which as far as I can see is already normalizing correctly.

/// Generate block (of given block_index) by calculating xor sum of iterations of PRF.
fn xor_sum(hmac_prf: &HmacPRF, salt: &str, iter_count: u32, block_index: u32, block: &mut [u8]) {
// for the first iteration, we concat: salt + block_index (as big-endian bytes)
let mut prev_u = Vec::with_capacity(salt.len() + 4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the length is fixed i guess this could also be an array with static length (there should be a constant in bitcoin_hashes for this).

I think the code would probably still look decently good with copy_from_slice: https://doc.rust-lang.org/std/primitive.slice.html#method.copy_from_slice

@evanlinjin
Copy link
Member Author

Should we avoid using heap memory, and keep everything on the stack?

One advantage of this (on top of the extra safety) is that it would be much easier to then port to embedded hardware. We have many features in bdk which I guess are not really fit for hardware wallets, but mnemonics are for sure one that we'll need to have.

Are you able to do a rough estimation of how much longer/how much harder it would be to implement in this way?

Less than a week. But I'm stuck into multi descriptor wallet business 😅😂

@afilini
Copy link
Member

afilini commented Jul 8, 2022

Yes, multi-descriptor is definitely the priority right now. We'll get back to this once you are done there :)

Copy link
Contributor

@vladimirfomene vladimirfomene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping us move this forward! Just a couple of questions and comments.

@@ -3,3 +3,6 @@ Cargo.lock

*.swp
.idea

# IDE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits: .idea is for IntelliJ. I don't know if it is necessary to have that IDE comment.

// parse word indexes and ENT+CS bits from mnemonic words
let parse_result = sentence_words
.iter()
.map(|&word| word_to_index_map.get(word).unwrap_or(&utils::U11_EOF))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not throw an invalid word error here if the word is not in the HashMap? What is the utility of having &utils::U11_EOF as the default value here?

Comment on lines +164 to +165
let mut word_indexes = Vec::with_capacity(MS_MAX); // word indexes
let mut ent_cs_bits = Vec::with_capacity(MS_MAX * utils::U11_BITS); // ENT+CS bits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use ms as your vector size instead of MS_MAX?

.iter()
.map(|&word| word_to_index_map.get(word).unwrap_or(&utils::U11_EOF))
.try_for_each(|word_index| {
if *word_index > utils::U11_MAX {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you are getting the word_index from the word_map is there a scenario where the word_index will be greater than utils::U11_MAX. I'm thinking if you throw an error for invalid words there will be no need for this if/else logic.

Comment on lines +224 to +226
let mnemonic_with_passphrase: GeneratedKey<_, _> =
MnemonicWithPassphrase::generate((WordCount::Words12, Language::English, password))?;
Ok(mnemonic_with_passphrase)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that there is a change in the examples, I believe this will affect users. Is it possible to implement the BIP in such a way that it doesn't change anything for users.

// or http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your option.
// You may not use this file except in accordance with one or both of these
// licenses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be great to have a reference to the PBKDF2 RFC link as part of this module's documentation. https://datatracker.ietf.org/doc/html/rfc2898

}

/// Generate word map for given language.
pub fn word_map(&self) -> HashMap<&str, u16> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we write a test for this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of test are you suggesting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of writing a test to make sure we have correct word to indices mapping in the hashmap.

@rajarshimaitra
Copy link
Contributor

Is this a good idea to have it in bdk_core eventually? Or we wanna do key generation outside of core separately?

@danielabrozzoni
Copy link
Member

We closed #561, let's close this one as well :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Write own BIP39 implementation
7 participants