-
Notifications
You must be signed in to change notification settings - Fork 394
BIP39 Implementation #644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BIP39 Implementation #644
Conversation
05b53d1
to
61b7dea
Compare
dce62db
to
2136bd2
Compare
82da908
to
29ae147
Compare
d4f35e7
to
0b5c558
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-pasted some review comments from #607 that it seems still need to be addressed
src/keys/bip39/mod.rs
Outdated
} | ||
|
||
/// Convert a mnemonic to a seed with an optional passphrase | ||
fn to_seed(&self, passphrase: Option<String>) -> Seed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #607:
@vladimirfomene: It might not be a good idea to change the type of passphrase from &str to option as that has the potential of breaking code which consumes this method.
@atalw: It makes sense to have the passphrase as Option as it really is optional, so if a breaking change is okay we can go ahead with this.
Personally, I agree that we should try not to break the API, and leave P: Into<Cow<'a, str>
(https://docs.rs/bip39/latest/src/bip39/lib.rs.html#479-486) here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, having a P: Into<Cow<'a, str>>
makes little sense based on our implementation. I propose just using a &str
, this way for most people, the API shouldn't break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b003391
to
8952808
Compare
There are a couple of features that have been implemented in this commit: - Parse mnemonic string to Mnenomic type - Generate Mnemonic from entropy - Derive seed from Mnemonic (with and without passphrase) - All language wordlists (with verification test to ensure they were untampered) - Mnemonic test vectors from BIP39 - Error handling Function names have mostly been kept the same to maintain backwards compatability. Co-authored-by: Vladimir Fomene <vladimirfomene@gmail.com>
* `Mnemonic::parse_in` now verifies the checksum against the entropy. * Add test: Make sure `from_entropy_in` produces error if length of entropy bits is less than 128, greater than 256 and not a multipe of 32. * Add test: invalid mnemonic sentence.Throw error if mnemonic sentence is less than 12 words or greater than 24 words or number of words is not a multiple of six or the contains a word not in wordlist or has invalid checksum.
Also removed unused dependencies
2399c6b
to
75fa8f6
Compare
* Introduce `Language::word_map` method for faster word index finding. * Readability changes to various `Mnemonic` methods and tests. * Re-introduce various methods back into `Mnemonic`. * `bip39::Error` no longer includes sensitive information.
c4334a1
to
cdb3e25
Compare
I still have some security concerns regarding the current implementation (although, I am no security expert, just based on what I've read on the internet). I will list them here, and hopefully someone knowledgeable enough could provide some clarity.
Thank you all in advance! P.S. Test |
Fixes: * Fixed implementations of `GeneratableKey` to work for all word lengths * Fixed example in `rpcwallet` Changes: * Added various `derive`s for bip39 structures * Added `Mnemonic::with_passphrase` method * Added `TryFrom<uszie>` implementation for `WordCount` * Introduced `Bip39TestVector` struct for more comprehensive testing * Various refactoring CI/CC Changes: * Added `all-languages` feature to `[package.metadata.doc.rs]` * Added `all-languages` feature to code coverage and CI tests
We could look into that, but it's better if we do so in a new PR. This one is already quite big, and the bigger it gets, the more difficult it is to collect reviews :)
This, instead, I think should be tackled here: for now, avoiding Debug and Display (or manually implementing a really generic one) should be enough (with an appropriate comment on why we do so). I'd avoid adding YA dependency :) |
Another aspect I've been thinking about, is the great majority of the time people will be using English (which shouldn't require Unicode normalization). For the passphrase, we can do a check only (and fail if not normalized). Since normalization sometimes requires resizing the vector (so it's a heap operation), and for most people, it also means one less dependably. |
Addressed in 08e1cc9. |
`Mnemonic` contains sensitive data so we should ensure internal fields are not easily leaked. * Explicitly implement `fmt::Debug` and redact all fields. * Explicitly implement `ToString` instead of `Display`. * Remove various comparative `derive()`s.
One advantage of this (on top of the extra safety) is that it would be much easier to then port to embedded hardware. We have many features in bdk which I guess are not really fit for hardware wallets, but mnemonics are for sure one that we'll need to have. Are you able to do a rough estimation of how much longer/how much harder it would be to implement in this way? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a partial review, I still haven't looked at all the files.
I just wanted to post this comments so that you could start thinking about them and see if they make any sense.
/// Password is the UTF8-NFKD-normalized result of mnemonic words separated by space. | ||
fn make_password<'a, W>(words: W) -> String | ||
where | ||
W: Iterator<Item = &'a str>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could define Item
as another generic that implements AsRef<str>
. This should allow you to pass a vec of strings as well if you want
|
||
/// Salt is the UTF8-NFKD-normalized result of (SALT_PREFIX + passphrase). | ||
fn make_salt(passphrase: &str) -> Cow<'static, str> { | ||
let mut salt = Cow::from(SALT_PREFIX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the small performance benefit of reusing the ref as-is, but I don't think it's worth using Cow
here especially considering that moving forward we'd like to avoid using the heap (even if we don't manage to finalize that transition in this PR)
/// Make hmac-sha512 engine from password. | ||
/// The hmac engine is used as the pseudo-random function. | ||
fn make_prf(password: &str) -> HmacPRF { | ||
HmacEngine::new(password.as_bytes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was gonna comment here that for extra safety we should re-normalize the string, but then I realized: wouldn't it be better to immediately convert strings to &[u8]
immediately after normalization?
This would be like a marker for us, anything that's &str
or similar is potentially not normalized, but as soon as we are done we just convert to bytes and forget about it.
With this change I guess you would make this function take a &[u8]
directly, and do the conversion in the caller which as far as I can see is already normalizing correctly.
/// Generate block (of given block_index) by calculating xor sum of iterations of PRF. | ||
fn xor_sum(hmac_prf: &HmacPRF, salt: &str, iter_count: u32, block_index: u32, block: &mut [u8]) { | ||
// for the first iteration, we concat: salt + block_index (as big-endian bytes) | ||
let mut prev_u = Vec::with_capacity(salt.len() + 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the length is fixed i guess this could also be an array with static length (there should be a constant in bitcoin_hashes for this).
I think the code would probably still look decently good with copy_from_slice
: https://doc.rust-lang.org/std/primitive.slice.html#method.copy_from_slice
Less than a week. But I'm stuck into multi descriptor wallet business 😅😂 |
Yes, multi-descriptor is definitely the priority right now. We'll get back to this once you are done there :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping us move this forward! Just a couple of questions and comments.
@@ -3,3 +3,6 @@ Cargo.lock | |||
|
|||
*.swp | |||
.idea | |||
|
|||
# IDE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits: .idea
is for IntelliJ. I don't know if it is necessary to have that IDE
comment.
// parse word indexes and ENT+CS bits from mnemonic words | ||
let parse_result = sentence_words | ||
.iter() | ||
.map(|&word| word_to_index_map.get(word).unwrap_or(&utils::U11_EOF)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not throw an invalid word error here if the word is not in the HashMap? What is the utility of having &utils::U11_EOF
as the default value here?
let mut word_indexes = Vec::with_capacity(MS_MAX); // word indexes | ||
let mut ent_cs_bits = Vec::with_capacity(MS_MAX * utils::U11_BITS); // ENT+CS bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just use ms
as your vector size instead of MS_MAX
?
.iter() | ||
.map(|&word| word_to_index_map.get(word).unwrap_or(&utils::U11_EOF)) | ||
.try_for_each(|word_index| { | ||
if *word_index > utils::U11_MAX { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that you are getting the word_index
from the word_map
is there a scenario where the word_index
will be greater than utils::U11_MAX
. I'm thinking if you throw an error for invalid words there will be no need for this if/else logic.
let mnemonic_with_passphrase: GeneratedKey<_, _> = | ||
MnemonicWithPassphrase::generate((WordCount::Words12, Language::English, password))?; | ||
Ok(mnemonic_with_passphrase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there is a change in the examples, I believe this will affect users. Is it possible to implement the BIP in such a way that it doesn't change anything for users.
// or http://www.apache.org/licenses/LICENSE-2.0> or the MIT license | ||
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your option. | ||
// You may not use this file except in accordance with one or both of these | ||
// licenses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be great to have a reference to the PBKDF2 RFC link as part of this module's documentation. https://datatracker.ietf.org/doc/html/rfc2898
} | ||
|
||
/// Generate word map for given language. | ||
pub fn word_map(&self) -> HashMap<&str, u16> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we write a test for this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of test are you suggesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of writing a test to make sure we have correct word to indices mapping in the hashmap.
Is this a good idea to have it in bdk_core eventually? Or we wanna do key generation outside of core separately? |
We closed #561, let's close this one as well :) |
Description
This is a continuation of PR #607 which closes #561
This PR includes commits for own implementation of
PBKFD2
.I've also modified the
.gitignore
, I hope that is okay.Notes to the reviewers
Although complete, I still have some security concerns for the current implementation (please check my comment below).
Checklists
All Submissions:
cargo fmt
andcargo clippy
before committingNew Features:
pbkfd2
implementationCHANGELOG.md