Skip to content

Commit

Permalink
Begin work on implementing BIP39 algorithm.
Browse files Browse the repository at this point in the history
  • Loading branch information
ctsrc committed Nov 20, 2024
1 parent c564d14 commit e62af12
Show file tree
Hide file tree
Showing 6 changed files with 254 additions and 10 deletions.
106 changes: 106 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,6 @@ eff-lexical-data = { path = "crates/eff-lexical-data", version = "1.0.0" }
anyhow = { version = "1.0.93", features = ["backtrace"] }
clap = { version = "4.5.21", default-features = false, features = ["std", "derive", "help", "usage", "error-context"] }
rand = "0.8.5"
sha2 = "0.10.8"
test-case = "3.3.1"
thiserror = "2.0.3"
4 changes: 4 additions & 0 deletions crates/pgen/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,8 @@ bip39-lexical-data = { workspace = true }
clap = { workspace = true }
eff-lexical-data = { workspace = true }
rand = { workspace = true }
sha2 = { workspace = true }
thiserror = { workspace = true }

[dev-dependencies]
test-case = { workspace = true }
69 changes: 59 additions & 10 deletions crates/pgen/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ pgen -V | --version
`-w` Specify wordlist to use.

* `eff-autocomplete` (default): Use *EFF's Short Wordlist #2*
(EFF's "short word list" with words that have unique three-character prefixes)

Features:
- Each word has a unique three-character prefix. This means that software could
Expand All @@ -99,6 +100,7 @@ pgen -V | --version
- <https://www.eff.org/dice>

* `eff-long`: Use *EFF's Long Wordlist*
(EFF's "long word list")

Recommended for the creation of memorable passphrases since the increased number of words,
as well as the greater effective word length, allows for good entropy with a lower amount
Expand All @@ -117,6 +119,7 @@ pgen -V | --version
- <https://www.eff.org/dice>

* `eff-short`: Use *EFF's Short Wordlist #1*
(EFF's "general short word list")

Features:
- Designed to include the 1,296 most memorable and distinct words.
Expand All @@ -125,7 +128,7 @@ pgen -V | --version
- [Deep Dive: EFF's New Wordlists for Random Passphrases][EFFWL] (2016)
- <https://www.eff.org/dice>

* `bip39`: Use *BIP39* wordlist
* `bip39`: Use *BIP39* English wordlist

Details:
- [BIP39][BIP39]
Expand Down Expand Up @@ -153,6 +156,59 @@ your computer to generate "sufficiently random" numbers.

`-V`, `--version` Print version information and exit.

## Calculation of entropy

When calculating the entropy of a password or a passphrase,
[one must assume that the password generation procedure is known to the attacker](https://crypto.stackexchange.com/a/376).
As such, the strength of the passphrases that `pgen` generate are not weakened
in and of itself by the fact that the wordlists we use are publicly known.

### EFF wordlists

When one of the EFF wordlists is used, you have the following total number of distinct words
to pick from the respective list:

- 7776 words in EFF's "long word list" (`eff-long`)
- 1296 words in EFF's "general short word list" (`eff-short`)
- 1296 words in EFF's "short word list" with words that have unique three-character prefixes (`eff-autocomplete`)

The number of bits of entropy added by each randomly selected word from these EFF wordlists
depends on the total number of words that are in the list we are selecting the words from.

To calculate the entropy added by each word, we take the binary logarithm of the number of words total in the wordlist:

- log2(7776) ~= `12.92` bits of entropy added from each randomly selected word in the "long word list".
- log2(1296) ~= `10.33` bits of entropy added from each randomly selected word in one of the EFF's short word lists.

Then:

- Creating a passphrase consisting of 10 randomly selected words from the "long word list" gives
a passphrase with log2(7776^10) ~= `129.25` bits of entropy.
- Creating a passphrase consisting of 12 randomly selected words from one of the EFF's short word lists gives
a passphrase with log2(1296^12) ~= `124.08` bits of entropy.

### BIP39 English wordlist and BIP39 algorithm

When using the BIP39 algorithm, the passphrase is derived directly from an entropy of random bits,
which are then padded with bits from a checksum at the end.

For example, for a BIP39 mnemonic sentence consisting of 12 words, one has to use 128 random bits
appended by 4 bits of checksum bits.

The checksum bits do not add entropy, nor are any of the initial entropy bits discarded.

So the entropy of a BIP39 mnemonic sentence is simply the number of random bits
it was generated from in the first place.

Specifically, BIP39 has five different possible mnemonic sentence lengths, each with
the following corresponding number of bits of entropy:

- `128` bits of entropy for a BIP39 mnemonic sentence consisting of 12 words.
- `160` bits of entropy for a BIP39 mnemonic sentence consisting of 15 words.
- `192` bits of entropy for a BIP39 mnemonic sentence consisting of 18 words.
- `224` bits of entropy for a BIP39 mnemonic sentence consisting of 21 words.
- `256` bits of entropy for a BIP39 mnemonic sentence consisting of 24 words.

## How many bits of entropy does your passphrase need?

How many bits of entropy should your passphrase consist of?
Expand All @@ -178,22 +234,15 @@ weak hashing algorithms such as MD5 were used, it is the opinion of the
author that the neighbourhood of 128 bits of entropy is in fact
an appropriate default for such use.

When calculating the entropy of a password or a passphrase,
[one must assume that the password generation procedure is known to the attacker](https://crypto.stackexchange.com/a/376).
Hence with 12 words from either of the short wordlists, each of which
consist of 1296 words, we get a password entropy of log2(1296^12) ~=
124.08 bits. Similarily, with 10 words from the long wordlist (7776 words),
we get a password entropy of log2(7776^10) ~= 129.25 bits.

## Is a CSPRNG really needed here?

Using a CSPRNG ensures uniform distribution of probability. This in turn
ensures that the password entropy calculations are correct. Hence it makes
ensures that the password entropy calculations are correct. Hence, it makes
sense to use a CSPRNG.

## See also

* `lastresort`(1) on [crates.io](https://crates.io/crates/base256) / [GitHub](https://github.com/ctsrc/Base256)
* `lastresort`(1) on [crates.io](https://crates.io/crates/base256) or [GitHub](https://github.com/ctsrc/Base256)

[EFFWL]: https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases

Expand Down
81 changes: 81 additions & 0 deletions crates/pgen/src/bip39_algorithm.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
use sha2::{Digest, Sha256};

/// Calculate BIP39 checksum (CS) bits given entropy bits.
fn calculate_cs_bits(ent: &[u8]) -> u8 {
let mut hasher = Sha256::new();
hasher.update(ent);
let hash = hasher.finalize();
let shift = match ent.len() {
// 128 bits of entropy (16 bytes) needs 4 bits of checksum
16 => 4usize,
// 160 bits of entropy (20 bytes) needs 5 bits of checksum
20 => 3,
// 192 bits of entropy (24 bytes) needs 6 bits of checksum
24 => 2,
// 224 bits of entropy (28 bytes) needs 7 bits of checksum
28 => 1,
// 256 bits of entropy (32 bytes) needs 8 bits of checksum
32 => 0,
// No other number of bits of entropy aside from the above is supported by BIP39.
// And since this function is internal to our program, and we only intend to call it
// with the supported number of bits of entropy, there really isn't much point in going
// through the extra motions of returning an error since it would mean we have a fatal
// (unrecoverable) error in the coding of our program anyway. So we may as well panic
// via `unreachable!()` instead of returning details about the error.
_ => unreachable!(),
};
hash[0] >> shift
}

#[cfg(test)]
mod test {
use crate::bip39_algorithm::calculate_cs_bits;
use test_case::test_case;

// From <https://github.com/trezor/python-mnemonic/blob/b57a5ad77a981e743f4167ab2f7927a55c1e82a8/vectors.json#L3-L8>:
//
// ```json
// [
// "00000000000000000000000000000000",
// "abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon about",
// "c55257c360c07c72029aebc1b53c05ed0362ada38ead3e3e9efa3708e53495531f09a6987599d18264c1e1c92f2cf141630c7a3c4ab7c81b2f001698e7463b04",
// "xprv9s21ZrQH143K3h3fDYiay8mocZ3afhfULfb5GX8kCBdno77K4HiA15Tg23wpbeF1pLfs1c5SPmYHrEpTuuRhxMwvKDwqdKiGJS9XFKzUsAF"
// ],
// ```
//
// - 128 bits of "entropy" (all zero in this case).
// - The 12th word in the mnemonic sentence is the 4th word (index 3) in the BIP39 English wordlist.
#[test_case(&[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 3; "128 bits of all zeros")]
// From <https://github.com/trezor/python-mnemonic/blob/b57a5ad77a981e743f4167ab2f7927a55c1e82a8/vectors.json#L27-L32>:
//
// ```json
// [
// "000000000000000000000000000000000000000000000000",
// "abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon agent",
// "035895f2f481b1b0f01fcf8c289c794660b289981a78f8106447707fdd9666ca06da5a9a565181599b79f53b844d8a71dd9f439c52a3d7b3e8a79c906ac845fa",
// "xprv9s21ZrQH143K3mEDrypcZ2usWqFgzKB6jBBx9B6GfC7fu26X6hPRzVjzkqkPvDqp6g5eypdk6cyhGnBngbjeHTe4LsuLG1cCmKJka5SMkmU"
// ],
// ```
//
// - 192 bits of "entropy" (all zero in this case).
// - The 18th word in the mnemonic sentence is the 40th word (index 39) in the BIP39 English wordlist.
#[test_case(&[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 39; "192 bits of all zeros")]
// From <https://github.com/trezor/python-mnemonic/blob/b57a5ad77a981e743f4167ab2f7927a55c1e82a8/vectors.json#L51-L56>:
//
// ```json
// [
// "0000000000000000000000000000000000000000000000000000000000000000",
// "abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon art",
// "bda85446c68413707090a52022edd26a1c9462295029f2e60cd7c4f2bbd3097170af7a4d73245cafa9c3cca8d561a7c3de6f5d4a10be8ed2a5e608d68f92fcc8",
// "xprv9s21ZrQH143K32qBagUJAMU2LsHg3ka7jqMcV98Y7gVeVyNStwYS3U7yVVoDZ4btbRNf4h6ibWpY22iRmXq35qgLs79f312g2kj5539ebPM"
// ],
// ```
//
// - 256 bits of "entropy" (all zero in this case).
// - The 24th word in the mnemonic sentence is the 103rd word (index 102) in the BIP39 English wordlist.
#[test_case(&[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 102; "256 bits of all zeros")]
fn calculates_cs_bits_correctly(ent: &[u8], cs_expected: u8) {
let cs_actual = calculate_cs_bits(ent);
assert_eq!(cs_expected, cs_actual);
}
}
2 changes: 2 additions & 0 deletions crates/pgen/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

#![forbid(unsafe_code)]

mod bip39_algorithm;

use bip39_lexical_data::WL_BIP39;
use clap::{Parser, ValueEnum};
use eff_lexical_data::{WL_AUTOCOMPLETE, WL_LONG, WL_SHORT};
Expand Down

0 comments on commit e62af12

Please sign in to comment.