BIP39 Manual Phrase Calculations - How are Multiple checksums valid?

3

1

Need some help understanding the math regarding why multiple checksums work for mnemonic phrase generation (BIP39).

Let's assume a 12 word passphrase. If we divide the 2048 wordlist into groups of 16 ... exactly 1 word out of 16 word "block" will be a valid checksum for the 11 words selected.

With a 24 word passphrase, 1 word out of every 256 would be a correct checksum.

When generating a phrase by hand... I know that the ENT / 32 bits of the sha-256 hash are appended to the entropy to generate the checksum word.. but this generates one specific word.

So I guess my long-winded question is ... what is the math behind other checksum values being valid? I guess my real question is how is the ENT + CS validated as legitimate?

See this example:

Entropy (128 bits): 11010011 01100100 00000010 01011110 01010011 11101100 01010011 01101110 01101010 01111000 11010010 11011000 10111010 00100011 11101100 11110010

SHA-256 Hash of entropy = 14 c5 8b c9 05 11 5e 08 27 49 61 1e 48 d6 04 c0 2a 70 8c 39 ad 6c dc 0c 91 2f 70 62 c3 24 71 23

First 4 bits of SHA-256 Hash = 1 (hex) Binary = 0001

Generated Recovery Phrase: square cactus nurse pond share rescue prepare bottom suffer speed will tomorrow

another valid phrase (with same entropy but different checksum): square cactus nurse pond share rescue prepare bottom suffer speed will account

Account = 13 (on the BIPS39 wordlist) Subtract 1 since word index starts at 0 = 12 12 to binary = 1100 Hex = C

C (hex) != 1 (hex)

another valid phrase (with same entropy but different checksum): square cactus nurse pond share rescue prepare bottom suffer speed will acoustic

Acoustic = 17 (on the BIPS39 wordlist) Subtract 1 since word index starts at 0 = 16 16 to Binary: 10000 Hex = 10

10 (hex) != 1 (hex)

I guess I'm also confused how these checksums are valid when their values do not equal the first 4 bits of the SHA-256 Hash of ENT? I'm guessing it has to do with how the checksum is validated (and pointing back to my original question)?

Ryan Ellis

Posted 2018-02-01T01:42:01.887

Reputation: 33

Answers

2

A careful reading of BIP 39 shows that there are not multiple words which fit the checksum. Rather the BIP is so broad that invalid checksums are allowed and should only be warned against.

In your example, the second phrase is actually invalid. But BIP 39 says that such invalid phrases should be allowed, and so the wallet software will allow it. The checksum is effectively ignored and not checked (which kind of defeats the purpose of a checksum).

Andrew Chow

Posted 2018-02-01T01:42:01.887

Reputation: 40 910

Thanks for this. This is interesting. Edits coming (hit enter too soon)Ryan Ellis 2018-02-01T17:03:39.400

1

So really ledger's tool for BIP39 generation, https://www.ledgerwallet.com/support/bip39-standalone.html (looks like a copy of Ian's tool) has a technically incorrect implementation. This is because BIP39 states that if an invalid checksum the software should say so.

The code at: https://github.com/bitcoinjs/bip39/blob/master/index.js#L78

Line 76-78 splits the ENT & CS from the binary of mnemonic.

Then Line 87 calls the function to generate the checksum from entropy, and compares it to the mnemonic phrase. Line 88 should throw the invalid checksum error, but doesn't. Not sure why

Ryan Ellis 2018-02-01T17:17:19.720

I was actually using Ian's BIP39 tool, and not the code referenced above. I'll need to test this other code out. Thanks again!Ryan Ellis 2018-02-01T17:25:06.797