BIP Draft: Multilingual mnemonic display and input conventions#2200
BIP Draft: Multilingual mnemonic display and input conventions#2200osem23 wants to merge 3 commits into
Conversation
…tions A display wordlist is a 2048-entry list in a target language, index-parallel to the canonical English BIP-39 wordlist. PBKDF2 runs only on the canonical English mnemonic; native-language renderings are a display and input layer with no new cryptographic surface, and every seed produced under the convention is restorable in any BIP-39 wallet via its English form. Preamble follows the BIP 3 format. No BIP number self-assigned.
danielabrozzoni
left a comment
There was a problem hiding this comment.
Only gave a first very quick pass, will do another one soon :)
| License: BSD-2-Clause | ||
| Discussion: 2026-06-13: https://groups.google.com/g/bitcoindev/c/Rwo7P5pTA0c | ||
| 2026-06-23: https://delvingbitcoin.org/t/bip39-native-language-display-wordlists-mapped-to-canonical-english/2637 | ||
| ``` |
There was a problem hiding this comment.
The preamble should contain
Requires: 39
There was a problem hiding this comment.
Done, added Requires: 39 to the preamble.
| ``` | ||
| BIP: ? | ||
| Layer: Applications | ||
| Title: Multilingual mnemonic display and input conventions |
There was a problem hiding this comment.
Unfortunately title should be at most 50 characters, and this is 51 😅
There was a problem hiding this comment.
Fixed. It's now "Multilingual mnemonic display and input rules" (45 chars).
| Title: Multilingual mnemonic display and input conventions | ||
| Authors: Daniel Osemberg <ceo@blocksight.live> | ||
| Status: Draft | ||
| Type: Informational |
There was a problem hiding this comment.
I think this is a specification BIP. From BIP3:
Lines 175 to 185 in 861e235
There was a problem hiding this comment.
Agreed, set to Type: Specification.
There was a problem hiding this comment.
This draft appears to be mostly AI generated?
Edit: am looking at the document history in https://github.com/osem23/bip39-wordlists-tzur/commits/main/docs/BIP-multilingual-mnemonics.md
|
Yes, I used AI as a writing tool, and I'm not going to pretend otherwise. I'm proud of it. |
Thanks for the pass, all three are good catches. |
Per danielabrozzoni's review on PR bitcoin#2200: - Title trimmed to 50-char limit (now 45): "...display and input rules" - Type changed Informational -> Specification (BIP-3: implementable with compliant implementations; has validator, decoders, vectors) - Add Requires: 39, placed after Discussion per BIP-3 field order Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
murchandamus
left a comment
There was a problem hiding this comment.
I gave this a quick first read. I like the idea of normalizing to the English wordlist under the hood as it directly mitigates one of the worst issues with BIP39’s portability.
That said, the approach to the additional languages feels unappealing to me: producing initial lists by mechanically translating the English word list is bound to cause a number of issues such as the described concerns with terms composed of multiple words and diacritics, which would persist especially for wordlists that don’t get review before publication. As such wordlists would have room for improvement, it implies that there would soon be multiple wordlists for some languages which would cause even more confusion on top of BIP39 language lists vs display language lists. It seems worthwhile to try and pursue more stable higher quality lists from the get-go, so that more languages would only ever have a single wordlist to converge on.
Given the numerous pull requests we’ve had to the BIPs repository where people tried to add more wordlists to BIP39, I would like to suggest only shipping a framework for more languages to be added instead of shipping with placeholder language lists, and to leave the creation of wordlists to the respective language communities.
Since you are creating a new mnemonic scheme that is essentially a breaking change to BIP39 for every language but English, I would alternatively propose that you go further and create a new scheme that is not backwards compatible with BIP39 but instead addresses all issues with BIP39:
- use the indices of the words to generate the seed instead of hashing text
- encode a version
- use a better checksum
- if possible encode information about the output script pattern used
- maybe create a generic encoding of data with words that then is used to encode a seed in a second BIP
Preferably such a scheme would also use a different number of words so that it cannot be mixed up with BIP39.
| - **English BIP-39 remains canonical.** The English BIP-39 mnemonic is the only mnemonic fed to PBKDF2-HMAC-SHA512, and the only artifact that determines the derived seed and cross-wallet compatibility. This document does not alter BIP-39 entropy, checksum, Unicode normalization, or PBKDF2 rules. | ||
| - **Localized wordlists are a display and backup layer only.** A display wordlist is never the password input to PBKDF2. It exists so a user can read and write their backup in their own language. | ||
| - **The mapping is by word index.** The display token at index `i` corresponds to the English BIP-39 word at index `i`, and to nothing else. There is no per-language entropy, checksum, or key derivation. | ||
| - **The localized mnemonic is always reversible to the canonical English mnemonic.** The bidirectional mapping is bijective across all 2048 entries (§Display wordlist requirements), so a conformant display mnemonic resolves back to exactly one English BIP-39 mnemonic, deterministically. |
There was a problem hiding this comment.
Was that supposed to be a link to the Display wordlist section?
There was a problem hiding this comment.
Yes, it points to the Display wordlist requirements section. I use the §section style throughout instead of anchor links. Happy to switch these to real anchors if the editors prefer.
|
|
||
| 1. Tokenize on Unicode whitespace (characters with the Unicode `White_Space` property) plus the ideographic space (`U+3000`) used by the official Japanese BIP-39 mnemonic. | ||
| 2. Normalize every token and the display wordlist to the same Unicode form (NFC) before comparison. Mismatched normalization between input and wordlist causes silent lookup failures on precomposed/decomposed accent pairs. NFC, and the NFKD that BIP-39 applies before PBKDF2, are both safe: they never merge two distinct entries in a conformant wordlist (there are zero NFKD collisions across the reference wordlists). | ||
| 3. If a wallet applies any *lossy* fold to input as a convenience — stripping diacritics, case-folding, or similar — and that fold maps a token to more than one wordlist entry, the wallet MUST reject the token and ask the user to disambiguate. It MUST NOT silently pick one entry. Distinct entries can collapse under accent stripping (for example Vietnamese `được` and `đuốc`, or Swedish `läger` and `lager`), and an arbitrary pick selects the wrong index and derives the wrong seed. Lossy folds are not required by this convention; a wallet that performs none is always conformant. Per-language collision counts are reported by the reference validator and documented in `validation/encoding-notes.md`. |
There was a problem hiding this comment.
Maybe you are implying that already, but would it be possible to enforce at word list creation time that no word matches another per list if diacritics were stripped, case was folded or similar? Has that been done for the proposed lists?
E.g., this was done for the French wordlist, where "special French characters "é-è" are considered equal to "e", for example "museau" and "musée" can not be together".
There was a problem hiding this comment.
Right now I enforce this at input time, not at construction time the way the French list did. §Input parsing MUST 3: if a lossy fold (diacritic strip, case fold) maps a token to more than one entry, the wallet must reject and ask the user, never auto-pick. The validator already reports per-language collision counts under those folds.
I didn't make it a construction-time MUST because a mechanically-seeded list can't always satisfy it without curation, which is the quality tension you raise in your top comment. I can add it as a construction-time SHOULD with the per-list collision report surfaced, and make it a MUST for any list that claims a curated tier. The input-time disambiguation MUST keeps wallets safe in the meantime.
|
|
||
| ### Multi-word native concepts | ||
|
|
||
| Some languages express a single BIP-39 concept only as a multi-word native term: Hebrew `רופא שיניים` (dentist), Turkish `hindistan cevizi` (coconut), Indonesian `kebun binatang` (zoo), Vietnamese multi-syllable words that use native word-spacing. Requirement 4 forbids embedded whitespace, so a conformant wordlist stores such entries as a single glued orthographic token (e.g., `רופאשיניים`, `hindistancevizi`, `kebunbinatang`). This is a structural consequence of the tokenization rule, not an independent requirement. |
There was a problem hiding this comment.
I was wondering how so many languages had been created at inception. So the wordlists were created by translating the English words to the target languages?
There was a problem hiding this comment.
Yes. Generated by translation, then validated rather than trusted: structural checks, back-translation and forward-translation each with an LLM verdict, multilingual sentence-embedding similarity, Wiktionary cross-reference, and a blind LLM top-8 pass. Process and per-language results are in docs/CONSTRUCTION.md and docs/V2_VALIDATION.md. It isn't a substitute for native-speaker review, which is why the lists are explicitly supersedable.
|
|
||
| The 4-character prefix uniqueness recommendation from the original BIP-39 specification is achievable for English and most Latin-script languages but structurally infeasible for several scripts where word stems and limited short-prefix variety dominate. Requiring it would exclude those languages or force authorship of artificial vocabulary. Treating it as a SHOULD with informational reporting per language preserves the autocomplete benefit where feasible without excluding scripts where it is not. | ||
|
|
||
| Native-speaker review is recommended (SHOULD) rather than required (MUST) because its absence is a UX risk, not a cryptographic risk. The worst case is a poorly-chosen native word that a future PR can correct; no funds are at stake. |
There was a problem hiding this comment.
I don’t follow here. If people had started using the original native words to record their backup, changing the poorly-chosen word would invalidate their backup.
There was a problem hiding this comment.
You're right, that line was wrong and I removed it (6608dcb). Published lists are frozen. A correction is a new versioned list, never a mutation of a published one, so an existing backup is never invalidated: it resolves against the exact version that produced it, pinned by SHA-256, with the canonical English mnemonic as the safety net.
|
|
||
| The 9 non-English canonical BIP-39 wordlists are alphabetized independent word selections, not translations of the English list, so they cannot serve as a display layer over an English mnemonic without the user facing semantically unrelated tokens at each index. This convention does not replace those wordlists; it sits parallel to them and fills the role they do not fill. | ||
|
|
||
| This convention does not eliminate the cross-wallet restore problem for display-only backups; it bounds the problem and defines wallet-level obligations (§Backup and portability policy) that mitigate it. The user-facing safety net is the canonical English mnemonic, which every conformant wallet exposes in any flow that shows a display mnemonic. A backup that includes the canonical English mnemonic is restorable in any BIP-39 wallet without depending on the receiving wallet's wordlist support. |
There was a problem hiding this comment.
If the users have to end up recording both the display-words and the English words, how does this solve the issues that non-English speakers are significantly more likely to make mistakes recording the English words?
There was a problem hiding this comment.
MUST 1 is an availability obligation on the wallet, not a requirement to record a second English copy. A user can back up in the display language only, and then there is no English transcription step and therefore no English transcription error, which is exactly the failure this removes. English stays viewable and exportable as the portability guarantee and safety net, surfaced and labeled. I clarified this in the text (6608dcb).
|
|
||
| - **PBKDF2 input is invariant under this convention.** Only the canonical English mnemonic reaches PBKDF2-HMAC-SHA512. An implementation that feeds the display mnemonic directly to PBKDF2 is non-conformant and produces incompatible seeds. The conformance test vectors in the reference registry exercise the resolve-to-English path for every supported language. | ||
| - **Strict single-wordlist tokenization.** On restore, every token in the display mnemonic MUST resolve within a single display wordlist. Wallets MUST NOT silently accept mnemonics whose tokens span multiple wordlists, partial-match across wordlists, or fall through to the canonical English wordlist when a display token is unrecognized. Mixed-wordlist input is malformed and is rejected. | ||
| - **Only the canonical English mnemonic guarantees cross-wallet recovery.** A user whose wallet supports a display wordlist can always recover the seed in any BIP-39 wallet by entering the canonical English mnemonic. A user who backs up only the display mnemonic and then needs to restore in a wallet that does not support the same display wordlist cannot recover without the mapping. The normative wallet-level obligations that follow from this property are defined in §Backup and portability policy above. |
There was a problem hiding this comment.
I was somewhat excited by your idea at first, but this approach seems to undermine a big portion of the potential utility of this BIP. If the wordlists are not intended to be stable, I am not sure I see the point.
There was a problem hiding this comment.
Agreed, and they are stable. The registry pins v1.0 with the SHA-256 as the load-bearing identifier; lists are frozen per version and never mutated in place. The Rationale line that implied otherwise was the bug, and I fixed it (6608dcb). Stability is the point, the same way it is for BIP-39 itself.
Thank you for clarifying. My goal isn't to stigmatize and I'm still trying to figure out the best way to handle LLM-generated submissions. I think it's mildly preferable to state upfront to readers and reviewers when the content is mostly LLM output, and to what extent, out of respect for their time. Some may indeed not see any issue. Others may not wish to spend scarce review cycles doing human review of LLM output, or may prefer to delegate review of LLM output out to LLMs, because human review is a scarce and expensive resource. The idea is to respect the community's time and help them allocate it well. |
- Remove the incorrect "future PR can correct, no funds at stake" line. Corrections are new versioned lists; published lists are frozen; backups resolve against the pinned version (SHA-256). - Clarify Backup MUST 1 is an availability obligation on the wallet, not a requirement that the user record a second English copy. - State explicitly that the BIP specifies a framework and blesses no individual wordlist as canonical; list creation belongs to language communities.
|
Thanks for the careful read. We agree on the core: normalizing to English under the hood is the win. On mechanical translation and "multiple lists per language", I think we're closer than it reads. The BIP ships no wordlists into this repo and blesses none as canonical. It specifies the framework: construction, mapping, and input rules, plus a conformance profile where every wordlist-level MUST maps to an executable check. The 30 lists live in a separate registry as a bootstrap corpus, supersedable by native-speaker review. I made that explicit in 6608dcb. So "ship a framework, leave creation to the communities" is the intended end state, not a conflict with it. On why I shipped a starting corpus and not just an empty framework: it expands practical BIP-39 coverage from 10 languages to ~30 today. The 10 canonical lists cover roughly a third of people by native language. The other two thirds, about 5 billion native speakers, have no list at all. A working corpus, even one communities later refine, lets wallets onboard those users now instead of waiting for 20 separate community list efforts to each reach completion. That reach, opening Bitcoin self-custody to people in their own language, is the whole point of the proposal. On "multiple lists cause confusion": that's what the (language, version, SHA-256) triple is for. Two lists for one language are two versions, and each backup names the one that produced it. BIP-39 today carries no version identifier at all, so this is strictly more robust, not less. On stability: I'm committing to immutability-by-version. A published list is frozen, corrections are new versions, no existing backup is invalidated. (You caught a Rationale line that said the opposite; fixed in 6608dcb.) On going further to a new, non-backwards-compatible scheme (indices to seed, version byte, stronger checksum, script-type encoding, distinct word count): I think that's worth doing, but it's a BIP-39 successor and a different document. This proposal's entire value is zero new cryptographic surface and universal restore in the installed base today, including English-only wallets. Folding a successor in forfeits exactly that, and helps none of those ~5 billion speakers now. I'd support a successor effort on its own track and would contribute, but I'd keep the two separate so this one stays deployable. |
|
This is not something I originally set out to work on. My main work is BlockSight.Live, a free Bitcoin explorer. I have worked in the Bitcoin ATM industry in Israel for the last 3+ years and have seen thousands of regular users interact with Bitcoin. My main goal has always been to build useful tools for Bitcoiners. While building a Bitcoin wallet with the native explorer integrated into it, I encountered the BIP39 language issue directly, and it bothered me. Users in Israel still generally have to write down their seed words in English. For many people, that is not natural, and I think it creates a real backup and recovery risk. I am not trying to change BIP39 itself. I am trying to explore whether this can be made better while keeping English BIP39 as the canonical base. |
This adds a Specification BIP draft, "Multilingual mnemonic display and input rules" (resubmission of the previously-closed #2192, updated).
A display wordlist is a 2048-entry list in a target language, index-parallel to the canonical English BIP-39 wordlist. PBKDF2 runs only on the canonical English mnemonic; native-language renderings are a display and input layer with no new cryptographic surface, and every seed produced under the convention is restorable in any BIP-39 wallet via its English form.
The preamble follows the BIP 3 format (
Authors,Assigned,Discussion; noDiscussions-To/Comments-*). I have not self-assigned a BIP number.Discussion
Reference implementation (MIT): https://github.com/osem23/bip39-wordlists-tzur — 30 index-paired display wordlists with bidirectional mappings, the 10 canonical BIP-39 wordlists preserved byte-for-byte for spec comparison, a reference validator enforcing every MUST clause, reference decoders in Python, JavaScript, and Swift producing byte-identical seeds, and per-language conformance test vectors across the five BIP-39 entropy lengths.
Shipped in production by the TZUR Wallet suite (iPhone and Windows).
License: BSD-2-Clause (document), MIT (reference implementation).