Top Qs
Timeline
Chat
Perspective
PGP word list
Words for conveying data bytes in speech From Wikipedia, the free encyclopedia
Remove ads
The PGP Word List ("Pretty Good Privacy word list", also called a biometric word list for reasons explained below) is a list of words for conveying data bytes in a clear unambiguous way via a voice channel. They are analogous in purpose to the NATO phonetic alphabet, except that a longer list of words is used, each word corresponding to one of the 256 distinct numeric byte values.
The revision history of this page may contain copyright violations. Certain historical revisions of this page may meet criterion RD1 for revision deletion, as they contain significant copyright violations of https://web.archive.org/web/20100326141145/http://web.mit.edu/network/pgpfone/manual/index.html#PGP000062 (Copyvios report) that have been removed in the meantime. A comment was left for the reviewing administrator: PLEASE CHECK TALK PAGE! I've launched copyright investigation. If you think this request meets to RD1, please remove {{copyvio}} below, and close the investigation.
Note to admins: In case of doubt, remove this template and post a message asking for review at WT:CP. With this script, go to the history with auto-selected revisions. Note to the requestor: Make sure the page has already been reverted to a non-infringing revision or that infringing text has been removed or replaced before submitting this request. This template is reserved for obvious cases only, for other cases refer to Wikipedia:Copyright problems. Note to others: Please do not remove this template until an administrator has reviewed it. |
Remove ads
History and structure
Summarize
Perspective
The PGP Word List was designed in 1995 by Patrick Juola, a computational linguist, and Philip Zimmermann, creator of PGP.[1][2] The words were carefully chosen for their phonetic distinctiveness, using genetic algorithms to select lists of words that had optimum separations in phoneme space. The candidate word lists were randomly drawn from Grady Ward's Moby Pronunciator list as raw material for the search, successively refined by the genetic algorithms. The automated search converged to an optimized solution in about 40 hours on a DEC Alpha, a particularly fast machine in that era.
The Zimmermann–Juola list was originally designed to be used in PGPfone, a secure VoIP application, to allow the two parties to verbally compare a short authentication string to detect a man-in-the-middle attack (MiTM). It was called a biometric word list because the authentication depended on the two human users recognizing each other's distinct voices as they read and compared the words over the voice channel, binding the identity of the speaker with the words, which helped protect against the MiTM attack. The list can be used in many other situations where a biometric binding of identity is not needed, so calling it a biometric word list may be imprecise. Later, it was used in PGP to compare and verify PGP public key fingerprints over a voice channel. This is known in PGP applications as the "biometric" representation. When it was applied to PGP, the list of words was further refined, with contributions by Jon Callas. More recently, it has been used in Zfone and the ZRTP protocol, the successor to PGPfone.
The list is actually composed of two lists, each containing 256 phonetically distinct words, in which each word represents a different byte value between 0 and 255. Two lists are used because reading aloud long random sequences of human words usually risks three kinds of errors: 1) transposition of two consecutive words, 2) duplicate words, or 3) omitted words. To detect all three kinds of errors, the two lists are used alternately for the even-offset bytes and the odd-offset bytes in the byte sequence. Each byte value is actually represented by two different words, depending on whether that byte appears at an even or an odd offset from the beginning of the byte sequence. The two lists are readily distinguished by the number of syllables; the even list has words of two syllables, the odd list has three. The two lists have a maximum word length of 9 and 11 letters, respectively. Using a two-list scheme was suggested by Zhahai Stewart.
Remove ads
Word lists
Summarize
Perspective
An editor has launched a copyright investigation involving this section. The text under investigation is currently hidden from public view, but is accessible in the page history. Please do not remove this notice or restore blanked content until the issue is resolved by an administrator, copyright clerk, or volunteer response agent.
The purported copyright violation copies text from https://web.archive.org/web/20100326141145/http://web.mit.edu/network/pgpfone/manual/index.html#PGP000062 ; as such, this page has been listed on the copyright problems page. Unless the copyright status of the text of this page or section is clarified and determined to be compatible with Wikipedia's content license, the problematic text and revisions or the entire page may be deleted one week after the time of its listing (i.e. after 13:00, 19 September 2025 (UTC)). What can I do to resolve the issue?
Steps to list an article at Wikipedia:Copyright problems:
|
Remove ads
Examples
Each byte in a bytestring is encoded as a single word. A sequence of bytes is rendered in network byte order, from left to right. For example, the leftmost (i.e. byte 0) is considered "even" and is encoded using the PGP Even Word table. The next byte to the right (i.e. byte 1) is considered "odd" and is encoded using the PGP Odd Word table. This process repeats until all bytes are encoded. Thus, "E582" produces "topmost Istanbul", whereas "82E5" produces "miser travesty".
A PGP public key fingerprint that displayed in hexadecimal as
E582
94F2
E9A2
2748
6E8B
061B
31CC
528F
D7FA
3F19
would display in PGP Words (the "biometric" fingerprint) as
topmost Istanbul
Pluto vagabond
treadmill Pacific
brackish dictator
goldfish Medusa
afflict bravado
chatter revolver
Dupont midsummer
stopwatch whimsical
cowbell bottomless
The order of bytes in a bytestring depends on endianness.
Other word lists for data
Summarize
Perspective
There are several other word lists for conveying data in a clear unambiguous way via a voice channel:
- the NATO phonetic alphabet maps individual letters and digits to individual words
- the S/KEY system maps 64 bit numbers to 6 short words of 1 to 4 characters each from a publicly accessible 2048-word dictionary. The same dictionary is used in RFC 1760 and RFC 2289.
- the Diceware system maps five base-6 random digits (almost 13 bits of entropy) to a word from a dictionary of 7,776 distinct words.
- the Electronic Frontier Foundation has published a set of improved word lists based on the same concept[3]
- FIPS 181: Automated Password Generator converts random numbers into somewhat pronounceable "words".
- mnemonic encoding converts 32 bits of data into 3 words from a vocabulary of 1626 words.[4]
- what3words encodes geographic coordinates in 3 dictionary words.
- the BIP39 standard permits encoding a cryptographic key of fixed size (128 or 256 bits, usually the unencrypted master key of a Cryptocurrency wallet) into a short sequence of readable words known as the seed phrase, for the purpose of storing the key offline. This is used in cryptocurrencies such as Bitcoin or Monero.
- Like the PGP word list, the Bytewords standard maps each possible byte to a word. There is only one list, rather than two. The words are uniformly four letters long and can be uniquely identified by their first and last letters
Remove ads
References
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads