In the beginning were transposition and substitution. Transposition ciphers, which date back to at least the fifth century B.C., are giant anagrams. You scramble all the letters according to an agreed-upon pattern and put them back together the same way. A grade-school example is the rail-fence. Extract the odd-numbered characters from a message, write them in sequence, and follow them by the even-numbered characters, so
S H U F F L E O F F T H I S M O R T A L C O I L
becomes
S U F E F T I M R A C I H F L O F H S O T L O L
Another transposition cipher, beloved of the Spartans, is the scytale, which is a wooden staff of a certain diameter. Wind a ribbon of leather or parchment around it, and write your message on it. Unwrap the ribbon, and voilà , your message is scrambled, awaiting delivery to someone with another wooden stick of precisely the same diameter. A scytale, if it’s of a fixed diameter, is just another version of the rail-fence cipher. If it isn’t fixed, then you’ve bought yourself a major hardware distribution problem.
Substitution ciphers are sometimes called Caesar ciphers after Julius Caesar, who didn’t invent them but employed them frequently. If you’ve ever seen a decoder ring from a cereal box you know what a Caesar cipher is. Take the alphabet and shift it three places. A becomes D, Q becomes T, Y goes around the bend and becomes B, and there you have it. Obviously if you restrict yourself to a simple shift you have only 25 distinct ciphers. (A cipher is a code in which each symbol stands for a single letter of the alphabet.) But if you allow any rearrangement of the alphabet, then you have 2525 unique ciphers, which is a lot.
This enormous number of ciphers to choose from makes substitution an advance over transposition. Once you know you’re dealing with a transposition cipher you’re halfway home, because there are only so many plausible ways to transpose letters. A transposition cipher, in other words, depends on the secrecy of the algorithm. But with a substitution cipher, even if you know that’s what it is, you need to discover the precise letter mapping, and there are far, far too many to try them all. A substitution cipher depends on the secrecy of the key, and it’s a helluva lot easier to change the key than it is to change the algorithm. The first principle of cryptography, laid down by Kerckhoff in 1883, is that any cipher that relies on the secrecy of the algorithm, as opposed to the key, is inherently insecure, because, sooner or later, someone will steal your scytale, or your Enigma machine. To this day you will see supposed security experts touting their new, proprietary, double-secret algorithms, which is an infallible sign of snake oil. “Security by obscurity” is the derisory term among cryptanalysts.
A few Arab scientists finally got wise to substitution ciphers around the ninth century AD. They began with the simple observation that some letters are a lot more common than others. And not only letters, but digrams and trigrams, sequences of two and three letters. Once you realize that 13% of the letters in ordinary English prose are E’s, 10% are T’s and 8% are A’s, and that three consecutive vowels or consonants are pretty rare, breaking substitution ciphers becomes straightforward. A decent armchair cryptanalyst, given enough ciphertext, can usually crack a simple substitution cipher in about ten minutes. Here’s the letter frequency table in English, along with a list of the commonest digrams and trigrams. (Scrabble follows the table fairly accurately, although not exactly, since in Scrabble it matters only how many words contain a given letter, not how common it is. The highest scoring letter relative to its frequency is H, which scores 4 despite being the 9th most common letter in the alphabet, its frequency being largely due to the definite article. The lowest is U, which scores 1 despite being 15th.)
Code makers tried many ruses to foil frequency analysis. Null symbols, which represent no letter at all, were added. More insidious were operation symbols, which represented not a letter but an instruction, such as “delete the previous letter.” Since one of the standard techniques to break a substitution cipher is to look for common words like “and” and “the,” special symbols were substituted for them. Messages were deliberately spelled badly, with, say, k’s for c’s. The best code breakers defeated all of these techniques.
The code makers needed a breakthrough, and it was supplied by Blaise de Vigenère. around 1550. Vigenère realized that the fundamental weakness of substitution ciphers is consistency: the same symbol in the ciphertext always maps to the same letter in the plaintext. He proceeded to invent the first polyalphabetic cipher. (This is incorrect; see the update below.) You begin with what’s called a Vigenère square:
Reading across the table, you see a simple series of Caesar ciphers: A is unshifted, B is shifted one letter, and so forth. Now you choose a code word; this is the key. So suppose we want to encrypt SHUFFLE OFF THIS MORTAL COIL with the keyword HAMLET. You begin by repeating the keyword to the length of the message. Our message has 24 letters, so we have HAMLETHAMLETHAMLETHAMLET. Now you encrypt the first letter, S, using the H row of the Vigenère square, so it becomes a Z. The next letter, H, is encrypted with the A row, which is unshifted, so it remains an H. The U is encrypted with the M row, becoming a G. And so forth. The encrypted message reads:
Z H G Q J E L O R Q X A P S Y Z V M H L O Z M E
You can try it yourself here.
In the Vigenère square the key is very small, just a single word, which vastly simplified the distribution problem. Modified substitution ciphers had grown so large that passing around complex codebooks was required. But its principal advantage is that it stymies basic frequency analysis, because one-to-one mapping between plaintext and ciphertext symbols no longer holds. In our message, for example, Z appears three times, representing two different letters, S and O. M appears twice, representing H and I. Conversely, the plaintext F is represented by Q, J, and R. The result is so nasty to analyze that Vigenère’s cipher was nicknamed le chiffre indechiffrable, with considerable justice. It remained unbroken for 300 years, and it took a genius to do it.
In Part II we’ll break the Vigenère cipher, defeat the Nazis, and complete this little history.
(Update: Actually it was Leon Alberti, in 1466, who first proposed polyalphabetic substitution. The “Vigenère” square first appears in Trithemius, in 1499, and Belaso, in 1553, added the repeating keyword. Vigenère himself synthesized these results in 1585. Thanks to AC Douglas for pointing some of this out.)
I don’t think that scytales really qualify as a cipher. In my own experiments with scytales (in the dim distant past) I distinctly remember writing letters so that the area covered by each letter was broken in two when you unwound the paper. You then ended up with a long strip of paper with odd looking marks down one edge, none of them looking particularly like a letter.
Now that’s your own damn fault. You just have to take pains to make the ribbon exactly one letter wide, and not to break up your letters. Neatness counts, even in cryptography.
I dunno; I thought it was a feature rather than a bug. Also, I think it’s the way the book I was reading said to do it.
But even if you make the ribbon one letter wide, and don’t break up your letters, it still isn’t any kind of cipher because each letter stands for itself. You’re not scrambling the meaning of each letter, you’re simply scrambling the position of them in the message. It’s not the same thing.
Aaron wrote: \”The code makers needed a breakthrough, and it was supplied by Blaise de Vigenère around 1550. Vigenère realized that the fundamental weakness of substitution ciphers is consistency: the same symbol in the ciphertext always maps to the same letter in the plaintext. He proceeded to invent the first polyalphabetic cipher.\”
Actually not. He was preceded by Alberti (c. 1460, approx) whose invention it (i.e., polyalphabetic substitution cipher) was.
ACD
Will: Since a code is at the level of words or phrases, while a cipher is at the level of letters, I think the scytale qualifies as a cipher. What would you call it?
AC: Yes, you’re right, I should have given at least some credit to Alberti, although he never developed his idea into a full-fledged encryption system. But the idea was his, not Vigenère’s.
I’ve re-read your original post, and OK, using scytales the way you describe counts as a transposition cipher. Perhaps because transposition ciphers are so weak, I rarely see the word "cipher" used to mean anything but "substitution cipher".
Thanks for this post, Aaron. Now I had to go back and re-read "The Code Book" (Simon Singh), where I found again the scytale, the Vigenre cipher, as well as (of course) Enigma, the Cipher Challenge (I did a couple while recovering from an operation) and the philosophical comment that an "unbreakable" code is tantamount to a challenge…
Aaron wrote: "AC: Yes, you’re right, I should have given at least some credit to Alberti, although he never developed his idea into a full-fledged encryption system."
I’m afraid you’re in error again. He most decidedly did "develop[] his idea into a full-fledged encryption system." It’s called the Alberti Disc (or Wheel), and ciphertext created with its use was unbreakable at the time, and for decades after.
I think you owe the man considerably more than "some credit" for his seminal contribution to cryptography.
ACD
AC: Rereading the section in Kahn instead of writing off the top of my head, I have to agree with you again. I’ll correct the text to reflect this, crediting you of course. Alberti was a bit vague about when to shift the alphabet "after three or four words"; but still, his contributions were far more significant than Vigenre’s.
Thanks for this essay on a subject unfamiliar to me. It’s always a pleasure to learn something new.
I’ll bet all the cryptographers in Hades are watching with fascination and envy while their modern counterparts work on computer codes.
Thanks, Alan. Part II will be much better. Modern cryptography is akin to magic.
Don’t know nothin’ about codes and stuff, but this looked interesting.
http://www.economist.com/science/displayStory.cfm?story_id=1666547
PS: Remember the scene in "A Christmas Story" where the kid decodes the ad for Ovaltine. Priceless.
Interesting article. We’re still a long way from a quick technique for factorization, which most mathematicians are still convinced is a "hard" problem (i.e., you can’t do much better than brute force). But I’ll talk about this in Part II, which should be up today.
You count 2525 possible substitution ciphers. I think the number is 26! or 26!-1 excluding the identity cipher, there being n! permutations of n things.