In cryptanalysis, frequency analysis is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers.
Definition from: Wikipedia.org
In Cryptography 101, you learned about some of the most common and simple ciphers used to encipher messages. It was mentioned a number of times during the course that these ciphertexts can be cracked using a method called Frequency Analysis.
This course explains what Frequency Analysis is, and how to apply it to a ciphertext in order to discern the plaintext.
Frequency Analysis relies on the patterns inherent in the English language (and indeed other languages.) You need to have an understanding of which letters are most commonly used, common pairings of letters and common short words; all of these will be supplied within this course.
In order to decipher a ciphertext, you need to study the frequency with which letters appear and apply your knowledge of language patterns to what you discern. For example, assuming "E" is the most common letter in the English language, and "X" is the most common letter in the ciphertext, it can be assumed that the X's in plaintext are actually E's.
The graph to the right shows the frequency with which letter occur in the English language. As you can see, E is by far the most common, followed (in order) by T, A, then O.
Digrams are a pair letters. The digrams listed below are the most common pairs of letters found in the English language.
Trigrams are like digrams, expect that they are the most common groups of three letters.
The following is a list of letter that are commonly double, such as in the word, "see."
The following is a list of letters that commonly end words.
The following list shows the most common English words.
The best way to learn is by doing, the next best option is to see it being done. There are a couple of ways that ciphertext can be given to you. The first, and easiest, is with space and/or punctuation from the plaintext retained. The second, and more difficult, is with spaces and punctuation removed.
This course will be on the more easier variety. In all cases, uppercase letters will denote text still ciphered and lowercase letters will denote plaintext.
Raken has sent an enciphered message to Muz Ashen. He didn't want other to be able to see what was contained in the message, but didn't have the opportunity to share the method which which to decipher it prior to transmission. Raken knows that if he leaves punctuation in the message, Muz will be able to easily understand the message.
The message Muz receives is:
GSVIV RH Z GRWV RM GSV ZUUZRIH LU NVM, DSRXS GZPVM ZG GSV UOLLW, OVZWH LM GL ULIGFMV. LNRGGVW, ZOO GSV ELBZTV LU GSVRI ORUV RH YLFMW RM HSZOOLDH ZMW RM NRHVIRVH. LM HFXS Z UFOO HVZ ZIV DV MLD ZUOLZG. ZMW DV NFHG GZPV GSV XFIIVMG DSVM RG HVIEVH, LI OLHV LFI EVMGFIVH.
Muz scans the ciphertext and notes that "V" is the most common single letter, "SV" the most common bigram, and GSV is the most common trigram. Using the information above, we know that, in plaintext English, E is the most common single letter, TH the most common digram, and THE the most common trigram.
There is a strong probability that, given what Muz has found, the following is true:
V ~ e
S ~ h
G ~ t
Therefore, GSV in the ciphertext is likely to be the.
The next most common letters are, considering that "G" is already accounted for, are "Z" and "L". Muz knows that the next most common letters in English are A and O, and guess that:
Z ~ a
L ~ o
*n.b. In longer ciphertexts it should be less necessary to guess which letter represents "a." *
Applying these assumptions to the ciphertext, the following is seen:
theIe RH a tRWe RM the aUUaRIH oU NeM, DhRXh taPeM at the UOooW, OeaWH oM to UoItFMe. oNRtteW, aOO the EoBaTe oU theRI ORUe RH YoFMW RM HhaOOoDH aMW RM NRHeIReH. oM HFXh a UFOO Hea aIe De MoD aUOoat. aMW De NFHt taPe the XFIIeMt DheM Rt HeIEeH, oI OoHe oFI EeMtFIeH.
Muz scans the text and notices patterns which could provide more information about the cipher.
theIe could be there, which would mean I ~ r
taPe could be take, which would mean P ~ k
there RH a tRWe RM the aUUaRrH oU NeM, DhRXh takeM at the UOooW, OeaWH oM to UortFMe. oNRtteW, aOO the EoBaTe oU theRr ORUe RH YoFMW RM HhaOOoDH aMW RM NRHerReH. oM HFXh a UFOO Hea are De MoD aUOoat. aMW De NFHt take the XFrreMt DheM Rt HerEeH, or OoHe oFr EeMtFreH.
Even more of the pattern is now apparent. Muz now realizes that:
takeM could be taken, which would mean M ~ n
theRr could be their, which would mean R ~ i
Given the assumption that M ~ n, aMW could be and, which would mean W ~ d
there iH a tide in the aUUairH oU Nen, DhiXh taken at the UOood, OeadH on to UortFne. oNitted, aOO the EoBaTe oU their OiUe iH YoFnd in HhaOOoDH and in NiHerieH. on HFXh a UFOO Hea are De noD aUOoat. and De NFHt take the XFrrent Dhen it HerEeH, or OoHe oFr EentFreH.
Each iteration of this process reveals more and more patterns Muz can make assumptions upon, until he finally determines that the the cipher is:
A ~ Z
B ~ Y
C ~ X
D ~ W
E ~ V
F ~ U
G ~ T
H ~ S
I ~ R
J ~ Q
K ~ P
L ~ O
M ~ N
N ~ M
O ~ L
P ~ K
Q ~ J
R ~ I
S ~ H
T ~ G
U ~ F
V ~ E
W ~ D
X ~ C
Y ~ B
Z ~ A
And the message from Raken is a quote of William Shakespeare:
There is a tide in the affairs of men, Which taken at the flood, leads on to fortune. Omitted, all the voyage of their life is bound in shallows and in miseries. On such a full sea are we now afloat. And we must take the current when it serves, or lose our ventures.
You should now have an understanding of Frequency Analysis and how to use it to decipher ciphertexts. Proceed onto the examination and apply your knowledge.
Please log in to take this course's exam