Encryption monoalphabetic
A cipher system is monoalphabetic when each character is replaced by a determined character in the alphabet of the cipher text
From the ancient times to our days, have sent secret messages
The need to communicate secretly has occurred in diplomacy and between military
With the advent of electronic communications, the interest in maintaining messages unintelligible to all except the receiver has done nothing but increase
To introduce a few terms before we get in, we will say that cryptology is the discipline dedicated to communicate secretly
Cryptography is part of cryptology that deals with the design and implementation of systems secrets and cryptanalysis which is dedicated to break such systems
I would like to start with a very simple system that can be explained mathematically speaking using modular arithmetic
Perhaps the first of these systems had their origin with Julius Caesar, encryption was simply to replace a letter by the one three places further in the alphabet that is To be transformed into D, B into E, and so on until Z became C
Throughout this article for simplicity I will use standard English alphabet 26 letters:
\tiny\begin{pmatrix} 0& 1& 2& 3& 4& 5& 6& 7& 8& 9& 10& 11& 12& 13& 14& 15& 16& 17& 18& 19& 20& 21& 22& 23& 24& 25 \\ A& B& C& D& E& F& G& H& I& J& K& L& M& N& O& P& Q& R& S& T& U& V& W& X& Y& Z \\ \end{pmatrix}
Which is enough for most of the encrypted text-based and has the advantage of occupying positions in successive ASCII code, which makes it very advantageous to schedule
Well, the cipher of Julius Caesar could be expressed as well C\equiv P+3\pmod{26} where we have assigned the number 0, B 1, ... , Z to 25, and \pmod{26} indicates that we should take the remainder of dividing by 26 (in C language we use the % operator ) C is the ciphertext and P the original
Frequency of letters
In the cryptanalysis of some classical methods it is interesting to know the frequency of letters, pairs of letters, and words in the language in which we assume that it is written that message
Here are some data useful for the English language:
Letter | Frequency % |
---|---|
E | 12,70 |
T | 9,06 |
A | 8,17 |
O | 7,51 |
I | 6,97 |
N | 6,75 |
S | 6,33 |
H | 6,09 |
Letter | Frequency % |
---|---|
D | 4,25 |
L | 4,03 |
C | 2,78 |
U | 2.76 |
M | 2,41 |
W | 2,36 |
F | 2,23 |
G | 2.02 |
Letter | Frequency % |
---|---|
Y | 1,97 |
P | 1,93 |
B | 1,49 |
V | 0,98 |
K | 0,77 |
The rest of the letters J, Q, X and Z have frequency less than 0.5% and can be considered so “rare”
Summarizing the above data and applying them by groups of letters, we could say:
-
The vowels occupy about 38% of the text
-
Only the E and the A are identified with relative reliability because they stand out much over the others
-
The letters of high frequency and accounted for 63% of the total
-
The consonants most frequent are T, N, S, H (around 28%)
-
The letters least common are J, Q, X and Z (little more than 1%)
Word | Frequency (per billion) |
---|---|
THE | 56271872 |
OF | 33950064 |
AND | 29944184 |
TO | 25956096 |
IN | 17420636 |
I | 11764797 |
THAT | 11073318 |
WAS | 10078245 |
HIS | 8799755 |
HE | 8397205 |
IT | 8058110 |
Word | Frequency (per billion) |
---|---|
OF | 33950064 |
TO | 25956096 |
IN | 17420636 |
HE | 8397205 |
IT | 8058110 |
IS | 7557477 |
AS | 7037543 |
BE | 5662527 |
ON | 5113263 |
AT | 5091841 |
Word | Frequency (per billion) |
---|---|
THE | 56271872 |
AND | 29944184 |
THAT | 11073318 |
WAS | 10078245 |
HIS | 8799755 |
FOR | 7097981 |
HAD | 6139336 |
YOU | 6048903 |
NOT | 5741803 |
HER | 5202501 |
Word | Frequency (per billion) |
---|---|
WITH | 7725512 |
HAVE | 4346500 |
FROM | 4108111 |
WERE | 3323884 |
SAID | 2637136 |
THEM | 2509917 |
BEEN | 2357654 |
WILL | 2320022 |
WHEN | 1980046 |
MORE | 1899787 |
Example of encryption monoalphabetic
MESSAGE SENT YESTERDAY we break the structure in the words of the message by deleting punctuation marks, if any, by putting for example MESSAGESENTYESTERDAY and, we get the numerical equivalents of these letters:
\tiny\begin{pmatrix} 12& 4& 18& 18& 0& 6& 4& 18& 4& 13& 19& 24& 4& 18& 19& 4& 17& 3& 0& 24 \\ M& E& S& S& A& G& E& S& E& N& T& Y& E& S& T& E& R& D& A& Y \\ \end{pmatrix}
by applying the transformation P+3\pmod{26} become
\tiny\begin{pmatrix} 15& 7& 21& 21& 3& 9& 7& 21& 7& 16& 22& 1& 7& 21& 22& 7& 20& 6& 3& 1 \\ P& H& V& V& D& J& H& V& H& Q& W& B& H& V& W& H& U& G& D& B \\ \end{pmatrix}
that is to say the encrypted message is now PHVVDJHVHQWBHVWHUGDB
A cipher of this type is ridiculously easy to break (but remember that it was also very easy to do), it is sufficient to test 25 possible offsets from P + 1 to P + 25, and with a glance we will know which is the message
We have used in this case, a cryptanalysis called “brute-force” because we test all the keys (in this case displacement) possible
There are some ways to improve this method, without complicate it too much, the first is based on choosing a key word with all different letters, let's say that we choose VIRTUAL ZONE
We write then the normal alphabet along with the transformed as follows:
\tiny\begin{pmatrix} A& B& C& D& E& F& G& H& I& J& K& L& M& N& O& P& Q& R& S& T& U& V& W& X& Y& Z \\ V& I& R& T& U& A& L& Z& O& N& E& B& C& D& F& G& H& J& K& M& P& Q& S& W& X& Y \\ \end{pmatrix}
and now the message along with the encryption would be
\tiny\begin{pmatrix} M& E& S& S& A& G& E& S& E& N& T& Y& E& S& T& E& R& D& A& Y \\ C& U& K& K& V& L& U& K& U& D& M& X& U& K& M& U& J& T& V& Y \\ \end{pmatrix}
now a brute-force attack is “somewhat” more expensive so you should try with all the alphabets of possible substitution that are 26!=403291461126605635584000000 or is a few more than the 25 from before
This method has the following weakness: with certain keys, the final letters of the alphabet are left unchanged, and this greatly facilitates the work of the cryptanalyst
The key in our example is chosen so that they appear in her letters as V, U, Z near the end of the alphabet, and they produce a greater “disorder” in the alphabet transformed
In any case, in an encryption like this uses what is called a frequency analysis. Consists of: knowing the frequency of letters in English (if you don't know in what language it is written in the original can cost you more work) try to guess which letter corresponds to each one of them
For example, in the last encrypted message CUKKVLUKUDMXUKMUJTVY it is noted that the letter repeated is the U, like the letter most frequent in English, is the And we may conjecture that U corresponds with the E as in effect and is following with the other letters can be ascertained enough to be able to read the original message