Encryption monoalphabetic

Encryption monoalphabetic

A cipher system is monoalphabetic when each character is replaced by a determined character in the alphabet of the cipher text

From the ancient times to our days, have sent secret messages

The need to communicate secretly has occurred in diplomacy and between military

With the advent of electronic communications, the interest in maintaining messages unintelligible to all except the receiver has done nothing but increase

To introduce a few terms before we get in, we will say that cryptology is the discipline dedicated to communicate secretly

Cryptography is part of cryptology that deals with the design and implementation of systems secrets and cryptanalysis which is dedicated to break such systems

I would like to start with a very simple system that can be explained mathematically speaking using modular arithmetic

Perhaps the first of these systems had their origin with Julius Caesar, encryption was simply to replace a letter by the one three places further in the alphabet that is To be transformed into D, B into E, and so on until Z became C

Throughout this article for simplicity I will use standard English alphabet 26 letters:

\tiny\begin{pmatrix} 0& 1& 2& 3& 4& 5& 6& 7& 8& 9& 10& 11& 12& 13& 14& 15& 16& 17& 18& 19& 20& 21& 22& 23& 24& 25 \\ A& B& C& D& E& F& G& H& I& J& K& L& M& N& O& P& Q& R& S& T& U& V& W& X& Y& Z \\ \end{pmatrix}

Which is enough for most of the encrypted text-based and has the advantage of occupying positions in successive ASCII code, which makes it very advantageous to schedule

Well, the cipher of Julius Caesar could be expressed as well C\equiv P+3\pmod{26} where we have assigned the number 0, B 1, ... , Z to 25, and \pmod{26} indicates that we should take the remainder of dividing by 26 (in C language we use the % operator ) C is the ciphertext and P the original

Frequency of letters

In the cryptanalysis of some classical methods it is interesting to know the frequency of letters, pairs of letters, and words in the language in which we assume that it is written that message

Here are some data useful for the English language:

Letters high-frequency
Letter Frequency %
E 12,70
T 9,06
A 8,17
O 7,51
I 6,97
N 6,75
S 6,33
H 6,09

Letters of average frequency
Letter Frequency %
D 4,25
L 4,03
C 2,78
U 2.76
M 2,41
W 2,36
F 2,23
G 2.02

Letters of low frequency
Letter Frequency %
Y 1,97
P 1,93
B 1,49
V 0,98
K 0,77

The rest of the letters J, Q, X and Z have frequency less than 0.5% and can be considered so “rare”

Summarizing the above data and applying them by groups of letters, we could say:

  • The vowels occupy about 38% of the text

  • Only the E and the A are identified with relative reliability because they stand out much over the others

  • The letters of high frequency and accounted for 63% of the total

  • The consonants most frequent are T, N, S, H (around 28%)

  • The letters least common are J, Q, X and Z (little more than 1%)

Most frequent words
Word Frequency (per billion)
THE 56271872
OF 33950064
AND 29944184
TO 25956096
IN 17420636
I 11764797
THAT 11073318
WAS 10078245
HIS 8799755
HE 8397205
IT 8058110

Two-letter words
Word Frequency (per billion)
OF 33950064
TO 25956096
IN 17420636
HE 8397205
IT 8058110
IS 7557477
AS 7037543
BE 5662527
ON 5113263
AT 5091841

Three-letter words
Word Frequency (per billion)
THE 56271872
AND 29944184
THAT 11073318
WAS 10078245
HIS 8799755
FOR 7097981
HAD 6139336
YOU 6048903
NOT 5741803
HER 5202501

Four-letter words
Word Frequency (per billion)
WITH 7725512
HAVE 4346500
FROM 4108111
WERE 3323884
SAID 2637136
THEM 2509917
BEEN 2357654
WILL 2320022
WHEN 1980046
MORE 1899787

Example of encryption monoalphabetic

MESSAGE SENT YESTERDAY we break the structure in the words of the message by deleting punctuation marks, if any, by putting for example MESSAGESENTYESTERDAY and, we get the numerical equivalents of these letters:

\tiny\begin{pmatrix} 12& 4& 18& 18& 0& 6& 4& 18& 4& 13& 19& 24& 4& 18& 19& 4& 17& 3& 0& 24 \\ M& E& S& S& A& G& E& S& E& N& T& Y& E& S& T& E& R& D& A& Y \\ \end{pmatrix}

by applying the transformation P+3\pmod{26} become

\tiny\begin{pmatrix} 15& 7& 21& 21& 3& 9& 7& 21& 7& 16& 22& 1& 7& 21& 22& 7& 20& 6& 3& 1 \\ P& H& V& V& D& J& H& V& H& Q& W& B& H& V& W& H& U& G& D& B \\ \end{pmatrix}

that is to say the encrypted message is now PHVVDJHVHQWBHVWHUGDB

A cipher of this type is ridiculously easy to break (but remember that it was also very easy to do), it is sufficient to test 25 possible offsets from P + 1 to P + 25, and with a glance we will know which is the message

We have used in this case, a cryptanalysis called “brute-force” because we test all the keys (in this case displacement) possible

There are some ways to improve this method, without complicate it too much, the first is based on choosing a key word with all different letters, let's say that we choose VIRTUAL ZONE

We write then the normal alphabet along with the transformed as follows:

\tiny\begin{pmatrix} A& B& C& D& E& F& G& H& I& J& K& L& M& N& O& P& Q& R& S& T& U& V& W& X& Y& Z \\ V& I& R& T& U& A& L& Z& O& N& E& B& C& D& F& G& H& J& K& M& P& Q& S& W& X& Y \\ \end{pmatrix}

and now the message along with the encryption would be

\tiny\begin{pmatrix} M& E& S& S& A& G& E& S& E& N& T& Y& E& S& T& E& R& D& A& Y \\ C& U& K& K& V& L& U& K& U& D& M& X& U& K& M& U& J& T& V& Y \\ \end{pmatrix}

now a brute-force attack is “somewhat” more expensive so you should try with all the alphabets of possible substitution that are 26!=403291461126605635584000000 or is a few more than the 25 from before

This method has the following weakness: with certain keys, the final letters of the alphabet are left unchanged, and this greatly facilitates the work of the cryptanalyst

The key in our example is chosen so that they appear in her letters as V, U, Z near the end of the alphabet, and they produce a greater “disorder” in the alphabet transformed

In any case, in an encryption like this uses what is called a frequency analysis. Consists of: knowing the frequency of letters in English (if you don't know in what language it is written in the original can cost you more work) try to guess which letter corresponds to each one of them

For example, in the last encrypted message CUKKVLUKUDMXUKMUJTVY it is noted that the letter repeated is the U, like the letter most frequent in English, is the And we may conjecture that U corresponds with the E as in effect and is following with the other letters can be ascertained enough to be able to read the original message