Part 2: Breaking Repeating Key XOR, the Theory

This is part two in a three part series on repeating key XOR encryption inspired by Cryptopals challenges five and six of set one. In part one I broke down what repeating key XOR is and how to implement it. I'd recommend you start there.

Part two (this part) describes the theory behind breaking said encryption. There's no coding in here but this may be the hardest of the posts to wrap your mind around. It was for me anyway.

Part three will implement what we learn here in Python.

As with part one, to grok what follows you shouldn't need much. Basic encryption knowledge will help but if you don't have it, read on anyway, Google is your friend.

The process

Before covering specifics, let's first state the general process... To break repeating key XOR we need to:

  1. Pick a range of possible key lengths between 2 and n to check. For the Cryptopals challenge they recommend 2 to 40.
  2. For each key length guess, first split the ciphertext up into chunks of size KEYLENGTH and then test each chunk for the KEYLENGTH that provides the best average Hamming distance across all the chunks. The best average distance is probably produced by the real key length.
  3. Once we have our suspected key length, transpose the original ciphertext into chunks (again) where the first chunk consists of the first byte of every KEYLENGTH long block of text, the second chunk consists of the second byte of each KEYLENGTH block and so on.
  4. For each of these new chunks do frequency analysis to uncover the single byte that was used to encrypt the chunk of bytes.
  5. Chain the individual key bytes we discovered in the previous step together to get the likely key.
  6. Decrypt the text using the key and see if we've found our huckleberry.

Some of this may be confusing. Don't worry we'll elaborate on each step below.

So, where should we start? Probably with...

Hamming Distance

Hamming distance is the total number of differences between two equal length strings:

abcde
abcef

Hamming Distance = 2

When we're talking about computers usually we mean the distance between the binary representation of the strings (or hex values or bytes or whatever). Here's an example with a meaningless octet:

0010 1101
0110 1011

Hamming distance = 3

Why is this important?

The true key length should have a very low (possibly the lowest) Hamming distance, so in guessing key lengths and checking Hamming distances between KEYLENGTH chunks of ciphertext we can zone in on the real key length by finding the lowest average Hamming distance across all chunks. Knowing the key length is a "key" (heh heh) step towards determining the key.

Why does a low Hamming distance imply the real key length?

The answer on this Stack Exchange question explains it more succinctly than I could but it helps if we assume that the characters in the plaintext are not uniformly random. The plaintext is likely something that makes sense to a person or a machine so that "something" probably has a logical grouping of characters and isn't just random garbage.

A couple other assumptions that we're making in this specific case to narrow things down...

  1. The plaintext and key are probably in ASCII (which is a small subset of all possible data).
  2. The plaintext and key are possibly in English (obviously not always a safe bet).

Considering this, the Hamming distance between two random ASCII characters (or further, two random English characters) represented as bytes should be less than the hamming distance between two totally random (evenly distributed) bytes.

Still confused...

Yeah, it was hard for me to wrap my mind around too! Basically if we guess key lengths and, for each guess, we find the hamming distance between adjacent KEYLENGTH sized chunks of plaintext, then the lowest Hamming distance implies that those chunks of text are not uniformly random. i.e. it's some logical text. i.e. we've likely found our key length. Check this out...

Plaintext:           Meet me at midnight
Repeating Key:       KEYKEYKEYKEYKEYKEYK
Ciphertext (in hex): 06203c3f65342e65383f65342221372222313f
Note that the hex representation is twice as long as the plaintext because a single character is always represented by 8 bits and each hex value is 4 bits wide.

So if we guessed the KEYLENGTH is 3, our chunks would be...

Mee and t m (06203d and 3f6534 in the hex ciphertext)

e a and t m (2e6538 and 3f6534 in the hex ciphertext)

idn and igh (222137 and 222231 in the hex ciphertext)

t (3f in the hex ciphertext)

The Hamming distance between those hex ciphertext chunks, on average, should be the lowest when the guessed KEYLENGTH is the correct one. This is because each grouping of bytes is encrypted with the same key value (the string KEY in our case). If we guess that KEYLENGTH is 2 then we'd be comparing bytes encrypted with KE to bytes encrypted with YK, then bytes encrypted with EY  to those encrypted with KE and so on, which gives us garbage. Same with if we guessed a key length of 4 or 5 or 10,426.

Now that we understand Hamming distance and its purpose, we can find the distance between each chunk, normalize it by dividing by KEYLENGTH and compare that to the previous lowest distance.

"Normalize" here means we're making it so each key length we're using to calculate Hamming distances is comparable to every other key length. Dividing the distance by the current length accomplishes this.

If the normalized distance is lower than the previous lowest, then the new length has a higher chance of being correct. Typically you will do this with each key length for every chunk (not just the first one) and average the distances over all chunks to get a single "score."

Averaging over all chunks instead of just checking the first one gives us a broader picture. A low score on one chunk could be a fluke, but a low score averaged across all chunks is likely not.

Grouping The Encrypted Data By Key Length

OK, so we've learned how to use Hamming distance to find key length. How does this help our cause?

Well, we now have enough information to do a frequency analysis attack. You may think that the fact that each plaintext character was encrypted against something different makes this tricky and it does, it's where the security in this scheme lies. However, the key rotates, which means every KEYLENGTH character in the key was used to encrypted every KEYLENGTH character in the plaintext.

Knowing this, we can transpose the ciphertext into chunks corresponding to the position in the key. This is number 3 in "The Process" section at the beginning of this post.

Continuing with out ciphertext example (06203c3f65342e65383f65342221372222313f), which we highly suspect used a 3 character long key,  our groupings would look like

Data encrypted with key position 1: 06 3f 2e 3f 22 22 3f
Data encrypted with key position 2: 20 65 65 65 21 22
Data encrypted with key position 3: 3c 34 38 34 37 31
Note I put spaces in between each hex block just to make it clear that each two digit hex block represents a single character.

This groups it such that each chunk must be encrypted with a specific character. We can then use frequency analysis on each chunk to determine the character used to encrypt that chunk. Combine all the characters we find and we've got the key.

So, how do we do "frequency analysis?"

ETAOIN SHRDLU

Remember how we said we're assuming this is in English? Well, as Wikipedia states, ETAOIN SHRDLU "is the approximate order of frequency of the 12 most commonly used letters in the English language."

The space character technically isn't included as a "most frequent character" but I have found it helpful since spaces are all over the place in written English.

Knowing this trick, we can...

  1. For every chunk, go through the list of all ASCII characters and XOR each one with the chunk's individual ciphertext bytes and then...
  2. Check how many "decrypted" characters are also in ETAOIN SHRDLU.

The key byte that produces the most overlap with our new mnemonic is likely the character that was used to XOR the bytes within our transposed chunks. We repeat this this for every chunk, keeping track of each key byte we uncover. Once we've decrypted KEYLENGTH chunks those bytes (combined sequentially) should make up the key that was used to encrypt the plaintext. Hot damn! Let's see it in action...

Continuing with our example, our first chunk of hex bytes is 06 3f 2e 3f 22 22 3f. As we're looping through ASCII characters and XORing them with each of these bytes we're finding mostly garbage, but then we get to the ascii character K (represented in hex as 4B). We do our usual steps...

4b ^ 06 -> 4d (M)
4b ^ 3f -> 74 (t)
4b ^ 2e -> 65 (e)
4b ^ 3f -> 74 (t)
4b ^ 22 -> 69 (i)
4b ^ 22 -> 69 (i)
4b ^ 3f -> 74 (t)

Hmm, wait a minute, that's interesting... The first noteworthy thing here is that all the XOR'd values, when converted to characters, turn out to be letters in the English alphabet. Secondly, all the letters except M are in ETAOIN SHRDLU.

We are on to something!

Repeat this process with the next chunk (20 65 65 65 21 22) and guess what, we'll see something similar when we do our XORing with the letter E. Do it with our 3 chunks and we'll have a good chance of building the real plaintext key. Pretty cool, eh?

Decrypting

This is the easy part. We've got a key, now we simply need to reverse the original process. Let's XOR the repeating key with our ciphertext and if we've done everything correctly the resulting plaintext should be that secret we've been poking at for so long.

What if I got this far and it's just nonsense?

If the decrypted text is unrecognizable, a few things could have happened to lead you down the wrong path.

  1. It could be the best scoring key length the correct one. A solution here would be to keep track of say, the top three key scores, and try the process with all of them.
  2. You may need to experiment with longer key lengths during the Hamming distance phase.
  3. ETAOIN SHRDLU might not be sufficient for frequency analysis. A more thorough approach would be to provide a score to every letter in the English language and use that to build a stronger case.
  4. It may not be in English. You could make guesses as to what language it's in and learn the frequencies for letters in other languages (having some context around the ciphertext is helpful here). Perhaps the plaintext is GPS coordinates which would require a different approach (i.e. numbers would be very frequent, letters not so much).
  5. You may be misunderstanding the plaintext. Maybe it's not actually nonsense, maybe it's just something you weren't expecting.

Conclusion

Whew, that was a lot of theory... this is admittedly a complex process and I hope it wasn't too confusing. I'd urge you to try coding it up yourself before moving on to the implementation details in the next post. You'll learn a lot from it and it's an incredibly satisfying feeling to crack the code on your own.

Next up: Cracking this thing programmatically.

Resources