Monday, December 8, 2014

Solving a crypto puzzle with Python

A beginners guide to Python programming, to solve a Caesar cipher.
This December, computer security firm Sophos has been running a "12 Days of Christmas" contest, with cyber-related quizzes each day. So far the quizzes have ranged from hoaxes to malware authors to abandoned operating systems. Each of the questions have touched on topics relevant to hackers (using the traditional, inquisitive sense of the word ... hacking is not in and of itself evil!), and each have required skills useful to a cyber security pro - often, simply paying attention to detail and noticing clues.

Friday's quiz was a simple cryptography task:

This quiz from the Sophos Naked Security blog asks you to unscramble a Caesar cipher.

The combination of letters didn't lend itself to merely unscrambling words, so I suspected this to be a basic Caesar cipher. A Caesar cipher, also known as a rotation or shift cipher, is one of the earliest methods of encoding a word or phrase to hide the message from prying eyes. Each letter in the plaintext (or original) message is replaced with the letter a certain number of places away in the alphabet to form the ciphertext. For example, A becomes Z, B becomes A, C becomes B, and so forth.

A Caesar cipher substitutes each letter with another letter a certain number of places to the right or left in the alphabet.

Using a shift of one character as above, "HELLO" would become "GDKKN." Caesar and other substitution ciphers are not particularly strong - a computer can try all possible substitutions in a matter of seconds as we will see below - but they will at least foil a casual glance.

There are numerous web sites that will quickly and easily decode a Caesar cipher message ... but this looked like a good excuse to learn a new computer language. I've written code in a great many forms, but never had a reason to learn Python, the language of choice for many system administrators and security pros. So I set out to write a program that would unscramble such a message.

As I began, my 11 year old daughter sat down beside me and said "teach me to hack." Wow, what timing! Never mind that this was the same girl that the same morning groaned when I signed her up for the Hour of Code program at her elementary school. After showing her the encoded message and explaining how a Caesar cipher works, she worked out the solution on her own using a pencil and paper, but also realized that it is a time-consuming way of breaking the code. So let's dig into a more elegant solution.

We had a starting point: from the quiz question, we knew the ciphertext we wanted to decode:

cipher = 'FGETARV-QPNA'

Python is a very intuitive language. What might have taken a half dozen lines of code in other languages sometimes can be done in one line. I asked Heather how she thought we should begin, and her idea was to start by writing out the alphabet and a rotated alphabet. When I reminded her that we wanted the computer to do the tedious part, she suggested we go through each letter in the ciphertext and let the computer change it somehow. Bingo :-)

In a traditional shell I would do:

For i = 0 to len(cipher)-1 do
  output[i] = convert(cipher[i])

Python makes that a breeze: I can do something to every letter in a string by merely saying:

for c in cipher:
  print(c)

F
G
E
T
.
.
.

That's nice, but so far all we've done is break the ciphertext into individual letters. We still need to perform a substitution. To the computer, letters, numbers, special characters, etc. are simply one large "alphabet" (more accurately, a character set). We can use Python's ord() function to find a particular letter's position in that alphabet, and conversely the chr() function to find the letter that is at a particular position in the alphabet. Python helpfully allows us to mix math with these functions, forming the beginnings of a substitution. plan. Assuming a shift one character to the right:

for c in cipher:
  position = ord(c)
  newposition = position + 1
  newletter = chr(newposition)
  print(newletter)

G
H
F
U
.
.
.

This replaces each letter in the cipher by the letter to its right. That doesn't look like a sensible word though, and we don't want to have to rewrite the entire code for every possible shift, so I introduced Heather to defining our own functions. Here we move the substitution code into a function:

def substitute(c):
  position = ord(c)
  newposition = position + 1
  newletter = chr(newposition)
  print(c, ' ==> ', position, ' ==> ', newposition, ' ==> ', newletter)

for c in cipher:
  substitute(c)

F  ==>  70  ==>  71  ==>  G
G  ==>  71  ==>  72  ==>  H
E  ==>  69  ==>  70  ==>  F
T  ==>  84  ==>  85  ==>  U
.
.
.

With this output, she could see how the original letter was at a certain position in the alphabet, we increased that position by one, and the result was the letter one place to the right in the alphabet. And in seeing that, she picked up on an input validation we needed to consider: what happens when you get to Z? Somehow we need to wrap around to A. That's my girl!

With some experimentation, we came up with a sequence of commands that would check to see if the original letter were uppercase, lowercase, or something other than a letter; if uppercase, we performed the shift, then checked to see if we were still within the range of uppercase letters (positions 65 through 90 in the standard ASCII character set). If the result were beyond Z, we subtracted 26, effectively wrapping around to A. If the result were before A, we added 26, wrapping around to Z. Along the way we also noticed the hyphen in the original ciphertext, and decided that any non-alphabet characters would simply be kept as they were. We also turned the shift factor into a variable r as well:

def substitute(c,r):
  if c.isalpha():
    if ord(c) in range(97,123):
      if ord(c)+r in range(97,123):
        return chr(ord(c)+r)
      elif ord(c)+r > 122:
        return chr(ord(c)+r-26)
      elif ord(c)+r < 97:
        return chr(ord(c)+r+26)
    elif ord(c) in range(65,91):
      if ord(c)+r in range(65,91):
        return chr(ord(c)+r)
      elif ord(c)+r > 90:
        return chr(ord(c)+r-26)
      elif ord(c)+r < 65:
        return chr(ord(c)+r+26)
  else:
    return(c)

Heather's next thought was, can we have the computer do all the possible rotations at one time? Why yes, we certainly can! Using "for <var> in range" we can give Python instructions to try every reasonable shift value. If we are going to print out all the possibilities though, we don't want to have the results keep running down the screen ... it makes more sense to reassemble the strings:

for r in range (-13, +14):
  output = ''
  for c in cipher:
    output += (substitute(c,r))

  print(r, " ==> ", output)

-1  ==>  EFDSZQU-POMZ
0  ==>  FGETARV-QPNA
1  ==>  GHFUBSW-RQOB
.
.
.

This looks a bit ugly though, so time to apply some string formatting:

for r in range (-13, +14):
  output = ''
  for c in cipher:
    output += (substitute(c,r))

  print('{0:+3} ==> {1}'.format(r, output))

-13 ==> STRGNEI-DCAN
-12 ==> TUSHOFJ-EDBO
-11 ==> UVTIPGK-FECP
-10 ==> VWUJQHL-GFDQ
 -9 ==> WXVKRIM-HGER
 -8 ==> XYWLSJN-IHFS
 -7 ==> YZXMTKO-JIGT
 -6 ==> ZAYNULP-KJHU
 -5 ==> ABZOVMQ-LKIV
 -4 ==> BCAPWNR-MLJW
 -3 ==> CDBQXOS-NMKX
 -2 ==> DECRYPT-ONLY
 -1 ==> EFDSZQU-POMZ
 +0 ==> FGETARV-QPNA
 +1 ==> GHFUBSW-RQOB
 +2 ==> HIGVCTX-SRPC
 +3 ==> IJHWDUY-TSQD
 +4 ==> JKIXEVZ-UTRE
 +5 ==> KLJYFWA-VUSF
 +6 ==> LMKZGXB-WVTG
 +7 ==> MNLAHYC-XWUH
 +8 ==> NOMBIZD-YXVI
 +9 ==> OPNCJAE-ZYWJ
+10 ==> PQODKBF-AZXK
+11 ==> QRPELCG-BAYL
+12 ==> RSQFMDH-CBZM
+13 ==> STRGNEI-DCAN

What do you know? One of these outputs makes sense!


So far the ciphertext is written into the code; the code would be much more usable if we could provide the ciphertext on the command line. In order to do that, we made use of the argparse module. In simple terms, argparse lets us define both required and optional parameters, and then easily use them in our code:

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('cipher')
parser.add_argument('-r')
args=parser.parse_args()
cipher=args.cipher
r=args.r

Now we can run the program from a command line and see either a specific rotation (using the -r option), or all possible rotations:

/tmp/CaesarsHelper/$ caesar.py FGETARV-QPNA
-13 ==> STRGNEI-DCAN
-12 ==> TUSHOFJ-EDBO
-11 ==> UVTIPGK-FECP
-10 ==> VWUJQHL-GFDQ

And a great thing about Python is that it works equally well on Windows and Linux:


C:\CaesarsHelper\> caesar.py FGETARV-QPNA
-13 ==> STRGNEI-DCAN
-12 ==> TUSHOFJ-EDBO
-11 ==> UVTIPGK-FECP
-10 ==> VWUJQHL-GFDQ

Admittedly this is not a very sophisticated program, but it's a great start for a dad and his 11 year old daughter! If you'd like to make use of it, it is available from GitHub: