A Text class

We'll build up the tools we need to crack Monome-Dinome ciphers in pieces, from the bottom up. First, we'll write a program to encrypt plaintext using the Monome-Dinome cipher. Then we'll write a program to decrypt ciphertext.

I've found that I almost always, when exploring a new ciphertype, start by creating encrypting and decrypting routines. This lets me test whether I correctly understand how the cipher works. It also lets me, by encrypting random selections of text, create an unlimited selection of cryptograms using that type to practice with.

And then, having finished the encrypting and decrypting routines, we will continue on with the tools we need to identify the row digits, and then to convert the ciphertext into a form that can be easily cracked.

Our first object First, we'll finally get down to some programming. To begin, we'll need some plaintext. How about this? “ As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs. ” (This is by Maurice Wilkes, the designer of EDSAC, in 1949) Start up your python interpreter, and assign the plaintext to the variable plaintext :

Python 2.4.3 (#1, Jan 21 2009, 01:10:13)

[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> plaintext = '''As soon as we started programming, we found to our surprise that it wasn't

as easy to get programs right as we had thought. Debugging had to be

discovered. I can remember the exact instant when I realized that a large

part of my life from then on was going to be spent in finding mistakes in

my own programs.'''

>>> print plaintext

As soon as we started programming, we found to our surprise that it wasn't

as easy to get programs right as we had thought. Debugging had to be

discovered. I can remember the exact instant when I realized that a large

part of my life from then on was going to be spent in finding mistakes in

my own programs.

>>>

Like most programming languages, Python uses quotes to surround strings. Unlike most, Python provides several types of quotes that act in different ways. Python uses triple quotes ( """ or ''' ) to surround strings that contain embedded newline characters. So far, we have our plaintext stored in a string object named plaintext . We can manipulate that string in various ways, using Python's string functions. We could, for example, convert the string to upper case, or determine its length:

>>> plaintext.upper()

AS SOON AS WE STARTED PROGRAMMING, WE FOUND TO OUR SURPRISE THAT IT WASN'T

AS EASY TO GET PROGRAMS RIGHT AS WE HAD THOUGHT. DEBUGGING HAD TO BE

DISCOVERED. I CAN REMEMBER THE EXACT INSTANT WHEN I REALIZED THAT A LARGE

PART OF MY LIFE FROM THEN ON WAS GOING TO BE SPENT IN FINDING MISTAKES IN

MY OWN PROGRAMS.

>>> len(plaintext)

309

>>>

This is progress, of a sort, but only of a sort. Our plaintext isn't really a string, it's a sample of plaintext. There are certainly many circumstances in which we will treat it like it was a string, but there are going to be things we want to do to it that we will not want to do to just any string, and we'll want a place to keep the code that does those things. So we'll define a Text class, and store our plaintext in that. We're calling this Text , rather than Plaintext , because at this point, we have no reason to distinguish between an object that contains plaintext and an object that contains ciphertext. We may find a need to make such a distinction, as we go on, but it's generally a mistake to add complexity earlier than is needed. All to often, it turns out not to be needed. So, our first class:

>>> class Text:

... def __init__(self, text):

... self.text = text

...

>>> pt = Text(plaintext)

>>> pt

<__main__.Text instance at 0xb7f044cc>

>>> print pt.text

As soon as we started programming, we found to our surprise that it wasn't

as easy to get programs right as we had thought. Debugging had to be

discovered. I can remember the exact instant when I realized that a large

part of my life from then on was going to be spent in finding mistakes in

my own programs.

>>>

A class is a container that contains members, which can be either data or functions that operate on that data. In this case, our class Text contains one element of member data, text , and one member function, __init__() . __init__() is a special function in Python that serves to initialize an object, so when you construct a new object of class Text, by calling pt = Text(plaintext) , it is the __init__() function in the Text class that is being called. You can construct multiple Text objects, with different names, and different member data.

>>> pt2 = Text("This is another text")

>>> pt2.text

'This is another text'

>>> print pt.text

As soon as we started programming, we found to our surprise that it wasn't

as easy to get programs right as we had thought. Debugging had to be

discovered. I can remember the exact instant when I realized that a large

part of my life from then on was going to be spent in finding mistakes in

my own programs.

>>>

Next, exit the Python interpreter, then re-enter it. Try to access the objects and the classes you created.

Python 2.4.3 (#1, Jan 21 2009, 01:10:13)

[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> pt

Traceback (most recent call last):

File "<stdin>", line 1, in ?

NameError: name 'pt' is not defined

>>> pt = Text("This is a test")

Traceback (most recent call last):

File "<stdin>", line 1, in ?

NameError: name 'Text' is not defined

>>>

We've revealed what is both the greatest strength and the greatest weakness of writing code directly into the interpreter: everything we do is lost when the interpreter closes. This is exactly what we want, when we're just playing around, exploring ideas. It's exactly what we don't want when we're writing code we want to reuse. If we want to write a class that we will be able to reuse, we want to create a module.

Our first module A module is a text file containing Python code. Most of the code in a module will be class and/or function definitions, but it is perfectly legal to put executable statements in a module, and we will see later that there are good reasons for doing so. To create a module containing our class, open up your favorite text editor, and enter the following code: class Text: def __init__(self, text): self.text = text Save it to the file crypto.py in the diectory from which you've been running the Python interpreter. Start the interpreter, then load your new module with the command from crypto import * . This will load all of the definitions in the crypto.py file into the interpreter. You should now be able to create a Text object just as you did before.

>>> from crypto import *

>>> pt = Text("This is a test")

>>> print pt.text

'This is a test'

>>>

There is an alternative way of loading a module, using the command import crypto . This differs from the former in that it loads everything in crypto.py into the crypto namespace. This then requires that any reference to classes or functions defined in the file to be prefixed by the namespace. This is the preferred method of loading a module because it avoids conflicts that might be caused when different modules declare functions with the same name.

>>> import crypto

>>> pt = crypto.Text("This is another test")

>>> print pt.text

'This is another test'

>>>



Loading from a file We no longer need to type in our class definition everytime we start the interpreter, but we still need to type in our plaintext. It would be convenient if we could store our plaintext in a file, and load it into a Text object by passing it a filename. To make these changes, edit the crypto.py so that it looks like this: class Text: def __init__(self, filename): self.load(filename) def load(self, filename): fp = open(filename, "r") self.text = fp.read() fp.close def __str__(self): return self.text Note that we no longer expect the plaintext to be passed when we construct the object, instead we initialize self.text to an empty string. We've added a open() method, that reads a text file and stores the contents in self.text . We've also added a __str__() method, which is a special function that is called when the interpreter needs to convert the object to a string. It's called, for example, when you type print pt in the interpreter. Create a text file containing the plaintext, and name it plaintext.txt . Then reload the crypto module with reload(crypto) , and try out your new module.

>>> reload(crypto)

<module 'crypto' from 'crypto.py'>

>>> pt = crypto.Text('plaintext.txt')

>>> print pt

As soon as we started programming, we found to our surprise that it wasn't

as easy to get programs right as we had thought. Debugging had to be

discovered. I can remember the exact instant when I realized that a large

part of my life from then on was going to be spent in finding mistakes in

my own programs.

>>>



Pre-processing the input You'll notice that our plaintext file contains many characters that we will not be encrypting. Spaces, newlines, punctuation, all of these will be thrown away. Similarly, all of the lower-case letters will be converted to upper-case. We'll add a function that will do this conversion, after the plaintext has been loaded from the file. As we saw earlier, Python string objects have a built-in upper() that will return the string with all letters converted to upper-case. They also have a function that checks to see if a character is alpha. Using both of these, our class becomes this: class Text: def __init__(self, filename): self.load(filename) def load(self, filename): fp = open(filename, "r") self.rawtext = fp.read() fp.close self.text = self.convert(self.rawtext) def convert(self, txt): rval = "" for c in txt.upper(): if c.isapha(): rval += c return rval def __str__(self): return self.text Note that we are now storing the unmodified text in self.rawtext , and putting the converted text in self.text . Our method __str__()() now returns only the upper-case alphabetic characters.

>>> reload(crypto)

<module 'crypto' from 'crypto.py'>

>>> pt = crypto.Text('plaintext.txt')

>>> print pt

ASSOONASWESTARTEDPROGRAMMINGWEFOUNDTOOURSURPRISETHATITWASNTASEASYTOGETPROGRAMSRIGHTASWEHADTHOUGHTDEBUGGINGHADTOBEDISCOVEREDICANREMEMBERTHEEXACTINSTANTWHENIREALIZEDTHATALARGEPARTOFMYLIFEFROMTHENONWASGOINGTOBESPENTINFINDINGMISTAKESINMYOWNPROGRAMS

>>>

That's starting to look a bit awkward, with that long, unformatted string. Let's change our __str__() to do formatted output, with five character groups, 12 groups to a line, as is common with cryptographic text. def __str__(self): rval = "" pos = 0 for c in self.text: rval += c pos += 1 if pos % 60 == 0: rval += '

' elif pos % 5 == 0: rval += " " return rval

>>> pt = crypto.Text('plaintext.txt')

>>> reload(crypto)

<module 'crypto' from 'crypto.pyc'>

>>> pt = crypto.Text('plaintext.txt')

>>> print pt

ASSOO NASWE START EDPRO GRAMM INGWE FOUND TOOUR SURPR ISETH ATITW ASNTA

SEASY TOGET PROGR AMSRI GHTAS WEHAD THOUG HTDEB UGGIN GHADT OBEDI SCOVE

REDIC ANREM EMBER THEEX ACTIN STANT WHENI REALI ZEDTH ATALA RGEPA RTOFM

YLIFE FROMT HENON WASGO INGTO BESPE NTINF INDIN GMIST AKESI NMYOW NPROG

RAMS

>>>

