Introduction

Writing software is among the most complicated endeavors a human can undertake. Brian Kernigan, co-author of the AWK programming language and “K and R C”, sumed up the true nature of software development in the book, Software Tools, when he stated, “Controlling complexity is the essence of software development.” The harsh reality of real world software development is that software is often created with intentional, or unintentional, complexity and a disregard for maintainability, testability, and quality. The end result of this unfortunate reality is software that can become increasingly difficult and expensive to maintain and that fails sporadically and even spectacularly.

The first step in the process of writing high quality code is to re-examine the entire thought process of how an individual or team develops software. Often in failed, or troubled, software development projects, the software was developed in a reactionary stream of consciousness where the focus of the software development was on getting a problem solved in any manner possible. In a successful software project, the developer is thinking not only about how to solve the problem at hand, but additionally about the process involved in solving the problem.

A succesful software developer will devise a way to run the tests in an easily automated fashion, so they can continuously prove the software works. They are aware of the dangers of needless complexity. They are humble in their approach, seek critical review, and expect refactoring at every step of the way. They continuously think about how they can ensure their software is testable, readable, and maintainable. Although Python the language, and Python the community, are heavily influenced by desire to write clean, maintainable code that works, it is still quite easy to do the exact opposite. In this article, we will tackle this problem head on and explore how to write clean, testable, high quality code in Python.

A clean code hypothetical problem

The best way to demonstrate this style of development is to solve a hypothetical problem. Let’s suppose you are a back-end web developer at a company that allows users to generate reviews, and you need to come up with a way to show and highlight small snippets of those reviews. One way to approach the problem would be to write a large function that takes a snippet of text, and query parameters, and returns back a character limited snippet with the query parameters highlighted. All of the logic needed to solve the problem would be included in the one “mega” function, and you would simply need to keep rerunning your script, until you got the result you wanted. The format would probably look like the code example below and would often be developed with a combination of print statements, or logging statements, and an interactive shell.

Listing 1. Messy code def my_mega_function(snippet, query) """This takes a snippet of text, and a query parameter and returns """ #Logic goes here, and often runs on for several hundred lines #There are often deeply nested conditional statements and loops #Function could reach several hundred, if not thousands of lines return result Show more Show more icon

With a dynamic language like Python, Perl, or Ruby, it is easy to develop software by simply banging away at the problem, often interactively, until you get what seems to be the correct result and calling it a day. Unfortunately, this approach, while tempting, often leads to a false sense of accomplishment that is fraught with danger. Much of the danger lies in not designing a solution to be testable, and part lies in not properly controlling the complexity of the software written.

How can you say this function even works? You can have faith that it works because it worked the last time you ran it during development, but are you sure it doesn’t contain subtle errors of logic or syntax? What happens if you need to change the code? Would it still work, and how would you know it still worked? What if that code needed to be maintained by another developer, and he needed to make changes to it? How would he know his changes didn’t cause something subtle to break? How hard would it be for him to understand what the code does?

The short answer is: if you don’t have tests, you don’t know if your software works. If you stack together enough guesses, you may eventually build something that appears to function, but that no human could ever say with certainty ever worked properly. This is a bad place to be, and I have both written this software and helped debug software written this way. Fortunately, this condition is easily avoidable. Writing tests before, such as the case of Test Driven Development, or while you write your logic actually shapes the way code is written. It leads to modular, extensible code that is easy to test, understand, and maintain. It is immediately apparent to the experienced developer when software was developed with testing in mind, and when it was not. The software itself looks dramatically different to the trained eye.

Without simply taking my word for it, or visually inspecting code, there are ways to measure scientifically the difference between these two different styles. The first way is to actually measure the lines of code that are tested. Nose is a popular extension of Python’s unit test framework that includes an easy way to run automatically a batch of tests and plug-ins, such as code coverage. By measuring code coverage during development, it becomes quickly apparent that it is almost impossible to get 100 percent test coverage for code that is composed of large functions, with highly nested logic, that are built in an ad hoc manner.

The second way to measure the difference is to use static analysis tools. There are several popular Python tools that measure various metrics for Python developers, ranging from general code quality to specific metrics, like duplicate code or complexity. You can measure the cyclomatic complexity of your code with either pygenie or pymetrics (see resources on the right).

Here is an example of what it looks like when we run pygenie on “clean” code that is relatively simple:

Listing 2. Pygenie output of cyclomatic complexity % python pygenie.py complexity ‑‑verbose highlight spy File: /Users/ngift/Documents/src/highlight.py Type Name Complexity ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ M HighlightDocumentOperations.createsnippit 3 M HighlightDocumentOperations._reconstruct_document_string 3 M HighlightDocumentOperations._doc_to_sentences 2 M HighlightDocumentOperations._querystring_to_dict 2 M HighlightDocumentOperations._word_frequency_sort 2 M HighlightDocumentOperations.highlight_doc 2 X /Users/ngift/Documents/src/highlight.py 1 C HighlightDocumentOperations 1 M HighlightDocumentOperations.__init 1 M HighlightDocumentOperations._custom_highlight_tag 1 M HighlightDocumentOperations._score_sentences 1 M HighlightDocumentOperations._multiple_string_replace 1 Show more Show more icon

What is cyclomatic complexity? Cyclomatic complexity is a software metric, developed by Thomas J. McCabe in 1976, to determine a program’s complexity. The metric measures the number of linearly independent paths, or branches, through source code. According to McCabe, it is best to keep the complexity of a method below 10. This is important because research into human memory has determined that 7 (plus or minus 2) is the magical number of items that a human can hold in short term memory.



If a developer is working on code that has 50 linearly independent paths, then they are roughly exceeding fives times the capacity of short term memory in keeping track of what is occurring in that method. Simpler methods that don’t tax all of a human’s short term memory are easier to work with and have been proven to be less error prone. A 2008 study by Enerjy found a strong correlation between cyclomatic complexity and faultiness. Classes that had a complexity of 11 had a probability of being fault-prone of 0.28 but rose to 0.98 with classes of a complexity of 74.

As you can tell from the example, every method is extremely simple and contains a complexity rating under 10, which is desirable according to McCabe’s research. In my experiences, I have seen “mega” functions written without testing that had complexity ratings over 140 and have stretched over 1200 lines. Suffice to say, it is literally impossible to test code like this. There is actually no way to ever know it works and refactoring it is impossible. If the author of the code kept testing in mind, and wrote the same logic with 100 percent test coverage, it is highly unlikely it would have such a high complexity rating.

A clean code hypothetical solution

Let’s now take a look at a complete source code example with accompanying unit tests and functional tests and see what it actually does, and why this code is considered clean. One reasonable definition of clean, using strictly metrics, is that it fulfills the following requirements: it has close to 100 percent test coverage; it has a cyclomatic complexity rating of under 10 for all classes and methods; and it scores close to a 10.0 rating with pylint. Here is an example of using nose to test unit test and doctest coverage on the highlight module:

Listing 3. Running nosetests with coverage reporting: 100 percent coverage % nosetests ‑v ‑‑with‑coverage ‑‑cover‑package=highlight ‑‑with‑doctest\ ‑‑cover‑erase ‑‑exe Doctest: highlight.HighlightDocumentOperations._custom_highlight_tag ... ok test_functional.test_snippit_algorithm ... ok test_custom_highlight_tag (test_highlight.TestHighlight) ... ok Consumes the generator, and then verifies the result[0] ... ok Verifies highlighted text is what we expect ... ok test_multi_string_replace (test_highlight.TestHighlight) ... ok Verifies the yielded results are what is expected ... ok Name Stmts Exec Cover Missing ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ highlight 71 71 100% ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ Ran 7 tests in 4.223s OK Show more Show more icon

As you can see from the above snippet, the nosetests command was run with several options, and there was 100 percent test coverage for the highlight spy script. The only thing of real note to point out is that --cover-package=highlight is a way of telling nose to show only the coverage report on a specified module. This is very useful to isolate the output of a coverage report to the module or packages you want to observe coverage reporting on. One thing you may want to try is to download the source code from this article and to comment out some of the tests to see how the coverage reporting mechanism really works.

Listing 4. highlight spy #/usr/bin/python #‑∗‑ coding: utf‑8 ‑∗‑ """ :mod:highlight ‑‑ Highlight Methods =================================== .. module:: highlight :platform: Unix, Windows :synopsis: highlight document snippets that match a query. .. moduleauthor:: Noah Gift Requirements:: 1. You will need to install the ntlk library to run this code. http://www.nltk.org/download 2. You will need to download the data for the ntlk: See http://www.nltk.org/data:: import nltk nltk.download() """ import re import logging import nltk #Globals logging.basicConfig() LOG = logging.getLogger("highlight") LOG.setLevel(logging.INFO) class HighlightDocumentOperations(object): """Highlight Operations for a Document""" def init(self, document=None, query=None): """ Kwargs: document (str): query (str): """ self._document = document self._query = query @staticmethod def _custom_highlight_tag(phrase, start="<strong>", end="</strong>"): """Injects an open and close highlight tag after a word Args: phrase (str) ‑ A word or phrase. Kwargs: start (str) ‑ An opening tag. Defaults to <strong> end (str) ‑ A closing tag. Defaults to </strong> Returns: (str) word or phrase with custom opening and closing tags >>> h = HighlightDocumentOperations() >>> h._custom_highlight_tag("foo") 'foo' >>> """ tagged_phrase = "{0}{1}{2}".format(start, phrase, end) return tagged_phrase def _doc_to_sentences(self): """Takes a string document and converts it into a list of sentences Unfortunately, this approach might be a tad naive for production because some segments that are split on a period are really an abbreviation, and to make things even more complicated, an abbreviation can also be the end of a sentence:: http://nltk.googlecode.com/svn/trunk/doc/book/ch03.html Returns: (generator) A generator object of a tokenized sentence tuple, with the list position of sentence as the first portion of the tuple, such as: (0, "This was the first sentence") """ tokenizer = nltk.data.load('tokenizers/punkt/english.pickle') sentences = tokenizer.tokenize(self._document) for sentence in enumerate(sentences): yield sentence @staticmethod def _score_sentences(sentence, querydict): """Creates a scoring system for each sentence by substitution analysis Tokenizes each sentence, counts characters in sentence, and pass it back as nested tuple Returns: (tuple) ‑ (score (int), (count (int), position (int), raw sentence (str)) """ position, sentence = sentence count = len(sentence) regex = re.compile('|'.join(map(re.escape, querydict))) score = len(re.findall(regex, sentence)) processed_score = (score, (count, position, sentence)) return processed_score def _querystring_to_dict(self, split_token="+"): """Converts query parameters into a dictionary Returns: (dict)‑ dparams, a dictionary of query parameters """ params = self._query.split(split_token) dparams = dict([(key, self._custom_highlight_tag(key)) for\ key in params]) return dparams @staticmethod def _word_frequency_sort(sentences): """Sorts sentences by score frequency, yields sorted result This will yield the highest score count items first. Args: sentences (list) ‑ a nested tuple inside of list (0, (90, 3, "The crust/dough was just way too effin' dry for me. Yes, I know what 'cornmeal' is, thanks.")) """ sentences.sort() while sentences: yield sentences.pop() def _create_snippit(self, sentences, max_characters=175): """Creates a snippet from a sentence while keeping it under max_chars Returns a sorted list with max characters. The sort is an attempt to rebuild the original document structure as close as possible, with the new sorting by scoring and the limitation of max_chars. Args: sentences (generator) ‑ sorted object to turn into a snippit max_characters (int) ‑ optional max characters of snippit Returns: snippit (list) ‑ returns a sorted list with a nested tuple that has the first index holding the original position of the list:: (0, (90, 3, "The crust/dough was just way too effin' dry for me. Yes, I know what 'cornmeal' is, thanks.")) """ snippit = total = 0 for sentence in self._word_frequency_sort(sentences): LOG.debug("Creating snippit", sentence) score, (count, position, raw_sentence) = sentence total += count if total < max_characters: #position now gets converted to index 0 for sorting later snippit.append(((position), score, count, raw_sentence)) #try to reassemble document by original order by doing a simple sort snippit.sort() return snippit @staticmethod def _multiple_string_replace(string_to_replace, dict_patterns): """Performs a multiple replace in a string with dict pattern. Borrowed from Python Cookbook. Args: string_to_replace (str) ‑ String to be multi‑replaced dict_patterns (dict) ‑ A dict full of patterns Returns: (str) ‑ Multiple replaced string. """ regex = re.compile('|'.join(map(re.escape, dict_patterns))) def one_xlat(match): """Closure that is called repeatedly during multi‑substitution. Args: match (SRE_Match object) Returns: partial string substitution (str) """ return dict_patternsmatch.group(0) return regex.sub(one_xlat, string_to_replace) def _reconstruct_document_string(self, snippit, querydict): """Reconstructs string snippit, build tags, and return string A helper function for highlight_doc. Args: string_to_replace (list) ‑ A list of nested tuples, containing this pattern:: (0, (90, 3, "The crust/dough was just way too effin' dry for me. Yes, I know what 'cornmeal' is, thanks.")) dict_patterns (dict) ‑ A dict full of patterns Returns: (str) The most relevant snippet with the query terms highlighted. """ snip = for entry in snippit: score = entry1 sent = entry3 #if we have matches, now do the multi‑replace if score: sent = self._multiple_string_replace(sent, querydict) snip.append(sent) highlighted_snip = " ".join(snip) return highlighted_snip def highlight_doc(self): """Finds the most relevant snippit with the query terms highlighted Returns: (str) The most relevant snippet with the query terms highlighted. """ #tokenize to sentences, and convert query to a dict sentences = self._doc_to_sentences() querydict = self._querystring_to_dict() #process and score sentences scored_sentences = for sentence in sentences: scored = self._score_sentences(sentence, querydict) scored_sentences.append(scored) #fit into max characters, and sort by original position snippit = self._create_snippit(scored_sentences) #assemble back into string highlighted_snip = self._reconstruct_document_string(snippit, querydict) return highlighted_snip Show more Show more icon

Listing 5. testhighlight.py #/usr/bin/python #‑∗‑ coding: utf‑8 ‑∗‑ """ Tests this query searches a document, highlights a snippit and returns it http://www.example.com/search?finddesc=deep+dish+pizza&ns=1&rpp=10&findloc=\ San+Francisco%2C+CA Contains both unit and functional tests. """ import unittest from highlight import HighlightDocumentOperations class TestHighlight(unittest.TestCase): def setUp(self): self.document = """ Review for their take‑out only. Tried their large Classic (sausage, mushroom, peppers and onions) deep dish;\ and their large Pesto Chicken thin crust pizzas. Pizza = I've had better. The crust/dough was just way too effin' dry for me.\ Yes, I know what 'cornmeal' is, thanks. But it's way too dry.\ I'm not talking about the bottom of the pizza...I'm talking about the dough \ that's in between the sauce and bottom of the pie...it was like cardboard, sorry! Wings = spicy and good. Bleu cheese dressing only...hmmm, but no alternative\ of ranch dressing, at all. Service = friendly enough at the counters. Decor = freakin' dark. I'm not sure how people can see their food. Parking = a real pain. Good luck. """ self.query = "deep+dish+pizza" self.hdo = HighlightDocumentOperations(self.document, self.query) def testcustom_highlight_tag(self): actual = self.hdo._custom_highlight_tag("foo", start="[BAR]", end="[ENDBAR]") expected = "[BAR]foo[ENDBAR]" self.assertEqual(actual,expected) def test_query_string_to_dict(self): """Verifies the yielded results are what is expected""" result = self.hdo._querystring_to_dict() expected = {"deep": "deep", "dish": "dish", "pizza":"pizza"} self.assertEqual(result,expected) def test_multi_string_replace(self): query = """pizza = I've had better""" expected = """pizza = I've had better""" query_dict = self.hdo._querystring_to_dict() result = self.hdo._multiple_string_replace(query, query_dict) self.assertEqual(expected, result) def test_doc_to_sentences(self): """Consumes the generator, and then verifies the result[0]""" results = expected = (0,'

Review for their take‑out only.') for sentence in self.hdo._doc_to_sentences(): results.append(sentence) self.assertEqual(results[0], expected) def test_highlight(self): """Verifies highlighted text is what we expect""" expected = """Tried their large Classic (sausage, mushroom, peppers and onions)\ deepdish;and their large Pesto Chicken thin crust \ pizzas.""" actual = self.hdo.highlight_doc() self.assertEqual(expected, actual) def tearDown(self): del self.query del self.hdo del self.document if __name == '__main': unittest.main() Show more Show more icon

Listing 6. testfunctionalhighlight.py """Functional Test That Performs Some Basic Sanity Checks""" from highlight import HighlightDocumentOperations def testsnippitalgorithm(): document1 = """ This place has awesome deep dish pizza. I have been getting delivery through Waiters on wheels for years. It is classic, deep dish Chicago style pizza. Now I found out they also have half‑baked to pick‑up and cook at home. This is a great benefit. I am having it tonight. Yum. """ document2 = """Review for their take‑out only. Tried their large Classic (sausage, mushroom, peppers and onions) deep dish;\ and their large Pesto Chicken thin crust pizzas. Pizza = I've had better. The crust/dough was just way too effin' dry for me.\ Yes, I know what 'cornmeal' is, thanks. But it's way too dry.\ I'm not talking about the bottom of the pizza...I'm talking about the dough \ that's in between the sauce and bottom of the pie...it was like cardboard, sorry! Wings = spicy and good. Bleu cheese dressing only...hmmm, but no alternative\ of ranch dressing, at all. Service = friendly enough at the counters. Decor = freakin' dark. I'm not sure how people can see their food. Parking = a real pain. Good luck.""" h1 = HighlightDocumentOperations(document1, "deep+dish+pizza") actual = h1.highlight_doc() print "Raw Document1: %s" % document1 print " Formatted Document1: %s" % actual assert len(actual) < 500 assert "<strong>" in actual h2 = HighlightDocumentOperations(document2, "deep+dish+pizza") actual = h2.highlight_doc() print "Raw Document2: %s" % document2 print " Formatted Document2: %s" % actual assert len(actual) < 500 assert "<strong>" in actual if __name == "__main": test_snippit_algorithm() Show more Show more icon

Concerning the above code sample, if you would like to run it, you will need to download the Natural Language Toolkit source and download the nltk data according to the instructions. Since this article is not about the code sample shown but about how it was created, and how to test it, I won’t go into any detail explaining what the code actually does. Instead, let’s finish up by running the static code analysis tool pylint on our source code:

Listing 7. Pylint % pylint highlight spy No config file found, using default configuration ∗∗∗∗∗∗∗∗∗∗∗∗∗ Module highlight E: 89:HighlightDocumentOperations._doc_to_sentences: Instance of 'unicode' has no 'tokenize' member (but some types could not be inferred) E: 89:HighlightDocumentOperations._doc_to_sentences: Instance of 'ContextFreeGrammar' has no 'tokenize' member (but some types could not be inferred) W:108:HighlightDocumentOperations._score_sentences: Used builtin function 'map' W:192:HighlightDocumentOperations._multiple_string_replace: Used builtin function 'map' R: 34:HighlightDocumentOperations: Too few public methods (1/2) Report ====== 69 statements analysed. Global evaluation ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ Your code has been rated at 8.12/10 (previous run: 8.12/10) Show more Show more icon

The code scored an 8.12 out of 10 and was nicked down for a few items. Pylint is configurable, so it is very likely that you may need to configure it to meet your needs on your project. You can refer to the official pylint document (see resources on the right). For this specific example, there are two errors on line 89 that can be attributed to the external library nltk, and there are two warnings that could be changed by a configuration change to pylint. In general, you will never want to allow pylint errors in your source code, but there are some times, such as in the example above, that you may need to make an executive decision. It isn’t a perfect tool, but I have found it to be very useful in the real world.

Conclusion

In this article, we explored how merely thinking about testing influences the structure of software, and how a lack of thought toward testing can prove fatally harmful to a project. We showed a complete code example, that included both functional and unit tests, and ran it against both code coverage analysis with nose and two static analysis tools, pylint, and pygenie. One thing we didn’t have time to cover was how to automate this with some form of continuous integration testing. Fortunately, this is quite simple with the open source Java™ Continuous Integration System, Hudson. I would encourage you to consult the Hudson documentation (see resources on the right) and experiment with setting up an automated tests for your project that runs all of your tests, including static code analysis.

Finally, testing isn’t a panacea, nor are static analysis tools. Software development is hard work. To get the chance even to be successful, we have to always be mindful of the real goal. It is not only to solve a problem, but also to create something we can prove works. If you agree with this premise, then this means that overly complex code, arrogance in design, and lack of respect for the power of Python, directly interfere with this goal.

Thanks to Kennedy Behrman, of Imagemovers Digital, for the technical review of this article.