Postcode validation is a requirement that comes up in a lot of my UK-based client projects. Parsing and linting UK postcodes is ripe with edge cases.

Postcodes do change from time to time. Users can be unaware of their postcode changing or they could have submitted their postcode prior to it being decommissioned. Mail might be delivered to decommissioned postcodes and geographic boundary information can exist meaning they can still be useful.

Network requests to validate postcodes can add a lot of overhead when you're batch processing. New postcodes aren't guaranteed to be in every 3rd-party database either so there is an error rate to take into account. Services where a flat number or building name and a postcode given by the user can be used to get their full address (saving time and spelling mistakes) often charge for querying their database so it's worth minimising calls to these services.

Good is not the enemy of perfect and sometimes just knowing that a postcode fits the format of a UK postcode can be enough for certain postcode-related functionality. Linting a postcode before checking with a 3rd-party database can reduce the costs as well.

There are a lot of regex snippets and libraries for parsing UK postcodes. I took a look at a couple of them to see if they could stand up to a database of 2.5 million current and past UK postcodes.

Rob Cowie's Postcode library The first I looked at was postcode by Rob Cowie. I quickly found that two digits in the outing code (the first 3-4 characters of a UK postcode) caused the library to raise a TypeError: $ pip install -e git+https://github.com/robcowie/postcode.git#egg = postcode >>> from postcode import uk >>> uk . validate ( 's11 7ty' ) Traceback ( most recent call last ): File "<stdin>" , line 1 , in < module > File ".../postcode/uk.py" , line 82 , in validate parts [ 0 ] = parts [ 0 ][ 0 ] TypeError : 'tuple' object does not support item assignment I raised an issue and began looking around for another library.

Simon Hayward's UK Postcode Parser The next library I looked at was a fork of ukpostcodeparser by Simon Hayward. $ pip install -e git+https://github.com/simonhayward/ukpostcodeparser.git#egg = ukpostcodeparser I downloaded a list of UK postcodes and ran all of them through the parser to see if it raised any exceptions: $ curl -O http://www.doogal.co.uk/files/postcodes.zip $ unzip postcodes.zip from ukpostcodeparser import parse_uk_postcode """ The layout of postcodes.csv looks like the following: AB1 0AD,57.10056,-2.248342,385053,... AB1 0AE,57.084447,-2.255708,384600,... AB1 0AF,57.096659,-2.258103,384460,... """ for line in open ( 'postcodes.csv' ): pieces = line . strip () . split ( ',' ) try : assert len ( pieces ) > 1 , pieces except AssertionError : print 'line invalid' , line continue # Remove the space between the outward and inward codes postcode = pieces [ 0 ] . replace ( ' ' , '' ) try : _postcode = parse_uk_postcode ( postcode ) except Exception , error : print error , postcode continue if _postcode is None : print 'Invalid postcode' , postcode The CSV file contained 2,545,662 postcodes and of them 7,085 came back as invalid. $ wc -l postcodes.csv 2545662 postcodes.csv $ python check.py > results $ wc -l results 7085 results I took a sampling of the invalid postcodes to see what they looked like: $ sort --random-sort results | head Invalid postcode NPT6ZE Invalid postcode W1R5HD Invalid postcode NPT7HS Invalid postcode W1X8NJ Invalid postcode NPT8AD Invalid postcode NPT5LU Invalid postcode W1M0BN Invalid postcode W1R0DS Invalid postcode NPT1JW Invalid postcode NPT2TW The NPT outing code for Newport is no longer is use so I'm not so concerned with that one but W1R covers part of central London. I experimented with a few combinations of postcodes and found that if a letter came after any digits in the outing code then the postcode would be seen as invalid by the library even though it is valid. For example: "Golden Square, London, W1R 3AD": >>> parse_uk_postcode ( 'w1r3ad' ) Traceback ( most recent call last ): File "<stdin>" , line 1 , in < module > File ".../ukpostcodeparser/parser.py" , line 129 , in parse_uk_postcode raise ValueError ( 'Invalid postcode' ) ValueError : Invalid postcode But then I tried another postcode, this one for "216 Oxford Street, London, W1D 1LA" and it did work: >>> parse_uk_postcode ( 'W1D1LA' ) ( 'W1D' , '1LA' ) Before looking to patch the library I wanted to see if there were any other obvious solutions.