I’ve always wanted to add support for parsing exponentiation using superscript characters, instead of clunky infix operators like ‘^’ or ‘**’. If I want 2 raised to the tenth power, 2**10 just doesn’t look as good as 2¹⁰. Parsing and evaluating these exponents is simple enough with pyparsing, but first the digits need to be convertible to regular int values.

The regular digits 0 through 9 work nicely enough. Since they are contiguous and ordered in the ASCII sequence, and Python has support for them, we can write int(‘9’) and get the value 9. Here we look at all 10 numeric characters:

digits = "0123456789" for i in sorted(digits): print(repr(i), ord(i), end=' ') try: print(int(i)) except ValueError: print('-error-')

And we get a friendly predictable listing of characters, ordinal values, and converted int values:

'0' 48 0 '1' 49 1 '2' 50 2 '3' 51 3 '4' 52 4 '5' 53 5 '6' 54 6 '7' 55 7 '8' 56 8 '9' 57 9

But the superscript digits are more difficult. Using the same for loop, but with this changed line:

digits = "⁰¹²³⁴⁵⁶⁷⁸⁹"

Now we get these results:

'²' 178 -error- '³' 179 -error- '¹' 185 -error- '⁰' 8304 -error- '⁴' 8308 -error- '⁵' 8309 -error- '⁶' 8310 -error- '⁷' 8311 -error- '⁸' 8312 -error- '⁹' 8313 -error-

What the heck?! They aren’t even in sorted order! Only the 1, 2, and 3 superscripts are even in the 8-bit 0-255 range, and even they aren’t in order.

So looking ahead to when we eventually parse and gather these exponent strings, we need to roll our own int() method to convert to usable Python ints.

I suggest we build a dict for mapping each superscript to its int value:

superscript_digits = "⁰¹²³⁴⁵⁶⁷⁸⁹" super_int_map = {digit: value for value, digit in enumerate(superscript_digits)}

Enumerating over the superscript digits in order gives us (int, digit) tuples, which make it easy to convert to a dict using a dict comprehension.

Now we just need a function that will convert a string made up of one or more superscript characters, and give back an int value, for example converting ‘⁷⁸⁹‘ to 789. This code is a typical exercise in beginning programming classes, so should not be much of a surprise.

def super_to_int(superscript_str): ret = 0 # iterate over each digit in the input, get its value (from 0-9) # and add it to the current running total, after multiplying # the running total by 10 for chr in superscript_str: ret = ret * 10 + super_int_map[chr] return ret

With that in our back pocket, we can start writing a parser, knowing that when it comes time to convert superscript exponents to ints, our conversion function is all ready to go.

Pyparsing comes ready-made with a number of helpful classes and objects for building up parsers a piece at a time. So we can begin by writing the expression that will parse an integer subscript using a pyparsing Word class, which indicates that we want to parse a word group composed of one or more characters, in this case the characters are the superscript digits.

exponent = pp.Word(superscript_digits).setName("integer exponent")

The reason for naming this expression becomes clear when we run some simple tests on it, using the runTests() method that is provided on all pyparsing parse expressions:

exponent.runTests("""\ ¹²³ ¹⁰ 10 """)

Giving

¹²³ ['¹²³'] ¹⁰ ['¹⁰'] 10 ^ FAIL: Expected integer exponent (at char 0), (line:1, col:1)

Without the name, we would have gotten a more cryptic-looking message like “Expected W:(⁰¹²³…) (at char 0), (line:1, col:1)”.

We actually will want to parse and evaluate strings like “2¹⁰“, so we also need an expression for a regular integer. Integer expressions are so common, pyparsing provides a standard expression in its pyparsing_common namespace class, so we can just use that.

integer = pp.pyparsing_common.integer integer.runTests("""\ 2 42 4294967296 """)

Giving

2 [2] 42 [42] 4294967296 [4294967296]

There is a subtle difference from the previous tests – these values have already been converted to ints! There are no quotation marks around the parsed values, as you see above when we tested out our exponent expression. So when we want to perform the final exponentiation operation, the base number will already be in int form.

pyparsing does this using a parse-time callback, called a parse action. Each expression in the parser can have one or more parse actions attached to it, for the purposes of conversion, validation, data structuring, or whatever. We will use this same feature to convert our superscript exponents to ints.

Pyparsing is pretty flexible in letting you define the arguments passed to parse actions. The most common is to just pass the list of parsed tokens. For this particular expression the list is always just 1 element long, since the expression we are acting on parses just a single word of superscript digits.

def convert_parsed_exponent(tokens): return super_to_decimal(tokens[0]) exponent.addParseAction(convert_parsed_exponent)

Rerunning our previous tests now gives:

¹²³ [123] ¹⁰ [10] 10 ^ FAIL: Expected integer exponent (at char 0), (line:1, col:1)

Now the first two parse correctly, but we are still stumbling over plain old “10”.

To handle our expressions like “2¹⁰“, we define another expression, this time combining the expressions we already have. We’ll allow for our parser to handle integers with or without exponents:

raised_number = integer + pp.Optional(exponent, default=1)

We simply use the ‘+’ operator to show that these two expressions should occur one after the next. We wrap our exponent expression using the pyparsing Optional class, so that pyparsing won’t complain when parsing values that have no exponent. In the event that there is no exponent, we would still like a default value to be given. In this case, a logical default exponent if none is explicitly given is 1, since any number raised to 1 is just that same number.

Testing out our combined expression, shows that we are getting pretty close:

raised_number.runTests("""\ 2¹⁰ 10³ 2³² 10⁰ 10 """)

Gives

﻿2¹⁰ [2, 10] 10³ [10, 3] 2³² [2, 32] 10⁰ [10, 0] 10 [10, 1]

The last step will be to do the computation of the actual exponentiation operation. But now that we are used to using parse actions, a second one added to the raised_number expression does the job.

def raise_to_power(t): return t[0]**t[1] raised_number.addParseAction(raise_to_power) raised_number.runTests("""\ ﻿2¹⁰ 10³ 2³² 10⁰ 10 """)

Gives the desired results:

﻿2¹⁰ [1024] 10³ [1000] 2³² [4294967296] 10⁰ [1] 10 [10]

Note that raise_to_power() has no code in it to convert the tokens to ints. This is because the two expressions that make up a raised_number have each already converted their strings to ints, so the resulting parsed tokens are no longer just strings, but ints. Similarly, the code that performs conversion from superscript string to int has no exception handling in case of bad input. Why? Because the input has already been screened in the parser so that only valid superscript strings are sent to the parse action.

Here are some ideas for other expressions that this parser does not currently handle:

-1³ -1² (4-5)³ (1/2)³ 1/2¹⁰ 2⁻¹⁰ 6.02×10²³

Here is the full listing of code from this article:

# -*- coding: utf-8 -*- import pyparsing as pp def show_numeric_characters(digits): for i in sorted(digits): print(repr(i), ord(i), end=' ') try: print(int(i)) except ValueError: print('-error-') digits = "0123456789" superscript_digits = "⁰¹²³⁴⁵⁶⁷⁸⁹" show_numeric_characters(digits) show_numeric_characters(superscript_digits) # a pyparsing expression to parse an exponent of superscript digits exponent = pp.Word(superscript_digits).setName("integer exponent") exponent.runTests("""\ ¹²³ ¹⁰ 10 """) # pyparsing-provided expression to parse plain old integers integer = pp.pyparsing_common.integer # alternate form, which could handle a leading '-' sign # integer = pp.Regex(r'-?\d+').addParseAction(lambda t: int(t[0])) integer.runTests("""\ 2 42 4294967296 """) # function to convert superscript string to int superscript_digits = "⁰¹²³⁴⁵⁶⁷⁸⁹" super_int = dict((digit, value) for value, digit in enumerate(superscript_digits)) def super_to_decimal(superscript_str): ret = 0 for chr in superscript_str: ret = ret * 10 + super_int[chr] return ret # parse action to convert a string of superscript digits to an int def convert_parsed_exponent(tokens): return super_to_decimal(tokens[0]) exponent.addParseAction(convert_parsed_exponent) exponent.runTests("""\ ¹²³ ¹⁰ 10 """) # define an expression to parse an integer optionally raised # to an exponent power raised_number = integer + pp.Optional(exponent, default=1) # add parse action to perform exponentiation def raise_to_power(t): return t[0]**t[1] raised_number.addParseAction(raise_to_power) # take it for a spin! raised_number.runTests("""\ 2¹⁰ 10³ 2³² 10⁰ 10 """)

Pyparsing is available through PyPI using pip install. The home for pyparsing has recently moved to GitHub: https://github.com/pyparsing/pyparsing

UPDATE: The unicodedata module in Python’s standard library includes the digit() method, which will do the same job as our super_int_map , converting superscripts (and subscripts too!) to their corresponding int values. We still need to do the conversion in sup_to_int that handles multiple digit superscripts, though.