Now let’s add all kinds of weird expressions to show the power and flexibility of the algorithm. First, let’s add a high-priority, right associative function composition operator: . :

1 2 3 4 5 6 7 8 fn infix_binding_power ( op : char ) -> ( u8 , u8 ) { match op { '+' | '-' => ( 1 , 2 ), '*' | '/' => ( 3 , 4 ), '.' => ( 6 , 5 ), _ => panic! ( "bad op: {:?}" ), } }

Yup, it’s a single line! Note how the left side of the operator binds tighter, which gives us desired right associativity:

1 2 3 4 5 let s = expr ( "f . g . h" ); assert_eq! ( s .to_string (), "(. f (. g h))" ); let s = expr ( " 1 + 2 + f . g . h * 3 * 4" ); assert_eq! ( s .to_string (), "(+ (+ 1 2) (* (* (. f (. g h)) 3) 4))" );

Now, let’s add unary - , which binds tighter than binary arithmetic operators, but less tight than composition. This requires changes to how we start our loop, as we no longer can assume that the first token is an atom, and need to handle minus as well. But let the types drive us. First, we start with binding powers. As this is an unary operator, it really only have right binding power, so, ahem, let’s just code this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 fn prefix_binding_power ( op : char ) -> ((), u8 ) { (1) match op { '+' | '-' => ((), 5 ), _ => panic! ( "bad op: {:?}" , op ), } } fn infix_binding_power ( op : char ) -> ( u8 , u8 ) { match op { '+' | '-' => ( 1 , 2 ), '*' | '/' => ( 3 , 4 ), '.' => ( 8 , 7 ), (2) _ => panic! ( "bad op: {:?}" ), } }

1 Here, we return a dummy () to make it clear that this is a prefix, and not a postfix operator, and thus can only bind things to the right. 2 Note, as we want to add unary - between . and * , we need to shift priorities of . by two. The general rule is that we use an odd priority as base, and bump it by one for associativity, if the operator is binary. For unary minus it doesn’t matter and we could have used either 5 or 6 , but sticking to odd is more consistent.

Plugging this into expr_bp , we get:

1 2 3 4 5 6 7 8 9 10 11 fn expr_bp ( lexer : & mut Lexer , min_bp : u8 ) -> S { let mut lhs = match lexer .next () { Token :: Atom ( it ) => S :: Atom ( it ), Token :: Op ( op ) => { let ((), r_bp ) = prefix_binding_power ( op ); todo! () } t => panic! ( "bad token: {:?}" , t ), }; ... }

Now, we only have r_bp and not l_bp , so let’s just copy-paste half of the code from the main loop? Remember, we use r_bp for recursive calls.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 fn expr_bp ( lexer : & mut Lexer , min_bp : u8 ) -> S { let mut lhs = match lexer .next () { Token :: Atom ( it ) => S :: Atom ( it ), Token :: Op ( op ) => { let ((), r_bp ) = prefix_binding_power ( op ); let rhs = expr_bp ( lexer , r_bp ); S :: Cons ( op , vec! [ rhs ]) } t => panic! ( "bad token: {:?}" , t ), }; loop { let op = match lexer .peek () { Token :: Eof => break , Token :: Op ( op ) => op , t => panic! ( "bad token: {:?}" , t ), }; let ( l_bp , r_bp ) = infix_binding_power ( op ); if l_bp < min_bp { break ; } lexer .next (); let rhs = expr_bp ( lexer , r_bp ); lhs = S :: Cons ( op , vec! [ lhs , rhs ]); } lhs } #[test] fn tests () { ... let s = expr ( "--1 * 2" ); assert_eq! ( s .to_string (), "(* (- (- 1)) 2)" ); let s = expr ( "--f . g" ); assert_eq! ( s .to_string (), "(- (- (. f g)))" ); }

Amusingly, this purely mechanical, type-driven transformation works. You can also reason why it works, of course. The same argument applies; after we’ve consumed a prefix operator, the operand consists of operators that bind tighter, and we just so conveniently happen to have a function which can parse expressions tighter than the specified power.

Ok, this is getting stupid. If using ((), u8) “just worked” for prefix operators, can (u8, ()) deal with postfix ones? Well, let’s add ! for factorials. It should bind tighter than - , because -92! is obviously more useful than -92! . So, the familar drill — new priority function, shifting priority of . (this bit is annoying in Pratt parsers), copy-pasting the code…​

1 2 3 4 5 6 7 8 9 let ( l_bp , ()) = postfix_binding_power ( op ); if l_bp < min_bp { break ; } let ( l_bp , r_bp ) = infix_binding_power ( op ); if l_bp < min_bp { break ; }

Wait, something’s wrong here. After we’ve parsed the prefix expression, we can see either a postfix or an infix operator. But we bail on unrecognized operators, which is not going to work…​ So, let’s make postfix_binding_power to return an option, for the case where the operator is not postfix:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 fn expr_bp ( lexer : & mut Lexer , min_bp : u8 ) -> S { let mut lhs = match lexer .next () { Token :: Atom ( it ) => S :: Atom ( it ), Token :: Op ( op ) => { let ((), r_bp ) = prefix_binding_power ( op ); let rhs = expr_bp ( lexer , r_bp ); S :: Cons ( op , vec! [ rhs ]) } t => panic! ( "bad token: {:?}" , t ), }; loop { let op = match lexer .peek () { Token :: Eof => break , Token :: Op ( op ) => op , t => panic! ( "bad token: {:?}" , t ), }; if let Some (( l_bp , ())) = postfix_binding_power ( op ) { if l_bp < min_bp { break ; } lexer .next (); lhs = S :: Cons ( op , vec! [ lhs ]); continue ; } let ( l_bp , r_bp ) = infix_binding_power ( op ); if l_bp < min_bp { break ; } lexer .next (); let rhs = expr_bp ( lexer , r_bp ); lhs = S :: Cons ( op , vec! [ lhs , rhs ]); } lhs } fn prefix_binding_power ( op : char ) -> ((), u8 ) { match op { '+' | '-' => ((), 5 ), _ => panic! ( "bad op: {:?}" , op ), } } fn postfix_binding_power ( op : char ) -> Option < ( u8 , ()) > { let res = match op { '!' => ( 7 , ()), _ => return None , }; Some ( res ) } fn infix_binding_power ( op : char ) -> ( u8 , u8 ) { match op { '+' | '-' => ( 1 , 2 ), '*' | '/' => ( 3 , 4 ), '.' => ( 10 , 9 ), _ => panic! ( "bad op: {:?}" ), } } #[test] fn tests () { let s = expr ( "-9!" ); assert_eq! ( s .to_string (), "(- (! 9))" ); let s = expr ( "f . g !" ); assert_eq! ( s .to_string (), "(! (. f g))" ); }

Amusingly, both the old and the new tests pass.

Now, we are ready to add a new kind of expression: parenthesised expression. It is actually not that hard, and we could have done it from the start, but it makes sense to handle this here, you’ll see in a moment why. Parens are just a primary expressions, and are handled similar to atoms:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 let mut lhs = match lexer .next () { Token :: Atom ( it ) => S :: Atom ( it ), Token :: Op ( '(' ) => { let lhs = expr_bp ( lexer , 0 ); assert_eq! ( lexer .next (), Token :: Op ( ')' )); lhs } Token :: Op ( op ) => { let ((), r_bp ) = prefix_binding_power ( op ); let rhs = expr_bp ( lexer , r_bp ); S :: Cons ( op , vec! [ rhs ]) } t => panic! ( "bad token: {:?}" , t ), };

Unfortunately, the following test fails:

1 2 let s = expr ( "(((0)))" ); assert_eq! ( s .to_string (), "0" );

The panic comes from the loop below — the only termination condition we have is reaching eof, and ) is definitely not eof. The easiest way to fix that is to change infix_binding_power to return None on unrecognized operands. That way, it’ll become similar to postfix_binding_power again!

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 fn expr_bp ( lexer : & mut Lexer , min_bp : u8 ) -> S { let mut lhs = match lexer .next () { Token :: Atom ( it ) => S :: Atom ( it ), Token :: Op ( '(' ) => { let lhs = expr_bp ( lexer , 0 ); assert_eq! ( lexer .next (), Token :: Op ( ')' )); lhs } Token :: Op ( op ) => { let ((), r_bp ) = prefix_binding_power ( op ); let rhs = expr_bp ( lexer , r_bp ); S :: Cons ( op , vec! [ rhs ]) } t => panic! ( "bad token: {:?}" , t ), }; loop { let op = match lexer .peek () { Token :: Eof => break , Token :: Op ( op ) => op , t => panic! ( "bad token: {:?}" , t ), }; if let Some (( l_bp , ())) = postfix_binding_power ( op ) { if l_bp < min_bp { break ; } lexer .next (); lhs = S :: Cons ( op , vec! [ lhs ]); continue ; } if let Some (( l_bp , r_bp )) = infix_binding_power ( op ) { if l_bp < min_bp { break ; } lexer .next (); let rhs = expr_bp ( lexer , r_bp ); lhs = S :: Cons ( op , vec! [ lhs , rhs ]); continue ; } break ; } lhs } fn prefix_binding_power ( op : char ) -> ((), u8 ) { match op { '+' | '-' => ((), 5 ), _ => panic! ( "bad op: {:?}" , op ), } } fn postfix_binding_power ( op : char ) -> Option < ( u8 , ()) > { let res = match op { '!' => ( 7 , ()), _ => return None , }; Some ( res ) } fn infix_binding_power ( op : char ) -> Option < ( u8 , u8 ) > { let res = match op { '+' | '-' => ( 1 , 2 ), '*' | '/' => ( 3 , 4 ), '.' => ( 10 , 9 ), _ => return None , }; Some ( res ) }

And now let’s add array indexing operator: a[i] . What kind of -fix is it? Around-fix? If it were just a[] , it would clearly be postfix. if it were just [i] , it would work exactly like parens. And it is the key: the i part doesn’t really participate in the whole power game, as it is unambiguously delimited. So, let’s do this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 fn expr_bp ( lexer : & mut Lexer , min_bp : u8 ) -> S { let mut lhs = match lexer .next () { Token :: Atom ( it ) => S :: Atom ( it ), Token :: Op ( '(' ) => { let lhs = expr_bp ( lexer , 0 ); assert_eq! ( lexer .next (), Token :: Op ( ')' )); lhs } Token :: Op ( op ) => { let ((), r_bp ) = prefix_binding_power ( op ); let rhs = expr_bp ( lexer , r_bp ); S :: Cons ( op , vec! [ rhs ]) } t => panic! ( "bad token: {:?}" , t ), }; loop { let op = match lexer .peek () { Token :: Eof => break , Token :: Op ( op ) => op , t => panic! ( "bad token: {:?}" , t ), }; if let Some (( l_bp , ())) = postfix_binding_power ( op ) { if l_bp < min_bp { break ; } lexer .next (); lhs = if op == '[' { let rhs = expr_bp ( lexer , 0 ); assert_eq! ( lexer .next (), Token :: Op ( ']' )); S :: Cons ( op , vec! [ lhs , rhs ]) } else { S :: Cons ( op , vec! [ lhs ]) }; continue ; } if let Some (( l_bp , r_bp )) = infix_binding_power ( op ) { if l_bp < min_bp { break ; } lexer .next (); let rhs = expr_bp ( lexer , r_bp ); lhs = S :: Cons ( op , vec! [ lhs , rhs ]); continue ; } break ; } lhs } fn prefix_binding_power ( op : char ) -> ((), u8 ) { match op { '+' | '-' => ((), 5 ), _ => panic! ( "bad op: {:?}" , op ), } } fn postfix_binding_power ( op : char ) -> Option < ( u8 , ()) > { let res = match op { '!' | '[' => ( 7 , ()), (1) _ => return None , }; Some ( res ) } fn infix_binding_power ( op : char ) -> Option < ( u8 , u8 ) > { let res = match op { '+' | '-' => ( 1 , 2 ), '*' | '/' => ( 3 , 4 ), '.' => ( 10 , 9 ), _ => return None , }; Some ( res ) } #[test] fn tests () { ... let s = expr ( "x[0][1]" ); assert_eq! ( s .to_string (), "([ ([ x 0) 1)" ); }

1 Note that we use the same priority for ! as for [ . In general, for the correctness of our algorithm it’s pretty important that, when we make decisions, priorities are never equal. Otherwise, we might end up in a situation like the one before tiny adjustment for associativity, where there were two equally-good candidates for reduction. However, we only compare right bp with left bp ! So for two postfix operators it’s OK to have priorities the same, as they are both right.

Finally, the ultimate boss of all operators, the dreaded ternary:

1 c ? e1 : e2

Is this …​ all-other-the-place-fix operator? Well, let’s change the syntax of ternary slightly:

1 c [ e1 ] e2

And let’s recall that a[i] turned out to be a postfix operator + parenthesis…​ So, yeah, ? and : are actually a weird pair of parens! And let’s handle it as such! Now, what about priority and associativity? What associativity even is in this case?

1 a ? b : c ? d : e

To figure it out, we just squash the parens part:

1 a ?: c ?: e

This can be parsed as

1 (a ?: c) ?: e

or as

1 a ? : ( c ? : e )

What is more useful? For ? -chains like this:

1 2 3 a ? b : c ? d : e

the right-associative reading is more useful. Priority-wise, the ternary is low priority. In C, only = and , have lower priority. While we are at it, let’s add C-style right associative = as well.

Here’s our the most complete and perfect version of a simple Pratt parser:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 use std ::{ fmt , io :: BufRead }; enum S { Atom ( char ), Cons ( char , Vec < S > ), } impl fmt :: Display for S { fn fmt ( & self , f : & mut fmt :: Formatter < '_ > ) -> fmt :: Result { match self { S :: Atom ( i ) => write! ( f , "{}" , i ), S :: Cons ( head , rest ) => { write! ( f , "({}" , head ) ? ; for s in rest { write! ( f , " {}" , s ) ? } write! ( f , ")" ) } } } } #[derive(Debug, Clone, Copy, PartialEq, Eq)] enum Token { Atom ( char ), Op ( char ), Eof , } struct Lexer { tokens : Vec < Token > , } impl Lexer { fn new ( input : & str ) -> Lexer { let mut tokens = input .chars () .filter (| it | ! it .is_ascii_whitespace ()) .map (| c | match c { '0' ..= '9' | 'a' ..= 'z' | 'A' ..= 'Z' => Token :: Atom ( c ), _ => Token :: Op ( c ), }) .collect :: < Vec < _ >> (); tokens .reverse (); Lexer { tokens } } fn next ( & mut self ) -> Token { self .tokens .pop () .unwrap_or ( Token :: Eof ) } fn peek ( & mut self ) -> Token { self .tokens .last () .copied () .unwrap_or ( Token :: Eof ) } } fn expr ( input : & str ) -> S { let mut lexer = Lexer :: new ( input ); expr_bp ( & mut lexer , 0 ) } fn expr_bp ( lexer : & mut Lexer , min_bp : u8 ) -> S { let mut lhs = match lexer .next () { Token :: Atom ( it ) => S :: Atom ( it ), Token :: Op ( '(' ) => { let lhs = expr_bp ( lexer , 0 ); assert_eq! ( lexer .next (), Token :: Op ( ')' )); lhs } Token :: Op ( op ) => { let ((), r_bp ) = prefix_binding_power ( op ); let rhs = expr_bp ( lexer , r_bp ); S :: Cons ( op , vec! [ rhs ]) } t => panic! ( "bad token: {:?}" , t ), }; loop { let op = match lexer .peek () { Token :: Eof => break , Token :: Op ( op ) => op , t => panic! ( "bad token: {:?}" , t ), }; if let Some (( l_bp , ())) = postfix_binding_power ( op ) { if l_bp < min_bp { break ; } lexer .next (); lhs = if op == '[' { let rhs = expr_bp ( lexer , 0 ); assert_eq! ( lexer .next (), Token :: Op ( ']' )); S :: Cons ( op , vec! [ lhs , rhs ]) } else { S :: Cons ( op , vec! [ lhs ]) }; continue ; } if let Some (( l_bp , r_bp )) = infix_binding_power ( op ) { if l_bp < min_bp { break ; } lexer .next (); lhs = if op == '?' { let mhs = expr_bp ( lexer , 0 ); assert_eq! ( lexer .next (), Token :: Op ( ':' )); let rhs = expr_bp ( lexer , r_bp ); S :: Cons ( op , vec! [ lhs , mhs , rhs ]) } else { let rhs = expr_bp ( lexer , r_bp ); S :: Cons ( op , vec! [ lhs , rhs ]) }; continue ; } break ; } lhs } fn prefix_binding_power ( op : char ) -> ((), u8 ) { match op { '+' | '-' => ((), 9 ), _ => panic! ( "bad op: {:?}" , op ), } } fn postfix_binding_power ( op : char ) -> Option < ( u8 , ()) > { let res = match op { '!' => ( 11 , ()), '[' => ( 11 , ()), _ => return None , }; Some ( res ) } fn infix_binding_power ( op : char ) -> Option < ( u8 , u8 ) > { let res = match op { '=' => ( 2 , 1 ), '?' => ( 4 , 3 ), '+' | '-' => ( 5 , 6 ), '*' | '/' => ( 7 , 8 ), '.' => ( 14 , 13 ), _ => return None , }; Some ( res ) } #[test] fn tests () { let s = expr ( "1" ); assert_eq! ( s .to_string (), "1" ); let s = expr ( "1 + 2 * 3" ); assert_eq! ( s .to_string (), "(+ 1 (* 2 3))" ); let s = expr ( "a + b * c * d + e" ); assert_eq! ( s .to_string (), "(+ (+ a (* (* b c) d)) e)" ); let s = expr ( "f . g . h" ); assert_eq! ( s .to_string (), "(. f (. g h))" ); let s = expr ( " 1 + 2 + f . g . h * 3 * 4" ); assert_eq! ( s .to_string (), "(+ (+ 1 2) (* (* (. f (. g h)) 3) 4))" , ); let s = expr ( "--1 * 2" ); assert_eq! ( s .to_string (), "(* (- (- 1)) 2)" ); let s = expr ( "--f . g" ); assert_eq! ( s .to_string (), "(- (- (. f g)))" ); let s = expr ( "-9!" ); assert_eq! ( s .to_string (), "(- (! 9))" ); let s = expr ( "f . g !" ); assert_eq! ( s .to_string (), "(! (. f g))" ); let s = expr ( "(((0)))" ); assert_eq! ( s .to_string (), "0" ); let s = expr ( "x[0][1]" ); assert_eq! ( s .to_string (), "([ ([ x 0) 1)" ); let s = expr ( "a ? b : c ? d : e" , ); assert_eq! ( s .to_string (), "(? a b (? c d e))" ); let s = expr ( "a = 0 ? b : c = d" ); assert_eq! ( s .to_string (), "(= a (= (? 0 b c) d))" ) } fn main () { for line in std :: io :: stdin () .lock () .lines () { let line = line .unwrap (); let s = expr ( & line ); println! ( "{}" , s ) } }