announcing a new op, OP_MULTICONCAT

From: Dave Mitchell

Date: October 25, 2017 12:15

Subject: announcing a new op, OP_MULTICONCAT

Message ID: 20171025121518.GP3083@iabyn.com

October 25, 2017 12:15announcing a new op, OP_MULTICONCAT

TL;DR: the branch I've just pushed for smoking, smoke-me/davem/mconcat allows multiple OP_CONCAT, OP_CONST ops, plus optionally an OP_SASSIGN or OP_STRINGIFY, to be combined into a single OP_MULTICONCAT op, which can make things a *lot* faster: 4x or more. I'll merge it into blead in a few days, barring any issues. This work has taken up most of my time for the last 3 months; its been kindly funded by Booking.com. In more detail: it will optimise into a single OP_MULTICONCAT, most expressions of the form LHS RHS where LHS is one of (empty) my $lexical = $lexical = $lexical .= expression = expression .= and RHS is one of (A . B . C . ...) where A,B,C etc are expressions and/or string constants "aAbBc..." where a,A,b,B etc are expressions and/or string constants sprintf "..%s..%s..", A,B,.. where the format is a constant string containing only '%s' and '%%' elements, and A,B, etc are scalar expressions (so only a fixed, compile-time-known number of args: no arrays or list context function calls etc) It doesn't optimise other forms, such as ($a . $b) . ($c. $d) ((($a .= $b) .= $c) .= $d); (although sub-parts of those expressions might be converted to an OP_MULTICONCAT). This is partly because it would be hard to maintain the correct ordering of tie or overload calls. The compiler uses heuristics to determine when to convert: in general, expressions involving a single OP_CONCAT aren't converted, unless some other saving can be made, for example if an OP_CONST can be eliminated, or in the presence of 'my $x = .. ' which OP_MULTICONCAT can apply OPpTARGET_MY to, but OP_CONST can't. The multiconcat op is of type UNOP_AUX, with the op_aux structure directly holding a pointer to a single constant char* string plus a list of segment lengths. So for "a=$a b=$b

"; the constant string is "a= b=

", and the segment lengths are (2,3,1). If the constant string has different non-utf8 and utf8 representations (such as "\x80") then both variants are pre-computed and stored in the aux struct, along with two sets of segment lengths. For all the above LHS types, any SASSIGN op is optimised away. For a LHS of '$lex=', '$lex.=' or 'my $lex=', the PADSV is optimised away too. For example where $a and $b are lexical vars, this statement: my $c = "a=$a, b=$b

"; formerly compiled to const[PV "a="] s padsv[$a:1,3] s concat[t4] sK/2 const[PV ", b="] s concat[t5] sKS/2 padsv[$b:1,3] s concat[t6] sKS/2 const[PV "

"] s concat[t7] sKS/2 padsv[$c:2,3] sRM*/LVINTRO sassign vKS/2 and now compiles to: padsv[$a:1,3] s padsv[$b:1,3] s multiconcat("a=, b=

",2,4,1)[$c:2,3] vK/LVINTRO,TARGMY,STRINGIFY In terms of how much faster it is, this code: my $a = "the quick brown fox jumps over the lazy dog"; my $b = "to be, or not to be; sorry, what was the question again?"; for my $i (1..10_000_000) { my $c = "a=$a, b=$b

"; } runs 2.7 times faster, and if you throw utf8 mixtures in it gets even better. This loop runs 4 times faster: my $s; my $a = "ab\x{100}cde"; my $b = "fghij"; my $c = "\x{101}klmn"; for my $i (1..10_000_000) { $s = "\x{100}wxyz"; $s .= "foo=$a bar=$b baz=$c"; } The main ways in which OP_MULTICONCAT gains its speed are: * any OP_CONSTs are eliminated, and the constant bits (already in the right encoding) are copied directly from the constant string attached to the op's aux structure. * It optimises away any SASSIGN op, and possibly a PADSV op on the LHS, in all cases; OP_CONCAT only did this in very limited circumstances. * Because it has a holistic view of the entire concatenation expression, it can do the whole thing in one efficient go, rather than creating and copying intermediate results. pp_multiconcat() goes to considerable efforts to avoid inefficiencies. For example it will only SvGROW() the target once, and to the exact size needed, no matter what mix of utf8 and non-utf8 appear on the LHS and RHS. It never allocates any temporary SVs except possibly in the case of tie or overloading. * It does all its own appending and utf8 handling rather than calling out to functions like sv_catsv(). * It's very good at handling the LHS appearing on the RHS; for example in $x = "abcd"; $x = "-$x-$x-"; It will do roughly the equivalent of the following (where targ is $x); SvPV_force(targ); SvGROW(targ, 11); p = SvPVX(targ); Move(p, p+1, 4, char); Copy("-", p, 1, char); Copy("-", p+5, 1, char); Copy(p+1, p+6, 4, char); Copy("-", p+10, 1, char); SvCUR(targ) = 11; p[11] = '\0'; Formerly, pp_concat would have used multiple PADTMPs or temporary SVs to handle situations like that. The code is quite big; both S_maybe_multiconcat() and pp_multiconcat() (the main compile-time and runtime parts of the implementation) are over 700 lines each. It turns out that when you combine multiple ops, the number of edge cases grows exponentially ;-) Here are some benchmarks, sorted by improving conditional branch count. The numbers represent relative counts per loop iteration, compared to blead at 100.0%. Higher is better: for example, using half as many instructions gives 200%, while using twice as many gives 50%. Ir Dr Dw COND IND ------ ------ ------ ------ ------ 88.52 121.28 104.76 104.00 140.00 $lex1 .= $lex2 96.19 105.26 101.16 102.35 115.38 (($lex1 .= $lex2) .= $lex3) .= $lex4 100.00 100.00 100.00 100.00 100.00 $pkg .= $lex1 100.00 100.00 100.00 100.00 100.00 my $x = "abc" 100.00 100.00 100.00 100.00 100.00 $lex = $lex1 . $lex2 100.00 100.00 100.00 100.00 100.00 $lex1 . $lex2 100.00 100.00 100.00 100.00 100.00 $lex1 = $lex1 . $lex1 100.00 100.00 100.00 100.00 100.00 $lex = join "", $lex1, $lex2 100.00 100.00 100.00 100.00 100.00 $lex1 = $lex1 . $lex2 101.45 128.57 111.90 107.69 175.00 $lex . "foo" 101.45 128.57 111.90 107.69 175.00 "foo" . $lex 103.68 128.57 111.90 108.11 175.00 $lex = "const$lex1" 105.63 136.84 110.53 108.70 140.00 $pkg .= "foo" 128.33 176.67 161.54 123.81 175.00 $lex .= "foo" 128.65 145.74 131.91 155.10 155.56 $pkg = $lex1 . $pkg 142.32 183.75 168.42 165.00 162.50 $pkg .= $lex1 . $lex2 143.78 159.50 143.08 176.36 216.67 my $lex = $lex1 . $lex2 145.95 159.22 141.82 182.61 200.00 $pkg = $lex1 . $lex2 149.90 199.21 191.23 148.72 188.89 $s .= $a.$b.$c where all RH args are utf8 157.46 208.45 200.00 178.38 185.71 $lex1 .= $lex2 . $lex3 162.47 225.24 214.89 168.42 188.89 $s .= $a.$b.$c where all args are utf8 163.09 198.39 180.95 182.81 228.57 my $s = $a.$b.$c where all args are utf8 164.27 185.83 168.18 198.28 200.00 $pkg = $lex1 . $lex2 . $lex3 164.55 177.92 167.57 187.80 200.00 $pkg = $pkg . $pkg 164.79 173.42 177.14 200.00 200.00 $pkg = $pkg . $lex2 170.03 185.06 162.50 210.00 240.00 $pkg = "const$lex1" 175.12 202.56 185.00 212.73 228.57 $lex = $lex1 . $lex2 . $lex3 175.35 231.58 225.58 197.96 188.89 $lex1 .= $lex2 . $lex3 . $lex4 175.81 214.29 207.89 173.25 200.00 sprintf "%s", $lex1 184.59 216.55 201.32 222.39 187.50 my $lex = $lex1 . $lex2 . $lex3 194.78 232.12 218.29 229.57 218.18 $lex = $lex1 . $lex2 . $lex3 . $lex4 . $lex5 195.58 227.62 206.90 244.90 220.00 my $lex = "const$lex1" 214.41 244.90 229.63 248.89 300.00 my $lex = sprintf "%s", $lex1 224.89 261.14 209.52 255.95 277.78 my $lex = "foooooooooo=$lex1 baaaaaaaaar=$lex2

" where lex1/2 are 400 chars 230.62 275.76 236.36 227.78 300.00 $lex = "foooooooooo=$lex1 baaaaaaaaar=$lex2

" where lex1/2 are 400 chars 231.97 302.65 272.41 238.33 275.00 $lex = "foo=$lex1 bar=$lex2

" 233.23 244.21 289.80 259.57 220.00 sprintf "%s%s", $lex1, $lex2 238.61 261.90 273.17 288.89 187.50 $lex1 = $lex2 . $lex1 239.64 272.57 234.48 253.33 288.89 $pkg = "foooooooooo=$lex1 baaaaaaaaar=$lex2

" where lex1/2 are 400 chars 240.98 263.75 247.73 272.22 275.00 $pkg = sprintf "%s", $lex1 243.99 295.93 265.62 271.43 266.67 $pkg = "foo=$lex1 bar=$lex2

" 250.85 313.48 286.49 283.33 255.56 my $lex = "foo=$lex1 bar=$lex2

" 255.30 345.54 329.17 274.55 227.27 $pkg .= "-foo-$lex1-foo-$lex2-foo-" 260.09 340.35 351.85 263.33 200.00 $pkg .= sprintf "%s", $lex1 267.63 302.86 286.84 300.00 366.67 $lex = sprintf "%s", $lex1 270.29 272.00 315.62 317.91 211.11 $pkg = sprintf "foo=%s bar=%s

", $lex1, $lex2 272.19 273.86 293.75 321.43 260.00 $pkg = sprintf "foo=%s", $lex1 272.73 292.31 329.73 325.00 200.00 my $lex = sprintf "foo=%s bar=%s

", $lex1, $lex2 272.90 356.57 337.21 278.26 271.43 $lex = "foo1=$lex1 foo2=$lex2 foo3=$lex3 foo4=$lex4

" 273.48 379.35 373.81 292.31 250.00 $lex1 .= "-foo-$lex2-foo-$lex3-foo" 275.38 301.89 317.24 333.33 240.00 my $lex = sprintf "foo=%s", $lex1 279.21 279.05 312.73 330.00 266.67 $pkg = sprintf "%s%s", $lex1, $lex2 280.63 302.44 330.77 338.98 250.00 my $lex = sprintf "%s%s", $lex1, $lex2 287.32 296.52 348.28 332.81 237.50 $lex = sprintf "foo=%s bar=%s

", $lex1, $lex2 296.81 406.25 452.38 296.30 240.00 $lex .= sprintf "%s", $lex1 297.69 336.59 415.79 329.55 212.50 $pkg .= sprintf "%s%s", $lex1, $lex2 298.19 310.26 335.71 348.72 325.00 $lex = sprintf "foo=%s", $lex1 301.21 309.47 351.02 355.32 320.00 $lex = sprintf "%s%s", $lex1, $lex2 325.90 379.45 493.75 356.10 242.86 $lex .= sprintf "%s%s", $lex1, $lex2 333.93 419.48 454.55 359.60 291.67 $s .= "foo=$a bar=$b baz=$c" where $b,$c are utf8 337.37 406.58 406.94 383.72 320.00 my $s = "foo=$a bar=$b baz=$c" where $b,$c are utf8 375.00 478.63 521.43 427.85 291.67 $s .= "foo=$a bar=$b baz=$c" where $s,$b,$c are utf8 391.79 408.33 490.41 461.96 270.00 $s = sprintf("foo=%s bar=%s baz=%s", $a, $b, $c) where $a,$c are utf8 528.36 623.38 721.21 571.00 391.67 $s .= "foo=$a bar=$b baz=$c" where $a,$c are utf8 545.23 613.16 651.39 630.23 440.00 my $s = "foo=$a bar=$b baz=$c" where $a,$c are utf8 581.81 723.31 830.56 622.22 500.00 $s .= "foo=$a bar=$b baz=$c" where $a is utf8, $c has 0x80 599.55 714.91 757.69 678.64 577.78 my $s = "foo=$a bar=$b baz=$c" where $a is utf8, $c has 0x80 603.04 718.32 835.71 693.67 391.67 $s .= "foo=$a bar=$b baz=$c" where $s,$a,$c are utf8 654.05 828.57 951.61 727.84 500.00 $s .= "foo=$a bar=$b baz=$c" where $s,$a are utf8, $c has 0x80 184.58 211.78 204.43 201.60 202.52 AVERAGE -- You never really learn to swear until you learn to drive.



