2017-06-20 | 1715 words | Seq type and its caching mechanism

I vividly recall my first steps in Perl 6 were just a couple of months before the first stable release of the language in December 2015. Around that time, Larry Wall was making a presentation and showed a neat feature—the sequence operator—and it got me amazed about just how powerful the language is:

# First 12 even numbers: say (2, 4 … ∞)[^12]; # OUTPUT: (2 4 6 8 10 12 14 16 18 20 22 24) # First 10 powers of 2: say (2, 2², 2³ … ∞)[^10]; # OUTPUT: (2 4 8 16 32 64 128 256 512 1024) # First 13 Fibonacci numbers: say (1, 1, *+* … ∞)[^13]; # OUTPUT: (1 1 2 3 5 8 13 21 34 55 89 144 233)

The ellipsis ( … ) is the sequence operator and the stuff it makes is the Seq object. And now, a year and a half after Perl 6's first release, I hope to pass on my amazement to a new batch of future Perl 6 programmers.

This is a 3-part series. In PART I of this article we'll talk about what Seq s are and how to make them without the sequence operator. In PART II, we'll look at the thing-behind-the-curtain of Seq 's: the Iterator type and how to make Seq s from our own Iterator s. Lastly, in PART III, we'll examine the sequence operator in all of its glory.

Note: I will be using all sorts of fancy Unicode operators and symbols in this article. If you don't like them, consult with the Texas Equivalents page for the equivalent ASCII-only way to type those elements.

PART I: What the Seq is all this about?

The Seq stands for Sequence and the Seq object provides a one-shot way to iterate over a sequence of stuff. New values can be generated on demand—in fact, it's perfectly possible to create infinite sequences—and already-generated values are discarded, never to be seen again, although, there's a way to cache them, as we'll see.

Sequences are driven by Iterator objects that are responsible for generating values. However, in many cases you don't have to create Iterator s directly or use their methods while iterating a Seq . There are several ways to make a Seq and in this section, we'll talk about gather / take construct.

I gather you'll take us to...

The gather statement and take routine are similar to "generators" and "yield" statement in some other languages:

my $seq-full-of-sunshine := gather { say 'And nobody cries'; say 'there’s only butterflies'; take 'me away'; say 'A secret place'; say 'A sweet escape'; take 'meee awaaay'; say 'To better days' ; take 'MEEE AWAAAAYYYY'; say 'A hiding place'; }

Above, we have a code block with lines of song lyrics, some of which we say (print to the screen) and others we take (to be gather ed). Just like, .say can be used as either a method or a subroutine, so you can use .take as a method or subroutine, there's no real difference; merely convenience.

Now, let's iterate over $seq-full-of-sunshine and watch the output:

for $seq-full-of-sunshine { ENTER say '▬▬▶ Entering'; LEAVE say '◀▬▬ Leaving'; say "❚❚ $_"; } # OUTPUT: # And nobody cries # there’s only butterflies # ▬▬▶ Entering # ❚❚ me away # ◀▬▬ Leaving # A secret place # A sweet escape # ▬▬▶ Entering # ❚❚ meee awaaay # ◀▬▬ Leaving # To better days # ▬▬▶ Entering # ❚❚ MEEE AWAAAAYYYY # ◀▬▬ Leaving # A hiding place

Notice how the say statements we had inside the gather statement didn't actualy get executed until we needed to iterate over a value that take routines took after those particular say lines. The block got stopped and then continued only when more values from the Seq were requested. The last say call didn't have any more take s after it, and it got executed when the iterator was asked for more values after the last take .

That's exceptional!

The take routine works by throwing a CX::Take control exception that will percolate up the call stack until something takes care of it. This means you can feed a gather not just from an immediate block, but from a bunch of different sources, such as routine calls:

multi what's-that (42) { take 'The Answer' } multi what's-that (Int $ where *.is-prime) { take 'Tis a prime!' } multi what's-that (Numeric) { take 'Some kind of a number' } multi what's-that { how-good-is $^it } sub how-good-is ($) { take rand > ½ ?? 'Tis OK' !! 'Eww' } my $seq := gather map &what's-that, 1, 31337, 42, 'meows'; .say for $seq; # OUTPUT: # Some kind of a number # Tis a prime! # The Answer # Eww

Once again, we iterated over our new Seq with a for loop, and you can see that take called from different multies and even nested sub calls still delivered the value to our gather successfully:

The only limitation is you can't gather take s done in another Promise or in code manually cued in the scheduler:

gather await start take 42; # OUTPUT: # Tried to get the result of a broken Promise # in block <unit> at test.p6 line 2 # # Original exception: # take without gather gather $*SCHEDULER.cue: { take 42 } await Promise.in: 2; # OUTPUT: Unhandled exception: take without gather

However, nothing's stopping you from using a Channel to proxy your data to be take n in a react block.

my Channel $chan .= new; my $promise = start gather react whenever $chan { .take } say "Sending stuff to Channel to gather..."; await start { $chan.send: $_ for <a b c>; $chan.close; } dd await $promise; # OUTPUT: # Sending stuff to Channel to gather... # ("a", "b", "c").Seq

Or gathering take s from within a Supply :

my $supply = supply { take 42; emit 'Took 42!'; } my $x := gather react whenever $supply { .say } say $x; # OUTPUT: Took 42! # (42)

Stash into the cache

I mentioned earlier that Seq s are one-shot Iterables that can be iterated only once. So what exactly happens when we try to iterate them the second time?

my $seq := gather take 42; .say for $seq; .say for $seq; # OUTPUT: # 42 # This Seq has already been iterated, and its values consumed # (you might solve this by adding .cache on usages of the Seq, or # by assigning the Seq into an array)

A X::Seq::Consumed exception gets thrown. In fact, Seqs do not even do the Positional role, which is why we didn't use the @ sigil that type- checks for Positional on the variables we stored Seq s in.

The Seq is deemed consumed whenever something asks it for its Iterator after another thing grabbed it, like the for loop would. For example, even if in the first for loop above we would've iterated over just 1 item, we wouldn't be able to resume taking more items in the next for loop, as it'd try to ask for the Seq 's iterator that was already taken by the first for loop.

As you can imagine, having Seq s always be one-shot would be somewhat of a pain in the butt. A lot of times you can afford to keep the entire sequence around, which is the price for being able to access its values more than once, and that's precisely what the Seq.cache method does:

my $seq := gather { take 42; take 70 }; $seq.cache; .say for $seq; .say for $seq; # OUTPUT: # 42 # 70 # 42 # 70

As long as you call .cache before you fetch the first item of the Seq , you're good to go iterating over it until the heat death of the Universe (or until its cache noms all of your RAM). However, often you do not even need to call .cache yourself.

Many methods will automatically .cache the Seq for you:

.Str , .Stringy , .fmt , .gist , .perl methods always .cache

, , , , methods always .AT-POS and .EXISTS-POS methods, or in other words, Positional indexing like $seq[^10] , always .cache

and methods, or in other words, indexing like , always .elems , .Numeric , and .Int will .cache the Seq , unless the underlying Iterator provides a .count-only method (we'll get to those in PART II)

, , and will the , unless the underlying provides a method (we'll get to those in PART II) .Bool will .cache unless the underlying Iterator provides .bool-only or .count-only methods

There's one more nicety with Seq s losing their one-shotness that you may see refered to as PositionalBindFailover . It's a role that indicates to the parameter binder that the type can still be converted into a Positional , even when it doesn't do Positional role. In plain English, it means you can do this:

sub foo (@pos) { say @pos[1, 3, 5] } my $seq := 2, 4 … ∞; foo $seq; # OUTPUT: (4 8 12)

We have a sub that expects a Positional argument and we give it a Seq which isn't Positional , yet it all works out, because the binder .cache s our Seq and uses the List the .cache method returns to be the Positional to be used, thanks to it doing the PositionalBindFailover role.

Last, but not least, if you don't care about all of your Seq 's values being generated and cached right there and then, you can simply assign it to a @ sigiled variable, which will reify the Seq and store it as an Array :

my @stuff = gather { take 42; say "meow"; take 70; } say "Starting to iterate:"; .say for @stuff; # OUTPUT: # meow # Starting to iterate: # 42 # 70

From the output, we can see say "meow" was executed on assignment to @stuff and not when we actually iterated over the value in the for loop.

Conclusion

In Perl 6, Seq s are one-shot Iterable s that don't keep their values around, which makes them very useful for iterating over huge, or even infinite, sequences. However, it's perfectly possible to cache Seq values and re-use them, if that is needed. In fact, many of the Seq 's methods will automatically cache the Seq for you.

There are several ways to create Seq s, one of which is to use the gather and take where a gather block will stop its execution and continue it only when more values are needed.

In parts II and III, we'll look at other, more exciting, ways of creating Seq s. Stay tuned!

-Ofun