Ackerman, URL and Raku

Perl 6 → Raku This article has been moved from «perl6.eu» and updated to reflect the language rename in 2019.

[22] Published 21. July 2019.

This is my response to the Perl Weekly Challenge #17.

Challenge #17.1

Create a script to demonstrate Ackermann function. The Ackermann function is defined as below, m and n are positive number: A(m, n) = n + 1 if m = 0 A(m, n) = A(m - 1, 1) if m > 0 and n = 0 A(m, n) = A(m - 1, A(m, n - 1)) if m > 0 and n > 0 Example expansions as shown in wiki page. A(1, 2) = A(0, A(1, 1)) = A(0, A(0, A(1, 0))) = A(0, A(0, A(0, 1))) = A(0, A(0, 2)) = A(0, 3) = 4

This is almost the same problem as the Hofstadter sequences from Challenge 13.2, where I used multiple dispatch («multi sub»). So this is straighforward.

I have chosen to declare a custom type (with subset for positive integers (0 included), and use a «MAIN» wrapper procedure to get the arguments from the command line. multi followed by a name is short for multi sub .

subset PositiveIntZero of Int where * >= 0; sub MAIN(PositiveIntZero \m, PositiveIntZero

) { say A(m, n); } multi A(PositiveIntZero \m where m == 0, PositiveIntZero

) { return n + 1; } multi A(PositiveIntZero \m where m > 0, PositiveIntZero

where n == 0) { return A(m - 1, 1); } multi A(PositiveIntZero \m where m > 0, PositiveIntZero

where n > 0) { return A(m - 1, A(m, n - 1)); }

File: ackermann

See docs.raku.org/language/typesystem#index-entry-subset-subset for more information about subset .

See docs.raku.org/syntax/multi for more information about «multi» and Multiple dispatch.

Running it:

$ raku ackermann 1 2 4

We can get rid of all the «where» clauses in the constraints in the «multi»s if we set up another custom type:

subset PositiveInt of Int where * > 0; subset PositiveIntZero of Int where * >= 0; sub MAIN(PositiveIntZero \m, PositiveIntZero

) { say A(m, n); } multi A(0, PositiveIntZero

) { return n + 1; } multi A(PositiveInt \m, 0) { return A(m - 1, 1); } multi A(PositiveInt \m, PositiveInt

) { return A(m - 1, A(m, n - 1)); }

File: ackermann2

Note that « PositiveIntZero

where n == 0 » can be replaced with the short form « 0 », if it is ok to loose the named variable.

Recursive procedures are not very efficient when they do a lot of computations. Here is a non-recursive version that also caches the already computed values:

subset PositiveIntZero of Int where * >= 0; sub MAIN(PositiveIntZero \m, PositiveIntZero

) { say A(m, n); } my %cache; sub A(PositiveIntZero \m, PositiveIntZero

) { unless %cache{m}{n}.defined # [1] { if m == 0 # [3] { %cache{m}{n} = n + 1; # [2] } elsif n == 0 # [3] { %cache{m}{n} = A(m - 1, 1); # [2] } else # [3] { %cache{m}{n} = A(m - 1, A(m, n - 1)); # [2] } } return %cache{m}{n}; }

File: ackermann-cached

[1] If the value we need hasn't been computed already,

[2] do so, and save the value. Note the use of defined , as a check on the variable itself will return False if the value is zero.

[3] The «multi»s has been recplaced by a traditional «if»-block.

See docs.raku.org/routine/defined for more information about defined .

Let us take a look at the speedup gain:

Argument Result ackermann ackermann-cached 3 1 13 0.2s 0.2s 4 1 65533 > 5 hours 2.2s 5 1 ? unknown > 5 hours

The cached version is way better than the recursive version, but even that is pretty slow when we increase the «m» value.

Challenge #17.2

Create a script to parse URL and print the components of URL. According to Wiki page, the URL syntax is as below: scheme:[//[userinfo@]host[:port]]path[?query][#fragment] For example: jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1 scheme: jdbc:mysql userinfo: user:password host: localhost port: 3306 path: /pwc query: profile=true fragment: h1

This nice illustration, lifted from the wikipedia article, explains it all:

I'll dive stright in, and present a complete program, with a rather complex regex. Raku allows spaces (and newlines) in regexes, so I have added inline comments to show what the individual parts are doing.

sub MAIN ($url, :$verbose) { if $url ~~ /^ (*)\: # $0 scheme [\/\/ # // [(.*[\:.+]?)\@]? # $1 userinfo (optional) (*) # $2 host [\:(\d+)]? # $3 port (optional) (\/?) # $4 path separator ]? # $1-$4 are optional ([]+)? # $5 path (optional) [\?(*)]? # $6 query (optional) [\#(.*)]? # $7 fragment (optional) $/ { say $/ if $verbose; say "scheme: $0"; say "userinfo: $1" if $1; say "host: $2" if $2; say "port: $3" if $3; my $path = $4 if $4; $path ~= $5 if $5; say "path: $path"; say "query: $6" if $6; say "fragment: $7" if $7; } else { say "Invalid URL."; } }

File: url-parse

The only thing I'll point out here is the fact that the path is the combination of the two matches «$4» and «$5», and that $/ is the match variable.

See perlpilot's excellent Introduction to Perl 6 Regex article for an introduction to Raku Regexes.

See https://docs.raku.org/syntax/$/ for more information about the match variable $/ .

Running it:

$ raku url-parse jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1 scheme: jdbc:mysql userinfo: user:password host: localhost port: 3306 path: /pwc query: profile=true fragment: h1

Add the «verbose» switch if you want to see the match object:

$ raku url-parse --verbose jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1 ｢jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1｣ 0 => ｢jdbc:mysql｣ 1 => ｢user:password｣ 2 => ｢localhost｣ 3 => ｢3306｣ 4 => ｢/｣ 5 => ｢pwc｣ 6 => ｢profile=true｣ 7 => ｢h1｣ ...

IPv4 addresses (hosts) will be validated, but I have disregarded support for IPv6. It is possible to do a DNS lookup for a hostname to see that it actually exists (and reach out for the IP address to see if it is reachable).

Grammars

grammar URL { regex TOP { <SchemeW> <Hostinfo>? <Path>? <QueryW>? <FragmentW>? } regex SchemeW { <Scheme> <SchemeS> } regex SchemeS { ':' } regex Scheme { <[a..z]><[a..z 0..9 + . : \-]>* } regex Hostinfo { '//' <UserinfoW>? <Host> <PortW>? } regex UserinfoW { <Userinfo> <UserinfoS> } regex Userinfo { .*[\:.+]? } regex UserinfoS { '@' } regex Host { <[\w \. \-]>* } regex PortW { <PortS> <Port> } regex PortS { ':' } regex Port { \d+ } regex Path { '/'? <[\w \d -] - [#?]>+ } regex QueryW { <QueryS> <Query> } regex QueryS { '?' } regex Query { <[\w \d \- =]>* } regex FragmentW { <FragmentS> <Fragment> } regex FragmentS { '#' } regex Fragment { .+ } } sub MAIN ($url, :$verbose) { my $result = URL.parse($url); if $result { say $result if $verbose; say "scheme: $/<SchemeW><Scheme>"; say "userinfo: $/<Hostinfo><UserinfoW><Userinfo>" if $/<Hostinfo><UserinfoW><Userinfo>; say "host: $/<Hostinfo><Host>" if $/<Hostinfo><Host>; say "port: $/<Hostinfo><PortW><Port>" if $/<Hostinfo><PortW><Port>; say "path: $/<Path>" if $/<Path>; say "query: $/<QueryW><Query>" if $/<QueryW><Query>; say "fragment: $/<FragmentW><Fragment>" if $/<FragmentW><Fragment>; } else { say "Invalid URL."; } }

The regex isn't very reader friendly, even when sprinkled with comments. But we can use a Grammer instead. The «TOP» rule (regex) is the default starting point: File: url-parse-grammar

I have added a trailing «W» to regexes that are needed as wrappers, and «S» to get rid of an explicit symbol (either pre- of postfix) which shouldn't be part of the match.

See Andrew Shitow's Perl 6 Grammars, Part 1 for an introduction to Perl 6 Grammars. (There is no part 2.) The Grammar tutorial in the official documentation is also recommended.

Running it gives the same result, but the match object is quite different:

$ raku url-parse-grammar --verbose jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1 ｢jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1｣ SchemeW => ｢jdbc:mysql:｣ Scheme => ｢jdbc:mysql｣ SchemeS => ｢:｣ Hostinfo => ｢//user:password@localhost:3306｣ UserinfoW => ｢user:password@｣ Userinfo => ｢user:password｣ UserinfoS => ｢@｣ Host => ｢localhost｣ PortW => ｢:3306｣ PortS => ｢:｣ Port => ｢3306｣ Path => ｢/pwc｣ QueryW => ｢?profile=true｣ QueryS => ｢?｣ Query => ｢profile=true｣ FragmentW => ｢#h1｣ FragmentS => ｢#｣ Fragment => ｢h1｣ ...

We got a tree in the match object (actually an Abstract Syntax Tree, or AST), and had to follow the path through the structure to get to the nodes (as shown in the «say» statements).

Writing $/<Hostinfo><PortW><Port> to access one value may seem too much trouble, but that's the cost of using a tree.

It is possible to add code in the regexes. And let the grammar itself print the result. E.g.:

regex Userinfo { .*[\:.+]? { say "userinfo: $/" if $verbose } }

But this kicks in whenever the regex matches, and it does a lot of backtracking as I have started with a greedy regex ( .* ) in this case:

userinfo: user:password@localhost:3306?kkkkk#sksks userinfo: user:password@localhost:3306?kkkkk#sksk userinfo: user:password@localhost:3306?kkkkk#sks userinfo: user:password@localhost:3306?kkkkk#sk userinfo: user:password@localhost:3306?kkkkk#s userinfo: user:password@localhost:3306?kkkkk# userinfo: user:password@localhost:3306?kkkkk userinfo: user:password@localhost:3306?kkkk userinfo: user:password@localhost:3306?kkk userinfo: user:password@localhost:3306?kk userinfo: user:password@localhost:3306?k userinfo: user:password@localhost:3306? userinfo: user:password@localhost:3306 userinfo: user:password@localhost:330 userinfo: user:password@localhost:33 userinfo: user:password@localhost:3 userinfo: user:password@localhost userinfo: user:password@localhos userinfo: user:password@localho userinfo: user:password@localh userinfo: user:password@local userinfo: user:password@loca userinfo: user:password@loc userinfo: user:password@lo userinfo: user:password@l userinfo: user:password@ userinfo: user:password

This gives a lot of noise (false positives), and it doesn't answer the challenge as partial matches will be printed out - even if the URL as a whole is invalid.

It is possible to remedy both problems by storing the value in a (global) hash instead of printing it in the regexes, and just let the code rewrite the value again and again (until it gets it right, on the last try). In the «say»-block we can then access the hash instead of the match object. E.g:

my %values; ... regex Userinfo { .*[\:.+]? { %values = ~$/ } } ... say "userinfo: %values " if %values ;

The output shows that I shouldn't have used a greedy regex here in the first place, as it causes a lot of unnecessary searches (at a cost). I could have used a non-greedy one, and saved the Regex engine for some work. But I'll leave it at that, as the program works and it isn't that critical computational wise in this case.

I do hope you as a reader agree that the important lesson from this challenge is that grammars are actually easier to use than complex regexes. Do compare the two versions, and imagine the regex version as a one-liner, without comments (to make the regex even harder to understand). A grammar doesn't require comments (or at least not that many of them), if the rules have carefully chosen names.

Still not convinced? Scroll up to the «nice illustration, lifted from the wikipedia article» illustration, and compare it with the grammar.