Allow me to add my favorite Perl 6 solutions. The split version is not tested yet because Pugs::Compiler::Rule doesn't seem to implement <before> and <after> : my $s = "ZBBBCZZ"; my @list = $s.comb: rx/(.)$0*/; # or if you are very fond on split: my @list = $s.split(rx/<after (.)><!before $0>/); [download] The comb-Version seems to be the same as the ruby example, the split-Version shows off the new regex syntax ;-) Perl 6 in German

Given that haskell's group is a library routine defined (one flavour) as: group = groupBy (==) groupBy _ []= [] groupBy eq (x:xs)= (x:ys) : groupBy eq zs where (ys,zs) = span (eq x) xs span p [] = ([],[]) span p xs@(x:xs') | p x = (x:ys, zs) | otherwise = ([],xs) where (ys,zs) = span p xs' [download] it would be equivalent to hide the complexity of a perl equivalent in a function, and it might be defined something like this: sub group { map{ pack 'C*', @$_ } @{ +reduce { defined $a->[0] && $a->[-1][0] && $a->[-1][0] == $b ? push @{ $a->[-1] }, $b : push @{ $a }, [ $b ]; $a; } [], unpack 'C*', shift } } [download] Giving a roughly equivalent one line usage: print for group 'ZBBBCCZZ'; Z BBB CC ZZ [download] Of course, the equivalence is only rough because Haskell's syntactic sugar of allowing a list of chars to look like a string. The above won't work for an arbitrary list. I remember suggesting that it would be nice if Perl 6 would allow a scalar to be treated syntactically as a list of chars: $scalar[3] = 'x'; etc. It would allow a single routine to be polymorphic to arbitrary lists and scalars and simplify the above, but I guess it wouldn't work for many other cases. BTW, does anyone know of a CPAN module that can solve this problem directly? I just did a codesearch for lang:haskell 'group' , and without promising to have exhaustively searched every link, I couldn't find a single use of group in any of the first 5 or so pages of hits. Loads of definitions, mostly the same but some variations in different libraries, but no uses. I guess that's why you won't find it on cpan. Other than for essentially academic uses like this, there are very few uses for it. Of course, now someone will turn up a 300 lines haskell module with 250 uses of group, but on the basis of what I found, it's not exactly an essential function. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

If you don't mind the different order of the output then: $ perl -le'print for keys %{ { "ZBBBCZZ" =~ /((.)\2*)/g } }' Z ZZ BBB C [download] :-)

Excellent excellent excellent.

A solution to the order problem would be to use an order-remembering hash (e.g. Tie::IxHash). Or you could pluck out the even-numbered elements of the list, e.g. $_ = 'abbbccddde'; @a = /((.)\2*)/g; @a = @a[ grep { not $_ & 1 } 0 .. $#a ]; [download] A word spoken in Mind will reach its own level, in the objective world, by its own weight

If you don't mind the different order of the output... ... and also don't mind the possible loss of some intended output: # OP's solution, given a different input: $ perl -le'print $1 while "ZBBBCZZCZ" =~ /((.)\2*)/g' Z BBB C ZZ C Z # jwkrahn's solution for that input: $ perl -le'print for keys %{{"ZBBBCZZCZ" =~ /((.)\2*)/g}}' Z ZZ BBB C [download] (Not to mention that the OP approach wins at golfing. :)

If you don't mind the different order of the output then: Sorry if I miss something, but your solution based on a hash, like all hash based ones seems aimed at uniqueness... but that doesn't seem a requirement as of the root node...

As you might expect, the only solution that I found truly satisfying was the Haskell one :-) Anyone who loves to browse voluminous manuals that describe huge inventories of library functions would agree with you, I'm sure. :) ... though this seems to be more of a library issue than a language one. Exactly. If the implementation of Haskell's "group" function has the same parsimony and efficiency as the perl "while /regex/g" solution, and if I really need to use it often, then I would certainly prefer to use the function call (once I've discovered that it actually exists). But if the "standard operators" provided by the language yield a fairly concise idiom for the task, why bother with the overhead (and relative obscurity) of a function call, especially if the particular task doesn't come up that often?

Anyone who loves to browse voluminous manuals that describe huge inventories of library functions would agree with you, I'm sure. :) And just how many lines of documentation are on the CPAN?

Not enough ;-)



holli, /regexed monk/

eyepopslikeamosquito,

BTW, does anyone know of a CPAN module that can solve this problem directly? It is hard to imagine why this specific problem would have its own solution. The problem can be generalized into something worth solving. Assume we want to divide a list on break points. I will present an iterator solution. #!/usr/bin/perl use strict; use warnings; use Data::Dumper; sub gen_divider { my %opt = @_; my $is_break = $opt{detect_break}; my ($curr, @list); return sub { for (@_) { if (! defined $curr || $is_break->($_, $curr)) { push @list, [$_]; $curr = $_; } else { push @{$list[-1]}, $_; } } return \@list; }; } my $break_point = sub { my ($item_to_test, $prev_break_point) = @_; return $item_to_test ne $prev_break_point; }; my $divide = gen_divider(detect_break => $break_point); for (split //, "ZBBBCZZ") { $divide->($_); } my $list = $divide->(); print Dumper($list); [download] This leaves a lot to be desired but the idea jumped into my head and so I thought I would share. Cheers - L~R

Ruby solution x = [] "ZBBBCZZ".scan(/((.)\2*)/){ x << [$~[0]]; x.flatten!} [download] .flatten is needed else you get # => [["Z"], ["BBB"], ["C"], ["ZZ"]] [download] as result. But this is quite an ugly solution anyway... in fact, perl looks almost as readable in this example ;) not sure how to solve this any easier though, hmm i wonder if .each could be used and then another grouping way... I really dont like the regex magic of the ruby solution, cant you guys think of another solution :-)

I really dont like the regex magic of the ruby solution, cant you guys think of another solution :-) How about some inject magic? x = s.split(//).inject([]) {|a,e| (a.last && a.last[e] ? a.last : a) < + < e; a} [download]

A reply falls below the community's threshold of quality. You may see it by logging in.

Actually, this is a good illustration of Haskell's strengths: given a contrived problem involving lists, there is an obscurely-named library function to solve it in one line.

You like contrived examples huh? How about this. The Vatican just called and want to commission you to write the software for their soon to be released answer to the current vogue for more and more violent and Satanic video games--The God in a Box console. They antisipate huge sales amongst followers once the Pope edicts that all good Catholic homes should have one. There is one caveat. To ensure that there are no hidden sex scenes or other inappropriate Easter eggs, they have decreed that all games for this new omni-threaded processor console shall be reviewed, at the source level, by a senior member of the Vatican Council. To facilitate this review, the source code must be written in Latin. What you gonna do? Does Haskell have an appropriate parser available? Thanks to theDamian, Perl does! Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

I remember suggesting that it would be nice if Perl 6 would allow a scalar to be treated syntactically as a list of chars: $scalar3 = 'x'; etc. It would allow a single routine to be polymorphic to arbitrary lists and scalars and simplify the above, but I guess it wouldn't work for many other cases. And funny thing, ruby gives me idea on adding some for my handmade jewelry. lol

My favourite solution from … was given in the first response by tye: Seems like he has a boiler plate reply prepared for this questions ;-). It was also his reply when I brought up that question with my very first post at PM. And it is a good solution. Beginning with ysth's reply to my question there was some in-depth discussion of using ??{} to approach this problem. And for me this solution has some appeal: my @split_on_char_change = $string =~ /((??{'(.)\1*'}))/g;

In JavaScript it's pretty concise, but still understandable:

var arr = "ZBBBCZZ".match(/((.)\2*)/g);

I really like that Javascript version. You either like or hate regexps for this kind of thing, but as regexp-based solutions go "ZBBBCZZ".match(/((.)\2*)/g) is damn concise. I couldn't get the perl version to be that concise... pity. I hope perl6's Rules allow one to specify a sub-match that can be referenced and then thrown away. -David

Earlier in this thread we already had a Perl 6 solution resembling this: "ZBBBCZZ".comb(/(.)$0*/) [download] I think that does what you want here, though in this case it's the .comb method that's forgetting the $0 capture, not the regex, so you raise an interesting point for the general case. It's not specced yet, but the current standard grammar has two rules called "same" and "differ" that could be used for this, along with the "quantify via separator" form: "ZBBBCZZ" ~~ m:g/ . ** <?same> /) [download] Though that is also kinda cheating on your general point by not capturing anything in the first place... At the moment, forgetting a capture within Perl 6 regex involves forcefully rebinding the capture to nothing, or returning an alternate result via a closure. Perhaps there should be a way to declare temp bindings that are not intended for the final match. But maybe not, if it's just going to turn into an obscure feature. So more likely it'll be some kind of context that just happens to work that way when you expect it to, like .comb does, only inside regex syntax.

I couldn't get that javascript version to work. Sure, it looks nice. It looks very much like something many tried as a first stab in Perl. But I'm not convinced it works in javascript any better than it works in Perl. I was not able to get it to return anything (while in Perl it returns too much). Update: Thanks, eB. I was eventually able to get your example to demonstrate that the code works. - tye

Some notes below your chosen depth have not been shown here

My take on a slightly more concise Ruby version: s = "ZBBBCZZ" x = s.scan(/((.)\2*)/).map{|v|v[0]} [download]

Generic version in the D programming language

$ cat test4.d && gdc test4.d -o test4 && echo "----" && ./test4 import std.stdio; T[][] group(T)(T[] array) { T[][] res; foreach (index, elem; array) { if (index == 0 || elem != array[index-1]) res~=(T[]).init; res[$-1]~=elem; } return res; } void main() { writefln(group("ZBBBCZZ")); } ---- [Z,BBB,C,ZZ] [download]

My take on solution in Erlang: -module(rosetta). -compile(export_all). test() -> lists:reverse(lists:foldl(fun rosetta/2, [], "ZYEEEEEABBCCCDDDDF")). rosetta(H, [[H|T]|TT]) -> [[H,H|T]|TT]; rosetta(H, L) -> [[H]|L]. [download] Shahzad Bhatti

Another erlangish -module(rosetta). -compile(export_all). test() -> rosetta("ZYEEEEEABBCCCDDDDF"). rosetta([H|T]) -> rosetta([], [H], T). rosetta(L, [H|_]=Act, [H|T1]) -> rosetta(L, [H|Act], T1); rosetta(L, Act, [H|T]) -> rosetta([Act|L], [H], T); rosetta(L, Act, []) -> lists:reverse([Act|L]). [download] maciek

For cheap thrills, a Perl 5 version using reduce: use List::Util qw(reduce); my $s = "ZBBBCZZ"; my $x = reduce { if (@$a && index($a->[-1], $b) >= 0) { $a->[-1] .= $b + } else { push @$a, $b } $a } [], split //, $s; [download]

I like the scan/map solution of ruby but I must admit too, that the javascript solution comes out cleanest :) "ZBBBCZZ".match(/((.)\2*)/g)

That would be: #!/usr/bin/env python str = "ABBBCCDDZ" parts = [] for i in range(len(str)): if len(parts)>0: if parts[len(parts)-1][len(parts[len(parts)-1])-1]==str[i]: parts[len(parts)-1] += str[i] else: parts.append(str[i]) else: parts.append(str[i]) print parts [download] - Antonio Ognio, Lima-Peru

Another solution, much more verbose but certainly more readable: #!/usr/bin/env python str = "ABBBCCDDZ" parts = [] last_char = '' current_chunk = '' for i in range(len(str)): current_char = str[i] if (current_chunk == '') or (last_char == current_char): current_chunk = current_chunk + current_char else: parts.append(current_chunk) current_chunk = current_char last_char = current_char if len(current_chunk)>0: parts.append(current_chunk) print parts [download] Also available here: http://pastebin.com/f1213db97 - Antonio Ognio, Lima-Peru

I find them equivalently unreadable ;-)

Gnah. No No and No. The itertools version is quite lovely, but if you must do an iterative version please at least make it idiomatic. If you really needed an index, you could at least use enumerate() ;-P mystr = "ABBBCCDDZ" def get_iterparts(mystr): buf = "" for char in mystr: if not buf or buf.endswith(char): buf += char else: yield buf buf = char if buf: yield buf print list(get_iterparts(mystr)) def get_parts(mystr): parts = [] buf = "" for char in mystr: if not buf or buf.endswith(char): buf += char else: parts.append(buf) buf = char if buf: parts.append(buf) return parts print get_parts(mystr) [download]

Shorter Perl 5 variants: Destroys $s: $s="ZBBBCZZ"; $s=~s/((.)\2*)/push@x,$1/eg; Nondestructive: $s="ZBBBCZZ"; $s=~s/((.)\2*)/push@x,$1;$1/eg; Needs a counter $i: $s="ZBBBCZZ"; @x=grep++$i%2,$s=~/((.)\2*)/g;