[Tutor] Amazing power of Regular Expressions...

On Sunday 05 November 2006 15:02, Kent Johnson wrote: ... > Regular expressions are an extremely powerful and useful tool that every > programmer should master and then put away and not use when there is an > alternative :-) <eyebrow> There's always an alternative to a regular expression, so are you really suggesting *never* use a regex? (seriously though, I doubt you are, but taken in this context, that's how it looks). The most pathological example of regex avoidance I've seen in a while is this: def isPlain(text): plaindict = {'-': True, '.': True, '1': True, '0': True, '3': True, '2': True, '5': True, '4': True, '7': True, '6': True, '9': True, '8': True, 'A': True, 'C': True, 'B': True, 'E': True, 'D': True, 'G': True, 'F': True, 'I': True, 'H': True, 'K': True, 'J': True, 'M': True, 'L': True, 'O': True, 'N': True, 'Q': True, 'P': True, 'S': True, 'R': True, 'U': True, 'T': True, 'W': True, 'V': True, 'Y': True, 'X': True, 'Z': True, '_': True, 'a': True, 'c': True, 'b': True, 'e': True, 'd': True, 'g': True, 'f': True, 'i': True, 'h': True, 'k': True, 'j': True, 'm': True, 'l': True, 'o': True, 'n': True, 'q': True, 'p': True, 's': True, 'r': True, 'u': True, 't': True, 'w': True, 'v': True, 'y': True, 'x': True, 'z': True} for c in text: if plaindict.get(c, False) == False: return False return True (sadly this is from real code - in defence of the person who wrote it, they weren't even *aware* of regexes) That's equivalent to the regular expression: * ^[0-9A-Za-z_.-]*$ Now, which is clearer? If you learn to read & write regular expressions, then the short regular expression is the clearest form. It's also quicker. I'm not someone who advocates coding-by-regex, as happens rather heavily in perl (I like perl about as much as python), but to say "don't use them if there's an alternative" is a little strong. Aside from the argument that "you now have two problems" (which always applies if you think all problems can be hit with the same hammer), solving *everything* with regex is often slower. (since people then do one after another, after another - the most pathological example I've seen applied over 1000 regexes to a piece of text, one after another, and then the author wondered why their code was slow...) JWZ's quote is more aimed at people who think about solving every problem with regexes (and where you end up with 10 line monstrosities in perl with 5 levels of backtracking). Also, it's worth bearing in mind that there's more than one definition of what regex's are (awk, perl, python, and various C libraries all have slightly differing rules and syntax, even if they often share a common base). Rather than say there's one true way, it's worth bearing in mind that regexes are little more than a shorthand for structured parsing, and bearing this in mind, then it's worth recasting JWZ's point as: If your reaction to seeing a problem is "this looks like it can be solved using a regex", you should think to yourself: has someone else already hit this problem and have they come up with a specialised pattern matcher for it already? If not, why not? In this case that *should* have led the poster to the discovery of the specialised parser: time.strptime(date, '%d/%m/%Y') File globs are another good example of a specialised form of pattern matcher. Using a regex when it's appropriate is good. Finding a more appropriate specialised pattern matcher? Even better. Avoiding using regexes in the way I've shown above, because it's an alternative to using a regex? Bad, it's slow and unclear. :-) Michael.