The Greatest Regex Trick Ever

<tag>.*?</tag>

[^"]+

Excluding certain Contexts while Matching or Replacing

This is a long page. It's sure to have typos and perhaps bugs. Will you do me a favor and report any typos or bugs you find? Thanks!

The Typical Solutions

(?<!")Tarzan(?!")

(?<!"(?=Tarzan"))Tarzan

Tarzan(?!"(?<="Tarzan"))

(?>(?<=(")|))Tarzan(?(1)(?!"))

(?<!")Tarzan|Tarzan(?!")

Tarzan(?!"(?:(?:[^"]*"){2})*[^"]*)

\K

\K

\K

(?:"Tarzan".*?)*\KTarzan

(?<!{[^}]*)Tarzan

(?<!{[^}]*?(?=Tarzan[^{}]*}))Tarzan

[i][/i]

[p][/p]

({[^{}]*?)(Tarzan)([^}]*})

\1\3

$1$3

\1T~a~r~z~a~n\3

$1T~a~r~z~a~n$3

\K

\K

\K

\K

(?:(?>{[^}]*?})[^{}]*?)*\KTarzan

The Best Regex Trick Ever (at last!)

Match Tarzan but not "Tarzan"

((?<=")?)Tarzan(?(1)(?!"))

Tarzan(?!"(?:(?:[^"]*"){2})+[^"]*?(?:$|[\r

]))

(?:"Tarzan".*?)*\KTarzan

"Tarzan"|(Tarzan)

|

Tarzania|--Tarzan--|"Tarzan"|(Tarzan)

How Does the Technique Work?

"Tarzan"|(Tarzan)

Now Tarzan says to Jane: "Tarzan".

"Tarzan"|(Tarzan)

"Tarzan"|(Tarzan)

The Technique in Pseudo-Regex

NotThis|NotThat|GoAway|(WeWantThis)

One small thing to look out for

(GetThis)

NotThis|(GetThis)

<img[^>]+>

<img[^>]+>|(.*)

<

<

.*

<img[^>]+>|(\w+)

More Applications of the Technique

Match Tarzan but not in {Tarzan's curly braces}

Not_this_context|(WeWantThis)

(Tarzan)

{[^}]*}

{[^}]*}|(Tarzan)

Tarzan\d+

{[^}]*?Tarzan[^}]*}

{[^}]*}

Match Tarzan but not in contexts A, B and C

\bBEGIN\b.*?\bEND\b|Therefore.*?[.!?]|{[^}]*}|(Tarzan)

BEGIN.*?END

\b

[.!?]

Match every word except Tarzan

Match X unless it is in contexts a, b and c.

Match every word except words a, b and c.

\bTarzan\b|(\w+)

\w+

\b

Match every word except those on a blacklist

\bTarzan\b|\bJane\b|\bSuperman\b|(\w+)

\b(?:Tarzan|Jane|Superman)\b|(\w+)

Ignore Content of This Kind

I want to ignore A.

Match everything except A

<b>[^<]*</b>|([\w\s]+)

(.*)

(.*?)

A Variation: Deleting the Matches

NotThis|NotThat|GoAway|(WeWantThis)

(KeepThis|KeepThat|KeepTheOther)|DeleteThis

(KeepThis)|(KeepThat)|(KeepTheOther)|DeleteThis

\1\2\3

$1$2$3

m.group(1) + m.group(2) + m.group(3)

Variation for Perl, PCRE and Python: (*SKIP)(*FAIL)

NotThis|NotThat

(*SKIP)(*FAIL)

(*SKIP)(*F)

(*SKIP)(?!)

(*FAIL)

(*F)

(?!)

(*SKIP)

(*SKIP)

(*SKIP)

(*SKIP)

Not_X|(GetThis)

Not_A(*SKIP)(*FAIL)|GetThis

Not_A(*SKIP)(*F)|GetThis

Not_A(*SKIP)(?!)|GetThis

(*SKIP)(*FAIL)

|

Not_A|Not_B|Not_C|(GetThis)

Not_A(*SKIP)(*FAIL)|Not_B(*SKIP)(*F)|Not_C(*SKIP)(?!)|GetThis

(?:Not_A|Not_B|Not_C)(*SKIP)(*FAIL)|(GetThis)

Code Samples

A Call to Help

May 2014. I'm calling for your help to translate the examples provided to languages in which you are fluent (see code translators needed). In advance, thank you.

\d

\d

[0-9]

Jane" "Tarzan12" Tarzan11@Tarzan22 {4 Tarzan34}

{[^}]+}|"Tarzan\d+"|(Tarzan\d+)

{[^}]+}

"Tarzan\d+"

"[^"]+"

"[^"]+"

"Tarzan\d+"

(Tarzan\d+)

PHP Code Sample

Please note that usually you will choose to perform only one of the six tasks in the code, so your own code will be much shorter.

<?php $regex = '~{[^}]+}|"Tarzan\d+"|(Tarzan\d+)~'; $subject = 'Jane" "Tarzan12" Tarzan11@Tarzan22 {4 Tarzan34}'; $count = preg_match_all($regex, $subject, $m); // build array of non-empty Group 1 captures $matches=array_filter($m[1]); ///////// The six main tasks we're likely to have //////// // Task 1: Is there a match? echo "*** Is there a Match? ***<br />

"; if(empty($matches)) echo "No<br />

"; else echo "Yes<br />

"; // Task 2: How many matches are there? echo "

<br />*** Number of Matches ***<br />

"; echo count($matches)."<br />

"; // Task 3: What is the first match? echo "

<br />*** First Match ***<br />

"; if(!empty($matches)) echo array_values($matches)[0]."<br />

"; // Task 4: What are all the matches? echo "

<br />*** Matches ***<br />

"; if(!empty($matches)) { foreach ($matches as $match) echo $match."<br />

"; } // Task 5: Replace the matches $replaced = preg_replace_callback( $regex, // in the callback function, if Group 1 is empty, // set the replacement to the whole match, // i.e. don't replace function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject); echo "

<br />*** Replacements ***<br />

"; echo $replaced."<br />

"; // Task 6: Split // Start by replacing by something distinctive, // as in Step 5. Then split. $splits = explode("Superman",$replaced); echo "

<br />*** Splits ***<br />

"; echo "<pre>"; print_r($splits); echo "</pre>"; ?>

Please note that usually you will choose to perform only one of the six tasks in the code, so your own code will be much shorter.

<?php $regex = '~(?:{[^}]+}|"Tarzan\d+")(*SKIP)(*F)|Tarzan\d+~'; $subject = 'Jane" "Tarzan12" Tarzan11@Tarzan22 {4 Tarzan34}'; $count = preg_match_all($regex, $subject, $matches); // $matches[0] contains the matches, if any ///////// The six main tasks we're likely to have //////// // Task 1: Is there a match? echo "*** Is there a Match? ***<br />

"; if($count) echo "Yes<br />

"; else echo "No<br />

"; // Task 2: How many matches are there? echo "

<br />*** Number of Matches ***<br />

"; if($count) echo count($matches[0])."<br />

"; else echo "0<br />

"; // Task 3: What is the first match? echo "

<br />*** First Match ***<br />

"; if($count) echo $matches[0][0]."<br />

"; // Task 4: What are all the matches? echo "

<br />*** Matches ***<br />

"; if($count) { foreach ($matches[0] as $match) echo $match."<br />

"; } // Task 5: Replace the matches $replaced = preg_replace($regex,"Superman",$subject); echo "

<br />*** Replacements ***<br />

"; echo $replaced."<br />

"; // Task 6: Split $splits = preg_split($regex,$subject); echo "

<br />*** Splits ***<br />

"; echo "<pre>"; print_r($splits); echo "</pre>"; ?>

C# Code Sample

Please note that usually you will choose to perform only one of the six tasks in the code, so your own code will be much shorter.

using System; using System.Text.RegularExpressions; using System.Linq; using System.Collections.Generic; class Program { static void Main() { string s1 = @"Jane"" ""Tarzan12"" Tarzan11@Tarzan22 {4 Tarzan34}"; var myRegex = new Regex(@"{[^}]+}|""Tarzan\d+""|(Tarzan\d+)"); var group1Caps = new List<string>(); Match matchResult = myRegex.Match(s1); // put Group 1 captures in a list while (matchResult.Success) { if (matchResult.Groups[1].Value != "") { group1Caps.Add(matchResult.Groups[1].Value); } matchResult = matchResult.NextMatch(); } ///////// The six main tasks we're likely to have //////// // Task 1: Is there a match? Console.WriteLine("*** Is there a Match? ***"); if(group1Caps.Any()) Console.WriteLine("Yes"); else Console.WriteLine("No"); // Task 2: How many matches are there? Console.WriteLine("

" + "*** Number of Matches ***"); Console.WriteLine(group1Caps.Count); // Task 3: What is the first match? Console.WriteLine("

" + "*** First Match ***"); if(group1Caps.Any()) Console.WriteLine(group1Caps[0]); // Task 4: What are all the matches? Console.WriteLine("

" + "*** Matches ***"); if (group1Caps.Any()) { foreach (string match in group1Caps) Console.WriteLine(match); } // Task 5: Replace the matches string replaced = myRegex.Replace(s1, delegate(Match m) { // m.Value is the same as m.Groups[0].Value if (m.Groups[1].Value == "") return m.Value; else return "Superman"; }); Console.WriteLine("

" + "*** Replacements ***"); Console.WriteLine(replaced); // Task 6: Split // Start by replacing by something distinctive, // as in Step 5. Then split. string[] splits = Regex.Split(replaced,"Superman"); Console.WriteLine("

" + "*** Splits ***"); foreach (string split in splits) Console.WriteLine(split); Console.WriteLine("

Press Any Key to Exit."); Console.ReadKey(); } // END Main } // END Program

Python Code Sample

Please note that usually you will choose to perform only one of the six tasks in the code, so your own code will be much shorter.

import re # import regex # if you like good times # intended to replace `re`, the regex module has many advanced # features for regex lovers. http://pypi.python.org/pypi/regex subject = 'Jane"" ""Tarzan12"" Tarzan11@Tarzan22 {4 Tarzan34}' regex = re.compile(r'{[^}]+}|"Tarzan\d+"|(Tarzan\d+)') # put Group 1 captures in a list matches = [group for group in re.findall(regex, subject) if group] ######## The six main tasks we're likely to have ######## # Task 1: Is there a match? print("*** Is there a Match? ***") if len(matches)>0: print ("Yes") else: print ("No") # Task 2: How many matches are there? print("

" + "*** Number of Matches ***") print(len(matches)) # Task 3: What is the first match? print("

" + "*** First Match ***") if len(matches)>0: print (matches[0]) # Task 4: What are all the matches? print("

" + "*** Matches ***") if len(matches)>0: for match in matches: print (match) # Task 5: Replace the matches def myreplacement(m): if m.group(1): return "Superman" else: return m.group(0) replaced = regex.sub(myreplacement, subject) print("

" + "*** Replacements ***") print(replaced) # Task 6: Split # Start by replacing by something distinctive, # as in Step 5. Then split. splits = replaced.split('Superman') print("

" + "*** Splits ***") for split in splits: print (split)

Java Code Sample

Please note that usually you will choose to perform only one of the six tasks in the code, so your own code will be much shorter.

import java.util.*; import java.io.*; import java.util.regex.*; import java.util.List; class Program { public static void main (String[] args) throws java.lang.Exception { String subject = "Jane\" \"Tarzan12\" Tarzan11@Tarzan22 {4 Tarzan34}"; Pattern regex = Pattern.compile("\\{[^}]+\\}|\"Tarzan\\d+\"|(Tarzan\\d+)"); Matcher regexMatcher = regex.matcher(subject); List<String> group1Caps = new ArrayList<String>(); // put Group 1 captures in a list while (regexMatcher.find()) { if(regexMatcher.group(1) != null) { group1Caps.add(regexMatcher.group(1)); } } // end of building the list ///////// The six main tasks we're likely to have //////// // Task 1: Is there a match? System.out.println("*** Is there a Match? ***"); if(group1Caps.size()>0) System.out.println("Yes"); else System.out.println("No"); // Task 2: How many matches are there? System.out.println("

" + "*** Number of Matches ***"); System.out.println(group1Caps.size()); // Task 3: What is the first match? System.out.println("

" + "*** First Match ***"); if(group1Caps.size()>0) System.out.println(group1Caps.get(0)); // Task 4: What are all the matches? System.out.println("

" + "*** Matches ***"); if(group1Caps.size()>0) { for (String match : group1Caps) System.out.println(match); } // Task 5: Replace the matches // if only replacing, delete the line with the first matcher // also delete the section that creates the list of captures Matcher m = regex.matcher(subject); StringBuffer b= new StringBuffer(); while (m.find()) { if(m.group(1) != null) m.appendReplacement(b, "Superman"); else m.appendReplacement(b, m.group(0)); } m.appendTail(b); String replaced = b.toString(); System.out.println("

" + "*** Replacements ***"); System.out.println(replaced); // Task 6: Split // Start by replacing by something distinctive, // as in Step 5. Then split. String[] splits = replaced.split("Superman"); System.out.println("

" + "*** Splits ***"); for (String split : splits) System.out.println(split); } // end main } // end Program

JavaScript Code Sample

Please note that usually you will choose to perform only one of the six tasks in the code, so your own code will be much shorter.

<script> var subject = 'Jane" "Tarzan12" Tarzan11@Tarzan22 {4 Tarzan34} '; var regex = /{[^}]+}|"Tarzan\d+"|(Tarzan\d+)/g; var group1Caps = []; var match = regex.exec(subject); // put Group 1 captures in an array while (match != null) { if( match[1] != null ) group1Caps.push(match[1]); match = regex.exec(subject); } ///////// The six main tasks we're likely to have //////// // Task 1: Is there a match? document.write("*** Is there a Match? ***<br>"); if(group1Caps.length > 0) document.write("Yes<br>"); else document.write("No<br>"); // Task 2: How many matches are there? document.write("<br>*** Number of Matches ***<br>"); document.write(group1Caps.length); // Task 3: What is the first match? document.write("<br><br>*** First Match ***<br>"); if(group1Caps.length > 0) document.write(group1Caps[0],"<br>"); // Task 4: What are all the matches? document.write("<br>*** Matches ***<br>"); if (group1Caps.length > 0) { for (key in group1Caps) document.write(group1Caps[key],"<br>"); } // Task 5: Replace the matches // see callback parameters http://tinyurl.com/ocddsuk replaced = subject.replace(regex, function(m, group1) { // pick one of those two depending on JS version // if (group1 == "" ) return m; if (group1 == undefined ) return m; else return "Superman"; }); document.write("<br>*** Replacements ***<br>"); document.write(replaced); // Task 6: Split // Start by replacing by something distinctive, // as in Step 5. Then split. splits = replaced.split("Superman"); document.write("<br><br>*** Splits ***<br>"); for (key in splits) document.write(splits[key],"<br>"); </script>

Ruby Code Sample

Please note that usually you will choose to perform only one of the six tasks in the code, so your own code will be much shorter.

subject = 'Jane"" ""Tarzan12"" Tarzan11@Tarzan22 {4 Tarzan34}' regex = /{[^}]+}|"Tarzan\d+"|(Tarzan\d+)/ # put Group 1 captures in an array group1Caps = [] subject.scan(regex) {|m| group1Caps << $1 if !$1.nil? } ######## The six main tasks we're likely to have ######## # Task 1: Is there a match? puts("*** Is there a Match? ***") if group1Caps.length > 0 puts "Yes" else puts "No" end # Task 2: How many matches are there? puts "

*** Number of Matches ***" puts group1Caps.length # Task 3: What is the first match? puts "

*** First Match ***" if group1Caps.length > 0 puts group1Caps[0] end # Task 4: What are all the matches? puts "

*** Matches ***" if group1Caps.length > 0 group1Caps.each { |x| puts x } end # Task 5: Replace the matches replaced = subject.gsub(regex) {|m| if $1.nil? m else "Superman" end } puts "

*** Replacements ***" puts replaced # Task 6: Split # Start by replacing by something distinctive, # as in Step 5. Then split. splits = replaced.split(/Superman/) puts "

*** Splits ***" splits.each { |x| puts x }

Perl Code Sample

Please note that usually you will choose to perform only one of the six tasks in the code, so your own code will be much shorter.

#!/usr/bin/perl $regex = '{[^}]+}|"Tarzan\d+"|(Tarzan\d+)'; $subject = 'Jane" "Tarzan12" Tarzan11@Tarzan22 {4 Tarzan34}'; # put Group 1 captures in an array my @group1Caps = (); while ($subject =~ m/$regex/g) { print $1 . "

"; if (defined $1) {push(@group1Caps,$1); } } ######## The six main tasks we're likely to have ######## # Task 1: Is there a match? print "*** Is there a Match? ***

"; if ( @group1Caps > 0) { print "Yes

"; } else { print ("No

"); } # Task 2: How many matches are there? print "

*** Number of Matches ***

"; print scalar(@group1Caps); # Task 3: What is the first match? print "



*** First Match ***

"; if ( @group1Caps > 0) { print $group1Caps[0]; } # Task 4: What are all the matches? print "



*** Matches ***

"; if ( @group1Caps > 0) { foreach(@group1Caps) { print "$_

"; } } # Task 5: Replace the matches # or: s/$regex/$1? "Superman":$&/eg ($replaced = $subject) =~ s/$regex/ if (defined $1) { "Superman"; } else {$&;} /eg; print "

*** Replacements ***

"; print $replaced . "

"; # Task 6: Split # Start by replacing by something distinctive, # as in Step 5. Then split. @splits = split(/Superman/, $replaced); print "

*** Splits ***

"; foreach(@splits) { print "$_

"; }

VB.NET Code Sample

Please note that usually you will choose to perform only one of the six tasks in the code, so your own code will be much shorter.

Click to Show / Hide code

(The code compiles perfectly in VS2015, but no online demo supplied

because the VB.NET in ideone chokes on anonymous functions.)

(The code compiles perfectly in VS2015, but no online demo suppliedbecause the VB.NET in ideone chokes on anonymous functions.)

Imports System.Text.RegularExpressions Module Module1 Sub Main() Dim MyRegex As New Regex("{[^}]+}|""Tarzan\d+""|(Tarzan\d+)") Dim Subject As String = "Jane"" ""Tarzan12"" Tarzan11@Tarzan22 {4 Tarzan34} " Dim Group1Caps As New List(Of String)() Dim MatchResult As Match = MyRegex.Match(Subject) ' put Group 1 captures in a list While MatchResult.Success If MatchResult.Groups(1).Value <> "" Then Group1Caps.Add(MatchResult.Groups(1).Value) End If MatchResult = MatchResult.NextMatch() End While '///////// The six main tasks we're likely to have //////// '// Task 1: Is there a match? Console.WriteLine("*** Is there a Match? ***") If(Group1Caps.Any()) Then Console.WriteLine("Yes") Else Console.WriteLine("No") End If '// Task 2: How many matches are there? Console.WriteLine(vbCrLf & "*** Number of Matches ***") Console.WriteLine(Group1Caps.Count) '// Task 3: What is the first match? Console.WriteLine(vbCrLf & "*** First Match ***") If(Group1Caps.Any()) Then Console.WriteLine(Group1Caps(0)) '// Task 4: What are all the matches? Console.WriteLine(vbCrLf & "*** Matches ***") If (Group1Caps.Any()) Then For Each match as String in Group1Caps Console.WriteLine(match) Next End If '// Task 5: Replace the matches Dim Replaced As String = myRegex.Replace(Subject, Function(m As Match) If (m.Groups(1).Value = "") Then Return m.Groups(0).Value Else Return "Superman" End If End Function) Console.WriteLine(vbCrLf & "*** Replacements ***") Console.WriteLine(Replaced) ' Task 6: Split ' Start by replacing by something distinctive, ' as in Step 5. Then split. Dim Splits As Array = Regex.Split(replaced,"Superman") Console.WriteLine(vbCrLf & "*** Splits ***") For Each Split as String in Splits Console.WriteLine(Split) Next Console.WriteLine(vbCrLf & "Press Any Key to Exit.") Console.ReadKey() End Sub End Module

Code Translators Needed