About two years ago I wrote about replacing text in the DOM, and how “it’s not that simple“. I revisited the problem a couple days ago and found a novel solution.

What is the problem? The problem is very simple. You have a DOM element, like:

< p > This is a test. Testing is fun! < / p >

You want to wrap all instances and variations of the word “test”, like so:

< p > This is a < span class = "f" > test < / span > . < span class = "f" > Testing < / span > is fun! < / p >

In other words we need to match that element’s text content against the regular expression /btestw*b/gi (any words beginning with “test”). Not only do we need to match but we also need to replace. Replacing in the DOM can’t be done as a simple string operation like:

element. innerHTML = element. innerHTML . replace ( /btestw*b/gi , '<span class="f">$0</span>' ) ;

Actually, this will work, but it has the following caveats:

It doesn’t guarentee that you’re only replacing text between HTML tags. innerHTML contains all HTML so you could be replacing the test123 in <br class="test123"/> , or any other text between < and > . The replacement completely wipes any inner elements from existence. This means any prior references to those elements are useless and event listeners will be gone. There will still be elements, but they’ll be fresh elements, not the ones from before.

Caveat #1 can be avoided by only replacing what we get from innerText (or textContent ):

element. innerHTML = element. innerText . replace ( /btestw*b/gi , '<span class="f">$0</span>' ) ;

Apart from the cross-browser issues innerText/textContent have, this solution also has a massive problem. It totally disregards any actual HTML that was previously in the element.

Current solutions

Current solutions tend to traverse through all child text nodes, individually testing them for matches, and then splitting the actual text-node into separate parts, wrapping the matched part in an a new element.

This is, without a doubt, a far better solution than the innerHTML/Text stuff above, but there are still caveats:

These solutions tend to assume that adjacent text nodes cannot exist, and it’s true that they rarely do but if they’ve been dynamically added it’s quite rare that the guilty developer will have remembered to call Node#normalize.

Most importantly, what happens if a match spreads across various nodes?? For example:

< p > This is a te < em > st < / em > . < / p >

I haven’t yet found a solution that takes these cases into account. A correct solution would transform the above into something like:

< p > This is a < span class = "f" > te < em > st < / em >< / span > . < / p >

Or, perhaps:

< p > This is a < span class = "f" > te < / span >< em >< span class = "f" > st < / span >< / em > . < / p >

i.e. wrapping either the entire match, including intersecting elements, or matching individual portions of the match.

Why is it tricky?

To match that initial regular expression we need a single chunk of text we can test against. If we test each individual text node then we won’t get any matches for the above. "te" is one text node, and "st" is another.

Replacement is also a hassle, because you’d have to split the matched node(s) at the right place and wrap in one or more replacement elements. It’s not a simple operation anymore, and probably costs more than its worth in developer time.

Here are the requirements for solving the “problem” correctly and fully:

Must accept and work correctly with any regular expression valid in JS. Must be able to match across element bounderies. For example, it must be able to match apple in app<em>l</em>e and even in <em>What is app</em>le ! Must not be destructive to element nodes. Destroying/splitting/normalizing text nodes is permissable though.

After trying a few different variants, one including injecting tokens into innerHTML in order to locate the matching nodes, I landed on one which is relatively efficient and seems to work well!

How is it done?

targetElement = where we’re looking for our matches.

Collect aggregate text of targetElement by using something like this (avoid innerText/textContent). Match text against regular expression, collecting the start and end indexes of every match. Traverse through the targetElement’s node tree, incrementing a counter to keep track of our text-index location. When we meet a match’s location then grab the start-node, the end-node and any intersecting nodes and send them to step #4. With the custom DOM range details (start-node, intersecting-nodes, and end-node): If the start-node is the same as end-node, then split the node into three parts. Before-match, match, and After-match. Then wrap match in <span class="f"> .

. If the start-node is different to end-node, then split each of them into match and non-match parts, wrapping the matching parts in <span class="f"> . Also wrap any intersecting text nodes.

Spooky message in the first three highlighted words!! D:

Using the steps above I wrote findAndReplaceDOMText which allows you to wrap regular-expression matches found in DOM text in any element you want. If matches are split across multiple nodes it will wrap each portion individually. Please check out the demo!

Thanks for reading! Please share your thoughts with me on Twitter. Have a great day!