How?

Tom’s post describes the concept in great detail, but the basic idea makes use of unicode characters with zero-width.

That is, when you insert a zero-width character into a piece of text, it will not be visible and will not (or may not) affect the rest of the text. However, these zero-width characters can be read by a machine and by using a series of these zero-width characters, we can encode invisible information into any piece of text — when an email is sent for example — and read it back later, when the text shows up on social media.

What can I do to avoid this?

Most obvious approach: Simply take a screenshot of the text and share that. This way, zero-width characters can not be read, and names can be blurred out. Sadly, this isn’t the only way your name could be encoded.

Why that wont work: As HN user boramalper suggests, other aspects such as ligatures, margins, fonts, kerning or even synonym replacement may be used to encode your identity, even from a screen shot.

A better approach: Don’t copy the hidden characters in the first place!

chpmrc’s chrome extension at work.

About 12 hours after the first HN discussion, a chrome extension was posted by user chpmrc to help do just that! It works rather simply — if you suspect there are hidden characters in the text, you open up the chrome extension, click the button and boom! All zero-width characters are replaced by emojis.

This is great, except for one thing: There is no way I would remember to do that! And I’m not the only one.

A better solution

Many suggestions were made, including:

strip the non-printing characters and display a number of characters stripped, similar to how ad blockers display the number of ads blocked.-kerkeslager scan your clipboard for zero-width characters, silently strip them, and then re-populate the clipboard. -Someone1234

and wryly:

read the web exclusively in ASCII with UTF-8 characters inserted using their codepoint. -flashman

Personally, I think automatically stripping characters might lead to some issues, particularly with emoji support where zero-width characters can be used to change how certain emojis are displayed (changing skin tone etc.)

I also feel manipulating people’s clipboards would be a risky manoeuvre — particularly if people want to copy zero-width characters for legitimate purposes. Not to mind, it would be difficult to implement.

In my opinion, the ideal solution would be an extension that:

1. Quietly warns you of hidden characters, similar to how ad-blockers show how many ads have been blocked, but don’t strip them.

2. Loudly warns you when you have selected text with hidden characters. This way, even if you forgot you installed the extension, you are still protected.

Building the ideal solution

“Ok, what do we do when the existing solutions don’t suit our needs?”

“We look harder, make do with what we have, or give up!”

“No! We waste a Sunday afternoon (and most of Monday) building a custom solution!”

Functionally, we need the extension to:

Read the content of pages we visit.

Edit the content of pages we visit.

Show a popup UI when necessary.

Thankfully, an extension I have previously built does all of that: Squish makes tall tweets shorter. You can read about how I built Squish in a previous blog post, and see the code behind it on github.

How to…

This will be super brief — leave a comment if you would like to read an in-depth post on how it’s made.

Starting with the Squish code, I am going to delete all the promo images then update the readme.md and the manifest to call our new project: zwBlocker: Find and notify the user of zero-width unicode characters.

manifest.json

Squish only worked on “twitter.com”, so we want to change the manifest to let our new extension access every page we visit. Under content-scripts, change “matches” to:

"matches": ["<all_urls>"],

To find zero-width characters on the page, we’ll use a content-script which runs after the page loads: ”run_at”: “document_end” .

Walking the DOM to find those pesky buggers

We’re going to use a treeWalker to search the DOM for zwc’s. It’s created with the factory method document.createTreeWalker which accepts a filter function we can write to identify nodes we are interested in. Our filter function looks like this:

contentScript.js ...

function(node) {

if(testForZeroWidthCharacters(node.nodeValue)){

totalCount++;

return NodeFilter.FILTER_ACCEPT;

}

else{

//FILTER_SKIP should keep the children of this node in consideration.

return NodeFilter.FILTER_SKIP;

}

}

...

NodeFilter.FILTER_ACCEPT and FILTER_SKIP are constants that tell the treeWalker to include the current node, or skip it. There is a third possibility here of FILTER_REJECT which would ignore the children of the current node. This would make searching the tree much quicker if we could be sure no text children existed, but we can’t, so we wont :).

totalCount is a global var I use to keep track of how many zwc’s we’ve found.

testForZeroWidthCharacters is my own function, heavily based off chpmrc’s code and looks like:

contentScript.js ...

var zwMatches = /[\u200B-\u200D\uFEFF]/g;

function testForZeroWidthCharacters(text){

return text.match(zwMatches);

}

...

The zwMatches regex is outside the function because I’ve also written functions to remove and replace zwc’s using the same regex. The regex ends with g meaning it is a global regex, and will match ALL occurrences — rather than just the first one. Pretty important when removing and replacing zwc’s!

Capturing selected text

I didn’t know this, but it turns out you can use document.onselectionchange to track every time the user highlights text on the page. selection.toString() got me the text, including any possible zwc’s — which I could then pass to my regex function, and send on to the background script to notify the user. Simple!

//catch selected text and send message to background page

document.onselectionchange = function() {

var selection = document.getSelection();

if(testForZeroWidthCharacters(selection.toString())){

chrome.storage.sync.set({

selectedText: selection.toString(),

selectedTextClean:removeZeroWidthCharacters(selection.toString()),

selectedTextEmoji:replaceZeroWidthCharacters(selection.toString()),

},function(){});

/* show message*/

chrome.runtime.sendMessage({message: "warnUserOfZeroWidthChars", value:selection.toString()}, function(response) {

});

}

};

And the rest…

The rest of the code is very straightforward:

Passing messages between the content-script and the background page using chrome.runtime.sendMessage and onMessage.

Showing notifications to the user using chrome.notifications.create and chrome.notifications.onButtonClicked.

Create a new tab with chrome.tabs.create . Pass the sanitised text with chrome.storage.sync .

I’ll spare you the detail here.

Conclusion

zwBlocker in action!

Say hello to zwBlocker: A Chrome extension that:

Subtly shows the number of zero-width characters on the page.

Detects zero-width characters in selected text, and notifies the user.

Filters out zero-width characters when requested.

I’m pretty happy with the result — not bad for a weekend of work! Of course, improvements can be made. Feel free to comment here, on twitter or create an issue or pull request on github.

I also want to say thanks to the creators who inspired this post. I love seeing active discussion and work on this kind of stuff.

I’m Aidan Breen, and I run a software consultancy in Dublin, Ireland. If you enjoyed this post, consider following me on twitter, or signup to my personal mailing list for less than monthly updates.