Series: practices

January 7, 2014 - 09:35:45 AM

It all started with a JavaScript error…

Uncaught SyntaxError : Unexpected token ILLEGAL

I was trying to catch up on my RSS items, but nothing was rendering on the page.

I dumped some debugging info and opened an issue on GitHub. I found a workaround, but it involved marking all my stories as read. No time to look into this issue now.

February 24, 2014 - 02:17:58 PM

Several other users have reported experiencing the same bug. A potential fix that involved removing unprintable characters ( .gsub(/[^[:print:]]/, '') ) was proposed but didn’t seem to completely address the issue.

March 27, 2014 - 10:00:15 PM

A comment on the [still unresolved] bug triggered an email notification from GitHub earlier this morning. I had some time to look into it after work.

I went back to my original bug report and tried to create a minimal test case that would reproduce the bug. I opened up the Chrome Dev console and started pasting in chunks of the large string I was trying to parse.

Using a primitive form of git bisect , I tried the first half of the string to see if the error happened again. Nope. I halved the remaining part of the string. I repeated until I had it narrowed down to a few characters.

The string in question was “QNk8n”. Nothing jumps out as being extraordinary about that string.

I pasted it into an irb session and found the likely culprit:

irb ( main ): 001 : 0 > "QNk8n \U +FFE2 \U +FFA8"

Some weird unicode characters were being tacked onto the end!

Googling for “unicode 2028 javascript” led me to a really excellent blog post explaining that JSON is not a true subset of JavaScript.

The long and short of it: u+2028 and u+2029 are valid JSON but not valid JavaScript. My app was trying to parse the JSON representation of the RSS articles into JavaScript (via backbone.js) to be rendered.

I wrote a failing test and then fixed the bug (confession: my first bug fix passed the test but created another, whoops).

Pushed. Deployed. Did a little dance.

March 30, 2014 - 06:56:19 PM

I wanted to get this fix upstream. In addition to wanting to give back, I didn’t want to have to implement this “hack” in my own app.

Next up the chain was feedjira — the gem I was using to parse RSS feeds. Ultimately, this code probably belonged in loofah — an HTML sanitization gem used by feedjira , but that library seemed to be dormant.

After a brief discussion with maintainer Jon Allured, we both agreed to try to get the fixes into loofah . If we couldn’t, we would patch it in feedjira .

April 6, 2014 - 06:00:01 PM

Finally got around to opening an issue with loofah . I proposed that we add code to deal with the Evil JSON Characters as part of loofah ’s sanitization process.

Project maintainer Mike Dalessio said this fix would be well received and pointed me toward the relevant sections of the codebase.

April 12, 2014 - 06:04:22 PM

Deep dive into the loofah codebase to add a new “scrubber”!

The loofah architecture was interesting; the scrubbers are basically parsers that operate on nokogiri nodes. You can make a top-down or a bottom-up parser and you can control when you break out of the tree as you walk the nodes.

With Mike’s initial direction guiding me, I got a working implementation and opened a pull request.

April 21, 2014 - 06:20:36 PM

A friendly ping to Mike and my PR gets merged.

May 9, 2014 - 06:49:54 PM

loofah version 2.0.0 is released (which includes my fix) and pushed to RubyGems.

Now that the fix has been applied upstream, we now have to update gem versions downstream.

May 13, 2014 - 04:19:17 PM

I open a new PR in feedjira to update the loofah version.

The PR is merged and feedjira version 1.3.0 is released.

May 14, 2014 - 11:44:35 AM

I can bump the versions of feedjira and loofah used in Stringer and I can finally replace the patch with scrub!(:unprintable) .

Victory!

So five months later, my two line of code bug fix has made it all the way upstream and then back again! It may not seem like much, but this is the magic of open source.