by Isoroku Yamamoto

Update: A newer version of the chrome extension is available here.

Wall Street Journal fixed their “paste a headline into Google News” paywall trick. However, Google can still index the content.

Digital publications allow discriminatory access for search engines by inspecting HTTP request headers. The two relevant headers are Referer and User-Agent.

Referer identifies the address of the web page that linked to the resource. Previously, when you clicked a link through Google search, the Referer would say https://www.google.com/ . This is no longer enough.

More recently, websites started checking for User-Agent, a string that identifies the browser or app that made the request. Wall Street Journal wants to know that you not only came from Google, but also that you are an agent of Google.

By providing this information in request headers, anyone can appear to be a Google web crawler. In fact, I will show you how to make a Chrome extension that does just that.

1. Create a file called manifest.json . Paste the following in the file. Add any sites you would like to read to the permissions list.

{ "name": "Innocuous Chrome Extension", "version": "0.1", "description": "This is an innocuous chrome extension.", "permissions": ["webRequest", "webRequestBlocking", "http://www.ft.com/*", "http://www.wsj.com/*", "https://www.wsj.com/*", "http://www.economist.com/*", "http://www.nytimes.com/*", "https://hbr.org/*", "http://www.newyorker.com/*", "http://www.forbes.com/*", "http://online.barrons.com/*", "http://www.barrons.com/*", "http://www.investingdaily.com/*", "http://realmoney.thestreet.com/*", "http://www.washingtonpost.com/*" ], "background": { "scripts": ["background.js"] }, "manifest_version": 2 }

2. Create a file called background.js . Paste the following into the file:

var ALLOW_COOKIES = ["nytimes", "ft.com"] function changeRefer(details) { foundReferer = false; foundUA = false var reqHeaders = details.requestHeaders.filter(function(header) { // block cookies by default if (header.name !== "Cookie") { return header; } allowHeader = ALLOW_COOKIES.map(function(url) { if (details.url.includes(url)) { return true; } return false; }); if (allowHeader.reduce(function(a, b) { return a || b}, false)) { return header; } }).map(function(header) { if (header.name === "Referer") { header.value = "https://www.google.com/"; foundReferer = true; } if (header.name === "User-Agent") { header.value = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"; foundUA = true; } return header; }) // append referer if (!foundReferer) { reqHeaders.push({ "name": "Referer", "value": "https://www.google.com/" }) } if (!foundUA) { reqHeaders.push({ "name": "User-Agent", "value": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" }) } console.log(reqHeaders); return {requestHeaders: reqHeaders}; } function blockCookies(details) { for (var i = 0; i < details.responseHeaders.length; ++i) { if (details.responseHeaders[i].name === "Set-Cookie") { details.responseHeaders.splice(i, 1); } } return {responseHeaders: details.responseHeaders}; } chrome.webRequest.onBeforeSendHeaders.addListener(changeRefer, { urls: ["<all_urls>"], types: ["main_frame"], }, ["requestHeaders", "blocking"]); chrome.webRequest.onHeadersReceived.addListener(blockCookies, { urls: ["<all_urls>"], types: ["main_frame"], }, ["responseHeaders", "blocking"]);

Save both files in one directory. These should be the only files in the directory. If you were too lazy to copy and paste, you can download the source code here.

Now type chrome://extensions/ in the browser address bar.

Click Load unpacked extension... (Make sure Developer Mode is checked in the upper right if you do not see the buttons.)

Select the directory where you saved the two files. Enable the chrome extension and visit wsj.com .

Remember: Any time you introduce an access point for a trusted third party, you inevitably end up allowing access to anybody.

Like this: Like Loading...