Excerpted with permission and updated from the November 24 issue of MWJ, published by MacJournals.com. Copyright 2008, GCSF Incorporated. For more information on MWJ, visit www.macjournals.com.

The release of Safari 3.2 on November 13 displayed Apple’s penchant for cryptic release notes, as the company describes all three versions as featuring “protection from fraudulent phishing Web sites.”

Let's decode that for you: Safari 3.2 offers an entirely new anti-phishing feature, enabled by default in the “Security” pane of Safari’s preferences. When you try to visit a site that’s “known” to try to attack you, Safari stops and warns you with a pseudo-dialog box (in a Web page window or tab) about it.

If the site is suspected of phishing for your personal or financial information, the text reads:

The website you are visiting has been reported as a “phishing” website. These websites are designed to trick you into disclosing personal or financial information, usually by creating a copy of a legitimate website, such as a bank.

On the other hand, if the page you’re visiting is a “known” distribution point for malware (viruses, trojan horses, other programs that can control your computer), the text reads:

The website you are visiting appears to contain malware. Malware is malicious software that may harm your computer or otherwise operate without your consent. Your computer can be infected just by browsing to a site with malware, without any further action on your part.

We are unaware of any current method for an attacker to actually download and run code on your system by visiting a Web page “without any further action on your part,” but that’s always the goal of attackers, so it seems sensible to err on the side of caution.

Nothing in Apple’s ridiculously minimal release notes suggested that this feature existed. But this time, the company’s intransigence in telling you what it has changed in the software you use may have further consequences. How Safari could “know” about these phishing and malware sites raises all kinds of interesting questions. Now we can tell you with reasonable confidence how it all works—but because Apple has not done the same thing, we cannot say with certainty that it is completely private, or that Safari is not sending information about the pages you visit to a third party.

We believe that Safari 3.2 is not doing that, but only Apple can say so—and you get one guess about whether the company has bothered to do so or not. Let’s begin with how these sites are “known.”

The basic problem Phishing sites are designed to fool people into thinking they’re the real thing, so your computer doesn’t have much of a shot—on its own—of figuring out if a site that looks like eBay really is eBay. Remember, the people who designed the Internet (incorrectly) assumed that all computers on the network would be trustworthy, so the rules are pretty loose. There’s no rule that says an eBay Web page URL can’t start with a raw IP address. Even if eBay had a written policy that all of its URLs will be in an “ebay.com” domain name, your computer has no way of knowing that.

Google’s computers, however, have a better shot at deciphering such attacks. As the world’s leading search engine, Google has figured out where eBay is, and knows that a single IP address in China is probably not one of eBay’s servers. Google knows what banks, credit card providers, insurance companies, and other firms people try to find, and it therefore has a reasonable idea that if their images show up in a page in the wrong part of the world, it may be bogus. It also helps that Google has something like six umpteen-gazillion times the computing power of the entire Apollo space program. You may have eight cores, but Google is still slightly ahead of you.

About three years ago, Google Labs released the first test version of Google Safe Browsing for Firefox, an attempt to take advantage of some of this accumulated knowledge. The extension for the Firefox browser (which had then recently seen its 1.5 release) examined URLs as you visited them in Firefox, and warned you if any of them were on one of two lists that Google maintained: one for suspected phishing sites, and one for sites suspected of distributing malware.

Google and the Mozilla Foundation have long been partners, so it was no surprise when Firefox 2.0 included Google’s “Safe Browsing” technology directly in the browser. To no one’s surprise, Chrome includes it as well. We were surprised that Safari 3.2 includes the same technology, especially since Apple’s minuscule release notes did not mention the word “Google” once. But our investigation convinces us that Safari 3.2’s “protection from fraudulent phishing websites” is, in fact, Google’s Safe Browsing technology.

How it works

Even if Google has a list of malicious sites, your browser can’t check in with Google every time you visit a new page. On the technical side, it would be a drag on slow connections, and some decent percentage of the world still uses dial-up Internet access. On the personal side, it’s somewhat unconscionable to imagine that your Web browser would report every page you visit to a central authority, whether it’s one as well-known as Google or not.

The other alternative is for your browser to keep a list of malicious URLs itself, so it can compare each page you visit against the list. Such a list would need periodic updates from Google because phishers and other purveyors of malware often use compromised computers for their attacks—one computer may only be “good” for attackers for a few hours.

But that poses its own problems—if network administrators know about some of these malicious URLs, then transmitting them over the network could trigger firewall problems. On top of that, many of the URLs might have variants that also get you to the malicious page. (You can typically append a useless parameter onto any static page and get the same results; some phishers do that in their URLs to avoid simple detection.)

Google solves this problem by sending the browser a list of hashed (encoded) URLs known to be phishing sites or distributors of malware. When you first launch Safari 3.2, it connects to safebrowsing.clients.google.com and requests information on the two main blacklists that Google maintains: a list of known phishing sites, and a list of known malware sites. Google returns the list of hashed URLs to your computer in chunks, starting with the freshest information first and gradually filling in older information. The updates are in a compact format that avoids having to send the hashes over the network again, and the hashes are just prefixes of longer numbers to avoid sending huge amounts of data over the wire.

Hash prefixes aren’t hashes, so if you’re visiting a page whose URL matches a hash prefix, Safari 3.2 goes back to Google and asks for the full hash for the prefix in question. Google responds, and if the full hash matches the hash for the URL in question, Safari knows that this page is on Google’s list of malicious sites.

Safari stores all of the data from Google using folder names that are difficult to decipher. If you look in the folder at /private/var/folders/ , you’ll see one or more folders with two-letter names. One of those that’s not named “zz” will contain another folder with a much longer random name. That folder contains a folder named “-Caches-“, and inside it, you’ll find a folder named “com.apple.Safari”. The full path is /private/var/folders/xx/yy/-Caches-/com.apple.Safari , where “xx” and “yy” are unique to your system.

Once you find that folder, you’ll see two files within it: Cache.db and SafeBrowsing.db. The former is indeed Safari’s cache. The latter file contains the blacklists from Google’s Safe Browsing initiative—you’ll notice that the file was most likely created right about the time you first launched Safari 3.2, and if you have the browser open, the file should have been modified within the past 30 minutes. (The November 24 issue of MWJ contains information on how you can look inside Safari’s safe browsing database and what the information you’ll find in it means.) Google’s rules do not allow clients like Safari or Firefox to warn you that a page may be malicious, even if its URL is on the list, unless the list has been updated within the past 30 minutes.

We ran a network spy on Safari 3.2 while visiting suspected phishing sites, and we saw it sending information to safebrowsing.clients.google.com just before displaying the “Suspected Phishing Site” warning. When a site’s URL matches the first part of a hash on the list, Safari 3.2 apparently asks Google for the full 256-byte hash value for that URL alone, and then compares the two—the same method that Safe Browsing uses in Firefox.

Why we’re telling you this

Networking must be transparent to earn your confidence. That’s the point behind EV SSL—verified authentication that the company you’re talking to is who they say they are. On a lower level, when you visit a page in your Web browser, you expect your computer to get the IP address from your DNS server, get the page from the Web server, and get all of the assets for the page (images, movies, and so on) from their locations as specified in the page’s source. You do not expect your browser to also tell Google, or even Apple itself, what you’re doing.

Safari is not the first browser with this feature, though. Although Safe Browsing is built into Firefox 2 and later, it was a Firefox extension before that, and later part of the Google Toolbar for Firefox as well. The Google Toolbar Privacy Notice says that, in some circumstances, Safe Browsing collects additional information from users. We know that Firefox itself does not, because the Mozilla Foundation spells out exactly what the browser’s “Phishing and Malware Protection” send to Google, with links to both the Mozilla and Google privacy policies.

The Apple Customer Privacy Policy says nothing about Safari sending any information to places other than the Web sites you’re visiting—but as of Safari 3.2, it does exactly that: it fetches lots of information from Google, and sends (non-identifiable) requests back to Google when you encounter a page whose URL is on one of Google’s blacklists. (Update: A Macworld reader has found a reference to Google in the Safari licensing agreement. See the MacJournals response in our comment thread for more information.)

We must point out here that this system provides, indirectly, a way for Google to estimate what pages you’re visiting. If the URL of a page you want to visit matches the hash prefix of a known malicious page, Safari 3.2 appears to send that prefix to Google and ask for the entire 256-byte hash to make sure that this really is a malicious page (and also to verify that the page hasn’t been removed from Google’s lists since Safari’s last list update). Millions and millions of URLs could produce hashes that start with the same 32 bits, but if Google gets several requests for the same value, the company could reasonably infer that people were visiting the malicious page it had tracked—and since the request from Safari to Google comes from your IP address, Google might infer data from that as well. Mozilla’s privacy policy would forbid use of that data except to improve the service, but Apple’s privacy policy does not. Neither Apple nor Google state anywhere that they would only use such data to improve the phishing and malware protection features.

We also note that the Safe Browsing v2.1 protocol mentions a third list beyond the list of phishing sites and the list of malware sites: a whitelist, “representing sites that are known to be trusted.” The spec continues, “Note that this list should only be used for ‘enhanced mode’ clients that do direct lookups to Google to determine which sites are phishy. In that case, if a site is on the whitelist there is no need to send the query to Google.”

Safari 3.2’s “SafeBrowsing.db” file does not appear to contain data for Google’s whitelist, but the specification confirms that some clients can, with Google’s permission, use an “enhanced mode” that looks up each page you visit rather than maintaining the list on the client computer. This would be a serious change for Safari. If it were implemented, users would need to be told about it and how it worked, so they could make informed and intelligent decisions about whether to use this feature or not.

The compulsion to hide

And yet, we cannot conclusively tell you that it’s not implemented today, because Apple refuses to document its changes. This time, it should come back to haunt Apple. Even when phrased as friendly to Apple as we can manage, the fact remains that after installing Safari 3.2, your computer is by default downloading lots of information from Google and sending information related to sites you visit back to Google—without telling you, without Apple disclosing the methods, and without any privacy statement from Apple.