In a restricted intelligence document distributed to police, public safety, and security organizations in July, the Department of Homeland Security warned of a “malicious activity” that could expose secrets and security vulnerabilities in organizations’ information systems. The name of that activity: “Google dorking.”

“Malicious cyber actors are using advanced search techniques, referred to as ‘Google dorking,’ to locate information that organizations may not have intended to be discoverable by the public or to find website vulnerabilities for use in subsequent cyber attacks,” the for-official-use-only Roll Call Release warned. “By searching for specific file types and keywords, malicious cyber actors can locate information such as usernames and passwords, e-mail lists, sensitive documents, bank account details, and website vulnerabilities.”

That’s right, if you’re using advanced operators for search on Google, such as “site:arstechnica.com” or “filetype:xls,” you’re behaving like a “malicious cyber actor.” Some organizations will react to you accessing information they thought was hidden as if you were a cybercriminal, as reporters at Scripps found out last year. Those individuals were accused of “hacking” the website of free cellphone provider TerraCom after discovering sensitive customer data openly accessible from the Internet via a Google search and an “automated “ hacking tool: GNU’s Wget.

But this warning from the DHS and the FBI was mostly intended to give law enforcement and other organizations a sense of urgency to take a hard look at their own websites’ security. Local police departments have increasingly become the target of “hacktivists.” Recent examples include attacks on the Albuquerque Police Department’s network in March following the shooting of a homeless man and attacks on St. Louis County police networks in response to the recent events in Ferguson, Missouri.

Bad queries

It’s true that Google hacking, or “dorking,” has been used by hackers and penetration testers for years. Just as the National Security Agency can use its XKeyscore surveillance data as a targeting system for more intrusive attacks on intelligence targets, hackers can use Google to find and target vulnerable sites—including ones where the work of hacking has already been done for them. A single query based on the signature of a common PHP-based “shell” malware can be used as a backdoor to access the operating system of affected websites. This search turns up a list of two dozen sites that have been hacked with the backdoor left open—most of them in Russia and Romania.

David Helkowski, the consultant who hacked the University of Maryland’s website and gained access to personal data in a university database, told Ars that he used Google advanced search to discover pages within UMD sites that allowed arbitrary Web executable files to be uploaded to them. Google searches allowed him to discover exploits that pre-existed on the site.

The DHS and the FBI called out two “dorking” incidents in particular to underscore the dire threat posed by not properly configuring robots.txt on websites. One of those was the October 2013 breach of more than 35,000 websites running vulnerable versions of the vBulletin Web bulletin board. The report says that a “dorking” query was used by hackers to identify websites that were still using an unpatched version of the software. The hackers could then attack them with open source exploit tools. Google was also allegedly used by attackers to target a vulnerable FTP server at Yale in 2011, exposing the Social Security numbers of 43,000 people.

There’s also a penetration testing tool called Diggity Project that can build automated queries against Google or Bing to locate files containing passwords, remote administration interfaces, and other vulnerabilities in Web-accessible computer systems. Diggity was called out specifically in the DHS/FBI intelligence report: “It contains both offensive and defensive tools and over 1,600 pre-made dork queries that leverage advanced search operators.”

Only you can prevent dorking

The Diggity Project is intended as a tool to help organizations secure their websites by finding the holes exposed by Google queries before someone with ill intent does. There’s also a vast database of tested-and-true Google queries in the Google Hacking Database hosted within Offensive Security’s Exploit Database site (though accessing the site, ironically, may be blocked by application firewalls used by Federal agencies because they contain keywords associated with Web malware).

These tools expose what Google already indexed. The best defense is to not have sensitive content indexed in the first place (or, if possible, to not have it on servers that face the public Internet to begin with—but let’s not get too far ahead of ourselves). The DHS and FBI recommended using Google’s Webmaster tools to remove things that shouldn’t have been indexed from their cache; they also suggested the liberal application of robots.txt files to tell Google and Bing to not spider down particular directory paths.

To seasoned Web hands, all of this sounds glaringly obvious. But considering the nature of the websites operated by many state, local, and regional agencies—and much of the Federal government for that matter—it’s worth stating the obvious. The same is true for thousands of private websites on the Internet operated by businesses and individuals. The sites may not seem important enough in themselves to secure, but they may inadvertently be connected to sensitive customer or employee information.