This image was lost some time after publication, but you can still view it here

How to Search the Invisible Web

by Wendy Boswell

September is in full swing, which means back to school, back to books, back to teacher's dirty looks - well, hopefully not that last part. School also means getting back to research papers, especially the super happy fun ten pagers on the environmental impact of Chicago's wheat exporting industry with footnotes, for example.

The Web has become a big part of most students' research processes; in fact, more people look on the Web for answers before checking any other reference. However, merely "Googling" something when it's an obscure topic or if you need targeted information with a particular focus doesn't always turn up the best results. That's where the Invisible, or Deep Web, comes in.

What is the Deep Web?

The term "invisible web" or "deep web" refers to the vast repository of information that search engines and directories don't have direct access to, like databases at university libraries, sites that require passwords to view, or sites that for some reason don't want search engines to crawl them. Unlike pages on the visible Web (that is, the Web that you can access from search engines and directories), information in databases is generally inaccessible to the software spiders and crawlers that create search engine indexes.


If you didn't know already, the Web is big. Ginormous big. And while search engines and directories do a pretty good job of indexing a lot of that material, there is a huge chunk that they just cannot wrap their little software-y arms around, and this chunk is the Invisible, Deep, or Cloaked Web. It sounds pretty mysterious but it's really not.

Think of it this way: Google, considered by most people in the know to have the largest search database, has about eight billion pages in its index. Those eight billion pages seem like a lot until you consider that the Deep Web is estimated to be 500 times bigger than the searchable Web. Multiply 500 by the 8 billion in Google's index... plus add in the fact that Google is only indexing a fraction of the searchable Web (around 250 billion pages are on the Web today)... and you'll get a whole bunch of math that makes my head hurt. Suffice it to say that the Deep Web is worth looking into.


This image was lost some time after publication, but you can still view it here

So How Do You Get To The Deep Web?

There are a couple of ways you can get to the Deep Web. It's not that this information is necessarily closed off to users; it's just that it's a bit trickier to tap into. I'll pick the topic: the life cycle of Botswana warthogs.


Search Engine Queries

You can use search engines, such as Google and Yahoo, to search the Invisible Web for database information, such as that from a college university or library. Think of these general search engines as the tool you're going to initially use to narrow down your search to Invisible Web databases. My query, "the life cycle of Botswana warthogs", returned way too many results. So I truncated it to "warthogs database" instead, and got the following:

This image was lost some time after publication, but you can still view it here


That's one of the tricks you can use to find Invisible Web content

just put the word "database" in your query and more often than not you'll come back lucky. This Penn State database has more than enough searchable information regarding warthogs than I'll ever need, plus, it's an academic, accredited, footnote-able institution. Way more worthy of a citation than say Billy Bob's Guide To That Them There Hogs.

Let's try another query: how about you're doing an in-depth report on the past ten years of plane crashes in Argentina. Try a query for "plane crash Argentina" in Yahoo and you'll get mostly news items, which would take a long time to comb through. Let's try this query again: "aviation database", and then we'll work our way down to the plane crashes. (Click on the image for a larger view.)


This image was lost some time after publication, but you can still view it here

The fifth site on our list is the winner, and that is the NTSB Aviation Accident Database. It took a bit of work to get there, but with the depth of search and information that this particular database offers, it was worth it.


Invisible Web Gateways

Maybe you don't want to dink around with finding databases on the Web; you would like to go straight to the databases themselves. There are sites that serve as invisible Web "gateways" that will help you do this. Here are just a few:


The Invisible Web

Untapped Goldmine of Info

Once you start getting more comfortable with finding databases and searching for content, you'll wonder how you ever got along without knowing about the various poisonous plant databases out there:

This image was lost some time after publication, but you can still view it here


or perhaps you won't be able to get by without your daily dose of cheese.

This image was lost some time after publication, but you can still view it here


Either way, once you understand that there's just so much out there, you'll be hooked.

More Resources for the Invisible Web


Wendy Boswell edits About.com's Web Search section.