In April I wrote about a local data accuracy study preformed by a company called Implied Intelligence. The company offers data services to publishers and developers. In April the company found that yellow pages site Superpages bested Google Maps in terms of the accuracy and completeness of the local business/POI database.

The following was the final ranking of sites according to the April study:

Implied Intelligence crawled and hand checked 1,000 independent local business websites in the US (no chains or franchises were included) and compared that data to the data contained on leading local search sites. Now Implied Intelligence has essentially replicated the study with an expanded group of local sites.

The company looked at 19 leading local search providers and directories, including Yelp, Google Maps, Bing Local, Citysearch, Foursquare, Mapquest, Yahoo Local and major yellow pages sites. The methodology used was identical to the previous test: 1,000 US local business websites were crawled and hand-checked to create a master data set against which to compare the data found on the various directory and search sites.

There were a range of criteria used by Implied Intelligence in evaluating and scoring the sites:

Record coverage

Number of duplicates

Phone errors

Address errors

Coverage with regards to homepage URL

Accuracy with regards to homepage URL

Number of records with opening hours

Number of records with additional information

The 1,000 local business websites from which the “master data” were obtained were compared to the data on the following expanded list of sites:

I won’t reproduce all the detailed findings and scoring but I’ll summarize what Implied Intelligence determined.

A notable finding involved Bing’s local data. The company said that between its previous test (in April) and today the data on Bing were most improved. In fact, Bing Local was one of three sites that tied for the top average score. The other two were Superpages and Yellowbook, which wasn’t part of the original test.

Below are the final, averaged scores for all the sites involved.

In particular categories the results varied somewhat. For example, Yellowpages/YP.com and Google had the most complete data sets (greatest number of matches with the 1,000 sites). On the other end Foursquare had the most incomplete data set. However that may in part be because Foursquare is focused on a relatively narrow selection of local business categories (e.g., restaurants vs. plumbers).

Yellowbot and Merchant Circle had the highest level of duplicate listings, while DexKnows had the lowest percentage of duplicates. In this case low is better than high. Each category featured a different ranking and slightly different winners and losers.

Implied Intelligence CEO Marc Brombert provided the following comments in his report, which I have edited for length:

No single data supplier or aggregator offers full coverage. There are important problems in user experience in terms of record duplication, errors, and gaps in rich attributes and even the best performing sites still have substantial room for improvement, both in terms of data accuracy and data coverage.

Related Entries