Press Release

Cornell maps the world’s photos

FOR RELEASE: April 23, 2009

ITHACA, NY – Cornell University computer scientists used a supercomputer at the Cornell Center for Advanced Computing to download and analyze nearly 35 million Flickr photos taken by over 300,000 photographers from around the globe. Their main goal was to develop new methods to automatically organize and label large-scale collections of digital data. A secondary result of the research was the generation of statistics on the world's most photographed cities and landmarks, gleaned from the analysis of the multi-terabyte photo collection:

• The top 25 most photographed cities in the Flickr data are: (1) New York City (2) London (3) San Francisco (4) Paris (5) Los Angeles (6) Chicago (7) Washington, DC (8) Seattle (9) Rome (10) Amsterdam (11) Boston (12) Barcelona (13) San Diego (14) Berlin (15) Las Vegas (16) Florence (17) Toronto (18) Milan (19) Vancouver (20) Madrid (21) Venice (22) Philadelphia (23) Austin (24) Dublin (25) Portland.

• The top seven most photographed landmarks are: (1) Eiffel Tower - Paris (2) Trafalgar Square - London (3) Tate Modern museum - London (4) Big Ben - London (5) Notre Dame - Paris (6) The Eye - London (7) Empire State Building - New York City.

The study also identified the seven most photographed landmarks in each of the top 25 cities. Most of these landmarks are well-known tourist attractions, but some surprising results emerged. For example, one striking result in the Flickr data is that the Apple Store in midtown Manhattan is the 5th-most photographed place in New York City – and, in fact, the 28th-most photographed place in the world.

Cornell developed techniques to automatically identify places that people find interesting to photograph, showing results for thousands of locations at both city and landmark scales. "We developed classification methods for characterizing these locations from visual, textual and temporal features," says Daniel Huttenlocher, the John P. and Rilla Neafsey Professor of Computing, Information Science and Business and Stephen H. Weiss Fellow. "These methods reveal that both visual and temporal features improve the ability to estimate the location of a photo compared to using just textual tags."

Cornell's technique of finding representative images is a practical way of summarizing large collections of images. The scalability of the method allows for automatically mining the information latent in very large sets of images, raising the intriguing possibility of an online travel guidebook that could automatically identify the best sites to visit on your next vacation, as judged by the collective wisdom of the world's photographers.

To perform the data analysis, the researchers used a mean shift procedure and ran their application on a 480-core Linux-based Dell PowerEdge 2950 supercomputer at the Cornell Center for Advanced Computing (CAC) called the “Hadoop Cluster.” Hadoop is a framework used to run applications on large clusters of computers. It uses a computational paradigm called Map/Reduce to divide applications into small segments of work, each of which can be executed on any node of the cluster. “As the creation of digital data accelerates," says CAC Director David Lifka, "supercomputers and high-performance storage systems will be essential in order to quickly store, archive, preserve, and retrieve large-scale data collections.”

The results of this research were presented in April 2009 at the 18th International World Wide Web Conference in Madrid. Details are available in the paper entitled "Mapping the World's Photos" by Cornell Computer Science researchers David Crandall, Lars Backstrom, Daniel Huttenlocher, and Jon Kleinberg. Visualizations from the project show how planetary-scale datasets can provide insight into different kinds of human activity – in this case those based on image; on locales, landmarks, and focal points scattered throughout the world; and on the ways in which people are drawn to them.

This research was supported in part by the National Science Foundation (NSF) and by funding from Google, Yahoo! and the John D. and Catherine T. MacArthur Foundation. The Cornell Center for Advanced Computing is supported by Cornell University, the NSF, DOD, USDA, and members of its corporate program.