Building Rome in a Day

The Colosseum, 2,106 images, 819,242 points, Full resolution video

Entering the search term Rome on Flickr returns more than two million photographs. This collection represents an increasingly complete photographic record of the city, capturing every popular site, facade, interior, fountain, sculpture, painting, cafe, and so forth. It also offers us an unprecedented opportunity to richly capture, explore and study the three dimensional shape of the city.

In this project, we consider the problem of reconstructing entire cities from images harvested from the web. Our aim is to build a parallel distributed system that downloads all the images associated with a city, say Rome, from Flickr.com. After downloading, it matches these images to find common points and uses this information to compute the three dimensional structure of the city and the pose of the cameras that captured these images. All this to be done in a day.

This poses new challenges for every stage of the 3D reconstruction pipeline, from image matching to large scale optimization. The key contributions of our work is a new, parallel distributed matching system that can match massive collections of images very quickly and a new bundle adjust software that can solve extremely large non-linear least squares problems that are encountered in three dimensional reconstruction problems.

The project is a work in progress and over the next few months, we hope to have full scale results on data sets consisting of 1 million images and more. Shown below are some preliminary results of running our system on three city data sets downloaded from Flickr: Dubrovnik, Croatia; Rome and Venice, Italy. The static images were rendered from viewpoints chosen using the Canonical Views algorithm. Our current results are sparse point clouds, in collaboration with Yasutaka Furukawa we are also working on producing dense mesh models.

This research is part of Community Photo Collections project at the University of Washington GRAIL Lab. which explores the use of large scale internet image collections for furthering research in computer vision and graphics. Our work uses and builds upon a number of previous works, in particular, Photo Tourism and Skeletal Sets.

Team

Papers

Building Rome in a Day

Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz and Richard Szeliski

International Conference on Computer Vision, 2009, Kyoto, Japan.



Reconstructing Rome

Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Brian Curless, Steven M. Seitz and Richard Szeliski

IEEE Computer, pp. 40-47, June, 2010



Building Rome in a Day

Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz and Richard Szeliski

Communications of the ACM, Vol. 54, No. 10, Pages 105-112, October 2011.

with a Technical Perspective by Prof. Carlo Tomasi



Software

The structure from motion code underlying our system has been released as the Bundler toolkit. We plan to release other parts of our software as well; please check back here for periodic updates.

Press

University of Washington Press Release

National Geographic

Popular Science

Slashdot

Seattle Times

The Telegraph

The New York Times

Science Nation

US News

See Also

Rome

The data set consists of 150,000 images from Flickr.com associated with the tags "Rome" or "Roma". Matching and reconstruction took a total of 21 hours on a cluster with 496 compute cores. Upon matching, the images organized themselves into a number of groups corresponding to the major landmarks in the city of Rome. Amongst these clusters can be found the Colosseum, St. Peter's Basilica, Trevi Fountain and the Pantheon. One of the advantages of using community photo collections is the rich variety of view points that these photographs are taken from. A striking example of this is the reconstruction of the interior of St. Peter's Basilica shown below.

# Images # Cores Match Time Reconstruction Time Largest Component 150,000 496 13 Hours 8 Hours 2,106

Venice

The Venice data set is the largest image collection that have experimented with up till now. Matching on this data set took 27 hours, and the 3D reconstruction took 27 hours on 496 compute cores. The matching process gave rise to three major components: the Grand Canal and San Marco square and Doge's Palace. The first two are illustrated with video fly throughs below. The San Marco square is also our largest reconstruction till date with almost 14,000 images and over 4.5 million 3D points.

# Images # Cores Match Time Reconstruction Time Largest Component 250,000 496 27 Hours 38 Hours 14,079

Dubrovnik

At the time of our experiments, there were only 58,000 images of Dubrovnik on Flickr. For this city we were able to experiment with the entire collection. Matching took only 5 hours on 352 compute cores. The largest and most interesting component corresonds to the old city. It is interesting that the reconstruction time for Dubrovnik is so much more than that for Rome. The reason lies in how the data sets are structured. The Rome data set is essentially a collection of landmarks which at large scale have a simple geometry and visibility structure. The largest connected component in Dubrovnik on the other hand captures the entire old city. With its narrow alley ways, complex visibility and widely varying view points, it is a much more complicated reconstruction problem, and this is reflected in the time it took to solve it.

Also worth noting is the fact that the reconstruction is not restricted to the city itself, as can be seen in the video below, it also contains the hills surrounding the city and part of Lokrum island which is south east of the city.

# Images # Cores Match Time Reconstruction Time Largest Component 57,845 352 5 Hours 17.5 Hours 4585

The old city of Dubrovnik, 4,619 images, 3,485,717 points, Full resolution video

Acknowledgements

This work is supported by,