Illustration of the 5-point problem with two views taken from reference [10] (see section "References")

I have been investing some time in reading about the relative pose problem during the last weeks. My aim is to develop a monocular SLAM system for mobile devices, as part of my Perceptive Portable Device project and, among other things, finding the relative pose between camera views is something that the system must address. The problem goes like this:

We have two images of the same 3D scene from two different viewpoints. There is a set of scene points that appears in both images and we know which point in image 1 corresponds to which point in image 2. How can a we find out how much the camera was rotated and translated from viewpoint 1 to viewpoint 2?

Whereas this problem might look pretty modern in the first instance —a product of the digital age, the era of 3D graphics and augmented reality—, it turns out to be century-old. As I found out going through the literature, the first contribution related to the topic dates back to 1908.

I like chronologies. They help me remember things. So, the first goal of this post is to serve as a reminder of the chronological order of contributions (with references) that lead to the current state of the art of the relative pose problem. Please, comment below if you see something missing, or any mistakes or inaccuracies. Its second purpose is to express my gratitude to all the scientists that worked on the topic throughout the years, whose papers helped me understanding it. Forgive me for the cliché but, yes, we are standing on the shoulders of giants.

The chronology

1908 : Von Sanden, in his PhD thesis about photogrammetry [1], uses a matrix that is analogous to the essential matrix used today in most solutions to the problem.

1913 : Kruppa [2] proves that the minimum number of points to solve the problem is 5, which leads to 11 solutions at most. Whereas he proposes a method, it involves finding the intersection between two sextic curves, which is not easily realizable.

1981 : Longuet-Higgins [3] introduces the concept of essential matrix to the computer vision community and presents a method to linearly solve the problem with 8 points that gives a single solution. The essential matrix assumes a calibrated camera with known intrinsic parameters.

1990 : Faugeras and Maybank [4] show that the 5-point problem actually has a maximum of 10 solutions, instead of 11, according to Kruppa. This work relies on the essential matrix and does not mention the fundamental matrix yet.

1994 : Luong and Faugeras [5] introduce the concept of fundamental matrix that, unlike the essential matrix, includes intrinsic parameters of the camera. Finding this matrix means, not only finding the relative pose between the viewpoints, but also the focal length and image center of the camera for each viewpoint (distortions not taken into account yet). They also study how to estimate the fundamental matrix and they analyze and compare different methods.

1996 : J. Philip [6] presents a non-iterative method to determine all essential matrices for the 5-point relative pose problem and a simpler linear 6-point method that yields a single solution. Actually, I was not able to access the full text, so this is has been taken from the abstract. However, [11] mentions that this method yields up to 6 solutions. This mismatch is still not clear.

2003 : Pizarro, Eustice and Singh [7] present an improved 6-point method that, unlike the linear one, does not fail on planar scenes. This method is still simpler than the solution to the 5-point problem known to that date.

2004 : Hartley and Zisserman, in their book "Multiple View Geometry in Computer Vision" [12], introduce an algorithm to solve the problem with 7 points for uncalibrated cameras (with the fundamental matrix) that gives up to 3 solutions.

2003 : Nistér [8] publishes an efficient algorithm that solves the problem with a minimal set of 5 points —reaching the limit theorized by Kruppa— and uses it as hypothesis generator within a preemptive RANSAC scheme to removes outliers in more dense point sets.

2004 : An extended version of Nistér's work is published [9], in which comparative test results between the 5-point algorithm and other methods to date (6, 7 and 8 points and 7-point with RANSAC) is provided. The tests include both synthetic and real scenes and the results prove that Nistér's approach has superior overall performance, despite the 8-point algorithm is better for forward motion, as he admits.

2006 : Stewenius, Engels and Nistér [10] introduce an improved version of the 5-point algorithm. The improvement comes from the use of a Gröbner basis. It makes the algorithm clearer and numerically more stable. They publish a Matlab implementation.

2011 : Scaramuzza [11] presents a solution to the problem for wheeled vehicles (Ackermann or differential) by using only 1 point. This is achieved by assuming planar motion and exploiting the non-holonomic constraints of these vehicles. Outliers are removed by using RANSAC. Despite that the inlier count provided by this method is only comparable to the 5-point algorithm for planar paths and reduced angle between the two views (< 10 degrees), it greatly reduces the number of RANSAC iterations in comparison to methods with more points, which makes it computationally cheap.

I am still researching, so new milestones may be added to this list eventually. Do you know about an important one that should be also listed? Is there some publication on this topic that should be highlighted? If so, please leave a comment below.

References