I recently started working with the Kinect V2 on Linux with pylibfreenect2.

When I first was able to show the depth frame data in a scatter plot I was disappointed to see that none of the depth pixels seemed to be in the correct location.

Side view of a room (notice that the ceiling is curved).

I did some research and realized there's some simple trig involved to do the conversions.

To test I started with a pre-written function in pylibfreenect2 which accepts a column, row and a depth pixel intensity then returns that pixel's actual position:

X, Y, Z = registration.getPointXYZ(undistorted, row, col)

This does a surprisingly good job at correcting the positions:

The only drawback to using getPointXYZ() or getPointXYZRGB() is that they work on only one pixel at a time. This can take a while in Python as it requires the use of nested for-loops like so:

n_rows = d.shape[0] n_columns = d.shape[1] out = np.zeros((n_rows * n_columns, 3), dtype=np.float64) for row in range(n_rows): for col in range(n_columns): X, Y, Z = registration.getPointXYZ(undistorted, row, col) out[row * n_columns + col] = np.array([Z, X, -Y])

I tried to better understand how getPointXYZ() was calculating a coordinate. To the best of my knowlege it looks similar to this OpenKinect-for-Processing function: depthToPointCloudPos(). Though I suspect libfreenect2's version has more going on under the hood.

Using that gitHub sourcecode as an example I then tried to re-write it in Python for my own experimentation and came up whth the following:

#camera information based on the Kinect v2 hardware CameraParams = { "cx":254.878, "cy":205.395, "fx":365.456, "fy":365.456, "k1":0.0905474, "k2":-0.26819, "k3":0.0950862, "p1":0.0, "p2":0.0, } def depthToPointCloudPos(x_d, y_d, z, scale = 1000): #calculate the xyz camera position based on the depth data x = (x_d - CameraParams['cx']) * z / CameraParams['fx'] y = (y_d - CameraParams['cy']) * z / CameraParams['fy'] return x/scale, y/scale, z/scale

This is a comparison between the traditional getPointXYZ and my custom function:

They look very similar. However with apparent differences. The left comparison shows edges that are more straight also some sinusoid shape on the flat ceiling. I suspect that additional math is involved.

I would be very interested to hear if anyone has ideas as to what might differ between my function and libfreenect2's getPointXYZ.

However the main reason I've posted here is to ask about attempting to vectorize the above function to work on an entire array instead of looping through each element.

Applying what I learned from the above I was able to write a function which appears to be a vectorized alternative to depthToPointCloudPos:

[EDIT]

Thanks to Benjamin for helping make this function even more efficient!

def depthMatrixToPointCloudPos(z, scale=1000): #bacically this is a vectorized version of depthToPointCloudPos() C, R = np.indices(z.shape) R = np.subtract(R, CameraParams['cx']) R = np.multiply(R, z) R = np.divide(R, CameraParams['fx'] * scale) C = np.subtract(C, CameraParams['cy']) C = np.multiply(C, z) C = np.divide(C, CameraParams['fy'] * scale) return np.column_stack((z.ravel() / scale, R.ravel(), -C.ravel()))

This works and produces the same pointcloud results as the previous function depthToPointCloudPos(). The only difference is that my processing rate went from ~1 Fps to 5-10 Fps (WhooHoo!). I believe this eliminates a bottle-neck caused by Python doing all the calculations. So my scatter plot now runs smoothly again with the semi-real-world coordinates being calculated.

Now that I have an efficient function for retrieving the 3d coordinates from the depth frame, I would really like to apply this approach to also mapping the color camera data to my depth pixels. However I am not sure what math or variables are involved to do that, and there was not much mention about how to calculate it on Google.

Alternatively I was able to use libfreenect2 to map the colors to my depth pixels using getPointXYZRGB:

#Format undistorted and regisered data to real-world coordinates with mapped colors (dont forget color=out_col in setData) n_rows = d.shape[0] n_columns = d.shape[1] out = np.zeros((n_rows * n_columns, 3), dtype=np.float64) colors = np.zeros((d.shape[0] * d.shape[1], 3), dtype=np.float64) for row in range(n_rows): for col in range(n_columns): X, Y, Z, B, G, R = registration.getPointXYZRGB(undistorted, registered, row, col) out[row * n_columns + col] = np.array([X, Y, Z]) colors[row * n_columns + col] = np.divide([R, G, B], 255) sp2.setData(pos=np.array(out, dtype=np.float64), color=colors, size=2)

Produces a pointcloud and colored vertexes (Very Slow <1Fps):

In summary my two questions basically are:

What additional steps would be required so that the real-world 3d coordinates data returned from my depthToPointCloudPos() function (and the vectorized implementation) are more resemblant of the data returned by getPointXYZ() from libfreenect2?

And, what would be involved in creating a (possibly vectorized) way to generate the depth to color registration map in my own application? Please see the update as this has been solved.

[UPDATE]

I managed to map the color data to each pixel using the registered frame. It was very simple and only required adding these lines prior to calling setData():

colors = registered.asarray(np.uint8) colors = np.divide(colors, 255) colors = colors.reshape(colors.shape[0] * colors.shape[1], 4 ) colors = colors[:, :3:] #BGRA to BGR (slices out the alpha channel) colors = colors[...,::-1] #BGR to RGB

This allows Python to quickly process the color data and gives smooth results. I have updated/added them to the functional example below.

Real-world coordinate processing with color registration running real-time in Python!

(GIF image resolution has been greatly reduced)

[UPDATE]

After spending a little more time with the application I have added some additional parameters and tuned their values with hopes to improve the visual quality of the scatter plot and possibly make things more intuitive for this example/question.

Most importantly I have set the vertexes to be opaque:

sp2 = gl.GLScatterPlotItem(pos=pos) sp2.setGLOptions('opaque') # Ensures not to allow vertexes located behinde other vertexes to be seen.

I then noticed whenever zooming very close to surfaces, the distance between adjacent verts would appear to expand until all of what was visible was mostly empty space. This was partially a result of the point size of the vertexes not changing.

To help aid in creating a "zoom-friendly" viewport full of colored vertexes I added these lines which calculates the vertex point size based on the current zoom level (for each update):

# Calculate a dynamic vertex size based on window dimensions and camera's position - To become the "size" input for the scatterplot's setData() function. v_rate = 8.0 # Rate that vertex sizes will increase as zoom level increases (adjust this to any desired value). v_scale = np.float32(v_rate) / gl_widget.opts['distance'] # Vertex size increases as the camera is "zoomed" towards center of view. v_offset = (gl_widget.geometry().width() / 1000)**2 # Vertex size is offset based on actual width of the viewport. v_size = v_scale + v_offset

And lo and behold:

(Again, GIF image resolution has been greatly reduced)

Maybe not quite as good as skinning a pointcloud, but it does seem to help make things easier when trying to understand what you're actually looking at.

All mentioned modifications have been included in the functional example.

[UPDATE]

As seen in the previous two animations it is clear that the pointcloud of real-world coordinates has a skewed orientation compared to the grid axes. This is because I was not compensating for the Kinect's actual orientation in the real word!

Thus I have implemented an additional vectorized trig function which calculates a new (rotated and offset) coordinate for each vertex. This orients them correctly relative to the Kinect's actual position in real space. And is necessary when using tripods that tilt (could also be used to connect the output of an INU or gyro/accelerometer for real-time feedback).

def applyCameraMatrixOrientation(pt): # Kinect Sensor Orientation Compensation # bacically this is a vectorized version of applyCameraOrientation() # uses same trig to rotate a vertex around a gimbal. def rotatePoints(ax1, ax2, deg): # math to rotate vertexes around a center point on a plane. hyp = np.sqrt(pt[:, ax1] ** 2 + pt[:, ax2] ** 2) # Get the length of the hypotenuse of the real-world coordinate from center of rotation, this is the radius! d_tan = np.arctan2(pt[:, ax2], pt[:, ax1]) # Calculate the vertexes current angle (returns radians that go from -180 to 180) cur_angle = np.degrees(d_tan) % 360 # Convert radians to degrees and use modulo to adjust range from 0 to 360. new_angle = np.radians((cur_angle + deg) % 360) # The new angle (in radians) of the vertexes after being rotated by the value of deg. pt[:, ax1] = hyp * np.cos(new_angle) # Calculate the rotated coordinate for this axis. pt[:, ax2] = hyp * np.sin(new_angle) # Calculate the rotated coordinate for this axis. #rotatePoints(1, 2, CameraPosition['roll']) #rotate on the Y&Z plane # Disabled because most tripods don't roll. If an Inertial Nav Unit is available this could be used) rotatePoints(0, 2, CameraPosition['elevation']) #rotate on the X&Z plane rotatePoints(0, 1, CameraPosition['azimuth']) #rotate on the X&Y plane # Apply offsets for height and linear position of the sensor (from viewport's center) pt[:] += np.float_([CameraPosition['x'], CameraPosition['y'], CameraPosition['z']]) return pt

Just a note: rotatePoints() is only being called for 'elevation' and 'azimuth'. This is because most tripods don't support roll and to save on CPU cycles it has been disabled by default. If you intend on doing something fancy then definitely feel free to un-comment it!!

Notice the grid floor is level in this image but the left pointcloud is not aligned with it:

The parameters to set the Kinect's orientation:

CameraPosition = { "x": 0, # actual position in meters of kinect sensor relative to the viewport's center. "y": 0, # actual position in meters of kinect sensor relative to the viewport's center. "z": 1.7, # height in meters of actual kinect sensor from the floor. "roll": 0, # angle in degrees of sensor's roll (used for INU input - trig function for this is commented out by default). "azimuth": 0, # sensor's yaw angle in degrees. "elevation": -15, # sensor's pitch angle in degrees. }

You should update these according to your sensor's actual position and orientation:

The two most important parameters are the theta (elevation) angle and the height from the floor. A simple measuring tape and a calibrated eye is all I used, however I intend on someday feeding encoder or INU data to update these parameters in real-time (as the sensor is moved around).

Again, all changes have been reflected in the functional example.

If anyone is successful in making improvements to this example or has suggestions on ways to make things more compact I would be very appreciative if you could leave a comment explaining the details.

Here is the fully functional example for this project: