Research

We’ll tackle this experiment with a first principles approach. We’ll identify what we know and build from there.

3d models

All 3d models start with vertices. Vertices are geospatial points. That is to say, each point has an x, y, and z coordinate that denotes its location. Connecting two vertices will give you an edge. Connecting edges will render faces. These faces define the shape of your 3d model. However, you must be careful. Depending on how you connect your vertices, you’ll get different models.

If we can collect vertices and tell them how to connect to each other, we should be able to generate a 3d model. To connect them we’ll need an algorithm. The algorithm should be able to take vertices and output faces . We should then be able to plot these faces into our surroundings.

Algorithm time

Let’s take a simple algorithm to connect our points, such as Quickhull. Given a set of points, Quickhull will compute the smallest convex polygon containing the given points. At its core, the algorithm takes a divide and conquer approach similar to quicksort.

By Maonus (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

Let’s be real, applying a generic algorithm to our points will most likely render decent results. We’ll want to feed the algorithm as accurate vertices as we can. The more accurate the vertices, the better the result.

3d modeling software such as MeshLab use a combination of algorithms to create models from vertices. However, Quickhull should be able to provide us a proof of concept for this experiment.

ARKit

The reason ARKit works with existing Apple hardware going back to the iPhone SE is because it uses black magic… eh, it uses existing technology to process your surroundings. Before ARKit, most augmented reality frameworks required multiple cameras to get a hold of depth perception. ARKit works differently. It uses visual-inertial odometry, which is the determination of position and orientation by comparing associated images.

To scan your scene, ARKit requires you to point your camera around the room. Meanwhile, ARKit is scanning and capturing frames. By combining the frame data with your device’s motion detection hardware, such as the accelerometer, ARKit starts to identify where in space your frames exist.

For ARKit to interact with your scene, it needs to be able to understand its contents. It does so through hit tests. A hit test will analyze a specific point in space and return a vector coordinate (x,y,z) of where it believes the point lies in your scene. The hit test may be computed multiple times, each time with a more accurate result. The more hit tests ARKit computes, the more of the scene it understands. Once ARKit starts understanding the contents of your scene, it will begin to identify planes by which it can interact with and place objects.

The process

If we want ARKit to create a 3d model of an object, we’ll need it to understand the scene first. It can then start performing hit tests to gather our vertices. From there, we’ll be able to run our algorithm on these vertices to create a 3d model. To summarize:

1. Scan the scene

2. Collect vertices

3. Run Quickhull on our vertices

4. Use the output to create our 3d model

5. Profit!

The code

We’ll be using `Xcode 9.0` together with `Swift 4.0`. It is important to note ARKit works with iOS 11 and up. We’ll go through a high level overview of the code.

Scan the Scene

SceneKit and ARKit are highly intertwined and work well together. Models created in SceneKit can be rendered in ARKit. These models are SCNGeometry objects in SceneKit. The basis for everything ARKit related lies in your ARSCNView . We’ll call ours sceneView . Through our sceneView , we’ll analyze our surroundings, gather hitTest results, and place our models within after they have been generated.

On launch, our sceneView will start scanning and processing the frames of our surroundings. It’ll take a moment for SceneKit to get its bearings. Once it does, we can start interacting with our scene and begin our hit tests. Successful hit tests will return feature points.

Let’s take a look at the feature points our scene is gathering by activating our sceneView debugger.

// Show feature points

self.sceneView.debugOptions = ARSCNDebugOptions.showFeaturePoints // Hide feature points, more so reset the debugger options

self.sceneView.debugOptions = []

As you move your phone around you’ll start to see a lot of yellow points being drawn! You’ll also notice that they disappear. These points are being drawn for a given frame . Once our sceneView decides that the new set of frames are no longer are applicable to the past ones, it releases them from memory. Also, notice that on the bottom of the screen ARKit is providing us with debugging tools. Take note at the frame rate (fps). Drawing too many of these feature points will drop your fps.

After playing with the feature points, you’ll start to notice that the detection works better on textured surfaces. Too shiny surfaces will confuse ARKit. One trick is to spray shiny surfaces with water, but this is cheating :)

Collect Vertices

These features points could do the trick for our collection of vertices. However, we wouldn’t want to collect all of them, there would be too many stray points that do not correctly represent our model. We only want the points that ARKit finds directly on our object.

Stray points off the wazoo!

If we could select which points we wanted, say by swiping over them, we’d be able to be pretty selective with points we want. We can do so by implementing Swift’s UIGestureRecongizer: touchesMoved . Let’s store the touch location in currentPoint . We’ll guard against a bad touch by returning early.

override func touchesMoved(_ touches: Set<UITouch>, with event: UIEvent?) {

guard let touch = touches.first else { return }

let currentPoint = touch.location(in: sceneView)

...

}

Our currentPoint is of type CGPoint . Wait, but we need 3d points and CGPoint only has x and y ! Don’t fret, Apple really thought this one out. The z coordinate gives us depth perception. While we can’t directly compute a z coordinate, we can identify where a feature point exists on our screen. If we can identify this 2d location on our screen, we can compare it to feature points that our sceneView has already drawn.

Our feature points contain x , y , and z coordinates that correlate to the space in our sceneView . These points are of type SCNVector . Our sceneView has a method projectPoint() , which will translate vector coordinates that pertain to our sceneView into coordinates that pertain to our device’s screen. Therefore, we’ll be able to check if a given any feature points exists where our touch fell on the screen.

...

// Get all feature points in the current frame

let fp = self.sceneView.session.currentFrame?.rawFeaturePoints

guard let count = fp?.count else { return } // Create a material

let material = createMaterial() // Loop over them and check if any exist near our touch location

// If a point exists in our range, let's draw a sphere at that feature point

for index in 0..<count {

let point = SCNVector3.init((fp?.points[index].x)!, (fp?.points[index].y)!, (fp?.points[index].z)!)

let projection = self.sceneView.projectPoint(point)

let xRange:ClosedRange<Float> = Float(currentPoint.x)-100.0...Float(currentPoint.x)+100.0

let yRange:ClosedRange<Float> = Float(currentPoint.y)-100.0...Float(currentPoint.y)+100.0

if (xRange ~= projection.x && yRange ~= projection.y) {

let ballShape = SCNSphere(radius: 0.001)

ballShape.materials = [material]

let ballnode = SCNNode(geometry: ballShape)

ballnode.position = point

self.sceneView.scene.rootNode.addChildNode(ballnode) // We'll also save it for later use in our [SCNVector]

self.pointCloud.append(point)

}

}

Take a look at how we are creating a SCNNode . SCNNode takes in a geometry object as input. SCNSphere subclasses SCNGeometry , which allows us to create a sphere in our scene, such as the yellow spheres that we have previously seen. We’ll also save our point in a pointCloud for later use.

To give our geometry a texture, we can add materials. Our createMaterial() method returns a blue material with a bit of transparency.

func createMaterial() -> SCNMaterial {

let clearMaterial = SCNMaterial()

clearMaterial.diffuse.contents = UIColor(red:0.12, green:0.61, blue:1.00, alpha:1.0)

clearMaterial.locksAmbientWithDiffuse = true

clearMaterial.transparency = 0.2

return clearMaterial

}

After some good swiping, we’ll have a pretty nice point cloud drawn out. You’ll also notice, the more points you draw, the lower your fps gets.

Algorithm

We’ve collected our of vertices and are ready to feed our algorithm. We decided to borrow Mauricio Poppe’s version of the algorithm. Yes, it is written in Javascript. No, it’s not a big deal :) If react-native does it, why can’t we?

public func quickHull3d(vertices: [Array<Float>]) -> [Array<Int32>]? {

let frameworkBundle = Bundle(identifier: "com.research.arkit")

if let quickHullModulePath = frameworkBundle?.path(forResource: "quickhull3d", ofType: "js") {

let quickHullModule = try! String(contentsOfFile: quickHullModulePath)

let jsSource = "var window = this; \(quickHullModule)"

let context = JSContext()!

context.evaluateScript(jsSource)

let algo = context.objectForKeyedSubscript("quickhull3d")!

let result = algo.call(withArguments: [vertices])

return result!.toArray() as? [Array<Int32>]

}

return nil

}

We used browerify to bundle the algorithm. Once bundled, we can access its methods globally by calling objectForKeyedSubscript on our JSContext . This context is a Javascript environment where our Javascript will run (very similar to a virtual machine). Notice that we are passing in [Array<Float>] not [SCNVector] . Our Javascript algorithm doesn’t understand what an SCNVector is so we’ll need to transform our pointCloud before passing it through.

func reduceVectorToPoints(given vertices: [SCNVector3]) -> [Array<Float>] {

var points = [Array<Float>]()

for vertex in vertices {

var vertexArray = [Float]()

vertexArray.append(vertex.x)

vertexArray.append(vertex.y)

vertexArray.append(vertex.z)

points.append(vertexArray)

}

return points

}

We now have an [Array<Int32>] which contains our faces! The output of our algorithm will give us something along the lines of:

let faces = [ [ 2, 0, 3 ], [ 0, 1, 3 ], [ 2, 1, 0 ], [ 2, 3, 1 ] ]

Woah, what do these numbers mean? They are the indices of our points. Three points make a face. For example face[0] tells us to connect pointCloud[2] to pointCloud[0] to pointCloud[3] … and so on.

We’ve done it! We can create our SCNGeometry objects from our pointCloud and faces .

The blue surfaces are the faces that were rendered from the point cloud. Stray points confused Quickhull.

Results

Well, this is where the 💩 hit the ☢ . Unfortunately, our results were less than ideal. But why? Where did we go wrong!?

The short and sweet:

1. Our algorithm wasn’t able to recognize stray points and filter them out.

One solution would be to clean up our point data before running Quickhull against it. Doing so would allow us to remove any outliers.

2. Convex vs Concave objects, understanding holes in our objects

Quickhull can compute convex hulls. It will not be able to compute most concave hulls. Therefore the items that we scan that are mostly convex will be more accurate.

Back to the Point Cloud

The point cloud by itself ended up rendering nicely. Amazingly, the scale was accurate as well.