Setting Up Our Vision Request

Now it’s time to set up our Vision rectangle detection request. In the following function detectRectangles , we set up our VNDetectRectanglesRequest and pass it to the image request handler to start processing:

A few things to note in the above code:

We’ve set the minimumAspectRatio and maximumAspectRatios to 1.3 and 1.7. respectively, since most credit and business cards fall in that range.

and to 1.3 and 1.7. respectively, since most credit and business cards fall in that range. The above function is invoked in the following function:

func captureOutput(

_ output: AVCaptureOutput,

didOutput sampleBuffer: CMSampleBuffer,

from connection: AVCaptureConnection) {



guard let frame = CMSampleBufferGetImageBuffer(sampleBuffer) else {

debugPrint("unable to get image from sample buffer")

return

}



self.detectRectangle(in: frame)

}

The result returned by the Vision request in the completion handler is of type VNRectangleObservation , which consists of the boundingBox and the confidence value.

, which consists of the and the value. Using the bounding box property, we’ll draw a layer on top of the camera where the rectangle is detected.

The doPerspectiveCorrection function is used to fix the image in case it’s distorted. We’ll look closer at this shortly. This function is invoked when the user taps the “Scan” button to extract the fully-cropped card from the camera feed.

Drawing Bounding Boxes on the Camera View

Vision’s bounding box coordinates belong to the normalized coordinate system, which has the origin as the lower-left corner of the screen.

Hence, we need to transform the Vision’s bounding box CGRect into the image coordinate system, as shown in the code below:

func drawBoundingBox(rect : VNRectangleObservation) {





let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -self.previewLayer.frame.height)



let scale = CGAffineTransform.identity.scaledBy(x: self.previewLayer.frame.width, y: self.previewLayer.frame.height)



let bounds = rect.boundingBox.applying(scale).applying(transform) createLayer(in: bounds) } private func createLayer(in rect: CGRect) { maskLayer = CAShapeLayer()

maskLayer.frame = rect

maskLayer.cornerRadius = 10

maskLayer.opacity = 0.75

maskLayer.borderColor = UIColor.red.cgColor

maskLayer.borderWidth = 5.0



previewLayer.insertSublayer(maskLayer, at: 1) }

Instead of doing a CGAffineTransform to transform the bounding box into the image’s coordinate space, we can use the following built-in methods available with the Vision framework:

func VNNormalizedRectForImageRect(_ imageRect: CGRect,

_ imageWidth: Int,

_ imageHeight: Int) -> CGRect

When the maskLayer is set on the detected rectangle in the camera feed, you’ll end up with something like this:

The job is only half done! Our next step involves extracting the image within the bounding box. Let’s see how to do that.

Extracting the Image from the Bounding Box

The function doPerspectiveCorrection takes the Core Image from the buffer, converts its corners from the normalized to the image space, and applies the perspective correction filter on them to give us the image. The code is given below:

The UIImageWriteToSavedPhotosAlbum function is used to save the image in the Photos library of a user’s device.

The image doesn’t show in your album if you directly pass the CIImage into the UIImage initializer. Hence, it’s crucial that you convert the CIImage to a CGImage first, and then send it to the UIImage .

Let’s look at the extracted image with perspective correction applied:

It’s evident from the above illustration that applying a perspective correction filter on the Core Image fixes the orientation of the said image.

Next, let’s look at extracting only the desired text from the scanned image.