Machine Learning - Image Content Analysis

Computer Vision is a field that studies how to process, analyze, and understand the contents of visual data. In image content analysis, we use a lot of Computer Vision algorithms to build our understanding of the objects in the image. Computer Vision covers various aspects of image analysis, such as object recognition, shape analysis, pose estimation, 3D modeling, visual search, and so on. Humans are really good at identifying and recognizing things around them! The ultimate goal of Computer Vision is to accurately model the human vision system using computers.

Computer Vision consists of various levels of analysis. In low-level vision, we deal with pixel processing tasks, such as edge detection, morphological processing, and optical flow. In middle-level and high-level vision, we deal with things, such as object recognition, 3D modeling, motion analysis, and various other aspects of visual data. As we go higher, we tend to delve deeper into the conceptual aspects of our visual system and try to extract a description of visual data, based on activities and intentions. One thing to note is that higher levels tend to rely on the outputs of the lower levels for analysis.

One of the most common questions here is, "How is Computer Vision different from Image Processing?" Image Processing studies image transformations at the pixel level. Both the input and output of an Image Processing system are images. Some common examples are edge detection, histogram equalization, or image compression. Computer Vision algorithms heavily rely on Image Processing algorithms to perform their duties. In Computer Vision, we deal with more complex things that include understanding the visual data at a conceptual level. The reason for this is because we want to construct meaningful descriptions of the objects in the images. The output of a Computer Vision system is an interpretation of the 3D scene in the given image. This interpretation can come in various forms, depending on the task at hand.

In this article, we will use a library, called OpenCV, to analyze images. OpenCV is the world's most popular library for Computer Vision. As it has been highly optimized for many different platforms, it has become the de facto standard in the industry. Before you proceed, make sure that you install the library with Python support. You can download and install OpenCV at http://opencv.org. For detailed installation instructions on various operating systems, you can refer to the documentation section on the website.

Let's take a look at how to operate on images using OpenCV-Python. In this recipe, we will see how to load and display an image. We will also look at how to crop, resize, and save an image to an output file.

How to do it… Create a new Python file, and import the following packages: import sys import cv2 import numpy as np Specify the input image as the first argument to the file, and read it using the image read function. We will use forest.jpg , as follows: # Load and display an image -- 'forest.jpg' input_file = sys.argv[1] img = cv2.imread(input_file) Display the input image, as follows: cv2.imshow('Original', img) We will now crop this image. Extract the height and width of the input image, and then specify the boundaries: # Cropping an image h, w = img.shape[:2] start_row, end_row = int(0.21*h), int(0.73*h) start_col, end_col= int(0.37*w), int(0.92*w) Crop the image using NumPy style slicing and display it: img_cropped = img[start_row:end_row, start_col:end_col] cv2.imshow('Cropped', img_cropped) Resize the image to 1.3 times its original size and display it: # Resizing an image scaling_factor = 1.3 img_scaled = cv2.resize(img, None, fx=scaling_factor, fy=scaling_factor, interpolation=cv2.INTER_LINEAR) cv2.imshow('Uniform resizing', img_scaled) The previous method will uniformly scale the image on both dimensions. Let's assume that we want to skew the image based on specific output dimensions. We use the following code: img_scaled = cv2.resize(img, (250, 400), interpolation=cv2.INTER_AREA) cv2.imshow('Skewed resizing', img_scaled) Save the image to an output file: # Save an image output_file = input_file[:-4] + '_cropped.jpg' cv2.imwrite(output_file, img_cropped) cv2.waitKey() The waitKey() function displays the images until you hit a key on the keyboard. The full code is given in the operating_on_images.py file that is already provided to you. If you run the code, you will see the following input image: The second output is the cropped image: The third output is the uniformly resized image: The fourth output is the skewed image: