We started with learning basics of OpenCV and then done some basic image processing and manipulations on images followed by Image segmentations and many other operations using OpenCV and python language. Here, in this section, we will perform some simple object detection techniques using template matching. We will find an object in an image and then we will describe its features. Features are the common attributes of the image such as corners, edges etc. We will also take a look at some common and popular object detection algorithms such as SIFT, SURF, FAST, BREIF & ORB.
As told in the previous tutorials, OpenCV is Open Source Commuter Vision Library which has C++, Python and Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. So it can be easily installed in Raspberry Pi with Python and Linux environment. And Raspberry Pi with OpenCV and attached camera can be used to create many real-time image processing applications like Face detection, face lock, object tracking, car number plate detection, Home security system etc.
Object detection and recognition form the most important use case for computer vision, they are used to do powerful things such as
- Labelling scenes
- Robot Navigation
- Self-driving cars
- Body recognition (Microsoft Kinect)
- Disease and cancer detection
- Facial recognition
- Handwriting recognition
- Identifying objects in satellite images
Object Detection VS Recognition
Object recognition is the second level of object detection in which computer is able to recognize an object from multiple objects in an image and may be able to identify it.
Now, we will perform some image processing functions to find an object from an image.
Finding an Object from an Image
Here we will use template matching for finding character/object in an image, use OpenCV’s cv2.matchTemplate() function for finding that object
import cv2 import numpy as np
Load input image and convert it into gray
image=cv2.imread('WaldoBeach.jpg') cv2.imshow('people',image) cv2.waitKey(0) gray=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
Load the template image
template=cv2.imread('waldo.jpg',0) #result of template matching of object over an image result=cv2.matchTemplate(gray,template,cv2.TM_CCOEFF) sin_val, max_val, min_loc, max_loc=cv2.minMaxLoc(result)
Create bounding box
top_left=max_loc #increasing the size of bounding rectangle by 50 pixels bottom_right=(top_left[0]+50,top_left[1]+50) cv2.rectangle(image, top_left, bottom_right, (0,255,0),5) cv2.imshow('object found',image) cv2.waitKey(0) cv2.destroyAllWindows()
In cv2.matchTemplate(gray,template,cv2.TM_CCOEFF), input the gray-scale image to find the object and template. Then apply the template matching method for finding the objects from the image, here cv2.TM_CCOEFF is used.
The whole function returns an array which is inputted in result, which is the result of the template matching procedure.
And then we use cv2.minMaxLoc(result), which gives the coordinates or the bounding box where the object was found in an image, and when we get those coordinates draw a rectangle over it, and stretch a little dimensions of the box so the object can easily fit inside the rectangle.
There are variety of methods to perform template matching and in this case we are using cv2.TM_CCOEFF which stands for correlation coefficient.
cv2.matchTemplate takes a “sliding window” of the object and slides it over the image from left to right and top to bottom, one pixel at a time. Then for each location, we compute the correlation coefficient to determine how “good” or “bad” the match is.
Regions with sufficiently high correlation can be considered as matches, from there all we need is to call to cv2.minMaxLoc to find where the good matches are in template matching.
Feature Description Theory
In template matching we slide a template image across a source image until a match is found. But it is not the best method for object recognition, as it has severe limitations. This method isn’t very resilient.
The following factors make template matching a bad choice for object detection.
- Rotation renders this method ineffective.
- Size (known as scaling) affects this as well.
- Photometric changes (e.g. brightness, contrast, hue etc.)
- Distortion form view point changes (Affine).
The one solution for this problem is image features
Image features are interesting areas of an image that are somewhat unique to that specific image. They are also called key point features or interest points.
The sky is an uninteresting feature, whereas as certain keypoints (marked in red circles) can be used for the detection of the above image (interesting Features). The image shown above clearly shows the difference between the interesting feature and uninteresting feature.
Importance of feature detection
Features are important as they can be used to analyze, describe and match images. They have extensive use in:
- Image alignment – e.g panorma stiching (finding corresponding matches so we can stitch images together)
- 3D reconstruction
- Robot navigation
- Object recognition
- Motion tracking
- And more!
What defines the interest points?
Interesting areas carry a lot of distinct information and unique information of an area. Typically, they are areas of high change of intensity, corners or edges and more. But always be careful as noise can appear “informative” when it is not! So try to blur so as to reduce noise.
Characteristic of Good or Interesting Features
Repeatable – They can be found in multiple pictures of the same scene.
Distinctive – Each feature is somewhat unique and different to other features of the same scene.
Compactness/Efficiency – Significantly less features than pixels in the image.
Locality – Feature occupies a small area of the image and is robust to clutter and occlusion.
Corners as features
Corners are identified when shifting a window in any direction over that point gives a large change in intensity.
Corners are not the best cases for identifying the images, but yes they have certainly good use cases of them which make them handy to use.
So to identify corners in your image, imagine the green window we are looking at and the black one is the image we want to find corners in, and now when we move the window only inside the black box we see there is no change in intensity and hence the image is flat i.e. no corners identified.
Now when we move the window in one direction we see that there is change of intensity in one direction only, hence it’s an edge not a corner.
When we move the window in the corner, and no matter in what direction we move the window now there is a change in intensity, and this is identified as a corner.
So let’s identify corner with the help of Harris Corner Detection algorithm, developed in 1998 for corner detection and works fairly well.
The following OpenCV function is used for the detection of the corners.
cv2.cornerHarris(input image, block size, ksize, k)
Input image - Should be grayscale and float32 type.
blockSize - The size of neighborhood considered for corner detection
ksize - Aperture parameter of Sobel derivative used.
k - Harris detector free parameter in the equation
Output – array of corner locations (x,y)
Also an important thing to note is that Harris corner detection algorithm requires a float 32 array datatype of image, i.e. image should be gray image of float 32 type.
import cv2 import numpy as np
Load image then grayscale
image = cv2.imread('chess.jpg') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
The cornerHarris function requires the array datatype to be float32
gray = np.float32(gray) harris_corners = cv2.cornerHarris(gray, 3, 3, 0.05)
We use dilation of the corner points to enlarge them
kernel = np.ones((7,7),np.uint8) harris_corners = cv2.dilate(harris_corners, kernel, iterations = 2)
Threshold for an optimal value, it may vary depending on the image
image[harris_corners > 0.025 * harris_corners.max() ] = [255, 127, 127] cv2.imshow('Harris Corners', image) cv2.waitKey(0) cv2.destroyAllWindows()
Corner Harris returns the location of the corners, so as to visualize these tiny locations we use dilation so as to add pixels to the edges of the corners. So to enlarge the corner we run the dilation twice. And then we again do some thresholding to change the colors of the corners.
The following function is used for the same with the below mentioned parameters
cv2.goodFeaturesToTrack(input image, maxCorners, qualityLevel, minDistance)
- Input Image - 8-bit or floating-point 32-bit, single-channel image.
- maxCorners – Maximum number of corners to return. If there are more number of corners than the total numbers of corners which are actually found, then the strongest one of them is returned.
- qualityLevel – Parameter characterizing the minimal accepted quality of image corners. The parameter value is multiplied by the best corner quality measure (smallest eigenvalue). The corners with the quality measure less than the product are rejected. For example, if the best corner has the quality measure = 1500, and the qualityLevel=0.01 , then all the corners with the quality measured less than 15 are rejected.
- minDistance – Minimum possible Euclidean distance between the returned corners.
import cv2 import numpy as np img = cv2.imread('chess.jpg') gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
We specify the top 50 corners
corners = cv2.goodFeaturesToTrack(gray, 100, 0.01, 15) for corner in corners: x, y = corner[0] x = int(x) y = int(y) cv2.rectangle(img,(x-10,y-10),(x+10,y+10),(0,255,0), 2) cv2.imshow("Corners Found", img) cv2.waitKey() cv2.destroyAllWindows()
It also returns the array of location of the corners like previous method, so we iterate through each of the corner position and plot a rectangle over it.
Problems with corners as features
Corner matching in images is tolerant of or corner detection don’t have any problem with image detection when the image is
• Rotated
• Translated (i.e. shifts in image)
• Slight photometric changes e.g. brightness
or affine intensity
However, it is intolerant of:
• Large changes in intensity or photometric
changes)
• Scaling (i.e. enlarging or shrinking)
SIFT, SURF, FAST, BRIEF & ORB Algorithms
Scale Invariant Feature Transform (SIFT)
The corner detectors like Harris corner detection algorithm are rotation invariant, which means even if the image is rotated we could still get the same corners. It is also obvious as corners remain corners in rotated image also. But when we scale the image, a corner may not be the corner as shown in the above image.
SIFT is used to detect interesting keypoints in an image using the difference of Gaussian method, these are the areas of the image where variation exceeds a certain threshold and are better than edge descriptor.
Then we create a vector descriptor for these interesting areas. And the scale Invariance is achieved via the following process:
i. Interesting points are scanned at several different scales.
ii. The scale at which we meet a specific stability criteria, is then selected and encoded by the vector descriptor. Therefore, regardless of the initial size, the more stable scale is found which allows us to be scale invariant.
Rotation invariance is achieved by obtaining the Orientation Assignment of the key point using image gradient magnitudes. Once we know the 2D direction, we can normalize this direction.
A full paper on SIFT can be read here:
http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf.
And you can also find a tutorial on the official OpenCV link.
Speeded Up Robust Features (SURF)
SURF is the speeded up version of SIFT, as the SIFT is quite computational expensive
SURF was developed to improve the speed of a scale invariant feature detector. Instead of using the Difference of Gaussian approach, SURF uses Hessian matrix approximation to detect interesting points and uses the sum of Haar wavelet responses for orientation assignment.
A full paper on SIFT can be read here: http://www.vision.ee.ethz.ch/~surf/eccv06.pdf
Alternatives of SIFT and SURF
As the SIFT and SURF are patented they are not freely available for commercial use however there are alternatives to these algorithms which are explained in brief here
Features from Accelerated Segment Test (FAST)
• Key point detection only (no descriptor, we can use SIFT or SURF to compute that)
• Used in real time applications
Here you can find the papers on FAST
https://www.edwardrosten.com/work/rosten_2006_machine.pdf
Binary Robust Independent Elementary Features (BRIEF)
• Computers descriptors quickly (instead of using SIFT or SURF)
• it is quite fast.
Here you can find the paper on BRIEF
http://cvlabwww.epfl.ch/~lepetit/papers/calonder_pami11.pdf
Oriented FAST and Rotated BRIEF (ORB)
- Developed out of OpenCV Labs (not patented so free to use!)
- Combines both Fast and Brief
Here you can find the paper on ORB
http://www.willowgarage.com/sites/default/files/orb_final.pdf
Using SIFT, SURF, FAST, BRIEF & ORB in OpenCV
Feature Detection implementation
The SIFT & SURF algorithms are patented by their respective creators, and while they are free to use in academic and research settings, you should technically be obtaining a license/permission from the creators if you are using them in a commercial (i.e. for-profit) application.
Below we are explaining programming examples of all the algorithms mentioned above.
SIFT
import cv2 import numpy as np image = cv2.imread('paris.jpg') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) Create SIFT Feature Detector object sift = cv2.xfeatures2d.SIFT_create() #Detect key points keypoints = sift.detect(gray, None) print("Number of keypoints Detected: ", len(keypoints))
Draw rich key points on input image
image = cv2.drawKeypoints(image, keypoints, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS) cv2.imshow('Feature Method - SIFT', image) cv2.waitKey(0) cv2.destroyAllWindows()
Console Output:
Number of keypoints Detected: 1893
Here the keypoints are (X,Y) coordinates extracted using sift detector and drawn over the image using cv2 draw keypoint function.
SURF
import cv2 import numpy as np image = cv2.imread('paris.jpg') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Create SURF Feature Detector object, here we set hessian threshold to 500
# Only features, whose hessian is larger than hessianThreshold are retained by the detector
#you can increase the value of hessian threshold to decrease the keypoints
surf = cv2.xfeatures2d.SURF_create(500) keypoints, descriptors = surf.detectAndCompute(gray, None) print ("Number of keypoints Detected: ", len(keypoints))
Draw rich key points on input image
image = cv2.drawKeypoints(image, keypoints, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS) cv2.imshow('Feature Method - SURF', image) cv2.waitKey() cv2.destroyAllWindows()
Console Output:
Number of keypoints Detected: 1548
FAST
import cv2 import numpy as np image = cv2.imread('paris.jpg') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Create FAST Detector object
fast = cv2.FastFeatureDetector_create() # Obtain Key points, by default non max suppression is On # to turn off set fast.setBool('nonmaxSuppression', False) keypoints = fast.detect(gray, None) print ("Number of keypoints Detected: ", len(keypoints))
Draw rich keypoints on input image
image = cv2.drawKeypoints(image, keypoints, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS) cv2.imshow('Feature Method - FAST', image) cv2.waitKey() cv2.destroyAllWindows()
Console Output:
Number of keypoints Detected: 8960
BRIEF
import cv2 import numpy as np image = cv2.imread('paris.jpg') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Create FAST detector object
brief = cv2.xfeatures2d.BriefDescriptorExtractor_create()
Create BRIEF extractor object
#brief = cv2.DescriptorExtractor_create("BRIEF") # Determine key points keypoints = fast.detect(gray, None)
Obtain descriptors and new final keypoints using BRIEF
keypoints, descriptors = brief.compute(gray, keypoints) print ("Number of keypoints Detected: ", len(keypoints))
Draw rich keypoints on input image
image = cv2.drawKeypoints(image, keypoints, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS) cv2.imshow('Feature Method - BRIEF', image) cv2.waitKey() cv2.destroyAllWindows()
Console Output:
Number of keypoints Detected: 8735
ORB
import cv2 import numpy as np image = cv2.imread('paris.jpg') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Create ORB object, we can specify the number of key points we desire
orb = cv2.ORB_create() # Determine key points keypoints = orb.detect(gray, None)
Obtain the descriptors
keypoints, descriptors = orb.compute(gray, keypoints) print("Number of keypoints Detected: ", len(keypoints))
Draw rich keypoints on input image
image = cv2.drawKeypoints(image, keypoints, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS) cv2.imshow('Feature Method - ORB', image) cv2.waitKey() cv2.destroyAllWindows()
Console Output:
Number of keypoints Detected: 500
We can specify the number of keypoints which has maximum limit of 5000, however the default value is 500, i.e. ORB automatically would detect best 500 keypoints if not specified for any value of keypoints.
So this is how object detection takes place in OpenCV, the same programs can also be run in OpenCV installed Raspberry Pi and can be used as a portable device like Smartphones having Google Lens.
This article is referred from Master Computer Vision™ OpenCV4 in Python with Deep Learning course on Udemy, created by Rajeev Ratan, subscribe it to learn more about Computer Vision and Python.