I have recently been playing with around tracking objects using my computers webcam. In this article I am going to talk about how to track object in video streams using OpenCv and cvMatchTemplate.
I have been interesting in computer vision stuff for a while now, but only recently have I really started playing around with it. Trying to make a computer “see” in any sense of the word seems like a daunting task. Fortunately, there is a great open source project out there that makes computer vision accessible to those of us who don’t understand greek. I speak, of course, of OpenCV. Really smart people have come up with really clever algorithms and the nice people at OpenCV have encapsulated them into a nice easy to use framework. When I say easy to use, I mean use, not install. I had a rather difficult time trying to get OpenCv set up and working on my machine, so I ended up using openFrameworks and its OpenCV addon instead.
With OpenCV there are more than a few ways to approach object tracking. Here I will be discussing a relatively simple method that uses Template Matching to do the tracking. The idea behind template matching is to take a picture of the thing you want to track and then try to find it in the webcam’s video frames. So lets say we are trying to track my face in the video we would take a picture of my face and give it to the cvMatchTemplate method which then takes that image and slides it across the video frame pixel by pixel and figuring out how close of a match it is. This outputs a grayscale image where how bright each pixel corresponds to how close of a match it is. This means that all you have to do is find the brightest spot in the resulting image to find where your face is in that frame (don’t worry though, OpenCV has got a handy little method for finding this bright spot)
Enough with the theory. Lets see how to actually use this. I am going to be working with the OpenFrameworks toolkit, so you will need to get that set up before any of the rest of this will work. If you need help and are going to be using Visual Studio 2010 to do your programming I have a small write up to help here. If your not using Visual Studio check out their download page for instructions
Basic webcam access
To access the webcam in the first place we are going to use a ofVideoGrabber. To do OpenCV stuff on the images we are going to use a ofxCvColorImage object to store the current video frame. So in your app’s .h file add these variables the public section:
and these constants to the top:
const int camWidth = 320;
const int camHeight = 240;
In your app’s setup() method we need to initialize the video grabber and allocate some space for the images so add this code:
Then in your app’s update() method we need to get the latest frame data from the vidGrabber and push it into colorImg. So lets update your update() method to look like this:
Once we have the image data we need to draw it to the screen so we have know what we are dealing with. So in the app’s draw() method add this:
That covers basic webcam access. If you run your project now you should see the feed from your webcam on the screen. The fact that we can get access to the webcam’s frame and draw them so easily is amazing. I have used DirectShow in the past to do this, and I can’t explain how much simpler this is.
Defining a template image
Where you get your template image (the thing that we want to track) is up to you, but for our purposes here, I am going to let the user select it out of the current video feed by just highlighting it in the video feed with the mouse. The code for selecting a rectangular region on the screen is boring and long winded so I am not going to show it here, but if you need it, it is in the example project that you can download at the bottom of the page. The fun bit that I am going to show is how to pull out a crop from the video feed.
subjectImg.allocate(subjectFrame.width, subjectFrame.height); //Allocate space for the template
colorImg.setROI(subjectFrame); //Set region of interest (ROI)
subjectImg = colorImg; //Copy the specific area to the subject image
colorImg.resetROI(); //Reset the ROI or everything downstream will go crazy
The fun bits
Alright now that we have the webcam streaming video, and we have a template image to look for, all that is left to do is actually run the fun bits to find the template image in the current video frame. If you recall the explanation about how cvMatchTemplate works you will probably remember that cvMatchTemplate outputs a grayscale image where each pixel is basically a value indicating how likely it is that location is where the template is. This means we need to allocate an image for to put its data into. In your update() method add this:
IplImage *result = cvCreateImage(cvSize(camWidth – subjectImg.width + 1, camHeight – subjectImg.height + 1), 32, 1);
This gives us a spot to store the results. Directly under this line add this line
cvMatchTemplate(colorImg.getCvImage(), subjectImg.getCvImage(), result, CV_TM_SQDIFF);
This is the magic line of code. It is what finds the what we are after. Quick note about that CV_TM_SQDIFF bit there. This is the method that we are telling cvMatchTemplate to use calculate how similar our template is to each portion of the video frame. There are a few other options that you can read about in the OpenCV documentation.
cvMatchTemplate has found the location of our template for us. Unfortunately, it, like a long winded tech blogger, it has also said a whole bunch of extra stuff. To pick out the actual screen coordinates we need to find the brightest spot in the image. And as it happens OpenCV has this handy little function cvMinMaxLoc that will look at the grayscale image and gives us back the locations and brightness values of the brightest and darkest spots of the image. Which we can use like this:
double minVal, maxVal;
CvPoint minLoc, maxLoc;
cvMinMaxLoc(result, &minVal, &maxVal, &minLoc, &maxLoc, 0);
Remember when I said that we were looking for the brightest spot in the result image to tell us the location of our tracked object? When you use CV_TM_SQDIFF it is actually the darkest spot. So the location of our tracked object is actually now located in the variable minLoc which we can transfer over into our subjectLocation variable like so:
subjectLocation.x = minLoc.x;
subjectLocation.y = minLoc.y;
Now that we have the location of our object lets just draw a box around it so we can see that its working. So in your draw() method just add this:
ofRect(subjectLocation.x, subjectLocation.y, subjectFrame.width, subjectFrame.height);
Would you look at that? Now the computer can see our smiling face actually do something with it. This is cool but there are somethings to know about it. You have to give an image to start with, and that image has to match really closely. It isn’t very robust. If you are trying to track your face, and the lighting changes dramatically it will probably loose tracking. If the size of the face changes much, like when you lean in or out from the camera, it will loose you. It also doesn’t handle rotation very well.
The performance of this algorithm isn’t bad, but if you have it search the whole video frame for your face you might notice a slowdown. To combat this you can limit the search area by calling colorImg.SetROI() before calling cvMatchTemplate. If you do this dont forget to reset the ROI afterwards and to adjust the amount of space you allocate for the result image. Also the resulting location will need to be offset to account. The general idea would be only search the area around the object when you last saw it. So after frame set the search window to a rectangle that is two or three times larger than you template image and center it on the tracked location. This will speed up processing time as well as possibly preventing false positives.
The source code for this project is available at: http://project-greengiant.googlecode.com/svn/trunk/Blog/TrackingWithTemplateMatching
Please note that to run it will need to install it in the right location relative to your instance of OpenFrameworks. More information about that can be found here: