Masking video frames using the Kinect’s depth data

Since the xbox kinect has that nice video camera  on top of that nifty depth camera I figured it would be fun to see if I could use that depth data to combine the color data two different frames.  Specifically, I want to take a snapshot of the current video feed and then combine it with the ongoing video feed. You know. That way you can clone yourself.

Check out this video:

The idea is to compare the two frames and combine them into a new one by looking at each depth and taking the closer one.  Doing that actually works fairly decent, but there a few problems with it. One problem is that while, it may look like the grayscale depth data and the color data like they line up exactly, they do not.  This is because the Kinect has two separate cameras for the depth and color images and they are an inch or so apart and consequently the images they produce are slightly out of alignment.  Also, the lens distortion from the two lens is a bit different. Another problem is that the color camera captures images at 640 x 480 pixels, but the depth camera only produces 320 x 240 pixel images.  This means that you do not actually have a depth pixels for every color pixel.  This means using the depth data as a simple mask is a little less straight forward than it otherwise would be.

To help fix these problems microsoft’s sdk exposes a handy method: GetColorPixelCoordinatesFromDepthPixel on the NuiCamera object. Using this method we can figure out where those depth pixels lie in color image. For my purposes it would have been nicer to be able get the depth coordinates from the color image, but it will work. As it turns out this method is a bit to slow to use in real time so I opted to produce a map of these points that we can use to map the color points back to the depth points.

Here’s how  I make the map:

        private void InitializeDepthToColorMapping()
            var viewArea = new ImageViewArea();

            colorToDepthPoints = new Point[colorWidth * colorHeight];
            for (int dy = 0; dy < depthHeight; dy++)
                for (int dx = 0; dx < depthWidth; dx++)
                    int cx, cy;
                    nui.NuiCamera.GetColorPixelCoordinatesFromDepthPixel(ImageResolution.Resolution640x480, viewArea,
                        dx, dy, 0, out cx, out cy);

                    if (cx >= 0 && cx < colorWidth-1 && cy >= 0 && cy < colorHeight-1)
                        var cord = new Point();
                        cord.X = dx;
                        cord.Y = dy;

                        //Set four pixels because the depth data is half the size of the color data. Dont look at me like that. I know this isnt a perfect solution, but who really has the time to interpolate the missing values.
                        setMapPoint(cx, cy, cord);
                        setMapPoint(cx+1, cy, cord);
                        setMapPoint(cx+1, cy+1, cord);
                        setMapPoint(cx, cy+1, cord);

        private void setMapPoint(int x, int y, Point cord)
            colorToDepthPoints[y * 640 + x] = cord;

So now that we have got a map to correct for the size and distortion images we can combine the images like this:

        public Frame CompositeImage(Frame A, Frame B)
            if (colorToDepthPoints == null) InitializeDepthToColorMapping();

            int depthIndex = 0;
            int colorIndex = 0;
            int distanceA = 0, distanceB = 0;
            bool AisValid = false, BisValid = false;
            Point depthCord;

            for (int y = 0; y < colorHeight; y++)
                for (int x = 0; x < colorWidth; x++)
                    depthCord = colorToDepthPoints[y * colorWidth + x];

                    if (depthCord != null)
                        depthIndex = (depthCord.Y * frame.DepthImage.Width + (A.DepthImage.Width - depthCord.X)) * 2;
                        distanceA = A.DepthImage.Bits[depthIndex] | A.DepthImage.Bits[depthIndex + 1] << 8;
                        distanceB = B.DepthImage.Bits[depthIndex] | B.DepthImage.Bits[depthIndex + 1] << 8;
                        AisValid = distanceA > minDistance && distanceA < maxDistance;
                        BisValid = distanceB > minDistance && distanceB < maxDistance;

                    if ((distanceA < distanceB || !BisValid) && AisValid)
                        frame.ColorImage.Bits[colorIndex + RedIndex] = A.ColorImage.Bits[colorIndex + RedIndex];
                        frame.ColorImage.Bits[colorIndex + GreenIndex] = A.ColorImage.Bits[colorIndex + GreenIndex];
                        frame.ColorImage.Bits[colorIndex + BlueIndex] = A.ColorImage.Bits[colorIndex + BlueIndex];
                        frame.DepthImage.Bits[depthIndex] = A.DepthImage.Bits[depthIndex];
                        frame.DepthImage.Bits[depthIndex + 1] = A.DepthImage.Bits[depthIndex+1];
                        frame.ColorImage.Bits[colorIndex + RedIndex] = B.ColorImage.Bits[colorIndex + RedIndex];
                        frame.ColorImage.Bits[colorIndex + GreenIndex] = B.ColorImage.Bits[colorIndex + GreenIndex];
                        frame.ColorImage.Bits[colorIndex + BlueIndex] = B.ColorImage.Bits[colorIndex + BlueIndex];
                        frame.DepthImage.Bits[depthIndex] = B.DepthImage.Bits[depthIndex];
                        frame.DepthImage.Bits[depthIndex + 1] = B.DepthImage.Bits[depthIndex+1];

                    colorIndex += 4;

            return frame;

You can find the source code in svn here:

Tagged ,

8 thoughts on “Masking video frames using the Kinect’s depth data

  1. daniel says:

    this is c++ language ?

  2. daniel says:

    it is possible to save your body into a new bitmap image using this method ?

    • Matt Bell says:

      Yes and no. This method specifically compares too image’s pixels and draws the closer one. You could look at all the pixels and compare them to a reference background image and take all of the ones that are certain amount different and put them into a new image and achieve what I think you are after

  3. […] Ver también: […]

  4. Matt Bell says:

    A friend of mine pointed this out to me. I am also named Matt Bell, and I have also done multiple 3D video merges with the Kinect. Small world. 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: