6DOF Positional Tracking with the Wiimote

In this post I am going to talk about how to track the absolute position and orientation of a wiimote in 3D space.

This is certainly nothing new and was done over seven years ago. Oliver Kreylos, the creator of the video I just linked to, wrote a very good writeup for how he did the motion tracking along with source code. But we will see how to reimplement the same effect with just one function using a computer vision library, OpenCV!

To see what we can do with the motion tracking, here’s a simple demo that plots the position of the wiimote as I am using tracing letters in the air (also shown is a white square that represents 4 LEDs along with its projected image onto the wiimote’s camera):

And another that shows my hand:

The Wiimote

First a bit of background about the wiimote. In terms of sensing capabilities, it has an infrared camera, an accelerometer, and if using Motion Plus, a gyroscope also. If you want the exact specs you can check out WiiBrew’s wiki.

For our purposes we will only need the IR camera which is normally used for tracking IR LEDs on the Wii’s sensor bar. The IR camera doesn’t capture traditional images and instead has on-board image processing for detecting up to 4 blobs at 1024×768 resolution which it reports back at up to 100Hz. This is pretty awesome since this is a relatively high report rate and we only have to deal with 4 coordinates rather than 1024*768 pixels like with a regular camera.

To use this data we need to connect the wiimote via bluetooth. There are a lot of libraries (most of which are abandoned) for interfacing with the wiimote. It doesn’t matter which one we choose since we just need very basic functionality for connecting to the device and reading raw IR values. The one I ended up using was wiiuse.

Camera Pose Estimation / Camera resectioning

Once we have the wiimote talking to our computer we can capture “images” of up to 4 points like this with the IR camera:

This might not seem like much information but if I told you that the picture above was of a square, you might guess that the wiimote camera is looking at something like this:

Or to draw it a different way, it might be looking at it from this angle:

This process of guessing the camera's position/orientation is all we need for motion tracking! In other words, if we know the real shape of the object, we can just iteratively guess a location of the camera and see if it will take the same “photo” as the one we saw. In code this can be implemented with some variant of gradient descent that minimizes the reprojection error.

But you don’t need to implement it yourself. OpenCV has a powerful function that can do exactly that for you called solvePnP (where PnP stands for the perspective-n-point problem). The signature for it looks like this:

bool solvePnP(InputArray objectPoints,
              InputArray imagePoints,
              InputArray cameraMatrix,
              InputArray distCoeffs,
              OutputArray rvec,
              OutputArray tvec,
              bool useExtrinsicGuess=false,
              int flags=ITERATIVE)

Arguments

objectPoints: These are the points of the object that the camera is looking at in world coordinates. For example if we are looking at LEDs in the shape of a square this might be [(1, 1, 0), (2, 1, 0), (1, 2, 0), (2, 2, 0)].
imagePoints: These are the 2D points representing the image that we got from the camera.
cameraMatrix/distCoeffs: This is also known as the intrinsic matrix that describes how your camera captures an image. It consists of the focal lengths, skew, and projection center. Usually we can do camera calibration to get the exact values for these but I wasn’t getting consistent measurements so I picked something that were close to the physical specs instead. And there’s also the distortion coefficients which is also something we can get from calibration, but I just assumed zero distortion.
rvec / tvec: This is the output camera translation/rotation! It is used to build what is known as the extrinsic matrix which gives us the rigid transform from the world frame to the camera frame. So if we had a point p_w in world coordinates and we want to know where it is in camera coordinates we would do p_c = R p_w + t, and similarly if we want to know where a point in camera coordinates is in the world we would do p_w = R^⊤ (p_c – t) (noting that the inverse of a rotational matrix is its transpose). For example the position of the camera is at (0, 0, 0) in the camera frame, so it is at -R^⊤ t in the world frame.
flags: The algorithm to use. The default algorithm CV_ITERATIVE is based on Levenberg-Marquardt which is the same as the one that Kreylos implemented.

Or in short, you need to give it objectPoints (3d coordinates defining the shape you’re looking at), imagePoints (the captured 2d points from the image) and camera intrinsics (how to reproject the object into a new image for calculating error). Then it will spit back out the camera position/rotation.

Here’s a short snippet showing how it is used:

// Intrinsic
double fx = 1700;
double fy = 1700;
double cx = image_width / 2;
double cy = image_height / 2;
cv::Mat intrinsic = (cv::Mat_<double>(3, 3) <<
    fx,  0, cx,
     0, fy, cy,
     0,  0,  1
);

// Solve for rvec and tvec
cv::Mat rvec, tvec;
solvePnP(object_points, image_points, intrinsic,
         cv::noArray(), rvec, tvec);

// Extrinsic
cv::Mat R;
Rodrigues(rvec, R);
cv::Mat extrinsic = cv::Mat::eye(4, 4, CV_64F);
R.copyTo(extrinsic.rowRange(0, 3).colRange(0, 3));
tvec.copyTo(extrinsic.rowRange(0, 3).col(3));

// Extrinsic inverse
cv::Mat extrinsic_inv = extrinsic.inv();

// Setup where to view from
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluPerspective(blah blah blah);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
gluLookAt(blah blah blah)

/*** Insert code here to draw in world frame ***/

glMultMatrixd(cv::Mat(extrinsic_inv.t()).ptr<double>(0));

/*** Insert code here to draw in camera frame ***/

Side note: This sometimes isn’t a unique solution. Especially for a symmetric shape like a square there are inherently many different positions that will explain the image you captured:

But the wiimote can only “see” 4 points so there’s not much you can do about it.

Infrared Marker

But the worst shape is probably the default wiimote sensor bar which is in a straight line:

If you just see a line of points you have an infinite number of camera pose that can explain your picture. To get around that we have to build our own IR LED marker!

It is easy to build one since all we are doing is making a circuit with four 940nm infrared LEDs. We will also need wires, resistors and batteries. I used this calculator to figure out what resistors I needed.

To make the LEDs into a square, I poked them through a piece of cardboard, using a thumbtack to make holes first, and then taped the pins of the LEDs together (I don’t have a soldering iron). I only chose square because it was easy but it could be whatever you want. The LEDs are diodes which means direction matters so make sure you hook them up in the right order (the longer pin is positive). Here is what mine looks like:

If you are human you probably can’t see infrared but digital cameras can (as long as they don’t have an IR filter) so you can use that for debugging. Here is an image taken with my android phone that shows the LEDs glowing purple:

Final thoughts

You can get the code on github if you want to experiment with it. I only tested on a Mac and with a Wii Remote Plus (which didn’t work with wiiuse at first without a small patch so it might be safer to go with the original wiimotes if you don’t need the gyro for anything else). I am not sure I would recommend using this for anything serious but it was cheap and fun to play with!

A few notes on limitations:

The narrow field of view makes this very awkward to use since you have to keep the target visible to the camera at all times.
LEDs have a narrow beamwidth so the wiimote might not detect them even if it is in its field of view. Retroreflective markers might work better here.
Square shape for the LEDs was not a great idea because you can’t automatically tell which of the 4 (or 8 if you count looking from back) directions you are looking from. It was dealt with by having one of the buttons cycle through the 4 orderings to select one manually but it is annoying whenever the LEDs go out view since then you’ll have to do it all over again. It is probably better to have something asymmetric.
You can scale up the physical size of the marker if you want to track from farther distance. But regardless of physical scale it will usually start getting unusably jittery/jumpy very quickly once the projected image is smaller than let’s say about 1/10 of the imaging plane. This is probably because the camera’s real resolution is actually 128×96 (it does subpixel analysis to get to 1024×768 resolution). I didn’t do any filtering for the demos but maybe that will help (opencv actually has an implementation of a Kalman Filter).

6DOF Positional Tracking with the Wiimote

The Wiimote

Camera Pose Estimation / Camera resectioning

Infrared Marker

Final thoughts

Serverless WebRTC using QR codes

Computing CSS matrix3d transforms