Computer Vision with OpenCV

Andy Zhang bio photo By Andy Zhang

This is part two of my series on my summer REU experience at Harvey Mudd College. Check out part one here.

The next half of my summer project revolved around the computer science and programming aspect of robotics. Since my main programming language at the time was Python (go script kiddies!), I mainly (read:exclusively) used Python to handle communication with the robot, path-planning, and computer vision.

Python does have its perks, though. It is a high-level language, so it is quick to develop and can rapidly deploy. The entire framework for the behavior-based robotics system was written in the span of a few days. That portion of the code is not so interesting; mostly, it was turning a graph of our robot’s behavior into a state-handling function, which then determined which behavior to execute. It also included hours of debugging to figure out which edge cases we were not handling correctly…

More interesting was the computer vision aspect, and here I must say I owe a great debt to Adrian Rosebrock and his superior tutorials on PyImageSearch. I learned most of what I needed to know about computer vision from his tutorials.

First came the ordeal of installing OpenCV, and arduous journey that took over 3 hours of staring at the terminal waiting for dependencies to install. Fun stuff!

But actually, OpenCV is an amazingly powerful suite of image processing and manipulation functions for Python, using C++ wrappers.

Now that we have all of the ingredients gathered, it’s time to bake our proverbial cake!

First is the question of localization. Localization, which is essentially how a robot guesses its current location, is extremely important to robotics. You will almost always want to know where you are, no matter what your robot is doing. The question is how exactly to determine this location. A multitude of algorithms exist, but the one we utilized is called Monte Carlo Localization.

The gist of it can be found in the Wikipedia article. Essentially, the robot is initialized within a map of its surroundings with randomized weights, indicating that it has no idea where it is. Next, it takes a sensor update. Based on that update, it changes the probabilities assigned to each particle, reducing the number of probable locations. The robot then performs a motion update, shifting all of the particles according to its motion. It then takes another sensor update, reweighting the probabilities again. This process is performed successively until the particles converge on the robot’s actual location.

Now, Monte Carlo Localization (MCL) is usually performed using laser rangefinders, which can be expensive and hard to utilize. We wanted to take our own twist on MCL by using monocular vision instead.

Our map was a series of panorama images taken at different locations in our workspace, which is visualized below.

We later extended our map to encompass most of the workspace, which can be seen in part one.

Thus, our particles were a direction vector and position on the map. For our sensor updates, we read in an image from the camera and performed a feature match, weighing the probabilities based on the number of matches between the particle image and the query image. We experimented with different image matching algorithms, eventually settling on a combination of Scale-Invariant Feature Transform (SIFT) and Speeded-Up Robust Features (SURF). More info on image matching and how feature detection works can be found in the OpenCV tutorials.

Using pure image matching to obtain our probabilities, we obtained the following result for a sequence of images.

Obviously, the algorithm is horribly inaccurate, but that is because we are missing key elements. First is the motion update, which we solved by rotating particles left or right depending on the robot’s motion, and reallocating probabilities between locations for lateral motion. Next, we also ‘remembered’ the previous generation of probabilities by assigning the previous probabilities a non-zero weight and combining them with the current probabilities.

Lastly, we had to account for the hardware limitation of our camera. Because our robot turned faster than the camera could capture images, more than half the image sequence had significant motion blur. To account for this, we found a way to quantify the blurriness of an image (also thanks to PyImageSearch). Essentially, you calculate the Laplace operator, or divergence, for all pixels in the image. Yay math! Taking the variance across the Laplace operator gives you a metric quantifying the number of edges in the image. For high variances, there must be a lot of edges. Conversely, lower variances imply few edges. Thus, blurry images will have a low variance of the Laplacian, and it was a simple matter of weighing the current sensor update by a factor proportional to the variance of the Laplace operator to account for blurriness.

Combined together, we had a robust and functional Monte Carlo Localization algorithm with monocular vision!

Check back next time for optimization techniques and navigation!