Computer Vision

Computer vision is one of the fundamental tasks for modern robotics. For four years I was primarily responsible for designing and maintaining the software used for our FIRA and RoboCup teams, including the vision processing.

Given this experience with applied computer vision, I was hired by Lumo Interactive (formerly Po-Mo), a Winnipeg interactive art and multimedia company that specializes in interactive projection installations. I was hired specifically as a computer vision specialist to work on the Lumo projector toy's motion-tracking system.


Robot Vision

See also: libdarwin

The DARwIn-OP robots I used for FIRA and RoboCup are relatively low-powered in terms of hardware. They use a single-core, 1GHz Atom CPU (newer models feature a dual-core CPU), with relatively little RAM compared to contemporary smartphones. This single CPU needed to manage all of the forward- and inverse-kinematics calculations for motions, decision-making, balancing calculations, and the vision. Of these tasks, vision is by-far the most computationally-intensive, and the most-likely to run into bottleneck issues.

This possibility of a bottleneck necessitated the use of robust-yet-cheap vision algorithms in order to track objects of interest during competitions.

Jimmy practicing the ladder-climbing event, with the robot's PoV

I decided to categorize objects of interest into four categories:

  1. single-blob: a unique, closed shape in the environment (e.g. soccer ball),
  2. multi-blob: a collection of similar but separate blobs (e.g. barriers in the obstacle course),
  3. single-line: a single, linear object in the environment (e.g. marathon tape), and
  4. multi-line: a collection of linear objects (e.g. ladder rungs, soccer field lines).
In practice, single-line was rarely used; I developed a special-purpose, blob-based vision system for the marathon tape.

When I was working on the robotics competitions the environments were tailored to be high-visibility, with generally-uniform colours for most objects. Therefore, the vision system was designed primarily around colour recognition, with shape as a secondary consideration. Multi-coloured objects (such as the sprint target) were defined as a set of distinct blobs, with post-processing to assemble them into the larger target.

Blob Detection

The first year I worked on the robots (2011) we used very simple colour thresholding and blob detection, implemented with OpenCV. We would define a maximim and minimum YUV thresholds for the colour of interest and create a binary image based on those values. We would then use OpenCV's built-in contour detection to determine the bounding boxes of objects in the scene.

This system was trivial to implement, which gave us more time to focus on the rest of the robot's software. 2011 was the first time the DARwIn-OP had been used at a competition, so there was a steep learning curve for all of us involved.

Unfortunately this system was also very prone to failure. Because of the strict thresholding we had to be very careful when calibrating the colours. Changes in lighting would break the vision.

Given these difficulties I made the decision to completely rewrite the vision system using a more complicated, but ultimately more robust algorithm.


The scanline algorithm had been used on some of the AA Lab's older robots (e.g. Bioloids with Nokia cellphones as the camera/brain), and had been shown to be an effective for humanoid robots in competitions.

The algorithm uses a combination of horizontal scanline segmentation and flood-fills. Optionally, the segmentation can be subsampled to increase throughput.

Like the blob detection, the algorithm uses a pre-configured max/min YUV range for the colour of interest. Unlike the blob detection agorithm, this range can safely be made over-broad without hugely negatively impacting the performance.

Step one invoves walking across each pixel row, looking for \(n\) contiguous pixles whose YUV values are within the defined range and with a maximum per-channel difference between pixels that is less than or equal to \(\epsilon_1\). Pixels that have been filled by the following step are skipped during this search.

If such a line of pixels is found, the average colour \(\bar{c}\) of the line is recorded, and we perform a 4-connected (or 8-connected, depending on preference) flood-fill of the region, colouring pixels whose value is within \(\epsilon_2\) of \(\bar{c}\).

From the flood-filled region we can extract the bounding box/aspect ratio of the object as well as its compactness (\(\frac{\mbox{number of filled pixels}}{\mbox{area of bounding box}}\)). These pieces of information are further used in post-processing for filtering out false-positives (e.g. a round ball should have a roughly square aspect ratio and a known compactness).

In practice this algorithm worked very well for detecting single- and multi-blob objects in the scene. The algorithm was so successful that it remained our default object detection system from 2012 through to 2015 when I stopped working with the robots. (The algorithm may still be in use -- I am unaware of the specifics of the robots' current software.)

Most importantly, because the algorithm used dynamic thresholds based on an initial, over-calibrated range, the vision could compensate for lighting changes that occur outdoors; while working on the hockey project, the robot was able to identify a red ball indoors in the lab and the same ball in direct sunlight on an ourdoor skating rink without the need for recalibration. Moving from direct sunlight to deep shadow likewise necessitated no re-calibration.

Pictures, video examples coming eventually

Line Detection

For detecting linear targets (defined as long, thin, straight segments) we needed something better than the scanline detector. While linear targets could be identified as a blob some between the aspect ratio and compactness, for a line we really want just two endpoints of the segment, not an entire bounding box.

The best solution we found for this problem was OpenCV's Probablistic Hough Line Detector.

This algorithm was relatively simple to implement; as with blob detection we define the max and min YUV values for the colour of interest, threshold the image based on these values, apply a simple edge detector (e.g. Canny), and pass the binary image to the OpenCV function. The output of this is literally just the endpoints of all possible line segments that meet our minimum length and angle criteria.

This output tends to be over-numerous, so as I final step I use a bucket-based algorithm to group similarly-inclined lines in the same general area together, averaging together their endpoints.

Sprint Targets

For the sprint event at FIRA, teams are allowed to place a small, coloured marker at the end of their lane to assist in navigation.

Between 2011 and 2014 we went through three different marker designs.

In 2011 our target was a simple, flat sheet of card. This was effective for the forward leg of the sprint, but failed when used for the reverse leg. Withough any depth the robot was unable to determine its position left-right within the lane, and frequently walked diagonally out of its lane.

In 2012 I designed a new target made of two diagonal panels in contrasting colours with a two-coloured strip across the front.

Our 2012 sprint target in-use. Still some bugs to work out.

The idea was that the robot would be able to identify the edge between the two colours on the front strip and the edge between the background colours. By shuffling sideways such that these edges are lined up the robot would stay in its lane.

In practice, this did not work as planned. The theory was sound, but the four coloured areas (six if we consider that the top and bottom panels are cut in half by the centre strip) were very small and fuzzy. Add to this poor lighting at the venue in Bristol, and the fact that we had to re-colour one of the panels from green to red (which had low-constrast with the pink) and the execution was, in general, not what we'd hoped for.

That said, we learned a lot from the experience in Bristol, and designed a new, more robust target for the 2013 competition.

Our 2013-2014 target was based on a design from Plymouth University. The design is a simple chair-shape, with two large, single-colour panels. The chair shape ensures one panel is located above and behind the other.

Our 2013-2014 sprint target in use. We finished the sprint in 4th place in 2013.

We keep the robot centred in its lane by ensuring that the top and bottom panels' centres are within a narrow tolerance of each other.

In practice this design proved to be much more reliable than the 2012 design. The simplified colour calibration (two colours instead of 4), combined with larger panels and less occlusion led to a winning design. Using this target we took 4th place in the sprint at the 2013 HuroCup competition, helping us secure first-place overall at the event.


The marathon event requires the robot to follow a line of tape the entire length of the track. The track length increases every year, and recent years have added breaks in the tape. This section describes the work I did on the marathon from 2011 to 2013 -- before breaks were added to the tape.

The marathon course is not a single linear track. It contains many curves and hard corners (up to 90 degree corners, and a minimum curve radius of 1m). While line detection algorithms, like HoughLines could be used, our implementation uses hard thresholding, which as previously discussed, is not suitable for dynamic lighting. Since the marathon event happens outdoors we needed a vision system that would work in variable lighting without the need for recalibration.

Given these requirements, I developed a blob-based system for tracking the marathon tape. My algorithm divides the entire image into thin slices and treats each slice as an image, looking for a multi-blob object in that slide whose colour matches our calibration colours.

Once blobs in each slice have been detected, I build a connect blobs in adjacent segments to build a multi-segment line with minimal deviation in the angle of adjacent segments. (This is a fancy way of saying I try to build the straightest line possible out of the blobs that were detected.)

In order to detect hard corners I divide the image not only into horizontal slices (which will build a vertical line), but also into horizontal slices in the left and right halves of the frame. These vertical slides will produce horizontal lines going to the left or right.

This gives us at-most 3 multi-segment candidate lines. Averaging these lines we can calculate the average angle and distance to the line. These values are put into a set of PID controllers to adjust the robot's stride length/speed, bearing, and lateral stride amount.

In general, the robot will walk faster when the tape is straight and we do not register any corners. The robot will take corners more slowly, as turning at speed can cause the robot to fall over more easily.

Practicing the marathon in the hallways of UofM (2013)

This vision system worked very well in practice; we took 4th place in the 2012 marathon (we would have done better, but the robot's battery went dead about halfway through the race). In 2011 we took second place with an similar algorithm that relied on hard thresholds and blob detection instead of scanline/flood-fill.

Robot POV images/video coming eventually...