Skip to content

2017Autonomous

David Matthews edited this page Apr 4, 2017 · 1 revision

FIRST FRC Autonomous robotic driving

The problem

In the FIRST FRC 2017 Steamworks game there are two main ways to score points; throwing 5 inch diameter whiffle balls at a target and delivering 11 inch diameter yellow gears onto pegs. Each match consists of a 15 second autonomous period followed by a 2 minute and 15 second remote controlled period. My robotics team decided to focus on building a robot to deliver gears. As a software engineer I decided to design software to have our robot deliver a gear in the autonomous period.

Robots begin the match touching one of two walls (the gray sections on the left and right of below diagram) each of which are about 20 feet wide. Centered on the field beginning about 9 feet away from each of the walls are regular hexagon shaped towers. On the three adjacent faces closest to the walls there are pegs extending normal from the tower to which robots can deliver gears. Below, in black, are six paths which a robot could follow to deliver a gear to the pegs.

On either side of each of these pegs there are two pieces of retroreflective tape.

Depending on the initial position of the robot the robot will have to travel between 7 and 15 feet in order to deliver a gear. This motion must place the peg reliably within an 6 inch wide area on the front of our robot or the gear will not be delivered.

The solution

There are a number of different sensors that we can use on our robot, however many sensors can not be used for dead reckoning because they measure derivatives of position and thus when integrated accumulate offset error. Due to the need for high precision and high accuracy I choose to use an image stream to guide the robot to these pegs. Cameras have a major benefit compared to other sensors in that all their data is directly tied to position and thus do not accumulate error over time due to integration. Most cameras provide relatively low frequency data which means that they are best used for planning a path and not well suited for executing a path. I will go into more about this later. Although I am aware of the types of systems that are a lot more efficient than cameras we only had six weeks to create a fully working autonomous navigation system. Given this time limit I decided that it was not feasible to have both a fast and reliable delivery system. I chose to make it reliable at the cost of speed.

My solution had two main aspects; target detection in each frame of the video, and executing robot motion from metadata about the target location. Due to the tendency of computer vision to be computationally expensive, I choose to run the target identification software on a different linux computer and use UDP to send the data to the roboRIO (the robot motion control computer) over ethernet.

Target identification

Computer vision feature detection is always easier when the features to be detected are very unique in an image. One of the simplest ways to do this is to make them bright. FIRST FRC encourages team to use vision identification by making all of their targets out of retroreflective tape. Retroreflective tape reflects incoming light rays back towards their source. We choose to place a green ring light around our camera. The use of a ring means that the light emitted by the ring will be selectively reflected by the target back towards the camera. This makes the target appear to be much brighter than the background due to the effective selective illumination from the ring light. We choose to use the color green for the same reason it is used in television studios; that is that it is not commonly found indoors.

My vision analysis leverages opencv, an open sourced library for image and video analysis with Python 3. It starts by obtaining an image frame from our camera, and thresholds it for the color green. The process is helpful for isolating the target but can not be used to identify the target because it can only provide position data of the target; not its height and width, and it has a high rate of false positives.

In the first few days after the game was announced, I realized that I needed to be identifying the actual target at a more abstract level. The target is made up of two rectangles. My higher level target identification can be broken down into two parts; identifying each piece of target tape, and determining which pair of them is the target. Opencv provides a function that walks the boundary between black and white in a binary image and returns a set of contours. For each contour I decide if it is a piece of target tape with two boolean tests. The contour must have a total enclosed area greater than 70% of the area of it’s bounding rectangle, and the bounding rectangle must have an aspect ratio between 1.5/5 and 2.5/5. If the contour passes both of these tests, I save the location and size of the bounding rectangle in an array. In perfect conditions the contour would fill 100% of it’s bounding rectangle and the bounding rectangle would have an aspect ratio of 2/5. I chose these thresholds for a few reasons. Camera lenses always have distortion and I need to still be able to detect the target in the edge of the frame, where distortion is highest. If the robot is not normal to the target it will also be distorted even in the center of the frame by compressing the target horizontally. Each playing field will be used hundreds of times, and FIRST FRC does not always replace damaged field elements. In order to not lose tracking of a damaged target I could not require 100% infill.

This first level of abstract target identification does not identify a target. Due to the placement of the three pegs on adjacent sides of the tower, it is feasible that our camera could see two targets at once and thus we can not determine the targets location by taking the average of the positions of the two largest pieces of target tape. This initial layer of target identification is very useful from an efficiency standpoint. Transitioning from analyzing pixels to doing calculations on sets of four numbers which represent pieces of target tape helps to avoid spending valuable CPU time.

For every set of two rectangles recorded by the first abstract level of target identification, the initial second level of abstract target identification checks if the two pieces of target tape are within 20% of each others size. From this, it calculates an expected distance that they should be apart. For them to be considered a target their distance had to be within 20% of that number. These initial tests suffered from two problems, high rates of false negatives and high false positives. It was occasionally picking up on two targets simultaneously. I decided that I needed to introduce a scoring metric to only identify one target, and hopefully identify the test or tests which were resulting in false negatives and modify them.

Below is a screen shot of my initial target identification identifying a mock target.

Scoring potential targets proved to be difficult. I quickly realized that for each test I needed to return a score independent of the distance the camera is to the target so that the software could equally well detect close and far targets. After this modification I realized that every test was returning a wide range of different scores for real targets. After using mathematica to to try and find a function which would return high values for false positives and low values for real targets, I came across the sigmoidal function. Creating a composite function of this with a linear function enabled me to return low values for probably real targets, and high values for probably false targets. Targets which are more than 30% away from their expected results are immediately rejected. This function provides slopes which allow targets to almost fail one test due to distortion without causing false negatives.

Below is a plot of the function:

In the end I used three tests; checking if they are of similar heights, checking if they are the expected distance apart in the X axis, and checking if they are the expected distance apart in the Y axis. Each of these tests generates a score which is added together to generate a score for each potential target. The lowest scoring potential target with a score under 50 is considered to be the real target. If no potential target matches this requirement then no target is detected for that frame. Below is a screenshot of this tracking.

When we put the camera on the robot and drove to the peg running target tracking we realized that the peg can sometimes cover one of the target’s breaking it into two squares. This was quickly resolved by modifying the tests for similar size, and changing how we calculate the expected X axis distance of the two target pieces to be based on the larger of the two pieces not the average.

Communications

Our communications use UDP to send the the X - Y position of the center of the target, it’s height, and width to the roboRIO which controls the motion of the robot. The only problem we had with this was when our robot radio was programed for competition. FRC switched to a new radio this year and still has some bugs. We were able to quickly resolve this issue by switching back to last years radio.

Robot motion

See RustyComms repo for code

Of the six paths that the robot can take to deliver a gear to the peg in the autonomous period of the game there are only three unique paths. Given that all the paths are equal in point value, I decided to start by writing software to deliver a gear to the center paths.

Initially, I had a two zone autonomous driving system. The robot would turn towards the target if the target was outside of the center 15% of the image. Once it was inside of the center 15% it would drive forward at a constant rate. In doing this we discovered that there was a high latency between events in reality and the roboRIO being updated over UDP from vision analysis. This latency was about 200 ms. By benchmarking the vision code, migrating the vision software to an intel i7 laptop, and dropped 66% of the frames we were able to reduce the latency to around 125 ms.

With the latency still fairly high, I decided to use non-linear response curves to help prevent overreaction, and to merge rotational error correction into forward motion. Using non-linear curves I was able to make the robot quickly correct for rotational error and reliably stop just in front of the target. During this time we did not have bumpers on our robot and thus were unable to test the actual delivery of the gear. We were only able to test how effective we were at lining up the peg to the robot. After I tuned the power and turning functions to make center auto reliably able to line up the robot, we calculated what paths the robot would have to drive for the side auto. From this we expected that driving straight at the peg from the edge of the field would be able to get the peg into our gear. As it turned out vision was unable to reliably place the peg in the the gear when 30˚ away from normal to the tower. Two modifications helped to fix this; enlarging the hole on the robot to require less accuracy and precision in delivering a gear, and changing auto to drive forward until the robot is normal to the peg. Without encoders and only a camera and a clock, I tried to have the robot drive forward for a specified amount of time to make the robot normal to the peg. The robot drives at drastically different speeds due to varying battery voltages and driving surface. We noticed that as the robot drives forward, the target moves towards the outer edge of the frame. I decided to have the robot drive forward until the target passes a threshold on the edge of the frame, and then act as if we are at the center auto station and finalize the delivery of the gear.

At our first competition, we mainly used the center auto station because our robot was more reliable than other robots at delivering a gear to the center peg, and our edge software was minimally tested. At our second competition, we had some time in the practice arena and were able to tune the thresholds, making us able to reliably deliver a gear to any of the six pegs during the auto period of the match. To deliver a gear to the center peg takes about four and a half seconds, and the edge pegs take about six and a half seconds. This allows our teammates in the tower seven to ten second to lift the gear out of the robot and put it in an ‘engine’ to score 60 points. Delivering a gear to the side pegs was a critical part of our team getting to semi finals at the second competition.

Limitation of solution and next steps

Accuracy, precision, and time are critical parts of scoring points in the autonomous period of the game. Vision analysis is really effective for accuracy and precision however it provides low frequency data with high latency which prevents us from moving the robot at high speed. Tools like Kalman filters and time stamped vision frames, can be used to create a high accuracy high precision prediction of what the robot is doing and where it is at any given time by using statistics to lower noise in data, and using physics to predict the future states of the robot between sensor measurements.

To drive a more optimal path quickly requires not only knowing where the robot is, but also to calculate where it can physically be in the future, and what path it should take. By experimentally determining the limits of our robot’s derivatives of position we can determine what position and state it can be in 100 ms from its current position and state. From this we can calculate a path and effectively follow that path.

If we can get our autonomous driving to push the robot to it’s limits and take optimal paths it could become relevant to not just the first 15 seconds of the match, but also take over at times during the remote controlled part of the match.