1. Some statistics

During the CARLA AD challenge, we ran more than 5,700 hours of simulated driving for a total of 6,582 Km across both public and sequestered towns and routes. The challenge received a total of 211 participants, who organized in 69 teams. These teams performed a total of 525 submissions among the different phases of the four tracks. In order to evaluate these submissions, we used a total of 10 AWS compute nodes endowed with 8 K80 GPUs each.

2. Results in context

In our analysis of the results, it is important to bear in mind the fundamental differences between the different tracks using in the CARLA AD challenge. These 4 tracks can be conceptually divided into two major categories: Perception-heavy tracks and Map-based tracks.

Perception-heavy tracks (Track 1 and 2) force agents to navigate without any prior knowledge about the surroundings, i.e., map-deprived. Teams competing in these tracks must rely on sensor information to understand the road and the different traffic situations. In terms of difficulty, these are the two most challenging tracks of the contest.

Map-based tracks (Track 3 and 4) assist agents by providing additional information, about the environment. Track 3 includes full HD maps containing the 3D representation of the scene along with semantic information about lanes, lane directions, the position of traffic signs and traffic lights, etc. This track is intended to accommodate existing AV stacks, such as AutoWare. For track 4, this information is enhanced with the position of dynamic actors, such as vehicles, pedestrians, and obstacles. The purpose of these simplifications is to allow participants to focus on the driving logic in “ideal” perception conditions.

In a situation of ideal perception, the winning team of track 4 was able to achieve an average score of 79.12 points (given as points obtained for route completion minus the infraction points discounted). These results are obtained by averaging the 10 routes over 3 repetitions in different weather conditions. It is interesting that even in situations of ideal perception, current driving stacks cannot deal with all traffic situations successfully.

For track 3, we observe a considerable gap (-12.29 points) with respect to previous results, achieving a maximum score of 66.83 points. This performance drop finds a simple explanation on the lack of certain sources of privileged information, such as the position of other vehicles and pedestrians. In this case, agents have to leverage sensor data to detect and predict the state of dynamic elements.

A more drastic gap is observed between track 3 and track 2 (-37.66 points), leading to a maximum score of 29.17 points. A natural explanation for this observation is the added challenge that agents have to address to understand the layout of the scene, including lanes geometry, topology, boundaries, etc. Furthermore, agents need to understand traffic signs, all using only cameras. One of the major challenges of this track is to identify which of the multiple traffic lights is affecting the agent.

From the graph, we observe that the score of track 1 is lower than that reported for track 2 (-2.44 points), reaching a maximum of 26.73 points. This may seem a bit counterintuitive at first, given that in track 1 teams can use additional sensors, such as LIDAR. We think that this phenomenon is due to the lack of time that teams faced during the challenge, which prevented them to implement LIDAR-based algorithms to deal with different tasks. In such conditions, track 1 and track 2 become equivalent, which explains well the results shown during the challenge.

3. Infractions count

Previous results represent an encouraging start in the process of mastering driving and traffic situations. However, it is important to have a look at the distribution of infractions in order to understand the reality behind the proposed methods. In the plot above we show the average number of infractions per track for the top-5 teams over a total of 60 km driven. These are certain things to pay attention to.

Even in idealized situations (track 4), there is a very high number of infractions per kilometer, including 47 infractions on red lights, 22 collisions with other vehicles and 2.5 collisions with very unlucky pedestrians.

When the amount of privileged information is reduced (track 3), these infractions raise significantly. 19.3 collisions with pedestrians and 28.7 collisions with other vehicles. Furthermore, we now observe 9.3 invasions of the opposite lane (wrong way) and 1 sidewalk invasion. Agents also seem to experience additional problems understanding the provided route, leading to an average of 5 route detours (and the consequent termination of the episode).

Moving on to the perception-based tracks, we see that the challenge of understanding the scene layout becomes more relevant. This results in an increase of lane invasions to an average of 13.5 (track 1) and 14.7 (track 2). The number of sidewalk invasions also increases to an average of 2.5 points (track 1). The same pattern is true for detours, which increases to an average of 8 (track 1) and 10.7 (track 2). Collisions against other vehicles also increase, reaching averages of 34.5 (track 1) and 29.7 (track 2). This can be due to the lack of context given the absence of a map, which makes reacting to dynamic objects harder.

Infraction analysis for different agents on the validation routes

Overall, the analysis of infractions is a clear indicator of the amount of work that is still required in order to produce reliable and safe driving agents.