These codes are all in the function model_eval_frame and this function is called in selfdrive/visiond/visiond.cc:

From the codes in visiond.cc, it can be seen that the output of the driving model is published to ZMQ through port 8009.

So I do a search on ‘8009’ in the code and find all the subscribers:

Source code for the subscribers of the port 8009

The first subscriber is in ui.c, which is mainly used for detection result display on EON Devkit.

The second subscriber is defined in service_list.yaml, so I need to find the underlying functions that call this file. Fortunately, with several rounds of cross searching (thanks to VS Code), I find all the subscribers for the output of driving_model:

Modules where lead car information and drivable path information from the driving_model are used

Modules where all the lane information from the driving_model is used

These subscribers are all modules written in Python in the controls folder. The radar module receives the lead car information from the driving model and fuses it with radar data to create more accurate lead detection. The planner module receives the drivable path information and implement a Model Predictive Control (MPC) for the driving speed. The lane_planner module receives the drivable path, left lane and right lane information and outputs it to the path_planner module.

Also, I found that service_list.yaml enlists all the ZMQ pubs/subs and the communications between them:

Combining all these information, I draw a draft diagram to show the general interfaces of the driving model:

Visualization

However, code analysis doesn’t provide me much useful information to infer the architecture of the deep neural network. So I planned to analyze the driving_model.dlc by building an isolated testing environment using Qualcomm Snapdragon Neural Processing Engine (SNPE) SDK.

Coincidentally, when I was exploring the reference guide of SNPE SDK, I found there is actually a visualization tool for DLC file: snpe-dlc-viewer.

All the tools in SNPE SDK only run in Ubuntu environment. So I quickly ran a docker for Ubuntu on my Macbook Pro, installed necessary python libraries and set the PYTHON_PATH, and type the snpe-dlc-viewer command to convert the driving_model.dlc to an html (I was so excited when this moment came). The html provides a fantastic interface for the model visualization:

From the visualization, it can be easily observed that the feature extraction CNN has a ResNet-like structure stacking 4 layers (conv2 to conv5):

After the CNN, the 8x16x4 feature is reshaped to a 1x512 vector and the vector is fed into a RNN-like structure:

This structure is obviously a modified version of GRUs:

GRU visualization from What is a Recurrent Neural Networks (RNNS) and Gated Recurrent Unit (GRUS)

After the RNN stage, the 1x512 output is forked into 5 channels. The first 4 channels are connected to 4-layer MLPs and finally output the information of path (1x384), left lane (1x385), right lane (1x385) and lead (1x58):

The 5th channel is directly concatenated to the output and connected back to the GRUs input (rnn_state:0) in the code. For more details on the model, you can visualize the model using snpe-dlc-viewer, or simply download the html file below and open it in your browser:

Another thing noteworthy is, from openpilot 0.6.5, comma.ai changed the stage between the CNN and GRUs from a simple 1x1 convolution structure to an inception-like structure:

Visualization of the stage between CNN and GRUs in 0.6.4 driving_model

Visualization of the stage between CNN and GRUs in 0.6.6 driving_model

I believe this change is the key to the improvement on path prediction and lead detection mentioned in the RELEASES.md.