As a followup - this doesn't feel like the way you'd build the NN if you were doing it from a clean start. The output isn't in the most digestible format so this approach is going to rely heavily on further processing in a subsequent subsystem. And I am told that this is the only block running on the GPU, so the 'subsequent subsystem' is a much lower power cpu.



Two reasons occur to me for why you would want to rely on a cpu this way. One is that it's easier to program and debug on CPUs than GPUs. Generally you want to keep complicated, non-performance sensitive code on a CPU because it's much easier to manage that way. The other reason would be if you already had all the CPU code working and were just using the GPU stuff to replace something else, like the Mobileye subsystem from AP1.

Click to expand...