im.thatoneguy: im.thatoneguy: I think step 5 needs to be GOTO 1. One of the strongest uses I see for something like ChauffeurNet isn’t necessarily driving it’s seeing when ChauffeurNet fails. Inevitably the Net will fail and you can start to bin failures into categories and some of those categories are solvable through further training but some of those will require a return to the fundamentals (perception). If the expert driver is reacting to some detail in the real world that doesn’t exist in the mid level data set. For instance if drivers are reacting to blinkers you need to Solve Perception in regard to adding blinker metadata for every vehicle. If a driver sometimes departs the roadway to go around a stopped vehicle but sometimes doesn’t you have a good data set of “departing roadway” to start adding metadata for road surface type “dirty, gravel, requires human intervention (uneven terrain with rocks)”.

Woah. This feels like a very deep insight: we don’t know a priori what self-driving cars need to perceive.

If this sounds counterintuitive to anyone, think about this: we don’t know how humans drive. We just do it. What we think we know about how humans drive — beyond the explicit knowledge we learn from driver’s ed — is mostly a posthoc reconstruction of our implicit knowledge. For all we know, we might be wrong in many parts of that reconstruction.

Or consider that, in general, neural networks are good at doing things that we have no idea how to tell them to do. We assume — or I assume — that we know how to tell a robotic system to drive. But why? Maybe we don’t know how to tell a robot to drive anymore than we know how to tell a robot to walk, or to see. Maybe driving involves an array of subtasks that are cognitively impenetrable and opaque to introspection.

im.thatoneguy, I don’t know who you are or what your background is, but it seems like you have really good instincts because you proposed months ago that Tesla could just upload mid-level representations instead of sensor data. When I said above:

strangecosmos: strangecosmos: Perhaps a path planning neural network … is being trained not with sensor data as input, but with the metadata outputted by the perception neural networks.

I think it was your post on TMC that had planted the seed in my mind. It’s pretty cool that your hunch has turned into a Waymo research paper and some reporting that suggests Tesla might actually be trying this approach.

What you said about using path planning failures to notice perception failures jives with what Karpathy said in this talk about Tesla’s “data engine”:

Perhaps the development process is a loop. Get far enough with perception to deploy a path planning feature (e.g. Navigate on Autopilot), then notice failures with that feature and identify them as either failures in perception or path planning, and then go back and work on perception some more or work on path planning some more. At the same time, keep working on new perception features (e.g. stop sign recognition) to enable new path planning features (e.g. automatic stopping for stop signs). Repeat the loop with those features.

I think the way I have been thinking about autonomous car development may be wrong because I have been thinking that we know what we need to solve. We know what all the parts of the problem are, we can solve those parts independently, and when we put all the parts together, that will be a complete solution. But this overlooks the fact that we have no idea why features will fail. The behaviour of the overall system is emergent from complex interactions within the system and with the environments, and it’s often unexpected.

Neural networks are black boxes, and even hand-coded software which is in theory transparent and deterministic often fails in ways we don’t expect.

If you try to build something without testing it in wild and varied conditions as quickly as possible, you run the risk that your posthoc reconstruction of what needs to be solved will diverge more and more over time with what actually needs to be solved.

My mental model has largely been “feed neural networks lots and lots of data and eventually they might solve the problem”. But this implies you already know a priori the problem that needs to be solved. And that knowledge of what needs to be solved comes from a posthoc reconstruction which is fallible. You need to test your whole system in the wild as early as possible to narrow the gap between your posthoc reconstruction and real driving.

To use an analogy, it won’t do to move closer and closer to hitting a target. You also have to keep checking whether that’s the right target to hit. You can’t just keep making progress on solving a problem. You have to make sure that’s the right problem to solve.

This is a made-up example just to illustrate the point. I can’t think of a real example, and I think the point I’m making is that real examples are hard to think of because they’re gaps between our explicit knowledge via posthoc reconstruction and how humans really drive using implicit knowledge.

Say that figuring out speed limits was a really hard problem for self-driving car engineers. And say that engineers thought this was a vital problem to solve because human drivers follow speed limits.

But say that, in reality, it turned out that human drivers completely ignore speed limits and just follow the natural flow of traffic, which emerges organically. (There might be a grain of truth in this; it’s inspired by a theory I read but only half-remember and can’t find now. I think some people argue it’s safer to increase speed limits because driving is safest when the traffic flows at an organic speed.)

You wouldn’t notice that until you deployed your self-driving car and found that it was getting into trouble because it was going a different speed than all the other vehicles (either driving too fast or too slow). You would be operating on a false theory about how driving is done, and you might put a lot of work into developing a solution to the speed limit problem before finally deploying and realizing that you solved the wrong problem. Not only is the solution you built unnecessary, it’s also insufficient.

To get a self-driving car working in the real world, you need to solve it feature by feature, and test the smallest possible features (atomic features?) as quickly as possible in the real world with the whole system running. If you don’t, you might solve problems that don’t need to be solved (like detecting speed limits, in the made-up example), and you might not solve problems that need to be solved (like how to follow the flow of traffic).

This is a whole new way of thinking for me that I’m not used to. I will have to think about this more and revisit some of my old assumptions.

It’s a super exciting conceptual revelation. What’s particularly interesting to me here on a meta level is that you can derive an engineering approach from epistemology, i.e. thinking carefully about what you know and how you know it, about how human knowledge is created (especially with regard to complex systems), what humans can and can’t know in different contexts (e.g. you can’t predict the discovery of a failure mode without making that discovery), and the difference between human competence and human comprehension (implicit knowledge and explicit knowledge).

Epistemology, either explicit or implicit (or a combination of both), is arguably behind the success of science and engineering as approaches and cultures of solving problems. I’m always excited when really abstract, dreamy concepts unexpectedly collide with nitty gritty technical concepts. It’s a reminder that thinking dreamy thoughts isn’t a waste of time and actually impacts the physical world in big ways.