I'm not a pro on the software side, and I only dabble in neural networks so all I can offer is observations and opinions. But here are a few items:



I put 26k miles on AP1 before switching to AP2 and now have about 7k miles on AP2. As of right now it's not hard to find situations where one or the other clearly wins so I can't really make the case that one is superior. Depending on what you care about and how and where you use it you may well have a strong preference for one or the other. Personally I'm pretty happy with AP2 right now but I can certainly understand people who are deeply unsatisfied with it.



From what I've been able to dig up about the vision features in Mobileye's vision design in AP1 and what I can determine about how AP2's vision works from the neural network architecture I think it's fair to describe these two systems as fundamentally different approaches. When ME started development on vision for their system neural networking for vision wasn't a thing. So of course they designed their silicon to be an efficient accelerator of conventional vision heuristics and they programmed it conventionally. From comments the CEO made back in 2014 and 2015 I believe that they hand tuned all of the vision kernels for the system that went into AP1. This approach has advantages and disadvantages. On the upside the kernels are very computationally efficient so you can run with less hardware, which was really important when they started. But the more important difference is that the kernels are "designed".



When something is "designed" it tends to be fairly well understood. If you look at any particular situation where it isn't working you can figure out why and how to fix it. If you take a well defined use case and design a solution for it you can come up with something that works reliably within that use case. I think it's fair to say that ME developed their use case and designed a machine that functioned predictably within that use case. So that's great, but it means you need a well defined use case and it means you have to explicitly design the machine for that use case. And within that use case you'll get predictable behavior. (As an aside, I think Tesla pushed AP1 outside of ME's use case and it's not hard to see why that would be upsetting to ME. They didn't want to see any accidents on Tesla's closely watched vehicle being attributed unfairly to a failure of ME's vision system.)



Now for driving in the real world a single overall use case isn't feasible so you break the problem down into elements and scenarios and you design solutions for each one and then combine them all. It's labor intensive. VERY intensive of VERY expensive expert labor. Google started with similar limitations and a similar approach and has been throwing enormous resources at the problem for over a decade and still doesn't have a production system. They might have one soon, or they might not. Rodney Brooks - one of the pioneer luminaries in this field - has predicted that they are still 15 years away (his prediction is that it won't be a real thing for real people any sooner than 2032).



So the rapid advance of neural networks - which were almost entirely ignored until the last 5 years - allows for a different approach. Instead of "designing" the vision system you give it lots of data and create a process that allows the vision system to "learn" what it needs to do. This has some downsides compared to explicitly designed systems. For one thing when it's not working you don't know why explicitly. Just like there isn't one neuron in a crazy person's brain causing the problem there isn't one line of code in a neural network that's responsible for why a particular sign wasn't recognized in a particular use case. The system's knowledge wasn't created by the designers and it isn't organized in ways that allow the designers to tease out the causes of particular behaviors. This 'black box' aspect of neural networks is a major challenge to people who work with them.



So why use neural networks if they have this really ugly flaw? In a word, it's because they scale well. If I need 50 designers for 5 years to design a system that works well in 1 use case and I have 10,000 use cases then I need something like 500,000 designers for 5 years to do all 10,000 use cases. Or more likely 50,000 people for 50 years. With neural networks the problem is data and computers per use case rather than people and years per use case. So I need 10,000x as much data rather than 10,000x as many people. And to the extent that this simplistic analogy is true this second example is feasible within 5 or 10 years and the first one is not.



In this manner of thinking about AP1 and AP2 Tesla Vision is more of a 'learned' system than it is a 'designed' system whereas the ME parts of AP1 are in the 'designed' category. And AP2 is still an immature 'learned' system at that - the training of it isn't yet properly sorted out. But the promise is that once they have the process for training the system worked out it will scale up to be able to handle the enormous variety of the real world much faster than a 'designed' system could scale up it's work force to deal with those thousands of use cases.



Ok, so this is a very roundabout answer to your question of why can't AP2 do simple things that AP1 could do years ago. And my grossly oversimplified response is that AP1 and AP2 are made different ways and those different methods have very different strengths and weaknesses. Tesla started over with a different approach because they need the ability to scale up the ability of AP without having to hire a vast army of people who don't even exist yet. Elon clearly believes that this tech is going to scale very rapidly once they have the formula worked out as his public pronouncements have consistently shown.



And in the meantime there are situations that AP2 doesn't handle that AP1 does.



As an aside - I prefer the AP2 lane change over that of AP1. Maybe this is geography or a matter of taste rather that code? And as for the speed limit signs - I agree that reading signs is not a particularly hard problem. My guess there is that they decided not to rely on reading signs rather than that they can't do it. Maybe because of a focus on using map annotations instead, or perhaps there's some subtle failure mode that relying on speed signs can lead to.



I recently sat through a lecture by Waymo's head of development and he was describing all these crazy and kind of scary things that they run into. One example was about seeing an overhead sign reflected in the rear window glass of the car ahead. It only happens in rare situations but since both lidar and vision reflect off of glass both of those sensors see a big street sign lying in the road ahead of the car and their car wants to swerve or brake to avoid the 'sign' in the road. It's a really obscure but serious failure and it's much harder to deal with than it first seems. They get similar weird events when driving past glass fronted buildings and big shiny buses. Even standing water on the road can do crazy things in the right situation. They have all these cases that they have to carefully test for, write code to fix, and then go out and test again. Heuristic approaches like the one's that Waymo uses work perfectly when they are working but they are brittle - they fail spectacularly and suddenly and the designers have to compensate for that. They make it easy to be overconfident because you can't see the failure coming, which I'm sure is one of the things that led Chris Urmson to commit to nothing less than full level 5 - because ordinary users can't be relied upon to respect the limitations of a system that they don't experience until it's too late.



I note that their initial test deployment service is going into Chandler AZ; a suburban development with few overhead signs, glass fronted buildings, or big shiny buses roaming the streets. And not a lot of standing water. I wonder if that's a coincidence.



Neural networks get wonky as they approach a failure point and if you use them much you'll find that you can see a failure coming. My sense of AP1 and AP2 mirrors this - AP1 gives me perfect confidence even in places where it might be driving right along the edge of a gross failure. That makes AP1 more 'comfortable' because it's hiding it's limitations, in a sense. AP2 conveys it's lack of confidence to me by getting wobbly or moving outside the perfect center of my comfort zone. So depending on what you expect that can make you not want to use it. I like it, but I understand why other people do not.

Click to expand...