How data is changing the car game for Ford

When most people think about how cars are built, they probably think about assembly lines, manufacturing robots, and batteries of safety and performance simulations on massive supercomputers. But at Ford, big data is having a significant impact on the parts and features of those cars before they're ever part of a design file. From the cars in stock at the dealership to the performance of the engine in a rainstorm, big data is infiltrating nearly every aspect of the Ford experience and the company itself.

Obviously, data is nothing new to the automotive industry — companies have been trying to optimize supply chains and analyze sales numbers for decades — but the advent of big data, as well as related technlogies such as sensors and smartphones, is changing how companies are thinking about data. Ford isn't alone in its quest to take advantage of these new technologies, either. For example, General Motors collects data from its OnStar system to help lower drivers' insurance premiums, and also collects lots of data on its Chevrolet Volt electric car that it feeds to drivers via a mobile app. We recently noted how a luxury automobile company used big data software from Aster Data Systems to determine the relationships between malfunctions so it could provide a more thorough and beneficial service-department experience.

But in an industry notoriously unwilling to talk about information technology, Ford's experiences might shed a lot on what other companies are thinking and doing, as well.

Building a better experience through data

According to John Ginder, manager for systems analytics with Ford Research & Innovation, the company has been doing advanced business modeling for about 20 years, but big data is something else. Today's technologies are allowing Ford to handle larger, more-diverse datasets than ever before possible, and its efforts are already beginning to bear fruit in numerous places — including in the cars themselves.

The most obvious example of data influencing the driving experience might be the types of data car companies are actually giving back to drivers. At Ford, its Energi line of plug-in hybrid cars generate 25 gigabytes of data per hour that's then processed and given back to drivers via a mobile app. It tells them about battery life, the nearest charging stations and other data about the vehicle's performance.

Ginder said all that data is the result of a "convergence of need and opportunity." The opportunity is a way to experiment with collecting and presenting vehicle data on a group of early adopters that's probably more interested in this type of advanced technology. The need has to do with what Ginder calls "range anxiety" — when drivers are getting used to electric vehicles, they need reassurance they're not going to run out juice.

However, Ginder said, the company is just scratching the surface of what's possible, because there aren't that many of the electric vehicles on the road yet. The goal is to better understand how drivers are using the vehicles and use that information to continuously improve the vehicles and the overall experience. Ford's Super Duty line of pickup trucks also offers a "crew chief" package that lets bosses monitor the fuel consumption, engine performance and other data about their fleets of vehicles.

Mike Cavaretta, technical leader for predictive analytics and data mining with Ford Research & Innovation, added that Ford is really interested in collecting more data from more vehicles, but noted there's also a privacy concern that could come into play. The potential of someone knowing where and how you're driving might not appeal to the mainstream just yet (just look at all that data Tesla collects about its cars and can present if it really wants to), but as with the Energi, data does present some opportunities to improve the customer experience.

The test cars in Ford's research labs are collecting about 250 gigabytes of data per hour from high-resolution cameras and an array of sensors, Cavaretta noted, and the company is trying to find out what data is most useful and how it might be rolled into production vehicles.

Building betters cars through data

Of course, sometimes the best data isn't the stuff you see, but the stuff that just makes your car better. Cavaretta said Ford analyzes a lot of social media and other external data in order to figure out, for example, what customers are saying about their vehicles compared with other makes and what problems they're having.

In one recent case, the product development team was curious as to whether the Ford Escape sport-utility vehicle should have a standard liftgate (i.e., it opens manually and the rear window can flip open) or a power liftgate in which the glass and the gate are one piece. In the latter option, the gate opens automatically by tapping under the rear bumper with your foot, but the window doesn't open at all. Regular surveys hadn't addressed the question, so Cavaretta and his team took to social media, where people were actually talking about it quite a bit and seemed to heavily favor the power liftgate in most cases. It's now a feature.

Back in 2004, Ford built a self-learning neural network system for its Aston Martin luxury brand that maintains proper engine function by recognizing engine misfires and particular driving conditions and adjusting warnings and performance accordingly.

Ginder said his team has been improving on that technology ever since and actually expanded its use into a system, called Smart Inventory Management System, that lets dealers ensure they have the optimal stock of vehicles and features on their lots. Historically, he said, some dealers were very sophisticated about inventory management, while others were more reactionary ("They just sold a red Mustang," he joked, "so they think they need to go order another red Mustang.") With SIMS, all sorts of data about vehicle sales and other locally relevant data from across the country is aggregated in Ford's big data platform, and the neural network algorithms learn the current patterns so Ford can make better recommendations — whether or not dealers choose to heed the advice.

Selling big data internally

Cavaretta characterizes the division in which he and Ginder work as "an Ernst & Young, but just for Ford," an internal consultancy (as opposed to Ford's more-traditional research and development division) in charge of solving business problems via analytics. About 80 percent of those problems come directly from those lines of business, while about 20 percent are the research division's own ideas. However, although he's excited about how big data can help his team answer these questions in novel ways, it's not always an easy sell with other parts of the company.

Mashing up data sources such as social and sales in order to find insights is a pretty easy sell, Cavaretta explained, but getting people to put sensors in everything and collect data every second or with every transaction can still be a bit challenging. In part, this is just a lingering effect of the constraints that legacy technologies imposed on the company. It wasn't possible to store all this data, so people just got accustomed to the status quo of summarizing data hourly, for example.

Now, however, he's pushing them to "dial it down" and collect data at the lowest level possible and as often as possible. In manufacturing alone, he explained, there are between 20,000 and 25,000 parts in any given vehicle, and there's a supply chain that spans from parts suppliers all the way up to dealerships. Getting a complete view of this process could help drive serious efficiencies and, Cavaretta said, "We don't see anything but big data technologies that can get us there."

Other areas where Ford is collecting, or wants to collect, more real-time data is from websites, call centers and the company's credit-processing arm, he added.

Building big data internally

In order to accomplish their lofty goals, the Research & Innovation analytics team relies heavily on open source technologies, most prominently Hadoop. However, Cavaretta said, they've been experimenting with a variety of natural-language processing tools, too, and even did a proof-of-concept with SAP's HANA in-memory analytic database. The NLP tools were first turned on text analysis of internal surveys and dealer network documents, but now are used pretty heavily on social media and other web data.

Their team has some systems numbering in the dozens of nodes in its own building, but on weekends it's able to borrow high-performance computing cycles from Ford's Numerically Intensive Computing Center next door in order to model recommendation engines and other tasks that demand serious computing power.

But as a part of a specialized research division, the work that Ginder, Cavaretta and their team do on everything from Hadoop to visualization with tools like Tableau isn't automatically ready for primetime. In fact, Cavaretta said, it looks at "what's the art of the possible" and tries to show the value of it. It's like a vanguard, he added, going out and seeing what's ahead and then reporting back.

At that point, projects are often handed off to Ford's central IT team that actually puts the technologies into production. A system that took the research team weeks to deploy and start deriving insights from might take IT months to make production-ready. However, Ginder added, his team can't just throw stuff over the wall and abandon it — it has to collaborate with the IT team and individual departments throughout the project's lifecycle.

An important part of this cross-company relationship — and something many CIOs have likely heard before — is having data scientists on board that can see the world through the eyes of both technologists and businesspeople, two groups that often have different concerns and goals in mind. "We look for people who can bridge those worlds," Ginder said. "It's hard to find these people, but they're hugely important to organizations."