During my visit to General Electric's Global Research Centers in San Ramon, California, and Niskayuna, New York, last month, I got what amounts to an end-to-end tour of what GE calls the "Industrial Internet." The phrase refers to the technologies of cloud computing and the "Internet of Things" applied across a broad swath of GE's businesses in an effort to squeeze better performance and efficiency from the operations of everything from computer-controlled manufacturing equipment to gas turbine engines and power plants. It's an ambitious effort that GE is hoping to eventually sell to other companies as a cloud service—branded as Predix.

GE is not alone in trying to harness cloud computing and apply it to the rapidly growing universe of networked systems in energy, manufacturing, health care, and aviation. IBM has its own Internet of Things cloud strategy, and other companies—including SAP, Siemens, and startups such as MachineShop—are hoping to tie their business analytic capabilities to the vast volumes of data generated by machines and sensors. That data could fuel what some have called the next industrial revolution: manufacturing that isn't just automated, but is driven by data in a way that fundamentally changes how factories work.

Eventually, analytical systems could make decisions about logistics, plant configuration, and other operational details with little human intervention other than creativity, intuition, and fine motor skills. And even in industries where there is no production plant, analytics could make people more efficient by getting them where they need to be at the right time with the right tools.

Creating that world requires some demanding management of data and the modeling of systems and processes in the physical world that create that data to give it meaningful shape. In other words, analytics of industrial operations requires both a schema for all the things and the computing power required to be able to both translate and understand real-time data streams while discovering trends in deep lakes of historical data.

Such data comes in many forms—and it can come from many places. In manufacturing and other traditional industrial environments, many systems have already been instrumented for computer control through SCADA (supervisory control and data acquisition) systems for a computer-based HMI (human-machine interface) console. In these cases, it's relatively straightforward to tap into the telemetry from those systems.

But other systems that haven't been connected to SCADA in the past can be an important part of analytic data, too. For example, GE's Connected Experience Lab Technology Lead Arnie Lund demonstrated an analytic system for Ars built for Hydro Quebec. The system pulled in not just information from the power grid, but weather sensor data and even information on historic and projected tree growth in areas around power lines to help predict in advance where there might be outages caused by wind or fallen branches. Similar analytic systems included geospatial data on railroad lines and track surveys, aiming to prioritize track maintenance to prevent derailments and other incidents. In both cases, much of the data was pulled from devices that weren't networked live, instead these were only occasionally or opportunistically connected to networks.

It's when data from networked sensors is fused with other sources (like the tree survey) that it becomes valuable. On the lower end of the analytics space, there are tools like Wolfram's Data Drop, which can take in data from anything that can send it via HTTP and add semantic structure to it for analysis. For larger systems, like GE's Predix, it all goes into a "data lake"—a giant cloud storage pool of structured and unstructured data that can be programmatically accessed by analytics tools.

But all that data is useless without good analytics, and simply matching raw sensor numbers by timestamps isn't enough to understand what's going on historically or in realtime. That's where data science comes in. "Data science is all about building models on any kind of data that represents physical phenomena," Christina Brasco, a data scientist at GE Software in San Ramon, told Ars. "We're building analytic engines that might be working on numbers a mathematical model produced, and not on raw data."

Brasco is focused on aviation systems right now, specifically the tens of thousands of GE gas turbine engines that the company manages in air fleets around the world. "GE is trying to move toward predictive maintenance, so data science comes in as we try to build predictive models that replace things that are more hands on," she said. "I'm producing one of many apps that try to do predictive scheduling of maintenance at the fleet level so we never have unscheduled downtime."

That means creating models based on the thousands of terabytes of maintenance history data and remote diagnostic data recorded from every jet engine in GE's managed fleet—data periodically dumped into the cloud during between-flight maintenance. Models can then calculate projected wear on turbine blades and other components over time, figuring out when it's time to pull them. Other models being built by data scientists at GE for other lines of business could also make calculations based on live streams of data to determine whether systems are configured efficiently or are edging toward dangerous conditions.

Harel Kodesh, GE Software's chief technology officer, told Ars that the goal of Predix is to essentially create "a cloud operating system" for industrial analytic apps based on sensor data. He envisions the system as a "digital Switzerland" where companies can control access to their industrial data while being able to leverage analytic software written by third-party developers.

The advantage of using a cloud platform to deliver the analytics as well as the data, Kodesh explained, was that "doing it in the cloud makes it much faster to get new systems out." Cloud APIs such as those for Predix and other platforms take the issues of building a data store or provisioning for more computing power out of the picture. "Developers shouldn't be worried about how to access data," Kodesh explained. "The idea is that we want developers to be able to build analytics software capable of solving the problems they set out for themselves."

Kodesh is betting that many of GE's customers will buy into the Predix platform because it will already have models for the equipment they use, and it will offer a platform for customers to build their own models for other systems. Potentially, Kodesh even foresees an "app store" with models from third-party developers and equipment manufacturers.

GE won't be alone in that game, though. IBM, Amazon, and others' existing cloud services are likely trying to draw developers of their own for "Internet of things" analytics and other cloud-based processing of industrial data. IBM is also looking to bring its Watson "cognitive cloud" service to help people understand data from IoT devices, according to IBM's vice president of Watson products and services Alexa Swainson-Barreveld. "We've actually been having a fairly in depth conversation about IoT recently," she told Ars. "It's a place where we think we can help deal with the massive amounts of content and separate the signal from the noise. We see the industrial data space as a major opportunity area."

For now, models that feed back into control systems from the cloud to modify their operations aren't in play. But systems like the company's wind turbines already use local analytics to change configuration based on sensor data (some turbines change the pitch of their blades when wind gusts are detected to prevent damage to the system, faster than a human could respond to the change). Considering GE's hopes and the company's progress so far, "closed loop" analytic systems for more of the industry are likely not that far off.