Hybrid Machine Learning Is the Best Model to Fuel Business Improvement, Revenues

Until recently, purely data-driven AI — machine learning, most notably — has been seen as the most attractive enabling new data technology for digitalization across industries, including digital twins deployed by heavy asset industries such as oil and gas, and shale – which is getting more attention recently in light of the attacks on key oil facilities in Saudi Arabia and oil production continuing to hit new lows.

More established, though much less hyped, physics-based modeling has rarely enjoyed the spotlight in recent years. However, artificial intelligence has an inherent black box nature. Black box models expose the outcome — such as how many times the model correctly identifies a cat in a series of photos — but provide no clue as to why the model arrived at the outcome it did.

Shedding Light on the Black Box Problem

The most commonly used practical example of this black box problem is training a cat identification model on 1,000 pictures in which each picture with a labeled cat in it — say 135 cat pictures — is in sunny conditions with a blue sky visible. It can then perfectly well be that the algorithm does not consider the cat as a signal in the picture at all but instead focuses on finding a sufficient surface area of sky blue and an overall light color tone in the picture.

From the output alone, this inner working of the model is impossible to see. And when validated against similar cat pictures taken in daylight outdoors, it will show very good accuracy. However, it will not have any chance of identifying any cat, no matter how clear in the picture, if taken inside a building with no sun and a darker color tone.

The black box nature of AI is why pure AI-based approaches are failing to gain full acceptance with field operations, which have a culture rooted in engineering sciences with zero risk tolerance for critical systems. In addition, mounting empirical evidence from hundreds of proofs of concepts involving promising AI startups by O&G industry leaders is debunking the omnipotence of AI to solve production optimization and predictive maintenance use cases.

Providing a Better Approach Using a Glass Box Model

This more informed reality of AI is driving the future of hybrid machine learning, a hybrid of physics and AI analytics that combines the glass box interpretability and robust mathematical foundation of physics-based modeling with the scalability and pattern recognition capabilities of AI. Glass box models shed some light into how they arrive at the outcome, in addition to the outcome itself.

Both physics-based models and machine learning (the most common form of AI applications) can be used to make future predictions. The answer as to which one to use for what and when depends upon the problem you are trying to solve.

Problem classes fall into two primary categories:

Systems with lots of experimental data about historical behavior but no theoretical knowledge framework Systems with good mathematical theory framework in place (commonly matched with equally robust empirical behavior data). One advantage of a physics simulator is that it can predict with a certain confidence even when no historical data exist. That means it works from first oil, meaning from the start of production. And it works during the design phase. Historical data is used to increase accuracy and estimate uncertainty.

Identifying Limitations and What Works Best in Which Situations

For systems in the first category, a physics-based model isn’t workable, as it’s not possible to formulate a robust mathematical model to describe the system. Machine learning, however, does not suffer from the same limitation. AI’s black box nature is an advantage here, making it possible to use machine learning also in such scenarios. That is assuming enough contextualized training data is available. With this condition met, a machine learning model should be able to learn any underlying pattern between the system and its outcomes and ultimately also make predictions.

Two caveats remain, however. The first is the questionable confidence level in resulting predictions (i.e., the precision and recall challenge), which could render an otherwise functioning AI approach unfit for many critical manufacturing processes. The second caveat is the oftentimes absent teaching sample of true failures in critical systems. Traditional scheduled equipment maintenance is designed to prevent such costly failures above all else.

For systems in the second category, a physics-based model can offer a good solution. Physics-based modeling is tried, tested and validated for even the most critical of simulations – such as space flight orbits. But it, too, has limitations. Its most notable limitation is the computational cost of persisting physics-based models in runtime environments with live data, especially across computationally heavy IoT use cases. It is here where hybrid analytics machine learning is offering an attractive solution.

Describing the system in detail using a physics-based model produces physically accurate, rich and fully interpretable synthetic data, such as virtual sensor data and equipment breakpoint data. This data is then used to train a machine learning model for subsequent live operational data analysis in predictive maintenance and production optimization use cases, leveraging the fact that once a machine learning model is trained, using it to make predictions on new data, even large with high velocity, is very cost-efficient. This applies to subsurface resource development challenges like finding the optimal completion strategies, where the hybrid models can provide smart advice based on physics-based simulations and historical data.

To give the production algorithm its cognitive edge, subject matter experts supervise such hybrid machine learning models to truly understand (hence the term “cognitive”) the physical boundary conditions of the systems. This greatly enhances the algorithm’s ability to produce meaningful outcomes.

To conclude, hybrid models are best suited for complex industrial process problems where a mathematical theory framework exists that can be used to teach a machine learning model. That is then used on real-time data for predictions. The result is a high-confidence tailored hybrid model combining strong domain knowledge (physics) with machine learning for cost efficiency and scalability. Especially in the proliferating space of digital twins, hybrid analytics is showing great potential.

About the author: Francois Laborie is President of Cognite North America. Visit www.cognite.com for more information.