AutoML, which uses machine learning to generate better machine learning, is advertised as affording opportunities to “democratize machine learning” by allowing firms with limited data science expertise to develop analytical pipelines capable of solving sophisticated business problems. In a Kaggle prediction competition held just a few months back, an autoML engine pitted against some of the best data scientists in the world finished second after leading most of the way. However, these advancements have raised concerns about AI hubris. By commoditizing machine learning for process improvement, autoML once again raises questions about what the interplay between data, models, and human experts should look like. What does all this mean for managing in an AI-enabled world? Here are three guidelines for using autoML effectively.

Greg Bajor/Getty Images

When Google Flu Trends was launched in 2009, Google’s chief economist, Hal Varian, explained that search trends could be used to “predict the present.” At the time, the notion that useful patterns and insights could be extracted from large-scale search query data made perfect sense. After all, many users’ digital journeys begin with a search query — including 8 out of 10 people seeking health-related information. So what could possibly go wrong? The answer is infamous in the business and data science communities. Google Flu was shut down in 2015 after the tool’s forecasts overestimated flu levels by nearly 100% relative to data provided by the Centers for Disease Control. Critics were quick to point to the project as the poster-child for big data hubris — the fallacy that inductive reasoning fueled by copious amounts of data can supplant traditional, deductive analysis guided by human hypotheses.

More recently, organizations have shifted towards amplifying predictive power by coupling big data with complex, automated machine learning (autoML). AutoML, which uses machine learning to generate better machine learning, is advertised as affording opportunities to “democratize machine learning” by allowing firms with limited data science expertise to develop analytical pipelines capable of solving sophisticated business problems. In a Kaggle prediction competition held just a few months back, an autoML engine pitted against some of the best data scientists in the world finished second after leading most of the way. However, these advancements have raised concerns about AI hubris. By commoditizing machine learning for process improvement, autoML once again raises questions about what the interplay between data, models, and human experts should look like. What does all this mean for managing in an AI-enabled world?

Insight Center AI and Bias Sponsored by SAS Building fair and equitable machine learning systems.

In our federally-funded project (with Rick Netemeyer and Donald Adjeroh), we are examining the efficacy of detecting adverse events from large quantities of digital user-generated content. It is critical for companies in many settings to monitor for adverse events related to their products or services — for instance, unknown drug side effects, children’s toy hazards, or issues leading to automobile recalls. The project’s goal is somewhat analogous to Google Flu’s original objective — use machine learning to generate accurate and timely signals for enhanced awareness of these potential adverse events. For instance, if a given drug has severe unforeseen side effects, or a car is malfunctioning due to a potential defect, various stakeholder groups including the product manufacturers, regulatory agencies, and consumer advocacy groups might be interested in receiving these signals as soon as possible. Our deep learning models analyze millions of web search queries to see how disproportionately certain product and adverse event pairings appear, relative to underlying noise. The models output a ranked list of potential adverse event signals to be further investigated. For example, the pairing of “Prius” and “sticky pedal” might be flagged as a signal.

In light of the possible pitfalls associated with applying advanced machine learning methods to large-scale search query data, we employed what we term an augmented machine learning approach (augML). Adapting the augmentation versus automation idea, augML enriches the autoML concept by underscoring the importance of experts, context, and complementary data. Managers and data scientists looking to enhance their machine learning capabilities in this way should consider the following:

1. Semi-automate the model development process.

Advancements in machine learning have increased both the complexity of tasks for predictive modeling, as well as opportunities for automating these tasks. Feature construction, tweaking representations in model architectures, and parameter tuning are examples of tasks that can often be automated — each is well-defined and can be guided by modeling KPIs. As complexity for these tasks grows, automation can add rigor by examining alternatives and combinations in a more comprehensive manner. But, at least for the foreseeable future, automation cannot replace expert knowledge. Rather, automation is better used as a complementary tool to human experts, freeing them to engage in more value-add activities that leverage their combination of domain and technical knowledge. In short, rather than bridging the expertise gap, we believe automation is best suited to augment the experts.

The research community’s post-mortem of Google Flu revealed the importance of considering the broader user journey when analyzing search data. Many searches containing “flu” aren’t really relevant to predicting outbreaks. Users might query how bad the flu might be this year, or be influenced to search for information on flu vaccines because of media hype. Based on these insights, in our adverse event detection project, we consciously designed our models to take into account query intent, be it product discovery, pre-purchase research, post-purchase inquiries, or other. Each type of search provides varying levels of signal regarding adverse events. The process for arriving at the best structure for our deep learning architecture was a semi-automated one guided by expert consideration of the underlying behaviors being analyzed, and augmented by machine learning.

2. Contextualize the machine learning with representation engineering.

Machine learning at a basic level can be described as algorithms learning patterns from data — something that advanced methods such as deep learning are highly adept at doing in increasingly complex, unstructured data environments. However, the way the data is represented to the algorithm is an often overlooked but critical factor for building high performing machine learning models. This representational richness comes from incorporating highly contextualized, problem-specific constructs tailored to the problem at hand. This is achieved through representation engineering, the intentional mapping of structured and unstructured data into a meaningful custom data architecture.

We developed a custom user representation to account for individualized user characteristics in our adverse event detection models. For example, when looking for adverse drug events, if some users are hypochondriacs who search for and seek information for certain drugs and reactions very frequently, such users might provide relatively less reliable signals for adverse event detection. This may not be readily apparent to machine learning algorithms without carefully constructed representations that add context to individual searches (this person searches this stuff an unusual amount). Representation engineering was also used to incorporate the context of user search intentions previously discussed. Embedding these custom representations allowed our models to better calibrate for search context, nearly doubling the number of true adverse events detected and reducing false positive rates by 30%. In other words, there was still room for experts to guide the process of representing the data for the model to learn from, even for complex deep learning models like the ones we were using.

3. Balance depth with breadth through data triangulation.

Advanced machine learning methods are highly adept at uncovering patterns from every nook and cranny of a particular data set. AutoML methods have further enhanced this benefit by using transfer learning — a machine learning technique where patterns from a target data set of interest are refined using insights from similar external data. For example, if a firm wants to infer customer sentiments using NLP applied to their call-center transcripts, autoML can leverage comparable transcript data from other firms. However, reliance on a single type of data can lead to under- or overfitting. In one of our other projects on enterprise machine learning, we show that rather than wringing every last drop of insight out of their primary data, firms can achieve significantly better predictive power by integrating complementary sources. For the call center problem, complementary data might include things like audio recordings, product reviews, and satisfaction surveys. In some respects, advancements in machine learning have given us a false sense that large volumes of data coupled with machine learning magic are all that’s needed — the variety and complementarity of the data matters, too.

In our project, we recognized that beyond a point, having data for more queries over an extended period of time provided quickly diminishing returns. Our major breakthrough came when we partnered with a firm that maintains large online user panels with anonymized data at the intersection of user characteristics, searches, and web browsing activity. Seeing which searches resulted in actual visits to certain types of websites was critical to understanding individual query intent. Analyzing the interplay between users’ search and browsing activities over time enabled us to account for user diversity. Relative to existing detection capabilities, these breakthroughs stemming from data triangulation resulted in models that were able to detect three times as many true adverse events, and with three times the precision.

Concepts such as explainable AI, algorithmic bias, and privacy by design have provided lessons on the limitations of artificial intelligence. In this same vein, there needs to be more discussion regarding how to maximize value from automated machine learning. Haphazardly applying automated machine learning without proper expertise, contextualization, and data complementarity is unlikely to produce the desired results. In our project, by incorporating these concepts we were able to detect hundreds of previously difficult to identify adverse product events, three to four years earlier on average, and with markedly fewer false positives. Following an augmentation approach to machine learning allowed us to develop a robust AI capability that was less artificial and more intelligent.