Many machine-learning developers lack experience in building enterprise applications, and many business stakeholders have insufficient knowledge of machine learning to know what questions to ask as they scope and manage projects. To innovate effectively, project owners need to know what trade-offs and decisions they’ll face while building a machine learning system, and when they should assess these trade-offs to minimize frustration and wasted effort. Here are five steps to follow: Design: Define the problem and articulate the business case. Determine the business’ tolerance for error and ascertain which regulations, if any, could impact the solution. Exploration: Conduct a feasibility study on the available data. Determine whether the data are biased or imbalanced, and discuss the business’ need for explainability. May require re-iterating from design phase depending on the answers to these questions. Refinement: Train and test the model (or several potential model variants). Gauge the impact of fairness and privacy enhancements on accuracy. Build and ship: Implement a production-grade version of the model. Determine how frequently the model must be retrained and whether its output must be stored, and how these requirements affect infrastructure needs. Measure: Document and learn from the model’s ongoing performance. Scale it to new contexts and incorporate new features. Discuss how to manage model errors and unexpected outcomes. May require re-iterating from build-and-ship phase depending on the answers to these questions.

Jorg Greuel/Getty Images

Here’s a scenario common to organizations applying machine learning in their business processes. The business and technical teams have aligned on a general problem statement for a machine learning task. Everyone is excited, and the technical team goes off for a few months and experiments with different algorithms on available data, eventually converging on an algorithm they believe achieves the highest performance on the agreed-upon metrics. Proud of their work, they bring results back to the business to integrate into a business process or implement as a feature in a software product.

Before deployment, however, their algorithmic model must be reviewed by a governance team to ensure it satisfies risk-management requirements. The governance team seeks rigorous documentation concerning requirements that the technical team never considered: can we explain why the algorithm derives its outputs from its inputs? What controls does the system use to protect client privacy? How stable are the input data over long periods of time, especially if we only plan to retrain the model once a month, or even once a year? Can we ensure the algorithm produces fair results across the population of affected clients?

The technical team retorts that no one told them about these requirements, so they didn’t consider them during development. Frustrated, they start again, this time constraining choices about possible algorithms and input data to ensure that the newly articulated risk management requirements are satisfied. Time and effort are wasted. Timelines stretch. Executives wonder why things are taking so long and become anxious they will lag behind competitors.

Insight Center AI and Bias Sponsored by SAS Building fair and equitable machine learning systems.

This lack of communication and coordination results from gaps in the knowledge that business and technical stakeholders have about what it takes to make machine learning work in real-world applications, as well as residual waterfall approaches to technical project management. As the field is still young, many machine-learning developers lack experience in building enterprise applications, and many business stakeholders have insufficient knowledge of machine learning to know what questions to ask as they scope and manage projects. To innovate effectively, project owners need to know what trade-offs and decisions they’ll face while building a machine learning system, and when they should assess these trade-offs to minimize frustration and wasted effort.

Let’s start with the what. Making machine learning work in a business context often requires a series of decisions and trade-offs that can impact model performance. At the heart of the matter lies the structure of machine-learning algorithms, which use data to learn approximate mappings between inputs and outputs that are useful for a business. With standard software, programmers write specific instructions that execute the same operations every time; the trade-off is that these instructions are limited to what can be explicitly articulated in code. With machine learning, by contrast, programmers specify the goal of the program and write an algorithm that helps the system efficiently learn the best input-output mapping from available data to achieve this goal, rather than selecting a particular mapping from the get-go. This approach enables us to tackle scenarios where it’s much harder to write the precise rules (e.g., image recognition, textual analysis, generating video sequences), but the trade-off is that the system’s output can be unpredictable or unintuitive. From this inductive foundation arise many of the nuances of applying machine learning systems, especially for business and technology teams who are used to rules-based computer systems.

Often these nuances come to life as trade-offs business teams need to make during system development. For instance, teams may find that the objective the business seeks to predict or optimize cannot be easily measured with available data, so they resort to a measurable proxy. The concern here is that the system will learn to predict this and only this proxy, not the true objective; the further the proxy is from the objective, the less useful the system’s output will be for the business.

Other trade-offs weigh model accuracy against concerns like explainability, fairness, and privacy. If the business is hesitant to adopt a model without an explanation of how and why it maps inputs to outputs, the technical team could constrain the set of potential solutions to algorithms that afford better explainability, but this may come at the cost of reducing model performance. Similarly, if the business is concerned that the system could propagate unfair outcomes to certain client segments, or that the system could expose sensitive user data, the technical team could restrict their attention to algorithms that ensure fairness and privacy, which could also impact performance.

Sometimes inherent error rates make it such that it’s best to cut losses early. This occurs particularly in contexts where the cost of making a mistake are high (e.g., user trust could be denigrated, or users expect certainty). The technical team could invest heavily in system design to improve accuracy or implement human-in-the-loop decision-making, but if the costs of these investments exceed potential benefits, cutting losses early may be the best solution.

These considerations are too nuanced to be managed in a single working session. Instead, project owners should engage business, end-user, technical, and governance teams in iterative dialogue throughout the system development process.

At Borealis AI, we break down this process into the following lifecycle:

Design: Define the problem and articulate the business case. Determine the business’ tolerance for error and ascertain which regulations, if any, could impact the solution. Exploration: Conduct a feasibility study on the available data. Determine whether the data are biased or imbalanced, and discuss the business’ need for explainability. May require re-iterating from design phase depending on the answers to these questions. Refinement: Train and test the model (or several potential model variants). Gauge the impact of fairness and privacy enhancements on accuracy. Build and ship: Implement a production-grade version of the model. Determine how frequently the model must be retrained and whether its output must be stored, and how these requirements affect infrastructure needs. Measure: Document and learn from the model’s ongoing performance. Scale it to new contexts and incorporate new features. Discuss how to manage model errors and unexpected outcomes. May require re-iterating from build-and-ship phase depending on the answers to these questions.

During the design phase, the first part of the process, we seek to define the business goal, translate that goal into a measurable objective, and determine whether there will be sufficient business impact to justify development effort. A key part of this phase is to assess the gap between what we want to measure and what we can measure, and ensure the business is clear on the consequences of the gap between a desired outcome and the available proxy. Most often, businesses struggle to define precise metrics required for an algorithm to meet business or user needs; this frustrates technical teams, who seek a clear baseline to determine whether a task is feasible. Teams can manage this by frequently communicating their results to the business during the design and exploration phases, and, where necessary, cutting projects early due to anticipated concerns with error rates or economic insignificance. For example, an auditing firm or a bank seeking to automate a portion of a compliance process may decide early on that a 70% prediction accuracy rate is too risky for the certainty they require in their business processes.

Questions around explainability, fairness, and privacy should be addressed during the exploration and refinement phases, when technical teams experiment with different model variations to determine the best approach to solving the business problem. Key to note is that not all algorithms are created equal (at least today, as this may change with future research): some are more explainable than others, and some admit privacy-preserving variants more easily than others. As such, what matters most is that technical teams know the constraints in advance, not after they’ve already invested time in building an algorithm that fails to meet business requirements. Governance teams should engage in dialogue during the feasibility and model refinement phases; this doesn’t mean pre-production validation should disappear, but that teams shouldn’t wait until the end of the process to start the dialogue. Finally, sometimes it’s hard to make a call on whether to trade off accuracy for privacy, fairness, or explainability until being able to measure impact on the system’s performance. As such, technical teams can build a few candidate solutions and present options to a business leader to make the call on which system to advance to production.

Much of this will feel familiar to teams accustomed to applying agile methods in software development projects. The inductive nature of machine learning systems poses a few new, nuanced questions, but the principles of iterative, cross-functional communication remain more important than ever. Most of the time, making a solution work requires compromise from business, technical, and governance teams. Contrary to popular narratives, we might conclude that the practice of applying machine learning in a way that upholds an organization’s principles and values is a fertile ground for reasoning and communicating. Doing it well requires us to exercise the skills that make us human.