In October 2019, a group of researchers from several universities published a damning revelation: A commercial algorithm widely used by health organizations was biased against black patients.

The algorithm, later identified as being provided by health-services company Optum, helped providers determine which patients were eligible for extra care. According to the researchers' findings, the algorithm gave higher priority to white patients when it came to treating complex conditions, including diabetes and kidney problems.

This is one of several recent stories involving algorithmic bias: the tendency of artificial intelligence to make decisions that give an unfair advantage to a certain group or demographic. Algorithmic bias can manifest in many fields, but in medicine it can be deadly.

Your Data Is Biased

Most improvements in AI systems are made because of advances in machine learning and deep learning. Unlike traditional AI systems, which were based on manually crafted software rules, deep-learning systems develop their behavior by examining lots of examples.

For instance, to develop a deep-learning system that predicts breast cancer, AI engineers created a base algorithm and fed it mammograms annotated with the patient outcome—cancer or no cancer. The algorithm processed the examples and found common patterns that characterize cancerous and non-cancerous slides. It used this information to make predictions for mammograms it hadn't seen before.

In a few areas, such as radiology and medical imaging analysis, AI algorithms have surpassed human performance. But deep-learning algorithms suffer from a fundamental problem: They often adopt unwanted biases found within the data on which they're trained. If the data is limited to a certain group of people, it will perform less accurately for other demographics.

"Datasets collected in North America are purely reflective and lead to lower performance in different parts of Africa and Asia, and vice versa, as certain genetic conditions are more common in certain groups than others," says Alexander Wong, co-founder and chief scientist at DarwinAI.

For instance, several studies have found skin cancer–detection algorithms to be less accurate when used on dark-skinned patients, in part because AI models were trained mostly on images of light-skinned patients.

When possible, engineers of AI systems take steps to reduce and remove bias. But machine-learning algorithms often find data points that indirectly represent problematic biases.

The developers of the health-management system mentioned earlier had removed race information from the data the AI used to make decisions. But the algorithm selected health-care spending as one of the factors that determined its output. Spending effectively became a proxy for race and disadvantaged black patients, because they had lower healthcare costs, for socioeconomic reasons.

Lack of Transparency

While most studies on algorithmic bias are focused on known factors such as gender, race, and age, several studies show that machine-learning algorithms can often pick up hidden biases that are difficult to identify but can be equally damaging. The problem is that machine-learning models are often black boxes that offer very little visibility into their inner workings, so it's difficult even for their creators to find and fix problematic biases.

For example, skin-cancer-detection algorithms are usually trained on images of malignant moles and healthy skin. But while photos of skin cancer usually contain rulers to depict the size of the mole, healthy skin pictures do not contain any objects. An AI system trained on these images might end up becoming biased toward detecting rulers instead of malignant moles. Without visibility into the salient features of the algorithm, it would be hard to find out whether it has tuned into the right features.

Machine-learning algorithms can also become sensitive to irrelevant correlations in health data. In one case, a hospital readmission algorithm gave lower-risk scores to patients with asthma. The program, touted to outperform expert doctors, would recommend hospitalizing a patient with pneumonia, but would clear the same person if they had both pneumonia and asthma.

"One must understand how and why decisions are made the way they are made by the AI algorithm in order to identify biases and devise strategies for addressing them," says Wong, whose company specializes in creating explainable AI models. "Explainability also allows us to build trust in the AI algorithm, which is key in the healthcare system."

Can It Be Fixed?

"As an AI community, we need to come together to share best practices, processes, and tools that will ensure fairness, inclusivity, reliability, and transparency while maintaining privacy and driving accountability across development and deployment," says Shantanu Nigam, CEO and co-founder of Jvion, a healthcare AI company.

Some efforts are underway to address bias and fairness in AI-based healthcare systems. Last year's NeurIPS conference ran a workshop to address fairness in machine learning for health applications. The workshop included several papers that explored the assessment of algorithmic fairness, discovering proxies, and calibrating algorithms for subpopulations. And the Alliance for Artificial Intelligence in Healthcare, a nonprofit organization founded in December 2018, brings together developers, device manufacturers, researchers, and other professionals to advance the safe and fair use of AI in medicine.

Some organizations have started baking inclusivity and fairness into the data-gathering process, training, and testing of their AI algorithms. For instance, Google recently released an AI breast cancer screening tool it's been testing to perform equally well across different geographical regions.

Kush R. Varshney, principal research staff member and manager at IBM Research AI, believes increasing transparency and cooperation in the process of developing and releasing healthcare AI systems can help improve fairness. "The best practices and governance of AI in healthcare should include the release of factsheets containing fairness test results and should involve multi-stakeholder participation on validating the entire AI lifecycle and also the organizational/human processes that surround the AI system," he says.

"We know that machine-learning models are, by their very nature, meant to statistically discriminate on all sorts of features in order to generalize to new, unseen patients," Varshney says. We just have to make sure they don't discriminate in other ways.

Further Reading

Health & Fitness Reviews