Between 2014 and 2017 Amazon tried to build an algorithmic system to analyze resumes and suggest the best hires. An anonymous Amazon employee called it the “holy grail” if it actually worked.

But it didn’t. After the company trained the algorithm on 10 years of its own hiring data, the algorithm reportedly became biased against female applicants. The word “women,” like in women’s sports, would cause the algorithm to specifically rank applicants lower. After Amazon engineers attempted to fix that problem, the algorithm still wasn’t up to snuff and the project was ended.

Amazon’s story has been a wake-up call on the potential harm machine learning systems could cause if not deployed without fully considering their social and legal implications. But Amazon wasn’t the only company working on this technology, and companies who want to embrace it without the proper safeguards could face legal action for an algorithm they can’t explain.

Numbers don’t always tell the truth

Mark J. Girouard, an employment attorney at Nilan Johnson Lewis, says one of his clients was vetting a company selling a resume screening tool, but didn’t want to make the decision until they knew what the algorithm was prioritizing in a person’s CV.

After an audit of the algorithm, the resume screening company found that the algorithm found two factors to be most indicative of job performance: their name was Jared, and whether they played high school lacrosse. Girouard’s client did not use the tool.

“It’s a really great representation of part of the problem with these systems, that your results are only as good as your training data,” Girouard said. “There was probably a hugely statistically significant correlation between those two data points and performance, but you’d be hard-pressed to argue that those were actually important to performance.”

The community of researchers and technologists studying artificial intelligence have warned that this could be possible in any similar AI algorithm that learns about people using historical data.

In 2016, Pinboard creator Maciej Cegłowski called machine learning “money laundering for bias.”

“It’s a clean, mathematical apparatus that gives the status quo the aura of logical inevitability. The numbers don’t lie,” Cegłowski said.

It’s only natural that machine learning would be applied to this problem. Reading through dozens or hundreds of resumes a tedious, complicated task, where workers have to pick up on subtle clues to tell whether a candidate would be both qualified and a fit within the company’s culture.

Humans typically think that things done by machines are better than if they were done by a human. It’s a well-studied phenomenon called “automation bias.” In this situation, it contributes to why many companies are pitching their AI-based tools as a solution to human bias in hiring.

“The basic premise on which this technology is based is that humans are flawed, computers can do things better,” says Raymond Berti, an employment attorney at Akerman LLP. “Obviously things aren’t that simple. We’re not at a point where employers can sit back and let computers do all the work.”

Girouard and Berti both say that under US regulations, companies are held accountable for the hiring decisions they ultimately make, leaving them responsible for the tools they use. Regulations by the Equal Employment Opportunity Commission even require the data used to come to a hiring decision be kept in case of a bias claim, meaning a company could be liable even if a company doesn’t know why an algorithm chose one candidate over another.

To avoid dealing with biased resume data, a slate of new companies are turning to organizational psychology, the field that has informed best practices for humans to follow when looking for job candidates.

Startups like Plum and pymetrics add a step to the application process that includes surveys, digital tasks, and games meant to build a profile of the candidate’s personality, like how detail-oriented or risk-averse they might be.

An algorithm then analyzes the person’s results, and the output of the algorithm is compared to high-performing employees currently doing the job.

“It blew my mind that there are 10,000 industrial organization psychologists in the world, they go to school, they get Ph.Ds in this, there’s a whole body of predictive science out there that tells us what predicts a top performer and what doesn’t, but yet 98% of the world is using poor quality, crap data that does not predict that, and only introduces a boatload of bias,” says Caitlin MacGregor, CEO of Plum.

However, this is a departure from the norms societies have accepted as part of the hiring process, and novel approaches in the past have turned out to be duds. Google’s infamous brainteasers meant to make candidates demonstrate creative problem-solving were later admitted to be “a complete waste of time” by Google SVP of people operations Laszlo Bock.

Pymetrics, a startup that promises bias-free results in bold letters on the homepage of its website, also further validates its algorithms before it goes into use. CEO Frida Polli tells Quartz that pymetrics maintains a dataset of 50,000 previous candidates including their race and gender, and runs any algorithm on that test set first. That way, if the algorithm favors a certain group by gender or race the company can figure out what’s wrong and correct the algorithm.

“It’s relatively straightforward. Some people say you should audit the algorithm or comb through the code, but that doesn’t necessarily tell you whether its going to give unbiased results,” Polli said. “But if you’re pretesting it, that’s probably the most straightforward solution.”