It has been a little over a year since the death of 6-year-old Zymere Perkins.

The boy, who died in Harlem at the hands of his mother's boyfriend, had been smacked as many as 20 times in a row in front of witnesses, beaten with a belt, placed under cold showers, and denied food as a punishment. In addition to bruises and broken bones, he was missing all of his front teeth. But apparently all his mother had to do was tell the city social workers that he had fallen—down the stairs, off a scooter, whatever—and they would close the case.

According to a report released by the New York State Office of Children and Family Services in December 2016, 10 children died in the 12 weeks before Perkins, despite each being the subject of at least four abuse or maltreatment complaints.

The New York City Administration for Children's Services (ACS) has since undergone an overhaul, installing a new commissioner and instituting greater measures of accountability for its employees. But to anyone who has been keeping track of such cases, the outrage and the plans for reform will sound sadly familiar.

Ten years before Zymere Perkins, there was Nixzmary Brown, a 7-year-old girl from the Bedford-Stuyvesant section of Brooklyn who was tortured and murdered by her stepfather. In that case too ACS had been made aware of her situation at least twice before the fatal beating. Her school had reported her absent for weeks at a time. Neighbors said she sustained unexplained injuries, including a black eye, and that she seemed undernourished—weighing less than 45 pounds at the time of her death.

In the wake of that case, Mayor Michael Bloomberg ordered the ACS to reopen numerous files. New York state legislators stiffened penalties to allow parents in such cases to be charged with first-degree murder. Efforts to publicize the city's child abuse hotline were expanded.

Most metropolitan child welfare bureaucracies have been through such a process at least once in the past two decades. From Gabriel Fernandez, the 8-year-old boy killed by his mother and her boyfriend in Los Angeles in 2013, to Danieal Kelly, the cerebral palsy–stricken girl who died in 2006 at age 14 after almost a decade of investigations by Philadelphia case workers into her mother's failure to feed or bathe her, to 2-year-old Tariji Gordon of Sanford, Florida, who had been sent back to live with her mother after her twin brother suffocated, only to be found in 2014 dead and buried in a suitcase, there is always a shocking case that leads to a public outcry and then reform.

But whether it's pumping more money into the system or simply installing a no-nonsense leader at the top, few of these changes seem to make a difference. All too soon, things go back to the way they were.

In Out of Harm's Way (Oxford University Press), published last year to much less fanfare than it deserved, sociologist Richard Gelles offers a devastating account of how little effect bureaucratic reforms usually have. More money, more staff, more training, more lawsuits brought against child protective services (CPS), or the ever-popular convening of more "blue-ribbon" committees—nothing has really moved the needle on protecting children in recent years. In some cases, reform amounts to little more than changing the name of the agency.

Some 3 million children are the subject of maltreatment investigations each year, 700,000 of which are substantiated. There are about 2,500 child fatalities due to abuse or neglect by a parent or caregiver in the U.S. annually, and about half of those are cases child welfare agencies were aware of beforehand. "If CPS agencies cannot protect the children they already know to be at risk," Gelles asks in his book, "whom can they protect?"

For too long, we've been asking undertrained social workers to make high-stakes decisions about children and families based on patchy data and gut intuition. The result is a system riddled with the biases, inattention, bad incentives, error, and malice that plague all human endeavors, but especially massive government bureaucracies. Every day, some kids are forcibly taken from their parents for the wrong reasons while others are left to suffer despite copious warning signs. The children the system is failing are disproportionately poor and members of racial minority groups. In many cases, their families have been devastated by generations of family breakdown, unemployment, drug abuse, and crime. But these cannot be excuses for leaving their fates to a system with such deep and abiding flaws.

Conservatives have too often thrown up their hands, arguing that government cannot replace the family and there is not much to be done until the institution of marriage is repaired. Liberals, meanwhile, have suggested that these problems can't be fixed until we end poverty and racism. In the meantime, bureaucrats are bumbling into the lives of too many families just trying to do their best while leaving some of the most vulnerable children in society unprotected.

We can do better.

2

Gelles is a Boston native, and Red Sox paraphernalia line the shelves and windowsills of his office at the School of Social Policy and Practice at the University of Pennsylvania. He also seems to have a thing for old-fashioned typewriters, which must look like ancient artifacts to his students. But his work is forward-looking, not nostalgic. Gelles is one of the biggest boosters of predictive analytics in the child welfare field.

Predictive analytics refers to the use of historical data to forecast the likelihood of future events. Thanks to powerful computers, statisticians in recent years have been able to develop models for understanding, for instance, the probability that a criminal out on parole might commit a crime again. In the area of child welfare in particular, proponents are enthusiastic about the prospect of getting better information about children at risk. There is so much we don't, and can't, know about what goes on inside of families that child welfare workers are largely flying blind. But big data has the potential to tell the likelihood that a child will be subject to neglect, abuse—or worse.

There are serious concerns about the collection and use of these data among civil libertarians. Studies have suggested that the models used to forecast criminal behavior overpredict crime by blacks and Latinos. There are also questions about what kinds of data are being collected and how. And then there are researchers who question the accuracy of such models altogether. But at least in the current pilot programs, there is no new intrusion into people's lives. The goal is to do a better job of crunching the numbers that are already being collected—by the reporters of abuse or by welfare agencies and school systems. In this field predictive analytics are being used to determine how urgently problems should be investigated and how resources like preventive services are allocated. For people who study the tragic details every day of the kids who fall through the cracks, this new method is the most dramatic development in decades.

It's easy to see how spending so much time reading about these cases—let alone working directly with victims of the worst sorts of abuse—could turn someone hypervigilant. But while Gelles is a self-described "safety hawk" when it comes to protecting kids, he isn't some Law & Order: Special Victims Unit hysteric who sees predators lurking around every corner. He simply believes that if social workers want to get better at their jobs, they should study math and statistics.

"There are patterns, and if you get enough data and you run it through enough iterations, you will find the pattern in what appears to be chaos," he says. This formulation will be familiar to those who have read Moneyball, journalist Michael Lewis' account of how big data allowed the Oakland Athletics baseball club to sign better players for less money and win championships against much wealthier teams.

In fact, at the beginning of his 2016 book The Undoing Project (Norton), Lewis lightly mocks all the other uses people have found for predictive analytics in the years since Moneyball was published. "In the past decade or so, a lot of people have taken the Oakland A's as their role model and set out to use better data and better analysis of those data to find market inefficiencies," he writes. "I've read articles about Moneyball for Education, Moneyball for Movie Studios, Moneyball for Medicare, Moneyball for Golf, Moneyball for Farming, Moneyball for Book Publishing (!), Moneyball for Presidential Campaigns, Moneyball for Government, Moneyball for Bankers, and so on."

And maybe we have overinvested in the promise of big data. In an October article in Slate, Will Oremus criticized our society's "fetishization" of these efforts and suggested they ultimately "went bust."

But before we write off sophisticated number crunching as yesterday's news, Gelles hopes we'll at least consider whether it could improve a field whose success or failure has life-altering, and even life-saving, implications for children—one that has experienced very little improvement in the past few decades.

3

Proponents of predictive analytics believe social workers still have an important role to perform in the system. Gelles has spent the better part of four decades training them. But he worries that we're not giving these actors the tools they need to do their jobs well. (He has also made his share of enemies among social workers, to judge by the responses I've received when I've written columns that quote him.)

With a Bostonian inflection, Gelles says it's preposterous to send out "23-year-old ahhht history majors" to determine which children are at risk of abuse. "Even the state-of-the-art assessment tools being used in New York are no better at predicting risk for a child than if you flipped a coin." He describes social workers' "expertise" as "simply inadequate" to the task of knowing when children should be forcibly removed from their homes, writing that these decisions inevitably "are influenced by the worker's personal characteristics, biases and experiences, which lead to a variety of problems concerning the reliability and validity of predicted risk."

But until the 1990s, case workers' clinical judgment was the only thing they had to go on.

One such case worker was Kim Dennis. After graduating from Bowdoin College with a degree in sociology in 1979, Dennis went to work for Maine's department of child services in the town of Caribou. "My mother told me I was good with people," she says. "I thought I could help them solve their problems."

Dennis recalls receiving almost no training. Mostly, she visited the trailer homes of pregnant teenagers, instructed them in proper breathing techniques for when they gave birth, and told them about the public services they were entitled to receive.

Every once in a while, though, she would be sent out to investigate a claim of child abuse. "I would knock on the door and say, 'What are you doing to your kid?' They would say, 'No, I'm not doing anything.' And I would leave," she says. "My feeling was that we were prying into people's lives. If it had been something serious, I wouldn't have known. And if I had known, I wouldn't have known what to do." In retrospect, "the only thing I did know was that I had no idea what I was doing." She quit a few months later.

Today, Dennis is president of the Searle Freedom Trust. Her first job—the one where she was, in Irving Kristol's famous words, mugged by reality—made her skeptical of the power of government to solve people's problems. At Searle, she has helped to support projects that improve people's lives through private means. (Searle is also a supporter of Reason Foundation, the nonprofit that publishes this magazine.)

In the late '90s, child welfare agencies started using actuarial risk assessment models to determine which children were at a heightened risk of abuse. "The argument being, of course, actuary risk assessment is used in many, many industries that manage to make a profit in making the right decisions," Gelles explains, "so why wouldn't I use actuary decision making in these kinds of decisions?"

Structured Decision Making (SDM), which is based on these models, did improve matters to a limited extent. For one thing, it imposed some uniformity on the factors that social workers took into consideration when making decisions. According to the Children's Research Center, SDM was supposed to offer "clearly defined and consistently applied decision-making criteria" as well as "readily measurable practice standards, with expectations of staff clearly identified and reinforced."

For too long, we've been asking undertrained social workers to make high-stakes decisions about children and families based on patchy data and gut intuition.

Unfortunately, it did not produce the magnitude of improvement that Gelles and others had hoped for.

"The actuarial decisions are only as good as the data you have, and the child welfare system had a limited portfolio of data," he recalls. States were collecting information on families through various agencies—child welfare, education, criminal justice—but it was difficult if not impossible for the different entities to share this information with one another, or for social workers to access it.

"The problem is that each agency would hire its own [information technology] person who did the software," he says, "and the software would come from six different companies." It wasn't until 2010 that two places—Mecklenburg County in North Carolina and Montgomery County in Maryland—began implementing what was called "interoperability."

In many ways, predictive analytics weren't a revolution so much as one more step in the long slog toward improving a faltering system.

Bill Baccaglini, president of New York Foundling—the oldest foster care agency in New York City—says data and "regression modeling" have "been around for a long, long time." So what's new today? "The algorithms, they're just much richer. There are just many more variables in the model." And that one step has made all the difference.

4

Emily Putnam-Hornstein has been steeped in data from the beginning of her career—which wasn't actually that long ago. In 2011, the same year she received her doctorate from Berkeley, she published a paper on predictors of child welfare in the journal Children and Youth Services Review. The question, which at the time seemed to Putnam-Hornstein like an interesting exercise but not anything with practical implications, was whether you could predict on the day a child is born the likelihood that he or she would eventually enter child protective services.

Putnam-Hornstein was surprised to find that the answer was "yes," and "with a high degree of accuracy" to boot. Shortly thereafter, she was approached by New Zealand's Ministry of Social Development.

When word about her work got around, there was an outcry. Citizens feared that the government's goal was to take children from their parents before the first report of abuse had been made. In fact, policy makers were trying to figure out how they could best deploy their home visiting services, and they were interested in developing a proof of concept.

In 2012, working with lead researcher Rhema Vaithianathan, a professor of economics at Auckland University of Technology*, Putnam-Hornstein developed a predictive risk modeling (PRM) tool for children in families that receive public benefits. Among the data used are the age of mothers and the date of their first benefit payment, but there were more than a hundred other factors as well.

As Vaithianathan explained in The Chronicle of Social Change, "We see this train wreck in slow motion and no one is doing anything about it until that first call comes in. The question is do we have an obligation to do something?" The ministry officials suggested that their data could be used to offer parenting programs to families with children at a high risk for abuse.

The backlash to the idea was immediate, however. Richard Wexler, executive director of the National Coalition for Child Protection Reform, suggested that using big data to determine which kids might be in danger resembles the plot of the movie Minority Report, in which police use psychic technology to figure out who is going to commit a murder and then arrest and convict the perpetrator beforehand. In an article on his blog called "Big Data is Watching You," Wexler, whose work has been cited by The New York Times and The Washington Post, wrote: "They are not proposing to rely on psychics in a bathtub. Instead they're proposing something even less reliable: using 'predictive analytics' to decide when to tear apart a family and consign the children to the chaos of foster care."

Wexler cites a study by Emily Keddell of the University of Otago, which notes that 32–48 percent of the children in New Zealand who were identified as being at the very highest risk by the software turned out to be "substantiated" victims of child abuse. He decries what he sees as the unacceptably high "false positive" rate of such a system.

"Think about that for a moment," he wrote. "A computer tells a caseworker that he or she is about to investigate a case in which the children are at the very highest level of risk. What caseworker is going to defy the computer and leave these children in their homes, even though the computer is wrong more than half the time?"

Keddell, the paper's author, argues that the substantiation rate may not even be as high as it seems, because child abuse is often ill-defined, and defined differently in different localities. Plus, case workers are subject to bias, including from pressure by an algorithm. "Substantiation data as a reflection of incidence have long been criticized by researchers in the child protection field," she wrote. "The primary problem is that many cases go [unreported], while some populations are subject to hypersurveillance, so that even minor incidents of abuse are identified and reported in some groups."

In 2015, the New Zealand program was shut down entirely. "Not on my watch," wrote Minister of Social Development Anne Tolley. "These children are not lab rats." She told the local press, "I could not believe they were actually even considering that. Whether it would have gotten through the ethics committee—I hoped it wouldn't."

Putnam-Hornstein and her colleagues continued their work, developing new models informed by the pushback.

In 2014, she began working on a proof of concept with the Department of Children and Family Services (DCFS) in Los Angeles County. Project AURA—the initials stand for Approach to Understanding Risk Assessment—looked at child deaths, near fatalities, and "critical incidents" in 2011 and 2012. Using data that were already being collected, including previous child abuse referrals, arrests, and substance abuse histories, statisticians at a company called SAS created an algorithm and produced in each case a risk score on a scale from 1 to 1,000.

"In Los Angeles, we are data rich but analytically poor," says Genie Chough, the director of government affairs and legislation at DCFS. AURA offered Chough and her colleagues the chance to make use of some of the vast quantity of information that was being collected on the families they served. According to Chough, when SAS compared the predictions with actual outcomes, the algorithm was 76 percent accurate. While not outstanding, that is significantly better than flipping a coin, and better than the findings from the New Zealand studies as well.

Nonetheless, Chough and her colleagues ultimately concluded that AURA was "fatally flawed." Perhaps because they realize how sensitive the program is, and how much relying on the wrong model could have undermined public confidence, they are self-reflective and critical in a way you might not expect from government bureaucrats. Since Los Angeles County used the private firm SAS, as opposed to a public university or some other open-source entity, to create the algorithm, it was not as transparent as they felt it should have been. The county attempted to validate the performance of the model, but researchers could not replicate it. Importantly, the data also could not be updated in real time. And with big data, the bigger and more current the inputs, the more accurate it is.

Putnam-Hornstein says this fact is not necessarily intuitive, even for someone trained to analyze data. "As a research scientist, I had always been taught to think about models that are developed for descriptive purposes and causal relationships," she says. But prediction risk modeling "is a different approach, a kitchen-sink approach. This is all the information we have at the time a call comes in. What can it tell us about future differences in outcomes? We are not looking at any one specific factor."

Los Angeles is ready to learn from its mistakes and start using predictive analytics to help screen the calls that come into its child abuse hotline. There were 219,000 last year, according to Jennie Feria, the DCFS regional administrator who oversees it. These can come from "mandated reporters," such as teachers and doctors, or from anyone who comes upon something she considers an instance of child abuse or neglect.

About a third of these calls result in an in-person investigation by a social worker. The rest don't meet the standard of "reasonable suspicion." But this determination is often fraught. The operators at the hotline frequently don't have all the information they need in front of them to make a call, and when they do, it's easy to overlook some important factor or to underestimate what might be a significant warning sign.

What kinds of signs are we talking about? The answer can be surprising. Pennsylvania has separate systems for reporting abuse and neglect. Child Protective Services monitors the former while General Protective Services looks at the latter. That separation long resulted in useful information being lost.

Once they started to look at the data, Gelles says, "what you find is there are a series of neglect reports—four, five, six neglect reports—that predate a fatality." Until this information was made available, he and most other experts and child welfare workers assumed that there would be "a progression of physical violence up to a fatal incident. That isn't the case. There are dysfunctions in the family that come to public attention," but they sometimes stop short of abuse.

Before we write off Moneyball-style number crunching as yesterday's news, we should consider whether it could be used in a field that has life-altering implications.

Red flags include a home in disrepair and multiple cases of utilities being shut off. Social services will help the family find better housing or get the power turned back on. "But when a family keeps coming to your attention and isn't changing, that's a serious sign that there's something we missed here," Gelles says. When parents could get running water and heat for their children but either repeatedly choose not to or simply forget about it, there's something very wrong. Yet the current system does not pay any special attention to those families.

There is also information in big data that would make call screeners less likely to request an investigation of cases that don't merit one. According to Putnam-Hornstein, one in three children in America will have contact with child services before the age of 18. That number suggests that agencies are wasting resources investigating cases that are highly unlikely to be substantiated. These false alarms significantly increase the caseload of child welfare workers and make it difficult to focus sufficient time and energy on the cases that are most likely to need them. They also often justify further intrusions into family privacy. (In this sense, America's child welfare system is broken in much the same way its immigration system is: Law enforcement officers spend so much effort trying to catch people who want to garden illegally that they run out of resources with which to track down people who want to blow up buildings.)

The use of PRM at a child abuse hotline is primarily a matter of triage. Not only might an algorithm be able to sort out likely from unlikely cases of abuse, Feria explains, it could also help ensure that those cases at the highest risk—say, the top 5 percent—get identified and investigated immediately.

While critics like Wexler worry about false positives, Gelles argues that the current system already produces plenty of the same. But right now, they're based almost entirely on the gut instinct of the case worker visiting a home or the person answering the phone at a hotline.

The next step in Los Angeles is stalled for the time being. The state of California is considering implementing predictive analytics at all of its county hotlines, since it is updating and revamping the technology it uses to field calls and collect data anyway. But doing so will be a long and bureaucratic process, Chough tells me. "L.A. County is anxious to get out of the gate. But now we have to wait for 57 counties to get on board, so we are in a sort of holding pattern."

5

There is one place in the U.S. where predictive analytics have made their official child welfare debut: Allegheny County, Pennsylvania. The area, whose largest city is Pittsburgh, implemented the Allegheny Family Screening Tool in August 2016. Some 17 teams had answered a request for proposals put out by the county two years earlier, and Putnam-Hornstein and Vaithianathan's was selected. In March 2017, the county published its first report on their project.

The Department of Human Services "decided that the most promising, ethical and readily implemented use of [predictive risk modeling] within the Allegheny County child protection context was one in which a model would be deployed at the time an allegation of maltreatment was received at the hotline," the report explains. The county receives roughly 15,000 calls to that line per year. Each one is currently given a score from 1 to 20 that determines its urgency and severity.

"Allegheny is one of the most forward-thinking counties in the country," says New York Foundling's Baccaglini, echoing a common sentiment among the child welfare reformers I interviewed. For one thing, Marc Cherna, the Allegheny County Department of Human Services director, has served longer in that position than anyone else in the United States—since 1996. His agency is one of the few that haven't followed the pattern of a big child fatality scandal followed by an ineffective cleaning of the bureaucratic house.

Not to say things have always been great. As Cherna acknowledges, for the first several years he was there, "we were a national disgrace. We were written up nonstop. Kids were getting hurt."

In 2014, Cherna received an award from Casey Family Programs, the nation's largest foster child foundation, recognizing his accomplishments in the position. As the Pittsburgh Post-Gazette noted, "the department has reduced the number of children in and out of home care by 48 percent in the past eight years. More than 80 percent of children who left the county system in 2012 ended up in permanent homes, with the vast majority returning to their families."

But Cherna believes there's still plenty of room for improvement keeping children safe—within their own families as well as in foster homes.

In 2016, Allegheny County was in a particularly good position to experiment with predictive analytics, because it had a lot of data and because its data were accessible across different government bureaucracies. In 1997, Cherna consolidated several bureaus into one Department of Human Services, and the Richard King Mellon Foundation and 17 other nonprofits created the Human Services Integration Fund "for projects that foster integration and support innovations in technology, research and programming that would be difficult or impossible to accomplish with public sector dollars." Several of the chief information officers of those groups agreed to help Cherna with a project that ended up merging 32 different data sets.

But there was another reason Allegheny County was the right place for this project. The Department of Human Services had built up a "high degree of trust" with the community, says Cherna. Scott Hollander, head of the legal advocacy group ChildVoice in Pittsburgh, confirms that assessment.

Hollander, who has done similar work in Colorado and Washington, says what's different in Pittsburgh is "the willingness to try new things and not be afraid of failing. And there is a lot of discussion with stakeholders that doesn't happen elsewhere."

For instance, when Cherna wanted schools to share data with child welfare services and vice versa, Hollander and a number of his colleagues objected. "Hey, this doesn't seem right. The parents don't seem to know what's going on," Hollander says he told the agency. So Cherna asked Hollander and some of his colleagues, who work for a parents' rights group, to design a consent form that would be clear and fair.

Before the Department of Human Services put out a request for proposals for the predictive analytics project, they met with legal aid and civil liberties groups to discuss the implications for families. According to Hollander, the response "was a mixed bag." He recalls "people thinking this sounds interesting and you're trying to protect kids, but the factors you're describing seem like they could create biased decisions."

"If you're looking at poverty and crime," he continues, "the police don't treat black and white people the same. You will echo the disproportionality that already exists in the current environment if you are looking at drug use, criminal activity, and whether there is a single parent as risk factors."

Hollander is not alone in his concern about the disparate racial impact of using predictive analytics. Just as members of low-income and minority communities are more likely to come into contact with law enforcement, Wexler argues, "the same, of course, is true when it comes to 'reports' alleging child abuse—some communities are much more heavily 'policed' by child protective services."

There's no doubt that black and Hispanic families make up a disproportionate number of those who end up encountering child welfare agencies. In New York City, that has been true for decades. According to public records, African-American children made up 31.5 percent of the population of kids in the city in 1987 but accounted for 63.1 percent of children in foster care. In 2012, they made up 25.9 percent of the population and accounted for 59.8 percent of those in foster care.

There is a bit of a chicken-and-egg problem here. Poverty is highly correlated with abuse. There are a variety of reasons for that, which can be difficult to untangle. Poverty causes stress in marriages and other relationships, and sometimes that stress is taken out on kids. Child abuse is also more likely in homes with younger mothers or, as has been discussed, when a man besides the child's father is present. And those two factors are more common among impoverished and minority families.

"If we can figure out which children are the most likely to be placed in foster care…the hope is that we can tailor and target more expansive and intensive services to that family at the outset."

But the fear that CPS workers target minority families is becoming increasingly common. Citing lawyers working on behalf of parents who have been investigated by the ACS, The New York Times suggested earlier this year that the agency is engaged in the "criminalization of their parenting choices," a practice the paper called "Jane Crow."

In a story for The New Yorker last summer, Larissa MacFarquhar quotes an attorney from the Bronx Defenders, a group that represents in family court parents who have been charged with abuse or neglect by child services, saying: "We are members of this system which we all strongly believe is racist and classist and doing harm to the families it claims to serve."

Baccaglini and many others who work in the child welfare system are deeply upset by these accusations, not least because most of the child welfare workers are themselves members of minority racial groups. According to the most recent statistics from New York, 65 percent of these employees were black and 15 percent were Hispanic. "When you piece together all the data points, you are hard-pressed to advance the argument that the Times is making," he says. "God, that space could've been used to really tell the story."

"You couldn't even consider race a variable," Baccaglini adds dejectedly. "It's a constant. All the kids who come into this system, unfortunately, are nonwhite. The racial discriminatory aspects of the system happened well before with our opportunity structure—the 'tale of two cities,'" as Mayor Bill de Blasio has called it. "The fact that the mom in the South Bronx cannot get decent medical care; the fact that the mom in the South Bronx cannot get a good job; the fact that the mom was put into an [individualized educational program] and never got a degree and then had a child."

But what about that child?

The Allegheny Family Screening Tool was rolled out in August 2016. There are more than 100 factors that go into the algorithm—and race is not one of them. The system gives the call screener a score between 1 and 20 that describes the likelihood that the child in question will be re-referred or removed from the home within two years, based on what is known from historical data. The screeners then assign an investigator to the cases with the highest risk scores.

More to the point, the algorithm is not just about removing kids from problem homes, a step nearly everyone agrees should be taken only as a last resort. "If we can figure out which children are the most likely to be placed in foster care when a call alleging maltreatment first comes in, the hope is that we can tailor and target more expansive and intensive services to that family at the outset, so as to prevent the need for placement," Putnam-Hornstein says. "Alternatively, it may be that there is nothing we can do to safely keep the child in the home. But at least we will make sure to investigate so that we can remove."

To avoid confirmation bias, investigators are not told what score a family received when they go to visit the home. The results have been similar to those found in California: The model predicted with 76 percent accuracy whether a child would be placed in foster care within two years, and with 73 percent accuracy whether the child would be re-referred—that is, whether child services would be alerted about that child again.

The county also looked at other measures to evaluate the accuracy of the tool. "We are keenly aware," Putnam-Hornstein acknowledges, "that by predicting future system involvement, we may simply be modeling (and inadvertently reinforcing) past decisions that were biased and/or wrong—placing children who could have stayed at home safely, and simultaneously failing to remove other children who are still being abused."

So they also collected data from a local children's hospital. If the system is working correctly, you would expect that many of the children who come to the emergency room for cases of attempted suicide or self-harm—or as a result of other kinds of trauma that are associated with abuse—would be those who were previously given higher risk scores. And indeed they are.

"Yes, the model is predicting placement in foster care among children referred for maltreatment," Putnam-Hornstein notes. "But the model is sensitive to children experiencing the most severe forms of more objective harm."

6

There will be more reports next year about Allegheny's PRM system. In the meantime, Douglas County in Colorado is developing a model for its child welfare agency as well. Vaithianathan, who is helping on the project, tells me her team was particularly enthusiastic about the county's plan to test the model against a control group in real time. Some of the calls that come in will be scored based on predictive analytics models, and some will not. At the end of a certain period, the results for the two groups will be compared.

It is possible that the county could cut the experiment short if the results are lopsided in one direction or the other—if children who are assigned a score have much better outcomes than those who don't, or vice versa. But at the outset, this effort could could provide the best possible kind of evidence about whether predictive analytics really affect outcomes for children.

The future of PRM in child welfare is still an open question. The advent of big data has a mixed record at best when it comes to improving government bureaucracies. Take No Child Left Behind, a George W. Bush–era program that required states to more rigorously measure K–12 student outcomes via standardized tests. Despite high hopes, it hardly moved the needle in most public school systems. Data alone aren't enough; what we do with the data matters.

Putnam-Hornstein acknowledges you could see a similar problem with the child welfare system. "Two years down the road, it could really just be one more piece of paperwork they print up and attach to their case files. Maybe they don't trust the score and so they just ignore it. Or maybe the child welfare workers don't do anything different in terms of interventions with the families."

But Gelles suggests things may go the opposite direction—that the score will come to be relied on more than it is in these pilot programs, impacting decisions even after the "hotline" stage. "If you're using this (a) to determine whether to remove a child from a home, or (b) to determine whether it's time to return a child to a home," he asks, "how are you not going to share that with the judge" tasked with making that call?

It could also be used to screen foster parents. That might begin to address the concerns of people like Wexler, who worry about the safety and well-being of children taken from their families by the state.

But none of the other researchers I spoke to were willing to discuss additional uses for the data yet. For now, they are focused on how to make PRM algorithms into the best call-screening tool they can be. The promise there is large enough. As Vaithianathan points out, "it's implementable anywhere. The whole country could be using it because the data is already there. In California we are just using child protection data. In Colorado we are using data from the child welfare system plus the public benefits system."

Putnam-Hornstein agrees. Though it may be necessary to make tweaks based on the kinds of data available and what the system is aiming to do in a given county or state, it would not be that hard to create similar algorithms for other places.

But just because the data are there doesn't mean everyone will be happy about using them in this way. In a 2013 piece for Popular Science, Viktor Mayer-Schönberger and Kenneth Cukier, the authors of Big Data: A Revolution That Will Transform How We Live, Work, and Think (Eamon Dolan), warned about the dangers of using data to predict crime and to help determine which interventions may be necessary. "A teenager visited by a social worker for having the propensity to shoplift will feel stigmatized in the eyes of others—and his own," they wrote. In other words, simply using big data could produce negative unintended consequences.

Jay Stanley, a senior policy analyst for the Speech, Privacy, and Technology Project at the American Civil Liberties Union (ACLU), says that the issue has come to the attention of some of his organization's state affiliates.

"It could be like Moneyball, that human intuition in this area is just inaccurate," he says. "And that even if the algorithms are also inaccurate, they might be less inaccurate than human intuition." But, he adds, "It could be that's not true. And that not only is the algorithm inaccurate but it could be worse than human inaccuracy [and] by making it seem like the judgment is coming from a computer, it is also hiding racial bias."

"The adage 'garbage in, garbage out' never holds truer than in the field of predictive analytics," writes researcher Kelly Capatosto in a report for the Kirwan Institute for the Study of Race and Ethnicity at Ohio State. "Subtle biases can emerge when seemingly 'race neutral' data acts as a proxy for social categories." The result is that "data that is ostensibly used to rate risk to child well-being can serve as a proxy for race or other past oppression, thereby over-representing those who have suffered from past marginalization as more risky."

Putnam-Hornstein and her colleagues are sensitive to these concerns. When they began this work in New Zealand, they consulted extensively with leaders in the indigenous Maori community. Because native peoples are more likely to be caught up in the child welfare system, they wanted to make sure this new tool was not having an adverse effect on them.

But Putnam-Hornstein argues that even—and perhaps especially—if you believe the deck is stacked against racial minorities, predictive modeling could be helpful. Let's say, for instance, that you believe the criminal justice system is biased so that the threshold for being arrested is lower for black men than it is for white men. That's actually a reason to include race in your algorithm: to help account for that difference.

This might sound like the soft bigotry of low expectations—are we really going to say a complaint is less serious if it's against a black man than if it's against a white one? But Putnam-Hornstein is right: Numbers can potentially correct for bias.

For those worried about the over-involvement of child services in the lives of low-income and minority families, predictive analytics really can start to solve the problem. "It is absolutely crazy that more than one-third of American children experience an investigation for alleged maltreatment between birth and age 18," Putnam-Hornstein says. That is a "huge intrusion into the lives of too many families." Worse, she argues, "it means we are flooding our systems to such an extent that we cannot identify the children who really do need protection."

She thinks big data may be the key not only to catching kids who are falling through the cracks of the current system, but also to protecting families who are being wrongly targeted. For the sake of all the children out there like Zymere Perkins, let's hope she's right.

*CORRECTION: The original version of this story misidentified Rhema Vaithianathan's instituional affiliation.