More than 30 years ago, Congress identified what it said was a grave threat to the American promise of equal justice for all: Federal judges were giving wildly different punishments to defendants who had committed the same crimes.

The worries were many. Some lawmakers feared lenient judges were giving criminals too little time in prison. Others suspected African-American defendants were being unfairly sentenced to steeper prison terms than white defendants.

In 1984, Congress created the U.S. Sentencing Commission with remarkable bipartisan support. The commission would set firm punishment rules, called “guidelines,” for every offense. The measure, signed by President Ronald Reagan, largely stripped federal judges of their sentencing powers; they were now to use a chart to decide penalties for each conviction, with few exceptions.

Five years later, a legal challenge to the sentencing commission wound up before the U.S. Supreme Court. In a case titled Mistretta v. U.S., the court was asked to consider whether Congress had overreached by taking on what seemed to be a role for the judiciary. In an 8-1 decision, the justices determined that the sentencing commission was constitutional. And they took care to say that the commission was also needed — to end the widespread and “shameful” sentencing disparities produced by the biases of individual judges.

Mistretta was a momentous decision, but it’s now clear the high court relied on evidence that was flimsy and even flat-out wrong.

The justices, in issuing the 1989 decision, had cited a single congressional report in concluding that there were disturbing and unacceptable sentencing disparities that needed to be addressed. That single report, in turn, was based primarily on two studies conducted in the early 1970s, both deeply flawed.

One of the studies was an experiment that surveyed federal judges about sentences they might give in hypothetical cases. Asked what sentences they would give in a specific tax fraud case, for example, the prison terms recommended by the judges ranged from three to 20 years.

That sounds significant. But the experiment ignored a basic fact about the real-life workings of the federal courts: Judges acted as the sole arbiter of sentences in a tiny fraction of cases. The vast majority of sentences were the result of plea bargains negotiated by prosecutors and defense lawyers, deals that were subject to a judge’s approval but that were not his or her handiwork. If fact, there was no evidence offered that judges around the country were signing off on vastly different plea bargain terms. And later research would debunk the claim.

The other study compared average sentences in federal district courts and indicated prison terms for identical crimes were often years longer or shorter depending on where judges presided. But an examination by ProPublica shows that the study was riddled with sample size errors that should have rendered much of the data unusable.

For example, the study said the average prison term for larceny in federal courts nationwide was three years and four months. In Maine, the average was listed as 12 years, more than three times as harsh. But our examination of the underlying data shows that only one person was sentenced to prison for larceny in Maine’s lone federal court that year; the “average” used by the congressional study was based on a single conviction.

The study — done for a Senate committee working on sentencing reform and using data from the Administrative Office of the U.S. Courts — was also distorted by an outright mistake. The study claimed Kentucky’s eastern district court sentenced burglary convicts to, on average, nearly 14 years in prison in 1972, which appears remarkably punitive for a property crime and out of whack with courts in the rest of the country. The study’s data, not the judges, was the problem, it turns out.

ProPublica’s review shows Kentucky’s eastern district had only four prison sentences for burglary that year. Three of the convicts were adults who received an average prison term of three years and four months. The fourth case involved a juvenile defendant supposedly sentenced to 550 months — more than 45 years — in a cell. The maximum sentence for burglary was 15 years, making that an impossibility, but the mistake wound up in the report that helped shape the thinking of the nation’s highest court. Court officials in Kentucky told ProPublica they could not determine what the juvenile’s sentence had actually been.

In recent months, ProPublica has fact-checked a sampling of the Supreme Court’s majority opinions from 2011 through 2015, and found a number of errors or glaring inaccuracies. The errors came from legal filings, from government records and from the independent research by the justices themselves.

In the case of the sentencing commission, the court’s decision proved enormously consequential. The federal sentencing guidelines produced by the commission helped remake the nation’s prisons. During the following decade, far more people went to prisons to serve far longer sentences.

“The guidelines did increase severity, pretty much across the board,” said Kate Stith, a Yale University law professor and expert on the federal sentencing commission.

Individual states followed the federal lead and instituted sentencing guidelines for state offenses, similarly lengthening prison terms and inflating prison populations.

Yet ample scholarship done over 30 years has only made clearer that the central rationale for the commission’s creation — large and pervasive discrepancies in sentencing imposed by judges — never existed.

Multiple sophisticated analyses of court data have found that judges, when left to deliver sentences on their own, do not differ greatly. Before the guidelines took effect, the average difference between judges was roughly five to eight months — not years, as the congressional report claimed and the nation’s highest court believed. After the guidelines were instituted, additional analysis has shown, the difference between sentences shrank by roughly a month.

Today, the sentencing commission’s guidelines are merely “advisory,” not mandatory, as a result of several subsequent Supreme Court decisions involving whether aspects of the guidelines violated a defendant’s right to trial. Today, judges can largely disregard the rules so long as they explain their reasoning.

The purported sentencing disparities that spurred the guidelines in the first place were not considered in the later rulings that restored federal judges’ discretion.

ProPublica requested comment from Chief Justice John Roberts and the other justices. As they had when presented with the earlier errors in opinions uncovered by ProPublica, the justices declined to respond.

To a significant degree, the drive for sentencing reform started with Marvin Frankel, a veteran federal judge in New York in the early 1970s. Frankel believed he and his colleagues were too easily swayed by biases and passions — in short, judges were too human — to be trusted with handing out criminal penalties on their own. Frankel’s book, “Criminal Sentences: Law Without Order,” published in 1973, became a rallying-cry for a movement.

“The almost wholly unchecked and sweeping powers we give to judges in the fashioning of sentences are terrifying and intolerable for a society that professes devotion to the rule of law,” Frankel wrote. The federal courts lacked strict rules for the appropriate punishment for each crime and individual judges reached wildly different sentences in similar cases, he argued. “The result is chaos.”

Stith, the Yale professor and expert on the sentencing commission, said Frankel’s argument found an eager audience. Republicans and Democrats, she said, had for years been promoting their own narratives — of the too-punitive “hanging judge” or the too-lenient “bleeding heart judge” — to suit their agendas.

Frankel wasn’t done after his book was published. He next helped to oversee an experiment that further fed such narratives. The Second Circuit U.S. Court of Appeals — which includes New York, Connecticut and Vermont — sent 20 hypothetical criminal case files to its judges and asked them to choose sentences. The files described an array of convictions, including many white-collar and illegal drug trade crimes.

The judges answered quite differently. On a financial fraud and tax evasion case, one judge chose a three-year prison term for the convict while two colleagues gave 20 years. For a case of theft involving interstate transport, the chosen punishments ranged from more than seven years in prison to mere probation.

Ilene Nagel, then a law and sociology professor at Indiana University, said the conclusion to be drawn from the experiment was clear: Judges produced unacceptably divergent sentences.

“It wasn’t a contested issue,” said Nagel, who was one of the first appointees to the sentencing commission.

The calls for reform were bipartisan to a degree unthinkable today.

The resulting legislation proved sweeping. It eliminated the federal parole system and formed a sentencing commission. The new agency would have seven members appointed to six-year terms by the president and then confirmed by the Senate. The commissioners would include judges, academics, prosecutors and defense lawyers. They’d write sentencing guidelines based on the criminal offenses, the defendants’ criminal histories and other factors related to the seriousness of the crime. Their word was binding.

Sen. Strom Thurmond, the longtime Republican leader from South Carolina, introduced the bill, which included the sentencing overhaul among several changes in the criminal justice system. Sens. Edward Kennedy and Joe Biden, pillars of the Democratic party, were among the bill’s earliest co-sponsors. Only one senator voted against the measure when it went to the floor in 1984. The House of Representatives passed it easily, with almost 100 “yes” votes to spare.

The Second Circuit survey went all but unquestioned for nearly 20 years, until researchers began examining the sentencing guidelines’ effect and gave Frankel’s arguments fresh scrutiny. Most notably, a U.S. Department of Justice study into racial disparities in sentencing in 1993 served as a takedown on the long-heralded experiment.

“Defendants in many courts plead guilty only after various kinds of agreements are reached regarding charges, sentence recommendations, and even ‘sentence promises,’” the Justice Department report stated, evaluating the Second Circuit results.

In real court cases, “it is likely that these dynamics constrained judges in their sentencing decisions,” the Justice Department report added.

Federal judges were not, in general, sentencing erratically, said Douglas McDonald, an expert in health and criminal justice analysis at the global research firm Abt Associates. That became clear “when you looked at real data,” McDonald said, “not the made-up, simulated thing that Judge Frankel sent around.”

The second study the Supreme Court relied on in Mistretta doesn’t fare well under scrutiny, either.

It looked at sentencing data from actual federal criminal cases. A consultant to the Senate’s Judiciary Committee, which was drafting the sentencing reform legislation in the late 1970s, and a pair of legal scholars from Yale and the University of Texas compiled the numbers from the federal courts system.

The effort was intended to further prove that sentencing was chaotic, varying greatly by judge and just as significantly by region. The data compared average sentences for certain crime categories between individual district courts to averages for the rest of the country. The authors set out the numbers in easy-to-digest charts, which gave the stark impression that geography dictated the degrees of punishment.

A prominent chart showed California’s northern district judges sentenced burglary convicts to an average of 10 years in prison in 1972, about twice as long as the national average.

However, the California district’s “average” was not an average at all. Only one burglary convict was sentenced to prison there during that year, data from the study’s appendix shows.

That problem with sample size was just one of many in the study, which ProPublica scrutinized as part of its reporting on misinformation in the Supreme Court’s majority opinions. ProPublica was fact-checking a 2013 ruling, Peugh v. U.S., that relied on information from the Mistretta opinion about the sentencing commission.

ProPublica unearthed hundreds of pages containing four-decades-old federal courts data to test the sentencing averages study. It appears to be the first time the research has been rigorously checked.

The study’s first criminal category — homicide and assault — is so riddled with sample size flaws its figures are useless. Averages for the Maryland and New Jersey judges are based on two cases in each case. (State courts generally handle violent crimes so it makes sense that federal courts did not hand out many such sentences.)

While the Supreme Court's Mistretta opinion and lawmakers repeatedly said that judges’ sentencing disparities could not be explained by defendants’ criminal histories, the analyses of federal data they depended on did not address the question.

For instance, the congressional report singled out the Illinois northern district court as unjustifiably lenient on robbery convicts; it averaged prison terms of less than seven years in 1972 compared to the national average of 10 years.

But it turns out robbery convicts in the Illinois court were a lot different than those across the country. A majority had no criminal history at all, and only 28 percent had previously served time in prison. The shorter sentences, then, were easier to understand.

Nationwide, most federal robbery convicts had substantial criminal histories, and 50 percent of them had previously served prison time.

Further illustrating the point, judges in the Missouri eastern district averaged 15-year prison sentences for robbery, several years longer than the national average. But 85 percent of that court’s convicts had been in prison before. The underlying data suggests that defendants with bad criminal histories often received longer prison sentences.

The sentencing guidelines took effect in November 1987. One month later, John Mistretta was indicted on drug trafficking charges in Missouri, for which he pleaded guilty to a single count of conspiracy to distribute cocaine and received 18 months in prison under the new rules. He appealed his punishment, arguing that the commission and its work violated the separation of powers required in the Constitution.

Lawyers for the federal government defending the guidelines provided the congressional report, containing the judge experiment and district sentencing averages, to the Supreme Court. The case centered on legal arguments, not questions about the statistical evidence.

Rather, the justices and lawyers treated the crisis of sentencing judges as established fact. “Congress wanted to fetter the power that individual judges had been exercising, because they are the ones who created the problem,” Justice Anthony Kennedy said during oral arguments in Mistretta. “They are the ones who gave the disparate sentences all around the country.”

Paul Bator, who represented the sentencing commission, described federal judges’ control of criminal punishment as being “very ugly days of discriminatory and arbitrary sentencing.”

In the majority opinion, Justice Harry Blackmun referred to sentencing disparities among judges as “a serious impediment to an evenhanded and effective operation of the criminal justice system.”

The consequences of the court’s ruling were considerable and have helped fuel sharp debates about issues such as mass incarceration.

In 1984, the average prison term handed down by federal courts was two years. That more than doubled over the decade that followed, according to figures from the sentencing commission. About half of federal convicts received probation when judges controlled most of the sentencing. Only 7 percent got probation last year.

Stith, the Yale professor, said the sentencing commission’s guidelines came to include sentencing “enhancements,” such as whether a defendant attempted to destroy evidence or had access to a weapon. They became a major part of the new punishment formula and had the effect of lengthening sentences.

Those “enhancements” eventually became the focus of another case that wound up before the Supreme Court. Lawyers for defendants argued that the sentencing enhancements were akin to convictions for additional crimes, crimes that had not been proven as part of the criminal prosecutions. The Supreme Court eventually barred the use of the enhanced penalties.

Read More It’s a Fact: Supreme Court Errors Aren’t Hard to Find A ProPublica review adds fuel to a longstanding worry about the nation’s highest court: The justices can botch the truth, sometimes in cases of great import.

But the creation and legal endorsement of the commission preceded a variety of developments that led to harsher penalties in court. Congress, for instance, made the punishment for crack cocaine much more severe than for powder cocaine. In 1985, the federal prison population consisted of just over 40,000 inmates, according to U.S. Bureau of Prisons data. During the decade that followed, with fixed prison sentences and tough anti-drug laws, the prison rolls grew by 150 percent and topped 100,000 inmates in 1995.

Nagel, the former commissioner and law professor, said the anti-drug laws are to blame for most of the severe penalties that swelled the federal prison population. Commissioners tried to prevent, or at least minimize, the harshest sentencing changes, she said.

“On several occasions, the commission tried to get Congress to back away,” Nagel said. She and her colleagues argued to lawmakers that violent crimes should be the priority for long sentences, not drugs. Their lobbying failed.

Drug prosecutions, in the end, produced their own set of disparities — disproportionate numbers of minority defendants. In 1996, 73 percent of those convicted for drug trafficking were black or Hispanic, roughly triple their share of the nation’s population. Researchers had struggled for decades to demonstrate racial disparities in federal sentencing. Suddenly, the gaps were glaring and repeatedly proven.

Indeed, sentencing disparities by race remain prevalent as ever, according to an analysis the commission released last month. Black men’s prison terms the past two years were 19 percent longer, on average, than those received by white men, the study shows. The difference cannot be explained by the crimes or the defendants’ criminal histories. The sentencing commission says there is no sign the racial gap for incarceration is getting smaller.