Look up California State Athletic Commission Executive Director Andy Foster on Sherdog’s Fight Finder, and you’ll see a man who knows his business when it comes to MMA.

Look past the suit and tie in his profile photo, past the expression that seems more befitting of a man trying to get your vote for city council than a man who could choke you out, and you’ll see that Foster, 36, was 9-2 as a professional fighter. His only two losses came against respected MMA veterans Brian Ebersole and Amar Suloev, the latter of whom scored a knockout victory over Foster in Foster’s last fight as a pro in 2007.

So when Foster took over as head of the California commission in 2012, he knew all too well that among the many persistent gripes fans and fighters had with the state of the sport, concerns about judging were near the top of the list.

Why weren’t problem judges removed, people asked. Why weren’t commissions doing more to evaluate their own judges? Why wasn’t there any accountability, or effort to improve on a flawed system that was so often responsible for determining whole career arcs, not to mention the size of the paychecks fighters took home?

Foster had wondered the same thing, which is why, after taking the job in California after four years with the Georgia Athletic and Entertainment Commission, he paid special attention to a presentation from former amateur boxer and boxing judge Matt Podgorski at a recent meeting of the Association of Boxing Commissions.

“I wanted something where I could gauge where my judges were at,” Foster told MMAjunkie. “You know, am I putting the right people in there? Just because you’ve been doing it a long time, that doesn’t necessarily mean that you’re right.”

Podgorski, who now works full-time as a statistical analyst for a Chicago-area food company, suggested a system to collect all the scoring data from all the fights, and then compare the judges against one another while also examining the numbers to expose certain scoring patterns and biases. He called it “The Pod Index,” and all he needed was a commission willing to put it into place.

To Foster, it sounded like exactly the kind of thing he’d been looking for.

“I wanted something to back me up other than just my thoughts,” Foster said. “I wanted some math or statistical thought process to back up what I’m doing with these assignments.”

What The Pod Index promised was a system that logs every judge’s score for every fight in the state of California. With a large enough sample size, Podgorski said, it would provide a statistical picture of which judges were consistently at odds with their peers, especially in split decisions.

“That doesn’t necessarily mean the judge is wrong,” Podgorski said. “It just means we need to investigate further. If we see a judge is the odd man out 20 percent of the time, we can go back and look at five or six fights in which they were in the minority and see what a greater number of different judges say.”

To complete that process, The Pod Index engages the services of five more “anonymous” judges who watch the same fight with the commentary turned off, then turn in their own scorecards for comparison. Getting a larger pool of judges lends clarity to the overall picture, since, as Foster put it, a judge who ends up in the minority of a split decision “could just be with two dummies.”

“But even if you’re with two dopes, the data will start to tell us who are the most consistent judges,” Foster said. “You could be right several times with the other two judges being wrong, but it’s probably not going to happen that way over and over again.”

Podgorski stressed that his sample size is still relatively small since he’s only been working with the California commission since July, and in a more limited capacity with the Nevada State Athletic Commission since November, but already that broader analysis has yielded some surprises.

For instance, those judges who wind up in the minority on split decisions? Once the field of judges is expanded to include the five additional anonymous judges, those lone outliers are vindicated more often than you might expect.

“As much as a third of the time, when there’s a close split decision, the judge in the minority might have three or more of the five anonymous judges agreeing with him,” Podgorski said. “It just goes to show that some of those split decisions really are tough calls.”

Podgorski knows all about that from his own experiences as a boxing judge. For more than 15 years, he scored bouts at ringside, he said, quitting only when he began The Pod Index as a side project, in order to eliminate any potential conflicts of interest. He still recalls a championship bout he scored, one that resulted in what he describes as “a wild split draw.”

“And I mean really wild,” Podgorski said. “I had it nine rounds to three for one guy, and another judge had it 10 rounds to two for the other guy. The third judge had it even. I was like, ‘What the hell?’”

Recently Podgorski tracked down video of the fight, just to see if his own system would offer any insight. He brought it to a World Boxing Council conference in Las Vegas and asked nine other judges to score it.

“And it was the same thing,” Podgorski said. “The scores were all over the place. That really opened my eyes. Sometimes you just get into your mindset. You can’t possibly see how anyone could see it another way, and then they do.”

But that raises the question of just how much a system like this can really change when it comes to the difficult task of scoring a fight. With a large enough sample size, it can identify consistent outliers among judges, maybe even those who need further training (or fewer assignments). But can it actually make judging better?

Maybe, if it’s implemented correctly. In fact, Podgorski said, his initial motivation in creating it was to use it more as a teaching tool than as a means of judge evaluation.

“The program is not designed to call judges out or embarrass them,” Podgorski said. “It’s not meant to say, ‘You’re the best judge, and you’re the worst judge.’ It’s not a ranking system. It’s a diagnostic tool for the judges to get better, and that’s where the recommendations come in.”

The recommendations, in some cases, are surprisingly specific and thorough. In a recent scoring review of a high-profile boxing match in California, Podgorski’s report identified one judge as having “a strong work-rate preference” in his scoring, which seemed to be altering his perception of close rounds. It recommended a specific bout for him to watch and score, in order to help the commission determine whether he’s too rigidly set in his ways, and whether that “strong preference can either be accepted or used as a catalyst to provide him some additional coaching.”

That additional coaching, according to Foster, is the whole point.

“There are some judges who like doing this, and it’s a fun thing for them, and there are other judges who really take this stuff seriously,” Foster said. “They sit at home and watch fights and take notes and do trainings. They focus on it. And these fighters who spend all this time in the gym, spend six or eight weeks in a training camp, they diet and live by this strict discipline just to spend 15 or 25 minutes in a cage. I want the same level of dedication from my refs and judges.”

According to Podgorski, the data is helpful not so much in determining who’s right and who’s wrong on any given decision, but more in identifying trends. That’s especially important in MMA scoring, which both Foster and Podgorski said has too often followed a boxing model, making it reluctant to use the full range of possibilities within the 10-point must system.

“The perfect example in MMA is, you can squeak out the first two rounds, just barely win them, 10-9 on all the cards,” Podgorski said. “Now you’ve essentially sealed the victory as long as you don’t get submitted or knocked out. It’s a problem especially in MMA because there’s no knockdowns. In boxing it’s easier. There’s a knockdown? It’s 10-8. It’s clear. Everyone agrees. In MMA you can pummel a guy and still get a 10-9 on half the judges’ cards. That’s something that definitely needs to be addressed.”

With this data to draw on, Podgorski said, commissions can go to certain judges and point out their recurring tendencies.

“We can do analytics to see how often judges are using 10-8 rounds, how often they’re split, to where one judge has it 10-9 and another has it 10-8, and that really gives us some firepower to say, ‘Hey, you scored 100 rounds in the last year, but you didn’t have one 10-8 round, whereas the average judge has five percent,’ or something like that,” Podgorski said. “It helps us encourage judges to think about it more broadly and not be so dead set on boxing-style scoring.”

Reactions from the judges have been mixed, Foster said. At first, few were terribly enthusiastic about a system that could potentially challenge their status and hold them up for more public criticism. But since the system was developed by a former boxing judge and first implemented in a commission run by a former MMA fighter, both men say that judges have gradually come around on the idea, for the most part.

“I know a lot of the judges personally, especially on the boxing side but also somewhat on the MMA side, so it makes it a lot easier coming from me,” Podgorski said. “It’s not some intern crunching numbers. It’s somebody who’s been there, who gets it, and can understand what the data means and what it doesn’t mean.”

Longtime MMA referee Herb Dean, who has also worked as a judge and helped train others in that capacity, said he’s encouraged to see some effort being made toward evaluating existing judges, but he doesn’t see the problem being fixed by statistical analysis alone.

“I think it’s useful to know, especially on a 10-8 round, who’s on the outside, but crunching the numbers can’t be the end,” Dean said. “We still have a mixed pool of judges who have a lot of different backgrounds. Some have more experience in this than others, some have actually been competitive in some of the sports that make up mixed martial arts, and some of them haven’t. Until it’s mandatory that everyone has a detailed understanding of the position, we’re going to have problems. But I do think it’s a good thing, a good start.”

At this point, Foster said, it’s still too soon to make any broad generalizations based on the data. The program has been in effect for less than a year, though the CSAC has worked to retroactively load the last few years worth of judging data in the state into The Pod Index’s database. In order to even begin to draw conclusions, Foster said, they need at least 60 rounds of scoring from every judge. Even then the data doesn’t tell them everything, but it does give them a place to start.

“When you’ve got five years worth of data that you can look at objectively, you can see someone who’s got 400 rounds worth of boxing scored and he’s only with the majority 71 percent of the time,” Foster said. “Then you’ve got someone else with the same number of rounds who’s with the majority 92 percent of the time. You know, I’m not saying that’s the only thing you should make a decision on, but it’s something to think about. It’s a tool. I think we have to be doing something to try and improve judging.

“And I think it is improving,” Foster added. “It’s a lot better than where it was, even if we’re not yet where we need to be.”