Computerized simulations have revolutionized games like backgammon. Can they do the same for America's favoriteand most statistically backwardsport?

It's early in the third quarter of a tie game and your favorite NFL team has the ball, facing fourth and goal from the two-yard line. Should it kick the field goal or go for the touchdown? In the past, this question would have been almost metaphysical in nature. Depending on their various philosophies, some coaches would elect to kick the field goal and others would try to ram the ball into the end zone. Actually, scratch that. Almost to a man, they would try the field goal, as it's the safer bet to put points on the board.



Courtesy Frank Frigo Frank Frigo (right) and Chuck Bower, the creators of ZEUS.

But, almost to a man, they are wrong. Or so says a computer- modeling program called ZEUS, which replaces philosophy with statistics. ZEUS's founders claim that the advice coming from their little robot can help most NFL teams win at least one more game every year.

ZEUS, first designed in 2001, works not by analyzing historical data but by creating its own. Its architects, backgammon experts Frank Frigo and Chuck Bower, based ZEUS on the neural network programs that have come to dominate their board game of choice. But instead of creating a bot that modeled dice rolls and checker movements, Frigo and Bower set out to develop one that mimics all of the action of an NFL game, from runs to passes to kicks to penalties, and keeps detailed stats of the simulated games it plays.

The driving force behind ZEUS is an idea called match equity. In a game between equally-matched opponents, the thinking goes, each play changes the likelihood that a given team will win. Take the example above. The team on offense (let's call it the Bears) already has a game winning chance (GWC) above 50% by the time it reaches the two-yard line, because it is in position to score, and even if it doesn't, the team on defense (let's call it the Jets) will probably have crappy field position. At the fork in the road that is the coach's choice between kicking and going for a touchdown, ZEUS can play out hundreds of thousands of games based on that decision. As it turns out, teams that kick the field goal win almost 4% fewer games than those who go for the touchdown. (Bower and Frigo have several other examples of how play calling affects GWC at their barebones website, Pigskinrevolution.com.)

Frigo and Bower have pitched rights to their idea to several different NFL teams, but so far, none are biting. Part of the reason could be the cost of an exclusive license, which Frigo tells Gelf is in the low six figures. Another factor could be that, as in backgammon, the modeling program can only be used as a tutorNFL rules stipulate than any such coaching aid has to be turned off during games. But the major reason, Frigo says, is the backward culture of the NFL. "It's not real sophisticated," he says.

Gelf talked to Frigo, a former world backgammon champ, about how he was able to get ZEUS to create real NFL stats, how it can be used to settle bar debates, and why conservative coaches are like smokers. The following interview has been edited for clarity.



Courtesy Wikipedia GNU backgammon is one of the neural-net-based software programs that have revolutionized the game.

: What do football and backgammon have in common?

Frank Frigo: A lot. Which is why we started the project in the first place. Backgammon went through a real technological revolution in the very late 1970s. TD-Gammon was the first program that was ever able to beat a world champion at any board game of skill [Editor's note: The technical description of the program is here]. The part that most people didn’t realize is that they played a very short match and the robot got somewhat lucky. It didn’t really play better than a human at this point. But nonetheless, it was quite a programming achievement. So that was an early version of a neural net that they had developed for gaming, well before Deep Thought or Deep Blue had been developed for chess. A later version came out that was called "Snowie," which is still today considered to be the best backgammon playing program in the world. It plays every bit as well as a top human expert.

GM: So how have computers changed the game?

FF: It's changed the game in the sense that the learning curve is a lot different. I had been playing for about 15 years pretty seriously by 1994 when I won the championship. Nowadays, somebody who gets a really good bot like Snowie 4 and locks themselves in a room for two years and studies like crazy can become a very, very good player.

It's also revolutionized the game in the sense that it's overturned some strategies. Back in the 1970s, you would argue forever about what was the right or wrong play and never really resolve it. Now, if there's a play that's in question that somebody made, we stick it in the bot, we do an extended rollouta simulation of 20,000 games or whateverand whatever that answer is, it's pretty much the end of discussion.

The model came up with really unconventional ideas that were overturning some of the human evolution of the game. Basically, the human evolution of the game is, there's some guy who has some decent tournament success, he writes a book professing some ideas, people anchor on those notions and just run with them without ever testing them.

GM: That sounds pretty similar to football.

FF: Time goes by and no one really challenges it because it's almost like the dogma of the game. People don’t want to challenge the experts. Along come these neural nets that suggest that in certain situations where humans were making these really romantic plays and taking risks where they didn’t need tosomething simple was far superior. We realized that these human subtleties of the game, the psychology of the game, were highly overrated. The proof in the pudding was that the models could perform every bit as good or better than a human, and we could test it. We could have them play, and sure enough they could compete. Some of the things from the human evolution of the game got confirmed; others were overturned.

We witnessed this in backgammon, and it occurred to me early on that there were other games that we could apply these types of strategies to. The one thing that really jumped out at me about football was the situational nature of the game. There was a stop-action element to it. You set up for a play. There were these multi-variables: how much time is on the clock, what's the score differential, what's the ball position, how many timeouts, etc. All of these things affect the decisions and these directional choices were really similar to backgammon situations where you're at a fork in the road. It’s a fourth down and you're deciding between a very deliberate path of punting or kicking a field goal or going for it; or you're deciding between a one- and a two-point conversion after a touchdown; or accepting a penalty or not. These types of thing are pretty transparent decisions. You line up and are choosing a particular path. That's very similar to backgammon and the multivariable nature is very similar to backgammon.



Courtesy Wikipedia ZEUS often disapproves of Bears Coach Lovie Smith's decisions.

: Why football and not, say, baseball?

FF: Baseball is a very good fit and we thought very hard about it, but the difference is that in baseball, the sabermetricians had already penetrated the game. We met with Paul DePodesta, who was then the GM of the Dodgershe's looked at ZEUS. They've done some pretty sophisticated things in baseball much earlier. But the NFL is a very curious culture, I've got to tell you. From meeting with a lot of these guys, it's not real sophisticated. You'll hear stories that these teams are really into technology. Don't believe what you hear. They're not.

GM: How well does the data that ZEUS creates match up with historical data?

FF: We built this model that could mimic NFL teams. The only guy who had ever done real historical work was Peter Palmer in The Hidden Game of Football . We went back and got his data where he had looked at very common situations and discovered how often a typical team wins in given situations, given point differentials, given field positions for some very common types of situations. We then laid ZEUS results against the Palmer results and the correlation was staggeringly on. That was really the final stamp of approval that we had created a core model that was behaving pretty darn close to a real game.

Now we have this core model, so we can feed it any unique situation and say, "Okay, against two equal teams, play this to conclusion, with the first play being a punt and the first play being a short run." Which fork in the road wins more often?

People say, "That's all well and good, but can you really replicate how real humans behave? Can you capture all the subtleties of the game?" One of the other things that we learned from programming achievements of backgammon is that this whole notion of skill differentials and emotion and momentum don’t make a whole hell of a lot of difference when you were comparing play A versus play B. We suspected this might be the case in football. I can have the model play it out 500,000 times for the first play being a punt and the first play being a run.

GM: What if one team is much better than the other one?

FF: We have a customization input. We have the ability to program in the unique aspects of offense and defense and special teams for each team. The model can now behave like any two teams in the NFL. By mid-season, when the statistics start to become significant, I can put in the fact that this is a very good rushing team, this is a very poor rushing defense that they are playing. I can put in the custom traits where hard statistics exist to further customize the two teams.

GM: Does customization change the outcome significantly?

FF: It can change it, but a lot of times, it doesn’t necessarily overturn the fact that play A was better than play B. Between two clones, it might be that play A was 3% better than play B. When I now put in the custom features of both teams, it might be that play A is still better than play B, but now it's only 2.4% better. All the coach really cares about is choosing the right play.

GM: What about up until the middle of the season when we don’t yet have statistically significant numbers?

FF: The big question that everybody asks is, "How do you know that your customizations are really accurate?" Sometimes, there are hard statistics and sometimes it's early in the season and maybe we're not 100% sure how good those customization inputs are.

The model will now go back, after it does its initial valuation, and it tries to overturn its own recommendation. It will go back after it says the run was better than the punt. Let's make the rushing team the worst rushing team in the NFL. Let's make the punting team the best punting team in the NFL. Let's make the punt return the worst punt return in the NFL. Let's make the rushing defense the best rushing defense in the NFL. It now sets up parameters that would be the worst-case scenarioby far the most likely scenario to overturn the model's recommendation. It's far worse than anything you could contrive in reality. Now the model re-runs the simulation under those conditions. It then compares the change that was made. Surely that will make a change in the results in terms of magnitude of the error. The fascinating thing is, often it does not flip the decision.

We then built an algorithm that, after the sensitivity analysis is done, you compare the core simulation result with the worst-case simulation result and the probability that the decision gets overturned and the magnitude by which it gets overturned results in a confidence factor. If the confidence factor is 10, that means that the worst-case custom simulation didn’t overturn the original decision, it only changed the magnitude. In other words, the coach made a blunder.

Now, if overturns it by a little bit, the confidence factor gets watered down. The way ZEUS works, it gives a recommendation. Some of the decisions that [New York Giants' Coach Tom] Coughlin made in our ESPN analysis were confidence-level-10 blunders. Some of them were lower. What we tell coaches is, if the confidence level is zero to three, ZEUS is saying something that you need to pay attention to, but if there are extraneous factorssome gut feel you've got about the way your guys are performingit's not unreasonable to overturn the ZEUS recommendation. If you get into the midrange of four to six or seven, it gets a lot less likely that ZEUS is wrong; when you get up into seven, eight, nine, almost nothing should overturn it, and when you get to 10, forget about it. It's irrefutable.

GM: What do coaches that you talk to think about two backgammon players coming and telling them how to coach?

FF: Well, they put more stock into the artistry of the game, what they view to be momentum and the emotional aspect of the game. In fact, we had one meeting with the St. Louis Rams a couple years back. We had basically the whole coaching staff in front of us with the exception of [head coach Mike] Martz. One of the guys, who was an ex-football player, said to me, "Here's the part you don't understand." He was talking down to me a little bit. He said, "When I put my guys out on the field, when I put them out on a fourth and short, there's a momentum and emotional aspect to this that your model can't take into account." He goes, "I've got to look those guys in the eye when they come to the sidelines. And if I put their butts on the line to go for it on a fourth and short based on your model and we fail, they're going to be dejected." What I didn't say to him at the time was that I think he was underestimating his players' ability to handle emotions. The argument I did make to him was, "OK, I don’t disagree with you that there's such a thing as negative emotionally-induced momentum. But if you're going to make that argument, then you would agree that if they succeed on the fourth and short, that they should get a boost. Is that fair?" He agreed. I said, "Would you also agree that you were more likely to succeed on that fourth and short than fail?"

"Yeah."

So the net momentum emotional effect was a positive in going for it. And he kind of shut up. It's a typical example of these coaches. They only see the negative side of the equation.



Courtesy Wikipedia Bengals Coach Marvin Lewis knows coaching isn't just about winning.

: Almost everything that ZEUS tells us is that these coaches are too risk-averse. Is it because they are concerned with something other than just winning?

FF: Absolutely. We met with Marvin Lewis a couple years ago in Cincinnati and he looked me in the eye and said to me, "I don’t disagree with what ZEUS is saying. You guys might very well be right that we're calling something too conservative in that situation. But what you don’t understand is that if I make a call that's viewed to be controversial by the fans and by the owner, and I fail, I lose my job."

He was brutally honest about it. That's what's going on. You've got this culture where nobody wants to stick their head out. You're a head coach, you're making $1.5 million a year, and if you operate within the status quo or improve a little bit, you're going to lock up your five year contract. You can't really blame these guys for having this individual risk profile. The real shame is that it's misaligned with what should be the owner's profile. There's no question there's a risk aversion.

GM: Has ZEUS ever found a coach to be too aggressive?

FF: There haven't been a lot of examples. There have been cases where guys have gone for it on a fourth down and shouldn’t have or we didn’t agree with their onside-kick attempt. There was one this season, where the Jets opened the second half of the Bears with an onside kick and the Bears recovered and went on to make the first score of the game. The press just ripped the Jets for it. [Editor's note: See this piece by USA Today columnist Ian O'Connor.] ZEUS actually said that it was not a terrible decisionit said that it was a little bit of an errorthe funny part about it was that the Bears recover, drive down the field, and settle for a field goal when they should have gone for the TD according to Zeus, which was a huge, multi-percent error [Editor's note: You can see the Bears' drive chart at ESPN]. But the onside kick that the Jets did was like a fraction of a percent. Meanwhile, the press is just focused on the fact that the Jets did the most ridiculous thing ever and they don’t realize that in the same sequence of play, the Bears did something far worse. The Bears won the game, though, so everyone forgets about it.

GM: You've called out Joe Buck and Troy Aikman for not saying anything about a similar error in the Bears-Patriots game.

FF: They don’t even notice it, which is really common. If you win, all is forgiven. Everybody has a short memory. You lose, they'll go dig through everything you did and second-guess it.

GM: Are the announcers and the media generally as misinformed as the coaches?

FF: They're worse. John Madden is painful most of the time. He says some things that are complete psychobabble. I think the media is misinformed. I think some coaches get it, but they're just afraid to implement it. Some guys refuse to believe it or don’t want to.

GM: You guys have gotten a lot of media coverage recently (from Wired News to Esquire). Has it been accurate? Are you happy with it?

FF: They all kind of want to dumb it down a bit. One thing they always say is that we can't get too technical. That's really a challenge for us. You can't really explain what our model is unless you get into the methodology behind it. Some people get this idea that we look at a bunch of NFL historical statistics and then we give an opinion on if it was right or wrong. I keep trying to tell people that ZEUS is not our opinion, it’s a vehicle.

GM: What do you think has sparked all of this media interest?

FF: I think part of it was the Super Bowl analysis we did last year. People really love this whole idea of looking at all the critical plays in a table. I think the fan base is really fascinated with it. These are the types of things that people debate over a beer and they talk about over the water cooler on Monday morning. For somebody to come out and be able to settle a wager and say, "Look, this was a boneheaded play or this wasn't a boneheaded play," that's really interesting. I think a lot of people have suspected that there are some strategies that aren't being fully implemented, like onside kicks, and for somebody to come out and actually put some science to it and say, "Lo and behold it is underutilized." It seems that it's really gotten some people pumped up out there.

GM: In the Esquire article you say that coaches should almost always go for it on fourth and short.

FF: It's this whole risk-aversion bias where they don’t want to put it on the line now. If they're going to give up that touchdown, then they'd rather give it up over several minutes than give it up in 10 seconds.

GM: It seems to me that that's why many people have such trouble coming to terms with these ideas that stats suggest. A lot of it is about delayed gratification…

FF: I think it's totally the case. I can't remember which behavioral psychologists gave this example, but I found it fascinating. They were talking about how people view risk and they were saying that people will smoke cigarettes their whole lives and won't comprehend the gradual risk because it doesn’t affect them on a day-to-day basis. But if one in so many cigarettes has an explosive in it that will blow your head off, people will stop smoking a lot quicker. They are viewing this catastrophic result in the short term which they have a total aversion to. If it’s a gradual process, they're willing to weather it.

I think it's a similar thing to football. All that most coaches can think about is, "This could be a disaster right here and now. If I punt the ball, it's going to play out for a bit. It's going to dilute the effect."

GM:Can you rank your coaches by their decisions?

FF: One of the things we're planning to post at the end of the season is we're planning to look at all 16 regular season games for each team. And we'll come up with a coaching index with their aggregate error rates.



Courtesy Wikipedia Cowboys Coach Bill Parcells makes some of the best decisions of any NFL coach, according to ZEUS.

: But you'd still have plenty of data to rank the coaches, right? Can you give us a tease of who a couple of the top coaches would be and who a couple of the bottom ones would be?

FF: My suspicion is that you will see Belichick and Parcells perform a bit better. And you'll probably see guys like Nolan in San Francisco, Coughlin, andit almost sounds shameful to say it because they're having such a good seasonbut Lovie Smith has done some really wacko stuff. Even the Chicago press said, "What the heck was he thinking when he did that?"

There's still quite a bit of parity. If I had to speculate on what the distribution curve is going to look like, the best performers will have given up about a half-game in expectation and the worst performers will probably have given up a game-and-a-quarter to a game-and-a-half.

GM: What else can ZEUS tell us about football besides which coaches are boneheads?

FF: We have a really interesting application for doing player evaluations, which is still going to get launched. The really cool thing about this is that we have always had this goal of converting everything in this game to a common currency of GWC: the cost of a blunder, the cost of a turnover, the cost of a suboptimal coaching decision, and the cost of having somebody who's better or worse at a given position in your lineup.

When they're analyzing game film, coaches break every play down in terms of who was participating, who was an integral part of that play and who was a bystander. Every play, a team's GWC goes up and down. Every increment you can see how much GWC was gained or lost. We're building an algorithm so that when they're analyzing the game film, they'll be able to assign the level of participation either positive or negative that impacted the given play on a given player. If someone was on the sideline or not involved in the play, then he is assigned a zero. If somebody missed a block or made a terrific pass, he could have a positive or negative relative weight on a given play. You could have a positive or negative GWC on every given increment. In aggregate for game or season, every player will get a positive or negative rating based upon their overall GWC impact. It can apply to a lot of the secondary positions that don’t have hard statistics. When you're breaking down game film, you can see how much impact this guy actually had. Over the course of the season, it might really add up. Maybe in a lot of plays with high GWC impact, this guy was playing a role in that play or he was goofing up, and it was costing us. The idea would be to, at the end of the season, be able to line up each roster and have a positive or negative rating on a given player and be able to line up every center in the league and say, who are the guys who are really contributors and who aren't. And it's being done in the same objective unit that you evaluate blunders, that you evaluate coaching decisions. [Editor's note: Sabermetricians have done similar work with win probability added in baseball (Wikipedia).]

You can really compare how everybody is performing and how much is at stake in their performance. You can start to make decisions within the salary cap about how you want to distribute salary and get the best return on your investments.

GM: If every team followed ZEUS principles, kickers would be kicking a lot less, right?

FF: Punters would be punting less. Kickers would be kicking fewer field goals.



Courtesy Wikipedia If the NFL followed ZEUS guidelines, there would be far fewer field goal attempts.

: Any idea how much kicking incidence would decline overall?

FF: I could take a stab at it. Just looking at the games that we've analyzed, it's not unusual for there to be at least, per team, two to three situations where they punt where ZEUS says that they shouldn't have. I could see there being a magnitude of 20 to 30 fewer punts per team in a seasonthat's probably reasonable. I think field goals would decrease as well. The very biggest magnitude errors are deep in opposing territory, short yardage, like settling for the "guaranteed" points of a field goal versus going for it. The thing that most teams don’t appreciate which ZEUS captures is that when you fail [on a field goal] the resulting field position for the opposing team is pretty significant.

GM: A lot of it seems to go against what seems practicalthe idea that sometimes going for two at the end of the game when you're down by one instead of trying to send it into overtime seemed crazy to me.

FF: It depends what you think your chances are on a two-point conversion and what you think your chances are in overtime. When we met with Belichick, we gave the example of the last play of the game about going into OT or going for a two-point conversion to win the game to give him the basic decision tree. For whatever reason, he latched onto it. He said that he didn’t know how often he made a two-point conversion. He said something like, "It's subject to the conditions on the field, what I think about my team. Sometimes I think I might make every time and sometimes I think I might not make it at all." It was a very strange thing to say. I'm not trying to disparage Belichick, I think a lot of him. I said to him, "Surely you've got some expectation in mind. I'm not telling you what your expectation has to be but you've got to have some criteria for your decision."

We're not giving our opinion. We're providing science; we’re being pretty transparent about our methodology and our criteria. It's just like we've witnessed in backgammon. When you go into something very objective and you just input the basic rules, you start to see some really counterintuitive things.