It is my sense of the field that AIC (Akaike information criteria) has moved past bandwagon status into a fundamental and still increasingly used paradigm in how ecologists do statistics. For some quick and dirty evidence I looked at how often different core words were used at least once in an article in Ecology Letters in 2004 and 2014. Regression was used in 41% and 46% respectively. Significance was used in 40% and 35%. Richness was 41% and 33%. And competition was 46% and 49%. Perhaps a trend or two in there but all pretty steady. AIC has gone from being in 6% of the articles in 2004 to 19% of the articles in 2014. So in summary – AIC has tripled in usage and is now found in 20% of all articles and is used almost 2/3 as often as the mostly widely used statistical technique of significance..

I have a theory about why this has happened which does not reflect favorably on how AIC is used. Please note the qualification “how AIC is used”. AIC is a perfectly valid tool. And like so many tools, its original proponents made reasonable and accurate claims about it. But over time, the community takes ownership of a concept and uses it how they want, not how it was intended.

And I would suggest how people want to use AIC is in ways that appeal to two low instincts of ecologists (and all humans for that matter). First humans love rankings. Most newspapers contain the standings of all the teams in your favorite sport every day. We pay more attention of the rankings of a journal’s impact factor than its absolute value. Any number of newspapers produce rankings of universities. It is ridiculous to think that something as complex as journal quality or university quality can be reduced to one dimension (which is implicit in ranking – you can’t rank in two dimensions). But we force it on systems all the time. Second, humans like to have our cake and eat it too. Statistics have multiple modalities or goals. These include: estimation of parameters, testing of hypotheses, exploration of covariation, prediction into new conditions, selecting among choices (e.g. models) etc. Conventional wisdom is you need to be clearly based in one goal for an analysis. But we hate to commit.

You can probably already see where I’m headed. The primary essence of what AIC delivers is to boil choices down to a single dimension (precisely it provides one specific weighting of the two dimensions of likelihood and number of parameters to give a single dimension) and then ranks models. And comparing AIC scores is so squishy. It manages to look like all 5 statistical goals at once. It certainly does selection (that is its claim to fame). But if you’ve ever assessed whether ΔAIC>2 you have done something that is mathematically close to p>0.05.

Just to be clear, likelihood also can be used towards all those goals. But they present much more divergent paths. If you’re doing hypothesis testing you’re doing likelihood ratios. If you’re doing estimation you’re maximizing. If you’re doing selection you can’t proceed unless you specify what criteria to use in addition to likelihood. You have to actually slow down and choose what mode of inference you’re doing. And you have to make more choices. With AIC you present that classic table of ΔAIC and weights and voila! You’ve sort of implied doing all five statistical goals at once.

I want to return to my qualification of “how AIC is used”. The following is a simple example to illustrate how I perceive AIC being used these days. Take the example of species richness (hereafter S). Some people think that productivity is a good predictor (hereafter prod). Some people think seasonality is a better predictor (hereafter seas). Some people suggest energy is the true cause (hereafter energ). And most people recognize that you probably need to control for area sampled (area).Now you could do full blown variable selection where you try all 16 models of every possible combination of the four variables and using AIC to pick the best. That would be a pretty defensible example of exploratory statistics. You could also do a similarly goaled analysis of variable importance by scaling all four variables and throwing them into one model and comparing coefficients or doing some form of variance partitioning. These would also be true exploratory statistics. You could also use AIC to do variable importance ranking (compare AIC of S~prod, S~seas, S~energ). This is at least close to what Burnham and Anderson suggested in comparing models. You could even throw in S~area at which point you would basically be doing hypothesis testing vs a null although few would acknowledge this. But my sense is that what most people do is some flavor of what Crawley and Zuur advocate which is a fairly loose mix of model selection and variable seleciton. This might result in a table that looks like this*:

Model ΔAIC weight S~prod+seas+area 0 31% S~prod+energ+area 0.5 22% S~prod+energ 1.1 15% S~energ+seas 3.2 9% S~energ 5.0 2%

There are a couple of key aspects of this approach. It seems to be blending model selection and variable selection (indeed it is not really clear that there are distinct models to select from here, but it is not a very clear headed variable selection approach either). Its a shame nobody ever competes genuinely distinct models with AIC as that was one of the original cliams to the benefit of AIC (e.g. Wright’s area energy hypothesis S~energ*area vs.the more individuals hypothesis a SEM with two equations: S~numindiv and numindiv~prod). But I don’t encounter it too often. Also note that more complicated models came out ranked better (a near universal feature of AIC). And I doubt anybody could tell me how science has advanced from producing this table.

Which brings me to the nub of my complaint against AIC. AIC as practiced is appealing to base human instincts to rank and to be wishy washy about inferential frameworks.There is NO philosophy of science that says ranking models is important. Its barely better than useless to science. And there is no philosophy of science that says you don’t have to be clear what your goal is.

There is plenty of good debate to have about which inferential approach advances science the best (a lot has happened on this blog!). I am partial to Lakatos and his idea of risky predictions (e.g. here). Jeremy is partial to Mayo’s severe tests which often favors hypothesis testing done well (e.g. here). And I’ve argued before there are times in science when exploratory statistics are really important (here). Many ecologists are enamored with Platt’s strong inference (two posts on this) where you compare models and decisively select one. Burnham and Anderson cite Platt frequently as an advantage of AIC. But it is key to note that Platt argued for decisive tests where only one theory survives. And arguably still the most mainstream view in ecology is Popperian falsification and hypothesis testing. I can have a good conversation with proponents of any of these approaches (and indeed can argue for any of these approaches as advancing science). But nowhere in any of these approaches does it say keeping all theories around but ranking them is helpful. And nowhere does it say having a muddled view of your inferential approach is helpful. That’s because these two practices are not helpful. They’re incredibly detrimental to the advance of science! Yet I believe that AIC has been adopted precisely because they rank without going all the way to eliminating theories and because they let you have a muddled approach to inference.

What do you think? Has AIC been good for the advance of science (and ecology). Am I too cynical about why hordes are embracing AIC? Would the world be better off if only we went back to using AIC as intended (if so how was it intended)?

UPDATE – just wanted to say be sure to read the comments. I know a lot of readers usually skip them. But there has been an amazing discussion with over 100 comments down below. I’ve learned a lot. Be sure to read them.

*NB this table is made up. In particular I haven’t run the ΔAIC through the formula to get weights. And the weights don’t add to 100%. I just wanted to show the type of output produced.