One of the most invigorating streams of research in the field of cognitive science is the practice of cognitive modeling. We’re far from understanding how the mind works, but by building cognitive models of representations or processes, we can simulate aspects of cognitive functioning to better understand them.

We’re currently in a position where almost every aspect of cognitive functioning, including , perception, language, and motor control, all have a wide range of cognitive models that are successful in capturing a range of relevant phenomena.

A recent publication in Computational Brain and Behavior by Michael Lee and a number of colleagues makes a range of suggestions for how we can improve robustness in cognitive models. That is, according to the authors, a good model goes far beyond merely providing a good explanation for the data at hand. Instead, a model is considered robust when it both generalizes to novel datasets and can provide consistent answers despite variations in its implementation or parameterization.

The authors also recommended a number of practices from the Open Science movement. In order for models to be useful, it’s best if they’re made more easily available to other researchers. And what is perhaps the most controversial point of all, the authors recommended pre-registration when cognitive models are used in experiments.

Pre-registration is a practice that has become quite popular in the Open Science movement as a remedy for the reported poor replication rate in psychology. In traditional psychology experiments, a researcher is often free to analyze the data how they please after the data have been collected. However, pre-registration deviates from this by specifying the entire analysis plan in advance of the experiment. This practice is designed to minimize practices such as “p-hacking”, where researchers can continually analyze the data using different methods or criteria until they achieve a desired result.

Lee et al. argued that pre-registration can be just as easily applied to cognitive models. That is, in advance of an experiment, researchers can specify the models that they will be applying as well as the way in which the model makes contact with data. They were very careful to claim that this will not apply in all settings, they specifically recommend this in confirmatory settings, where predictions are very made very clearly in advance of the data being collected.

The considerable range of disagreement about these recommendations was evident by the sheer number of submitted commentaries. While some controversial articles may have as many as two or three published replies, this article had a whopping 25 commentaries, which made the article a special issue in and of itself. And if there wasn’t enough star power already in the list of authors, the complete list of commentators nearly constitutes a “who’s who” in cognitive modeling. To top it off, the authors submitted a response to each of the commentaries, closing the loop as a discussion between the authors and the commentators.

Is preregistration really necessary for models?

Preregistration might have been the most controversial point in the article. A number of commentaries (Shiffrin, Eachern & Van Zandt, De Boeck et al, Lilburn et al., Palmeri, Szolossi & Donkin) heavily criticized the utility of preregistration in a modeling context. While each author made their own respective points, the consensus was that in many cases, cognitive models are often being applied in completely novel contexts and therefore it is often difficult if not impossible to specify what will be required in advance. In fact, many modelers will attest that a failure of a particular model to provide a satisfying account of a dataset is often the most interesting and provocative part of an investigation, as it often leads to thinking about what made the model fail and how the model could be improved.

This is often the part where some might say “Hold on a minute! Isn’t that just another form of p-hacking?” That is, if a model fails to account for a particular phenomenon in a dataset, isn’t it problematic to modify the model to be able to account for that previously missing pattern of data?

It can be, in some cases. If a modeler tries out a large number of ad hoc assumptions and augmented mechanisms, it’s possible that they will eventually fit the data but at the cost of overfitting. However, there are a number of safeguards in place that many members of the cognitive modeling community already practice.

First, if the model in question is a theoretical model that makes a wide range of predictions, introducing an additional assumption may be able to capture the particular pattern in the data at hand, but it often comes with a number of unpredictable costs. That is, the model may be able to fit the challenging pattern, but it may compromise its initial strengths and misfit other patterns of data the model was designed to explain. These misfits can at times become immediately apparent to the modeler and what is so challenging about modeling – a modified model still has to exhibit some generality in order to be useful.

Still, this example assumes that the dataset the modeler is working with is constraining. The data may not be, and the author may end up overfitting the data. In this particular case, the modeler can use a range of model selection methods that evaluate the fit of the data against a measure of model complexity. Additional assumptions often come at the cost of higher complexity penalties, which may end up preventing the modified model from being preferred even if it can capture the pattern. Such techniques were in fact recommended in the Lee et al. article as good practice for cognitive modelers.

Nonetheless, in favor of Lee et al.’s points, not all models are being used in situations where the critical assumptions of the models are being tested or generalized. Sometimes the modeler’s goal is merely to estimate parameters from a model to compare them across groups of participants or experimental conditions using a model such as signal detection theory or the diffusion model.

In these cases, the researcher is performing something akin to statistical analysis, but instead of asking how the data is different across conditions or groups, they are instead asking how discriminability, , speed-accuracy thresholds, or some other psychological constructs are affected in the experimental context. These may be situations where pre-registration is useful or should be encouraged – modelers can preregister such details such as the specific model variant they will employ as well as the exclusions in the data they will undertake before the modeling in advance of the modeling being performed. This can prevent modelers from performing post-hoc modifications if the results do not favor a preferred hypothesis, such as exploring different model variants or parameterizations until they stumble upon a result that they preferred.

What we talk about when we talk about “modeling”

Speaking of a diversity of modeling and approaches, much of the discussion around preregistering models often presupposed that one is fitting models to the data in service of different goals. However, there is an enormous diversity of modeling approaches out there, some of which don’t involve fitting models to data at all.

David Kellen’s excellent commentary made the point that a wider range of modeling approaches need to be considered when we discuss computational models, making reference to Suppes’ (1966) hierarchy of modeling approaches. According to Suppes, a theory never makes direct contact with data, but does so through a range of models at different levels, including models of the data, experiment, and a model of the theory itself.

This broader scope is something many modelers often miss when they discuss models. The Lee et al. article, for instance, stated: “the only difference between statistical analysis and psychological modeling lies in the emphasis that psychological models place on substantive interpretation.” This statement presupposes that cognitive models are always used to explain data from experimental paradigms, which is not always the case. There are many models that do not make any direct contact with data whatsoever.

Take a model like latent semantic analysis (LSA: Landauer & Dumais, 1997), for instance, which is a model of how semantic representations of words are built in an unsupervised fashion from a large collection of natural text. LSA’s representations are incredibly useful and exhibit similarity relations that strongly resemble natural language and have been used in a wide range of applications, including automatic essay grading and even diagnosis of clinical disorders.

However, LSA does not make any direct contact with data. LSA is a theory of representations, but it contains no specification of how a cognitive task is performed. This means that if one has data from language production or comprehension tasks, one would need to combine the LSA representations with a model of the task itself in order for the representations to make contact with data.

LSA is just one example – there are a number of other models that require additional assumptions, mechanisms, or even models in order to make direct contact with data – to “fit” the model, so to speak. It is in this respect that Kellen is correct – we can most certainly benefit from a broader taxonomy of both the specifications and goals of cognitive models. Much of the disagreement between the commentators and the original article may come from the fact that authors are actually discussing different levels or goals.

What else is useful for cognitive modelers to consider?

A number of the other commentaries made some excellent contributions that emphasized other procedures or methodologies that modelers should be considering. One such example discussed by Leslie Blaha is the importance of visualizing models and data to inspect systematic misses. While is a common practice, she emphasizes that this is in danger of getting sidelined by an increasing practice of automating the practice of model fitting and selection, to the point where there are some cases where models are not inspected at all.

This is also an excellent point when one considers the importance of visualization for understanding models. Speaking from personal experience, models are often quite difficult to both comprehend and communicate. But a highly effective visualization of a model’s predictions or mechanisms can often make a dramatic difference in simplifying the job.

Pitt and Myung and Heck and Erdefelder discuss some methods for improving the diagnosticity of data. Even an excellent modeler can expend a lot of labor in collecting data that are of little diagnostic utility for deciding between a set of models. One such technique for improving the practice is adaptive design optimization, which is an iterative procedure that can be integrated into an experimental design where models are fit to data on the fly. As the models are fit, the procedure attempts to find regions in the data that maximally distinguish between the two models, and experimental parameters are selected to collect data to accomplish such purposes.

Kennedy et al. gave a wonderful case study about exploring the predictions of models in advance of fitting them to the data, using prior predictive checks within Bayesian methods. They demonstrated that that where this technique can benefit is that one can find that an improperly specified model can often make some strange or counter-intuitive predictions, and that one may even find that a very large amount of data is required in order to get a model to perform sensibly.

In closing

The Lee et al. article and its commentaries resulted in a wonderful and stimulating discussion about the respective merits and methods in computational modeling. There are many other excellent commentaries (again, there are twenty-five of them!) that I simply did not have time or space to summarize here, but I highly recommend giving them a read.