Appendices

Appendix 1

Note that any mathematical model can be defined as a collection of mathematical functions, with one function for every experiment and dependent variable for which the model makes a prediction. Each function in this collection assigns a specific value of the dependent variable to specific values of all the model’s parameters. For example, consider a model with r free parameters, θ 1 , θ 2 , …, θ r . Then the function f i might predict the performance of the model in task T i on the dependent variable of interest, which can be denoted as P(T i ). In other words,

$$ P\left({\mathrm{T}}_{\mathrm{i}}\right)={f}_i\left({\theta}_1,{\theta}_2,\kern0.5em \dots, {\theta}_r\right). $$

Note that the only assumption incorporated into this definition of a model is that each f i is a function—that is, each f i assigns only one value of P(T i ) to each combination of θ 1 , θ 2 , …, θ r .

A state-trace analysis plots P(T 2 ) (e.g., on the ordinate) against P(T 1 ) (on the abscissa). The question is what can be learned about the underlying model from examining such plots. For example, one might ask under what conditions P(T 2 ) is a function of P(T 1 ) [i.e., so that each value of P(T 1 ) occurs with only one value of P(T 2 )]. In other words, under what conditions does there exist a function F such that

$$ P\left({\mathrm{T}}_2\right)=F\left[P\left({\mathrm{T}}_1\right)\right]? $$

And for example, under what conditions is F strictly increasing?

A general solution to this problem does not appear possible, but some strong sufficient conditions are easily derived (Dunn & Kirsner, 1988). Consider the case where all parameters are fixed save one. Assume that the parameter that varies is θ j . Thus, P(T 1 ) = f 1 (θ j ) and P(T 2 ) = f 2 (θ j ). Next assume that f 1 (θ j ) is strictly monotonic—that is, f 1 is either a strictly increasing or strictly decreasing function of θ j . Under these conditions, f 1 has an inverse f − 1 1 , and the inverse is itself a function. Therefore,

$$ {\uptheta}_j={f}_1^{-1}\left[P\left({\mathrm{T}}_1\right)\right], $$

which implies that

$$ P\left({\mathrm{T}}_2\right)={f}_2\left\{{f}_1^{-1}\left[P\left({\mathrm{T}}_1\right)\right]\right\}. $$

A function of a function is itself a function (i.e., f 2 ∘ f − 1 1 is a function). Therefore, under these conditions, the state-trace plot is a function (a type 2 state-trace). Even so, the state-trace plot might not be strictly monotonic (i.e., type 1). To guarantee a state-trace plot in which all points fall on one strictly monotonic curve, it suffices to add the extra assumption that f 2 is also strictly monotonic (and therefore, f 1 , f − 1 1 , and f 2 are all strictly monotonic). For example, if f 1 and f 2 are both strictly increasing functions, then f 2 ∘ f − 1 1 is a strictly increasing function, and all points on the state-trace plot therefore must necessarily fall on a single strictly increasing curve.

If two parameters θ j and θ k are both varying, then P(T 1 ) = f 1 (θ j , θ k ) and P(T 2 ) = f 2 (θ j , θ k ), for two functions f 1 and f 2 . In this case, there are no conditions under which f 1 has an inverse (since the inverse would have to map ℜ → ℜ2). Thus, no model in which two or more parameters are varying can meet these sufficient conditions.

Multiple-systems models assume, by definition, that different cognitive systems are used in different tasks and conditions. For example, a dual-systems model might assume that one system dominates in task T 1 and a different system dominates in task T 2 . Thus, a multiple-systems model could be defined as a special type of mathematical model (i.e., as defined above) in which any two of the following sets are nonempty: set 1, model parameters that affect only performance in task T 1 (denoted by {α 1 , α 2 , …}); set 2, parameters that affect only performance in task T 2 (denoted by {β 1 , β 2 , …}); and set 3, parameters that affect performance in both tasks (denoted by {θ 1 , θ 2 , …}). By the same analogy, a single-system model should include only parameters in set 3, since such models predict that all tasks are performed in the same way.

Thus, for any dual-systems model, there must exist functions f 1 and f 2 such that

$$ P\left({\mathrm{T}}_1\right)={f}_1\left({\alpha}_1,{\alpha}_2,\kern0.5em \dots, \kern0.5em {\theta}_1,\kern0.5em {\theta}_2,\kern0.5em \dots \right)\kern0.5em \mathrm{and}\kern0.5em P\left({\mathrm{T}}_2\right)={f}_2\left({\beta}_1,{\beta}_2,\kern0.5em \dots, \kern0.5em {\theta}_1,\kern0.5em {\theta}_2,\kern0.5em \dots \right). $$

Such a model predicts a (type 1) monotonically increasing state-trace plot if the following conditions are met. (1) Only one parameter is varying across tasks T 1 and T 2 and that parameter is a member of the set {θ 1 , θ 2 , …}. (2) Suppose the single varying parameter is θ j . Then, P(T 1 ) and P(T 2 ) both monotonically increase with increases in θ j . The proof of this is identical to the proof given above. Note that if only one parameter is varying but it is a member of the set {α 1 , α 2 , …}, the state-trace plot must be a single horizontal line (because under these conditions, P(T 2 ) must be a constant). In contrast, if the single varying parameter is a member of the set {β 1 , β 2 , …}, the state-trace plot must be a single vertical line (because under these conditions, P(T 1 ) is a constant).

Appendix 2

Category structures

In all panels of Figs. 1 and 2, both tasks are categorization tasks with two categories A and B, each composed of three exemplars that vary on two perceptual dimensions. In the RB conditions (task T 2 in Fig. 1a, c, task T 1 in Fig. 1b, and the ordinate in all Fig. 2 panels), the A exemplars have coordinates (0,0), (0,1), and (0,2), whereas the B exemplars have coordinates (1,0), (1,1), and (1,2). In all II conditions in Fig. 1 (except task T 2 in Fig. 1d) and Fig. 2, the A exemplars have coordinates (0,1), (1,2), and (2,3), whereas the B exemplars have coordinates (1,0), (2,1), and (3,2). Thus, the category bound has a slope of 1, so the optimal strategy is to allocate equal attention to the two perceptual dimensions. Task T 2 of Fig. 1d is also an II task, but in this case, the category bound has a slope of 2, so both dimensions are relevant but the optimal strategy is to allocate more attention to dimension 1 than to dimension 2. The stimuli in this condition were created by rotating the coordinates of the stimuli from the RB condition. Such a rotation guarantees that between- and within-category similarity are identical in all categorization conditions.

Single-system model

The single-system model was the GCM (Nosofsky, 1986). The model had three parameters: (1) c, which could be interpreted as the total amount of attention allocated to the task; (2) an attention weight w, which is the proportion of total attention allocated to dimension 1; and (3) γ, which is a measure of response determinism. The γ parameter, introduced by Ashby and Maddox (1993), is an exponent on each summed similarity. When γ = 1, the model is the same as the original GCM; when γ > 1, it responds more deterministically; and when γ < 1, it responds more probabilistically. In all applications except Fig. 2a, γ = 1. The model assumed no response bias. In Fig. 1a, w was fixed to .5, and the only parameter that varied across tasks was c. In Fig. 1b, c, and d, c was fixed at .05, and only w was allowed to vary. In Fig. 2a, the dual task was assumed to decrease γ from γ = 3.8 to γ = .45 and to increase c. The attention weight w was set to .5 in all conditions except the RB task under dual-task conditions, where it was set to .91. In Fig. 2c, the dual task was assumed to decrease c and to impair attentional learning. Specifically, under single-task conditions, w was set to the optimal values of 1 in the RB task and .5 in the II task. Under dual-task conditions, w was set to .5 in both tasks.

Dual-systems model

The dual-systems model used to generate Fig. 2b, d was a simplification of COVIS (Ashby et al., 1998). In the RB task, the model switched back and forth between a horizontal and a vertical decision bound. As training progressed, the use of the incorrect horizontal bound exponentially decreased, and the use of the correct vertical bound exponentially increased. In the II task, the model switched back and forth between guessing and a vertical decision bound. As training progressed, the frequency of guessing decreased exponentially, and the use of the vertical bound exponentially increased. The best one-dimensional rule (either a vertical or a horizontal bound) yields an accuracy of 67% correct in the II task, so during this phase of training, the model could not exceed 67% correct. After persisting for some time with this reduced accuracy, the model switched to its procedural system. After this switch trial, accuracy exponentially increased to 100%. The model has three parameters: a learning rate in each system (i.e., the exponential rate) and a threshold for switching from the explicit system to the procedural system (i.e., a tolerance on poor performance). In Fig. 2b, it was assumed that the only effect of the dual task was to reduce the learning rate in the explicit system. In Fig. 2d, the dual task was again assumed to slow the explicit system learning rate, but now it was also assumed to reduce the threshold on poor performance. Reducing this threshold causes the model to switch to the procedural system on an earlier trial.