542 participants were included in the dataset as follows: Anorexia Nervosa (AN) n = 171; Bulimia Nervosa (BN) n = 82; Recovered AN n = 90; Healthy controls (HC): n = 199. All completed the Wisconsin Card Sorting Task (WCST), an assessment that integrates multiple measurement of several executive processes concerned with problem solving and cognitive flexibility. The AN and BN groups performed poorly in most domains of the WCST. Recovered AN participants showed a better performance than currently ill participants; however, the number of preservative errors was higher than for HC participants.

Funding: This work was supported by the Psychiatry Research Trust, by the National Institute of Health Research Biomedical Research Centre for Mental Health and by the Maudsley NHS Foundation Trust and Institute of Psychiatry. KT is grateful to the Swiss Anorexia Foundation for financial support. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

This brief report explores WCST performance and various other clinical outcomes in a substantially large dataset collated from several published studies that have used this task in accumulated samples of people with a lifetime history of an ED (AN and BN) and people who have recovered from an ED (RecAN), with HC comparisons.

A systematic review of set-shifting in ED included five experimental studies using the WCST [9] based on all available studies before December 2005. It analysed the performance of 73 patients diagnosed with anorexia nervosa (AN) compared to 80 healthy controls (HC), the results suggesting that AN patients have prominent difficulties in set-shifting. Subsequent studies have confirmed these findings [4] , [5] , [10] . However, there remains limited data available for other categories of ED, such as bulimia nervosa (BN) [11] .

In this context, the assessment of cognitive flexibility using standardised tests to measure set-shifting and problem solving is relevant to clinical practice. One commonly used neuropsychological measure of cognitive flexibility (or set-shifting ability) is the Wisconsin Card Sorting Task (WCST [8] ). This procedure integrates multiple measurements of executive processes and is one of the most widely reported neuropsychological tasks, despite some acknowledged weaknesses in interpretation of the profiles (e.g. difficulties in task performance could be caused by set shifting, poor abstraction and conceptualization, or attentional problems). One of the main outcomes of the WCST is the measurement of perseveration; defined as repetitive responses to a stimulus/rule that continues despite a shift in the stimulus requiring a different response.

There is a growing literature in the neurobiological underpinnings of the psychopathology of eating disorders (ED) [1] . A salient feature is that people with ED frequently present with inflexible behaviours around eating related issues (e.g. counting calories, exercising), have rigid rituals around the daily routine (e.g. cleaning, housekeeping, homework) and experience difficulties in seeing alternative ways of coping with problems. Consistent with this, difficulties with cognitive flexibility have been shown to be an important risk and maintenance factor, for example, in anorexia nervosa (AN) [2] , [3] . The neurobiological basis for this impairment is not established, but some evidence from studies assessing first-degree relatives suggests that it could be a candidate endophenotype [3] – [5] . Recently, this trait has been found in adolescent cases of people with AN [6] and in small-scale studies with people who have recovered from AN [5] , [7] .

The data were inspected using histograms and Kolmogorov-Smirnov tests to assess assumptions of normal distribution. A one-way ANOVA was applied to analyse between group differences for each measure. Alpha was set at p<0.05 unless Bonferroni's correction for multiple comparisons was applied as indicated below. Cohen's d (mean 1 -mean 2 /pooled standard deviation) was calculated to provide effect sizes for normally distributed data, with an effect size of <0.2 defined as small, <0.5 defined as medium and >0.8 defined as large [12] .

For the number of trials to complete the first category measure, a score of 128 was given if no category was achieved. As shown in Table 2 , percentages of responses or errors were calculated where appropriate by dividing the scores by the total number of trials administered.

Many studies using this test present two or three scores as indices of performance, but given the relatively large sample size, this was increased to 11 principal measures grouped into four main types, as follows: A) General Performance measures: The number of trials administered, the total correct responses, the total response errors and the number of categories completed; B) Perseveration: Perseverative responses (any response that fitted the criteria for perservation), perseverative errors (only perseverative responses that are also errors) and non-perseverative errors; C) Conceptual Ability. The number of trials needed to complete the first category and percentage of conceptual level responses; and D) Response Consistency: This includes a failure to maintain set measure, computed as the number of times the participant makes between 5–9 correct responses in a row, reflecting efficiency during the test; and learning to learn, a measure of decrement in the number of responses needed to achieve each successive category.

The Wisconsin Card Sorting Test [8] (WCST Computerised version 4: CV4) was used in all the studies, presenting the task graphically on a computer screen. The WCST entails matching stimulus cards with one of four category cards, in which the stimuli are multidimensional according to colour (C), shape (S) and number (N), each dimension defining a sorting rule. By trial and error, the participant has to settle a preordained sorting rule given just the feedback (“Right” or “Wrong”) on the screen after each sort. After 10 consecutive correct sorts the rule changes. There are up to six attempts to derive a rule, providing five rule shifts in the following sequence (C-S-N-C-S-N), with each rule attainment referred to as ‘completing a category.’ Participants are not informed of the correct sorting principle and that the sorting principal shifts during the test; Testing continues until all 128 cards are sorted and irrespective of whether the participant achieves completes all the rule shifts. Two types of errors are possible, perseverative errors, in which the participant makes a response in which they persist with a wrong sorting rule, and non-perseverative errors.

Cases from the final dataset were excluded if age, current BMI, length of illness or WCST original data were missing. Additionally, ED cases were excluded if BMI>25, and HCs were excluded if BMI<19. Of those with AN, 8 cases were excluded because their BMI was above the diagnostic cut off point (>18). From the patient sample, BN participants were outpatients or from a community sample (n = 82); AN participants were inpatients (n = 90), outpatients and from a community sample (n = 81). Table 1 .

All participants were recruited between 2006 and 2011 in our department (n = 542 in total). In line with the ethical standards laid down in the 1964 Declaration of Helsinki, all studies had received approval from the ethical committee of the South London and Maudsley (SLaM) NHS Foundation Trust. All participants provided written informed consent prior to their inclusion in the study. ED patients were recruited from the SLaM Eating Disorders inpatient or outpatient units and had been diagnosed by experienced ED clinicians as fulfilling DSM-IV criteria for AN or BN. HC and recAN participants were recruited via advertisements in the local community, and through circular emails sent around to King's College London students and staff. The inclusion and exclusion criteria for the recAN group was based on Bardone-Cone et al. 2010, who state the definition of this group should include physical, behavioural, and psychological components, such that recovered participants were included if they: a) had a body mass index BMI (weight/height 2 ) above 18.5, b) had restored menstruation for at least a year prior to recruitment; and c) had an absence of ED behaviours such as restriction or binge-purge symptoms during this one year period. HC participants were excluded if they had any history of EDs, head injury or psychiatric illness. All participants were female and aged between 18 and 55 years old.

An additional analysis was conducted to observe if inpatient and outpatients with AN were different in WCST performance, with the possibility that inpatients would be more severely affected. Table 3 also shows the variables, and their respective effect sizes between IP and OP, The outpatient and community sample had significantly better performance in comparison to inpatients with AN in non perseverative errors p<0.05 and categories completed p<0.05.

Table 2 compares between group performances on the WCST. All domains showed significant differences and moderate effect sizes between groups, except WCST total correct and failure to maintain set. AN and BN groups performed significantly worse compared to HC in the majority of the task aspects. The BN group showed poorer performance in areas of: 1) Conceptual level of responses (reflecting insight into the correct sorting principle, i.e. 3 correct responses in a row usually would not occur by chance alone); 2) WCST Learning to Learn (this score can be calculated only for patients who complete 3 or more categories/stages of the test); The AN group took significantly more trials to complete the first category than HC group. The recovered AN group performed significantly worse than HCs in the domain of perseverative errors, but significantly better than AN group.

Table 1 below provides clinical and demographic information for the participant groups. All groups were well matched for age; however, the recAN group was significantly older than the other groups. As expected, there was a significant main effect of group for participants' BMI, with the AN group having a significantly lower BMI than other groups (p≤0.001).

Discussion

This study explored WCST performance as a measure of cognitive flexibility, reporting various outcomes in a large dataset of actively ill, recovered and healthy controls collated from several studies conducted within our department. Patients with AN and BN performed poorly in almost all domains of WCST. Results showed that in terms of flexibility (perseverative errors), there was no significant difference between inpatient and outpatients with anorexia, but rather inpatients made significantly more nonperseverative errors, a finding which can reflect problems with attention and poor nutritional status. People with a past history of AN showed better performance than actively ill participants; however perseverative errors, conceptual level responses, and number of categories completed (the main flexibility outcomes) were significantly impaired compared to HC participants. The effect sizes, however, were smaller between HC and people with a past history of AN suggesting that flexibility can be improved relative to active illness state.

In general, our results clearly replicate studies which report poor cognitive flexibility in AN [4], [13], [14]. This study extends knowledge about set-shifting in BN, showing that patients with BN perform as poorly on the WCST as those with AN. People in a stage of good recovery from AN still had problems with perseverations and conceptual strategy use in this task. A recent report on a similar large dataset using a different measure of set-shifting – the Brixton Special anticipation task [15] - found that patients with AN performed significantly worse than HC and BN participant groups. In contrast, in the current study, we found that both actively ill ED patient groups (AN and BN) had worse performance in comparison to HCs and in some aspects of the tasks (e.g. perseverative responses and conceptual level of responses), people who had recovered also showed poor performance. One possible explanation for this finding could be that although both the Brixton task and WCST measure flexibility, they differ in terms of their complexity. In the Brixton task, participants are told explicitly that the sorting principle will change and are therefore alert to future rule changes. In the WCST, participants must identify the rule in order to respond correctly, with this rule subject to modification. Thus, the WCST may involve increased levels of ambiguity, as unlike the Brixton task, participants are not explicitly told that the rules will change throughout the task. The WCST is dependent on cognitive operations such as searching for a new category and consolidating the correct classification category. The tasks also differ in how feedback is provided. Whereas the WCST provides feedback in which participants are told they are “right” or “wrong,” for the Brixton task, the designated correct answer is provided by definition when the next array is shown. This type of feedback is arguably less pronounced and requires the additional process of remembering the immediately previous response and matching for correctness. In summary, the instructions are more explicit in the Brixton task and feedback is arguably more pronounced in the WCST.

Therefore an explanation for this finding could be that patients with BN failed to learn from the feedback on WCST but did well in a relatively simple switching task. The current study shows that people with ED were not able to learn from the feedback as efficiently as HCs. The recovered AN group showed intermediate scores between AN and HC for the Brixton task, and their performance was not significantly different from HCs. In the WCST, people who were recovered demonstrated a significantly higher number of perseverative errors compared to HC and showed difficulties in performing strategically on the WCST, suggesting that with more complex tasks, they performed poorly. This finding supports previous research proposing set-shifting as an endophenotype/biomarker or trait characteristic for AN [3]–[5].

Regarding poor learning from feedback, the findings are similar to previous studies which report poor learning in a decision making task [16]–[18] where both patient groups (AN and BN) failed to improve and shift from a disadvantageous strategy (picking risky cards versus safe cards) to an advantageous strategy (picking safe cards with small wins and small amount of losses); again feedback (behavioural and physiological) did not facilitate an improvement in the ED groups' cognitive approach to the task. Both WCST results and decision making studies demonstrate that people with AN and BN have difficulty learning from previous experiences, evidenced by little improvement over time in these neurocognitive tasks. Interestingly, people recovered from AN still were performing less efficiently than HCs. This supports the evidence that individuals recovered from AN have difficulties in differentiating positive and negative feedback [19].

To our knowledge, this study has used the largest available cross-sectional dataset reporting WCST performance administered with the electronic version of the WCST in people with EDs. The strengths of the study are the large sample size, cross-sectional design including currently ill ED groups and recovered groups, as well as an age matched HC comparison group. As highlighted before, experimenter error was reduced to minimum because all studies included here used the computerised version of the WCST. A further strength of the present study is that it reports all outcomes of the WCST in addition to perseverative errors. This has important clinical implications because psychological therapy is focused on learning (unlearning maladaptive behaviours and learning new strategies) and is largely based on feedback. Therefore, understanding the mechanism by which feedback is used by patients is informative.

This study provides an important message about cognitive shifting ability in AN, BN and recovered groups. This line of research is potentially useful to better understand EDs in terms of cognition as well as comorbidity, lifetime diagnosis and personality characteristics and may also help us develop improved treatment approaches. In terms of therapeutic interventions, the recently developed Cognitive Remediation Therapy (CRT) for AN [20]–[21], allows therapists to address flexibility of thinking within therapy sessions and helps individuals develop an awareness about thinking styles and apply this knowledge in their daily lives. Initial feasibility studies [21]–[22] show positive results, suggesting that a rational approach (“cold cognitive route”), raising awareness of thinking styles, might be beneficial for patients with AN. From a recent systematic review [23], it seems that people with AN have good cognitive reserves in terms of higher than average IQ (110). In our dataset it was replicated when we analysed available data on IQ. These strengths can be used as a cognitive resource in therapy. This study highlights that further work and clinical adaptations to support BN patients will be needed. From previous reports, it was not clear whether poor set shifting constitutes part of the neurocognitive signature in BN (because of small scale studies). The present study highlights that more work and clinical adaptations for BN may be needed. Previous reports are less clear about the neurocognitive profile in BN due to the small scale nature of current studies [13].

In the broader context, WCST performance impairment is not specific for ED. Difficulties with cognitive flexibility are reported in almost every psychiatric disorder; however studies of this kind can help us to think about the relative neuropsychological impairments. In general, the WCST allows us to access abstraction ability but clinically it can inform therapists of specific areas of difficulty, e.g. perseverations, conceptualisation, maintaining set or learning through the task over the time. In comparison to brain lesion or schizophrenia patients, performance of people with ED is better (e.g. [24], meta-analysis on schizophrenia reported effect sizes of 1.00 on WCST categories completed and 0.8 on perseverative errors, which are greater than the ED groups performance presented in this study). A meta-analysis of obsessive compulsive disorder patients [25] reported small to medium effect sizes relative to controls performance on WCST categories completed (d = 0.23) and perseverative errors (d = 0.25), which are lower than the effect sizes reported in the current study.

It is of note that data presented here were merged from different studies available within our department. Therefore, information about medication, illness duration, IQ and subtypes of the ED were not included in the analysis due to missing data, although it should be noted that 70% of patients had IQ measured using the NART and the predicted IQ was 110 (s.d. = 8.9), . In future research, it would be desirable to measure these variables in relation to WCST performance. It was difficult to recruit recovered BN participants and future work would benefit from including this group.

The WCST is the most widely reported neuropsychological measure of executive function and is viewed as an excellent indicator of prefrontal function. There is a growing interest in the diagnostic and treatment implications of cognitive flexibility and reporting all the available outcomes of this most widely disseminated task will be useful to researchers and clinicians alike, working both inside and outside of the ED field.