All methods were carried out in accordance with the approved guidelines.

Participants

33 subjects were recruited from the University of Sussex student population. During an initial briefing, participants were asked whether they had any letter-color synesthetic phenomenology and were excluded from the experiment if they answered that they had. In addition, any participant who had color consistency scores (see below) comparable to synesthetes was excluded from the experiment. Thus, all participants were initially not grapheme-color synesthetes. Participants were also initially assessed using two measures of visual imagery: Vividness of Visual Imagery Questionnaire (VVIQ)31; and the Spontaneous Use of Imagery Scale (SUIS)32. Given that previous findings have suggested, based on self-report questionnaires, that grapheme-color synesthetes score highly on visual imagery33,34, a subset of participants with no evidence of initial grapheme-color phenomenology and the greatest combined visual imagery scores on the VVIQ and SUIS were selected, to increase the chances of successful training. Note that the connection between visual imagery ability and GCS should be taken as provisional, given the absence of independent behavioral evidence for this association. This final sample size was based on that used in previous synesthesia training studies. Motivation levels to complete the study were also considered during selection. The final sample consisted of 14 subjects, 2 male and 12 female, mean age 19.35 (SD = 1.78). Mean VVIQ scores for this sample were 38.75 (SD = 10.27) and for the SUIS were 38.79 (SD = 7.65). Participants were reimbursed for their time in all the test and training sessions. The study was approved by the University of Sussex Life Sciences and Psychology Cluster Research Ethics Committee. Informed consent was obtained from all subjects.

Behavioral procedure

A test session lasting ~3 hours was administered before and after training. This included working memory, long-term memory, IQ, perceptual and phenomenological assessments. At the midpoint of training, in the 5th week, a subset of tests was also administered. 3 months after the final testing session a follow up session was administered on another subset of tests (Table s1).

Training consisted of ~30 minute sessions including 4–5 tasks per day, 5 days per week, for 9 weeks, with one or two new tasks each week replacing older tasks (Table s3). In addition, “homework” was assigned on each training day, which involved reading an e-book at home, with colored letters to match the training tasks (see below), using a similar paradigm to Colizoli and colleagues22. Participants were paid an extra £1 at the end of the week for each training task they scored higher on than at the end of the previous week. For full details of the training tasks, see supplementary information.

Test tasks

For details of which tasks were administered at what stage, see Table s1.

Color Naming Stroop

The Color Naming Stroop procedure was adapted from previous studies35,36. Stimulus presentation was controlled by E-Prime 1.2 (Psychology Software Tools, Pittsburgh, PA, USA). Participants were presented with 130 trials. For half of the trials, one of the 13 trained letters was presented in a color congruent with the trained association and for the other half the color was incongruent. The order of stimulus presentation was randomized. Each trial started with a fixation cross presented at the center of the screen for 1000 ms, followed by the grapheme, which covered a visual angle of 0.64°. The color of the background was set to 196,188,150 (RGB). The onset of each grapheme was accompanied by a beep sound (used for manual coding of reaction times). The stimulus remained on the screen until the participant made a response into a microphone. Participants were required to name the veridical color as fast as they could, while ignoring the trained color. Participants' voice response reaction times were manually coded using Audacity audio editing software (http://audacity.sourceforge.net). The time was measured for each trial from the peak of the beep (which coincided with the presentation of the stimulus) to the onset of the participant's voice response. Only correct trials were included (average accuracy pre-training: 94%; mid-training 97%; post-training: 93%). Responses that fell outside of +/−2 SDs (calculated per participant) were removed (7/130 on average per participant pre-training, 8/130 mid-training and 8/130 post-training).

Synesthetic Stroop

This was identical to the Color Naming Stroop task, except that participants were required to name the trained color for the letter shown and ignore the veridical color presented. Again, only correct trials were included (average accuracy 93%) and responses that fell outside of +/−2 SDs (calculated per participant) were removed (9/130 on average per participant). As can be seen from this and the previous related test, accuracy was high and very few trials were excluded.

Color Consistency Test

Participants completed the internet-based standardized grapheme-color consistency test (www.synesthete.org)37. In this test, each participant was presented with the graphemes A–Z and 0–9 three times in randomized order (108 trials). Participants were instructed to select a color that best fit with each grapheme and to use their first instinct and to always pick a color for each grapheme. Colors were represented on a plane varying in lightness along the vertical and in saturation along the horizontal axis with a separate bar to adjust hue. Each participant was given a demonstration of how the selection procedure worked before they began.

Analysis was based on a recent method optimized to maximize sensitivity and specificity38. For the purpose of this study, we were interested in the consistency of trained and untrained letters before and after training. Trained and untrained letters were analyzed separately for each of the testing sessions.

Consistency was calculated on the basis of euclidean distances in CIELUV color space i.e., L*u*v*: the L* axis represents perceived lightness, the u* axis contrasts green (negative values) against red (positive values) and the v* axis contrasts blue (negative) against yellow (positive)). Consistency is typically below 135 for genuine GCS38.

Synesthetic conditioning

The procedure for the synesthetic conditioning task was adapted from previous studies14,27,39,40.

Apparatus. Stimulus presentation was controlled by E-Prime 1.2 (Psychology Software Tools, Pittsburgh, PA, USA). Auditory materials were presented at 100 dB via headphones by a stereo integrated amplifier. Skin Conductance (SC) was measured with two non-shielded disposable electrodes (Biopac System Inc, Goleta, CA, USA), pre-gelled with an isotonic 0.5% NaCl solution. SC data were acquired with a Biopac MP36 skin conductance level meter (Biopac System Inc, Goleta, CA, USA) and SC data were recorded using Biopac Student Lab software version 3.7.7 (Biopac System Inc, Goleta, CA, USA).

Procedure. Skin conductance response (SCR) was continuously measured to assess autonomic arousal. It was sampled at 1000 Hz with two electrodes, attached to the thenar and hypothenar eminences of the non-dominant hand. Participants were seated 60 cm in front of a computer screen. They were asked to relax, remain silent and to attend to squares that would appear on the screen. Five possible colored squares (red, green, blue, yellow and white), covering a visual angle of ~10.6°, could be shown centrally, superimposed on a peripheral light beige background set to 196,188,150 (RGB). The white square included a letter which was associated with a color during training. No motor or verbal response was required. Each square was shown for 2 s and the inter-trial-interval (ITI) was ~10 s. In the habituation phase, stimuli were presented in a random order 12 times for a total of 60 trials.

In the conditioning phase, a total of 29 trials were presented in a fixed pseudo-random order. Seven squares of the same color were followed by a loud startling sound, which acted as the unconditioned stimulus (UCS). Six white squares, including a letter associated with the UCS's color during the training, were used as conditioned stimuli (CS). The letter stimulus was selected on the basis of the strongest self-reported letter-color association for each individual. None of the CS stimuli were followed by the UCS. An additional 16 squares showing the other three colors were used as neutral filler stimuli. Neutral stimuli were never followed by the startling sound and were only considered for the analysis if the preceding trial was not a UCS or CS trial.

In the extinction phase, two white squares including the same letter as previous trials and two squares of the letter's associated color were presented in alternating order for a total of 24 trials. These were never preceded by the UCS. These trials were included to extinguish the conditioned response and were not considered for the analysis.

Analysis. Using Ledalab (version 3.4.3) for continuous decomposition, SC data were down-sampled to 20 Hz and separated into phasic and tonic activity41. For analysis of phasic SCR a response window of 2 s was used. The starting point was defined as the offset of the stimuli (i.e., colored squares). Note that this rather short response window was used to minimize the likelihood of physiological random noise. SCRs were defined as the average phasic driver in the response window with higher SCR indicating higher autonomic arousal. This score represents phasic activity most accurately41. One participant was excluded from the analysis due to a previous injury to her non-dominant hand.

Cattell Culture Fair IQ

In order to assess the potential of the synesthetic training regime for general cognitive enhancement, the Cattell Culture Fair form 2a was given to participants both before and after training. A control group (n = 9, mean age 22.5, 2 females and 7 males) carried out the same test 9 weeks apart with no training component in between, in order to assess the potential confound of improvement due to practice on the test.

Phenomenology questionnaire

Participants were interviewed about the effects of training during training week 5, immediately after training was complete and then again three months later (two participants were unable to attend the three month follow up session). Participants were invited to describe whether they used any mnemonic strategies to aid in letter-color associations and if so what they were. The extent of color phenomenology was assessed for each letter using two methods. First, participants were asked to respond to the question, “Which statement characterizes your grapheme-color associations best: a) Whenever I see a letter there is only that letter, but no color at all; b) I can't even think of an associated color, no matter how hard I try; c) I know the associated color, but I never see it; d) I see the color in front of my mind's eye; e) I see the color outside my head (e.g., a few inches away); f) I see the color floating on the surface where the letter/number is.” Second, participants were shown a black letter and asked to describe any associated color phenomenology, both inside the lab and during their daily lives. Subjects were finally given a chance to report any additional effects of their training.