Appendix A: ERP recording

Recording was performed with a 64-channel \(\hbox {H}_{2}\)O-cap, Electrical Geodesics, Inc. system with Cz as the reference electrode. Signal was amplified with a sampling rate of 500 Hz and stored using Net Station software (Electrical Geodesics, Inc.). Pre-processing was performed with the EEGLab software (Delorme and Makeig 2004) with ERPLab plugin (Lopez-Calderon and Luck 2014). Before analysis, a high-pass filter (1 Hz to remove signal drift) and a notch filter (50 Hz to remove powerline noise) were applied. Independent Component Analysis (ICA) was used to remove eye artifacts and other high frequency noise (Hyvärinen and Oja 2000). Grand-average ERPs were created separately for every sentence type across all participants. Time window of 300 ms to 500 ms was used in the statistical analysis.

Appendix B: Visual inspection

Largest effects for self-referential sentences were found on frontal and midline electrodes (Fp2, AFz, Fz, Fp1, AF3, F5). For normal sentences largest effects were found on right frontal electrodes (Fp2, F8). For paradoxical sentences, the negative deflection marking the beginning of the N400 component started around 100 ms after the onset of the target word. In false sentences (both self-referential and normal) this latency was longer and negative deflection started around 190 ms. Semantic errors (both self-referential and normal) showed extremely higher amplitude of the P2 complex than other types of sentences, with negative deflection starting at the same time as false sentences (190 ms), but lasting much longer and peaking around 600 ms. True sentences (both self-referential and normal) showed no presence of the N1-P2 complex, instead steadily dropping until 300 ms with the lowest amplitude of all types of sentences. The highest N400 amplitude was elicited by paradoxical sentences, second highest by false sentences. True sentences and the Truthteller sentences did not differ in N400 amplitude, with both being lower than paradoxes and falsehoods. Because semantic errors showed highly different latency of the negative deflection it is not possible to accurately assess their N400 component. Additionally, following the N400 time window there was a more negative waveform for false and paradoxical sentences.

Appendix C: Statistical analysis

A statistical analysis (repeated measures ANOVA) was performed to test the hypothesis that paradoxical sentences are processed like false sentences. Repeated measures ANOVA was selected due to the fact that the experiment followed a within-design and to compare the N400 for different types of sentences a dependent-samples test was required. In Table 2 we report the mean values of N400 amplitudes for different types of sentences. Because ANOVA requires the variables to be normally distributed, we also report the results of the Shapiro–Wilk test of normality in Table 2. All of the study variables were found to be normally distributed.

The detailed results of the ANOVA analysis are reported in Table 3.

First, to confirm the validity of our signal recording we checked if normal false sentences elicited different N400 than normal true sentences. Consistently with previous literature we have found that normal false sentences elicited higher N400 than normal true sentences.

Next, we performed the analysis for the main hypotheses of the study. It revealed that the N400 component was significantly higher for paradoxes compared to true self-referential sentences and that false self-referential sentences also elicited higher N400 than true self-referential sentences.

We have also checked if the Liar sentences elicited different N400 than false self-referential sentences and found that they did not. In fact, mean N400 for the Liar sentences was identical to that of false self-referential sentences with accuracy to two decimal places (see Table 2).

To make sure that the effect is specific for the Liar sentence itself, and not only due to the fact that it address its own truth value, we also performed analysis for the Truthteller sentences. It revealed that the Truthteller sentences did not elicit N400 different from true self-referential sentences.

Table 2 Descriptive statistics of the study variables and tests of the normality of distribution Full size table

Table 3 Repeated measures ANOVA results for comparing N400 of different types of sentences Full size table

Additional, exploratory analysis was performed to check if participants’ opinions about the truth value of the Liar sentences and their previous knowledge of it had any impact on their ERPs. 18 participants believed the Liar sentence to be false, 5 to be true and 7 answered “neither”. 21 participants indicated that they did not know what the Liar paradox is, while 9 that they did.

There was no effect of the participants’ opinion about the truth value of the Liar sentence on their ERPs: \(F(2,27) = 0.8, p= 0.46, \eta ^2 = 0.06\).

We also checked if including the previous knowledge of the Liar paradox in the model will diminish the effect of the Liar sentence on N400 (i.e. if excluding the people who knew about the paradox will affect the result). Interaction between previous knowledge and the effect of the Liar sentence was not significant: \(F(1,28) = 2.3, p = 0.14, \eta ^2 = 0.08\), while the main effect of the Liar remained significant: \(F(1,28) = 4.2, p = 0.05, \eta ^2 = 0.13\), even though the sample size got reduced by 9 people (almost a third of the original sample). There is a visible trend in people who had previous knowledge of the Liar paradox to exhibit lower N400 component in response to the Liar sentence, although it is not statistically significant. This trend is present only for the Liar sentence, but not for other types of sentences (see Fig. 5).