Subjects

We studied 33 healthy female subjects51 (19–39 years, mean age of 26 years, one left-handed, laterality index of right handed 84.5%). None of the subjects reported any history of neurological or psychiatric disorders. When asked, all subjects reported either normal vision or corrected to normal vision by contact lenses. Three subjects were excluded due to discomfort in the scanner, so that the final analysis included 30 subjects. 27 of them were native Finnish speakers and three were native Russian speakers. All subjects were sufficiently proficient in English to follow the dialogue in the movie without subtitles. The experimental protocols were approved by the research ethics committee of the Aalto University and the study was carried out with their permission (Lausunto 9 2013 Sosiaalisen kognition aivomekanismit, 8.10.2013) and in accordance with the guidelines of the declaration of Helsinki51. Written informed consent was obtained from each subject prior to participation.

Stimuli and Procedure

The study consisted of two experiments. In the first experiment, the feature film My Sister’s Keeper” (dir. Nick Cassavetes, 2009, Curmudgeon Films), edited to 23 minutes and 44 s, (of which 14 min 17 s (60%) portray the theme of refusal of the organ donation),with the main story line retained, was shown to the subjects during fMRI. This shortened version of the movie focuses on the moral dilemma of the protagonist Anna to donate one of her kidneys to her sister Kate, who is fatally ill from cancer. In the course of the movie, Anna refuses to donate and Kate dies. The reason for Anna refusing to donate the kidney was not revealed to the subjects until after the experiment. The movie was shown to the subjects in the scanner four times in two separate scanning sessions on two different days. For each viewing of the movie the instructions were varied regarding the information about the sister’s relationship and the perspective to take in this viewing (Fig. 6). Each subject thus watched the movie assuming that the sisters were genetic sisters or that the younger sister Anna had been adopted as a newborn. In addition each subject was asked to take either the perspective of the potential donor (Anna) or the perspective of the potential recipient (Kate) on separate viewings (and both under the condition of a genetic or non-genetic relation background). The order of the different viewing conditions was counterbalanced between the subjects.

Figure 6 Experimental procedure and ISC analysis in the movie watching task. (A) Every subject watched the movie four times, in a 2 × 2 design assuming that the movie characters are either genetic sisters or not genetically related and taking the perspective of the to-be-donor sister Anna or the to-be-recipient sister Kate. The order of all the conditions were counter-balanced. (B) Time series from each voxel from all the fMRI recordings are compared across subjects in pairwise correlations to obtain the mean inter-subject-correlation (ISC). Full size image

In the second experiment, each subject carried out a moral-dilemma decision task during fMRI in order to localize brain regions that are related to moral decision making. For this purpose, a modified version of the classical trolley dilemma35,53,54, was shown to the subjects. Each subject had to choose between rescuing different individuals, including unknown individuals, their sister and a best female friend. A presentation showing text and pictures told a story about civil unrest in a fictive distant country. This country had two parts: one part very dangerous and the other much less dangerous. Different people are in both parts of the country. Subjects were also told that as they were very rich and owned an airplane, they could go there and rescue some of the people. However, due to the circumstances in the country they had to decide which group of people to rescue. The two choices were always a group of five individuals on one side and a single person on the other. In seven runs the identity of the involved individual(s) was varied using the real names of the subject’s sister and best female friend. The 7 runs were: 1. All persons are unknown; 2. Sister is with four others in the dangerous part of the country, the single person is unknown; 3. Five persons are in the dangerous part of the country, the single person is the sister; 4. Five persons are in the dangerous part of the country, the single person is the friend; 5. Sister is with four others in the dangerous part of the country, the single person is the friend; 6. Friend is with four others in the dangerous part of the country, the single person is the sister; 7. Sister is with four others in the less dangerous part of the country, the single person is unknown. Responses in the moral dilemma decision task were recorded with a button press on a LUMItouch keypad (Photon Control Inc.8363, Canada). For the all questions, it was calculated with which percentage the sister was chosen over the friend and the stranger(s); statistical significance was tested with a Chi2 test.

fMRI acquisition

Before each scan the subjects were informed about the scanning procedures and asked to avoid bodily movements during the scans. All stimuli were presented to the subject with the Presentation software (Neurobehavioral Systems Inc., Albany, CA, USA), synchronizing the onset of the stimuli with the beginning of the functional scans. The movie was back-projected on a semitransparent screen using a data projector (PT-DZ8700/DZ110X Series, Panasonic, Osaka, Japan). The subjects viewed the screen at 33–35 cm viewing distance via a mirror located above their eyes. The audio track of the movie was played to the subjects with a Sensimetrics S14 audio system (Sensimetrics Corporation Malden, USA). The intensity of the auditory stimulation was individually adjusted to be loud enough to be heard over the scanner noise. The brain-imaging data were acquired with a 3T Siemens MAGNETOM Skyra (Siemens Healthcare, Erlangen, Germany), at the Advanced Magnetic Imaging center, Aalto University, using a standard 20-channel receiving head-neck coil. Anatomical images were acquired using a T1-weighted MPRAGE pulse sequence (TR 2530 ms, TE 3.3 ms, TI 1100 ms, flip angle 7°, 256 × 256 matrix, 176 sagittal slices, 1-mm3 resolution). Whole-brain functional data were acquired with T2*-weighted EPI sequence sensitive to the BOLD contrast (TR 2000 ms, TE 30 ms, flip angle 90, 64 × 64 matrix, 35 axial slices, slice thickness 4 mm, 3 × 3 mm in plane resolution).

A total of 712 whole-brain EPI volumes were thus acquired for each movie viewing. The number of whole-brain EPI volumes for the moral dilemma decision task varied individually depending on the decision made by each subject (median 267 whole-brain EPI volumes). Heart pulse and respiration were monitored with the Biopac system (Biopac Systems Inc., Isla Vista, California, USA) during fMRI. Instantaneous values of heart rate and breathing rate were estimated with Drifter software package55 (http://becs.aalto.fi/en/research/bayes/drifter/).

fMRI preprocessing

Standard fMRI preprocessing steps were applied using the FSL software (www.fmrib.ox.ac.uk) and custom MATLAB code (available at https://version.aalto.fi/gitlab/BML/bramila/). Briefly, EPI images were corrected for head motion using MCFLIRT.

Then they were coregistered to the Montreal Neurological Institute 152 2 mm template in a two-step registration procedure using FLIRT: from EPI to subject’s anatomical image after brain extraction (9 degrees of freedom) and from anatomical to standard template (12 degrees of freedom). Further, spatial smoothing was applied with a Gaussian kernel of 6 mm full width at half maximum. High pass temporal filter at a cut-off frequency of 0.01 Hz was used to remove scanner drift. To further control for motion and physiological artefacts, BOLD time series were cleaned using 24 motion-related regressors, signal from deep white matter, ventricles and cerebral spinal fluid locations (see56) for details, cerebral spinal fluid mask from SPM8 file csf.nii, white matter and ventricles masks from Harvard Oxford atlas included with FSL). As a measure of quality control we computed framewise displacement to quantify instantaneous head motion. Out of all the 120 runs (30 subjects, 4 sessions each), 97.5% of the runs (117 runs) had 90% of time points (640 volumes) with framewise displacement under the 0.5 mm threshold suggested in57. For the remaining three runs, the number of time points under 0.5 mm were 639 (89.7%), 633 (88.9%), 489 (68.7%), i.e. only one session had a considerable amount of head motion. While head motion is a concern in connectivity studies as it can increase spurious BOLD time series correlations that are affected by the same amount of instantaneous head motion, with across-brain time series correlation, head motion is expected to reduce the SNR. However, to make sure that head motion similarity did not explain any group difference, we computed the same permutation test for the ISC also for average framewise displacement by estimating the similarity of two subjects as the distance between their average framewise displacement value. We found that similarity in average head motion was not different between the two viewing conditions (t-value = 0.255; p = 0.398 obtained with 5000 permutations).

Inter-subject correlation (ISC) analysis of brain activity during movie watching

To investigate how similar the brain activity was across subjects in the different experimental conditions, we performed inter-subject correlation (ISC) using the isc-toolbox (https://www.nitrc.org/projects/isc-toolbox/)58. For each voxel the toolbox computes a similarity matrix between subject pairs and within same subject in all conditions, with the conditions being: (i) shared assumption that the movie’s sisters are genetically related, (ii) shared assumption that the younger sister was adopted, (iii) shared perspective of the to-be-organ-donor, and (iv) shared perspective of the to-be-organ-recipient. The total size of the similarity matrix is then 120 × 120 (4 conditions × 30 subjects) with each subject having two viewings for the genetic and two viewings for the non-genetic condition. The comparison between the conditions of the sisters to be perceived as either genetic sisters or non-genetic sisters results thus in a total of 1740 pairs per condition, as the similarity of BOLD time series during the two viewings (in either the genetic or the non-genetic condition) of each subject is compared with the two respective viewings of the other N-1 subjects. As the order of subjects does not matter, the final number of pairs in same conditions will be 2*2*(N-1)*N/2 = 1740 with N = 30. Each value of the correlation matrix is a result of the correlation between the BOLD time series of the pair of subjects considered for the selected voxel. We computed differences between experimental conditions by first transforming the correlation values into z-scores with the Fisher Z transform and then computing t-values and corresponding p-values using a permutation based approach59.

The Fisher-Z transformed correlations of the two perspectives were pooled for either the genetic or the non-genetic sisterhood.

Correction for the multiple comparison was performed with Benjamini-Hochberg false discovery rate (BH-FDR) correction at a q < 0.05, corresponding to a t-value threshold of 2.133. For visualization purposes, all results were also cluster corrected by removing any significant cluster smaller than 4 × 4 × 4 voxels. Summary tables were generated with an increased t-value threshold of 3. For the conjunction or “intersection–union test”64 the p values of the ISC and GLM results are pulled together by considering the maximum p-value at each voxel. Then, multiple comparisons correction is performed with the Benjamini-Hochberg false discovery rate procedure with an FDR threshold equal to q < 0.05.

Unthresholded statistical parametric maps can be found in neurovault: http://neurovault.org/collections/WGSQZWPH/.

Perspective taking

In the movie-viewing experiment, in addition to having the subjects to watch the movie in the conditions of sisters related by birth or by adoption, we had altogether four runs, so that on two of the runs the subjects were asked to view the movie from the perspective of the sister who was expected to donate the organ, and on two of the runs from the perspective of the to-be-recipient sister. Thus, there was one run wherein the subjects viewed the movie from the perspective of the to-be- donor thinking that the sisters were genetic, one run wherein the subjects viewed the movie from the perspective of the to-be- donor thinking that the sisters were non-genetic, one run wherein the subjects viewed the movie from the perspective of the to-be- recipient thinking that the sisters were genetic, and one run wherein the subjects viewed the movie from the perspective of the to-be- recipient thinking that the sisters were non-genetic.

As the results of this task open up a completely other aspect of the experiment with various results to discuss, which go beyond the scope and the space limitation of this article, they will be reported separately elsewhere. These conditions are mentioned here for reasons of describing the experimental procedures thoroughly so that it would be possible for others to replicate the study should they wish to do so.

General linear model analysis of the fMRI data acquired during the control task

A moral dilemma decision task was performed by all subjects to localize regions involved in moral processing. The moral dilemma decision task was analyzed with a general linear model approach using the SPM12 software (www.fil.ion.ucl.ac.uk/spm). To distinguish between moments of decision in the moral dilemma and the simple perception of the presentation, we created a temporal model of the occurrence of decision moments during the experiment. The decision regressor included time points from the revelation of the identity of involved individuals to the moment of decision indicated by button press. The activity during these time points was compared to the activity in all other time points of the task, including telling the background story of the moral dilemma in the presentation. Regressors were convolved with canonical hemodynamic response function to account for hemodynamic lag. From the preprocessed input data (see above) low-frequency signal drifts were removed by high-pass filtering (cutoff 128 s). First, individual contrast images were generated for the main effects of the regressors, then first level analyses were subjected to second-level analyses in MATLAB using one-sample t-test to test which brain areas showed significant activations in decision vs. no decision moments in a one-sample t-test over subjects. Statistical threshold was set at p < 0.05 (cluster-corrected using the threshold free cluster enhancement approach implemented by FSL randomize with 5000 permutations).

Recording of eye-movements

Eye movements were recorded during fMRI scanning from all subjects with an EyeLink 1000 eye tracker (SR Research, Mississauga, Ontario, Canada; sampling rate 1000 Hz, spatial accuracy better than 0.5°, with a 0.01° resolution in the pupil-tracking mode). Due to technical problems, 4 subjects had to be excluded from the final data analysis (with the rejection criteria of blinks maximum 10% of the duration of the scan and majority of blinks and saccades less than 1 second in duration). In addition, a part of recordings from some additional subjects had to de discarded due to the same criteria mentioned above, resulting in 61 recorded files with sufficient quality, with 35 files remaining in the genetic condition and 26 remaining files for the non-genetic condition. Prior to the experiment the eye tracker was calibrated once with a nine-point calibration.

Saccade detection was performed using a velocity threshold of 30°/s and an acceleration threshold of 4000°/s2. Because the experiment was relatively long and no intermediate drift correction was performed, we retrospectively corrected the mean effect of the drift. We first calculated the mean of all fixation locations over the entire experiment for each subject, and then rigidly shifted the fixation distributions so that the mean fixation location coincided with the grand mean fixation location over all subjects.

Eye-movement analysis

Subject-wise gaze fixation distributions were compared across the genetic vs. non-genetic conditions in the movie viewing task. Individual heat maps were generated by modelling each fixation as a Gaussian function using a Gaussian kernel with a standard deviation of 1degree of visual angle and a radius of 3 standard deviations. The heat maps were generated in time windows of 2 seconds corresponding to the TR used in the fMRI measurements. Spatial similarities between each pair of heat maps across the eye-tracking sessions were calculated using Pearson’s product-moment correlation coefficient (inter-subject correlation of eye gaze, eyeISC60). In the end a similarity matrix was obtained with correlations between each pair for each of the 712 time windows.

First, the mean eISC scores over all 712 time windows were examined. These mean scores were acquired by extracting the mean of Fisher’s Z-transformed correlation scores and then transforming these mean values back to the correlation scale before the statistical analysis. The statistical significance of the group differences was analysed by contrasting pairs in which both subjects assumed a genetic relationship with pairs in which both subjects assumed the younger sister to be adopted. Non-parametric permutation tests with a total of 100000 permutations were used to avoid making assumptions about the data distribution. In this procedure the data were mixed randomly to change groupings and differences in the resulting new randomised groups were used to form an estimated distribution of the data. A comparison of how many of the permuted random partitions into groups build a more extreme group mean difference that the one observed with the original grouping yielded the final p-values.

Behavioral Measurements and Self-reports

Valence and Arousal measurements

The subjects self-reported emotions they had experienced during movie viewing. This was carried out after the fMRI experiment by viewing the movie again (Full procedures have been described in an earlier publication61). Two aspects of emotional experience were rated: emotional valence (positive-negative scale) and arousal which were acquired on separate runs. While watching the movie in the middle of the screen, the subjects could move a small cursor on the right side of the screen up and down on a scale using the computer mouse to report their current state of valence or arousal using a web tool https://version.aalto.fi/gitlab/eglerean/dynamicannotations 60. The self-ratings were collected at 5 Hz sampling rate.

Behavioral questionnaires

The subjects were asked after the first fMRI session five short freeform questions about their perception of the movie, specifically about how easy it was to take one or the other perspective, and whether they would have donated their kidney if in place of the movie protagonist. After the second fMRI session all subjects were debriefed by showing them the ending of the original movie, where it is revealed that the sick sister had wished for the healthy sister to refuse donating her kidney. Afterwards they were asked if seeing the real ending changed their opinion on the roles of the two movie protagonists.

As an additional self-report measure, the subjects’ disposition for catching emotions from others was assessed with two emotional empathy questionnaires: Hatfield’s Emotional Contagion Scale62 and the BIS/BAS scale63. Every subject also filled in a questionnaire quantifying their social network2, including their emotional closeness to their sister and best friend. The names of the sister and best friend were obtained from this questionnaire for the moral dilemma task.

Analysis of behavioral measurements

Valence and arousal measurements

To test whether dynamic valence and arousal were different between the genetic and non-genetic condition, we first computed inter-subject similarity matrices using valence and arousal rating time-series. These were compared against a similarity matrix for the experimental conditions of the viewing preceding the valence/arousal rating, i.e. the model tests for the case where individuals are more similar within the same condition (genetic or non-genetic), but dissimilar between conditions. Tests were performed using Mantel test with 5000 permutations. We also performed a test to see if subjects who were rating arousal and valence for the genetic condition had a stronger group similarity than subjects who rated arousal and valence for the non-genetic condition. Tests were performed using permutation based t-tests. As dynamic ratings can also be different in specific time points, we also performed a permutation-based t-test on valence and arousal values at each time point corrected for multiple comparisons across time.

Heart rate and breathing rate analysis

Differences between experimental conditions were computed in the same way as in the ISC analysis: Correlation values were first transformed into z-scores with the Fisher Z’s transform and then a permutation based approach was used to compute t-values and corresponding p-values52. Correction for the multiple comparisons was performed with Benjamini-Hochberg false discovery rate (BH-FDR) correction at a q < 0.05, corresponding to a t-value threshold of 2.133.

Behavioral measurements with a new group of subjects

Subsequent to the fMRI experiments a new group of 30 subjects (all female, and having a sister, 18–33 years, mean age 25.5 years, right handed) were recruited for two further behavioral measurements. The subjects first performed an implicit association test (IAT). The IAT measures attitudes and beliefs that might not be consciously self-recognized by the subject or attitudes that the subjects are unwilling to report. By asking the subjects to sort, as quickly as possible, positively and negatively connoted words into categories, the IAT can measure the reaction times of the association process between the categories and the evaluations (e.g., good, bad). It has been shown in previous studies that making a response is easier and thus faster if the category is matching the implicit evaluation the subject bears in mind23. In this study the two categories were “genetic sister” (sisko) and “adopted sister” (adoptiosisko). The two categories were paired in different randomized runs with positive or negative words, thus the experiment comprised separate runs asking the subjects to either match the positive words with the category “genetic sister” and negative words with the category “adopted sister” or vice versa to match positive words with the category “adopted sister” and negative words with the category “genetic sister”. The order in which the runs are presented counter-balanced across subjects and categories switched their localization on the screen in different runs to be on the left or right side of the screen to the same extent. Subjects were asked to press a key with either the right or the left hand and thus assign the evaluation word to one category on either the left or right hand side of the computer screen. With the experiment going on, the number of trials in this part of the IAT is increased in order to minimize the effects of practice. The IAT score is based on how long it takes a person to sort the words with the condition associating positive words and genetic (and negative and adopted) in contrast to negative words and genetic (and positive words and adopted). If an implicit preference exist for one of the categories subjects would be faster to match positive words to that category relative to the reverse. Data were analysed using Matlab. Similarity between subjects’ scores were examined TOST testing34. As a second task, reaction times for the moral decision task were measured with the same group of subjects that underwent the IAT. As a difference to the decision task performed during fMRI scanning the order of the decisions was randomized (with easy decision including only strangers and difficult decisions including the sister on one side and the friend on the other). Reaction times were measured as the time between the onset of the slide revealing the identity of the involved individuals and the button press of the subject that related her decision.

Data availability

The data that support the findings of this study are available on request from the corresponding author MBT. The data are not publicly available due to a prohibition by the Finnish law: Juridical restrictions set by the Finnish law prevent public access to the collected data, be it anonymized or non-anonymized, when data are recorded from human individuals. As the consent given by the subjects only applies to the specific study reported in our manuscript, no portion of the data collected could be used or released for use by third parties.