Participants

A total of 76 undergraduate students participated in the research, which involved having a varying number of one-on-one face-to-face conversations with other participants. There were 176 conversations overall, and as each conversation could produce two sets of gaze data (i.e., a set of gaze data for each of the two participants having a conversation), there were hypothetically 352 sets of gaze data obtainable. This study only reports data from conversations where an individual was accurately tracked with the Tobii Glasses (we explain how this was determined in the procedure section). We acquired 107 sets of accurately tracked gaze data, collected from 49 participants (mean age = 32 years, SD = 14, min = 17 & max = 67). Some reasons why our final set of data was substantially smaller than what could have been acquired all related to the fact this was our first experience using the Tobii Glasses: (1) We did not screen out participants who needed corrective lenses, as we were recruiting participants for another research project that was running in parallel to the study reported here. There are corrective lenses than can be purchased for use with the Tobii Glasses however we did not possess that option during our data collection. (2) For some participants we did not compensate for a slight downward tilt in the Tobii Glasses camera that meant in some instances the visual recording only captured half the head of the person they were conversing with, so these instances had to be discarded. (3) We were conservative with our criteria for accepting the tracking as accurate. Prior to commencement, the research was approved by the Edith Cowan University Human Research Ethics Committee. All participants provided informed consent. This research was performed in accordance with the National Statement on Ethical Conduct in Human Research as outlined by the Australian Government National Health and Medical Research Council (NHMRC).

Equipment

To simultaneously record eye gaze patterns of two participants we used two pairs of Tobii Pro Glasses 2 (50 Hz model: http://www.tobiipro.com/product-listing/tobii-pro-glasses-2/). The eye tracking glasses contain a high definition camera and microphone embedded in the middle of the glasses for recording the audiovisual scene in front of the wearer (resolution of 1920 × 1080 pixels, at 25 frames per second). The audiovisual recording is stored on a recording unit that connects to the glasses. The wearer’s eye gaze behaviour is tracked and recorded via four sensors with a 50 Hz sampling rate. The manufacturer has reported a spatial accuracy of 0.63 degrees at a distance of 1.5 metres56. This manufactory testing is conducted under ideal conditions so this is likely to be an over-estimation of the spatial accuracy quality in our study57. In our study participants are seated across from each other at 1 metre distance with much of the conversation people staring straight ahead. Compared to other mobile tracking studies where participants are moving around an environment with large head movements17, our study arguably represents a context relatively close to the testing benchmark context. The glasses require dedicated Tobii software to operate and analysis software to export the recordings as .mp4 files. We also used specialized behavioural coding software to manually code gaze behaviour that allows synchronous playback and manual coding of two audiovisual files – Mangold INTERACT (https://www.mangold-international.com/en/software/interact).

Procedure

Participants were recruited to visit the ECU Cognition Research Group lab and engage in a short round robin of 4-minute getting acquainted conversations. Participants were not instructed to speak about any specific topics, instead they were left to their own devices to ensure the conversation be as natural as possible. The ideal was 4 people attending a session with everyone having 3 conversations. This however was difficult to achieve due to participants not showing up, or sometimes having to leave the session early. Also, a handful of participants returned for a second or third session. We analyze eye tracking data from the 49 participants for whom we successfully and accuractely recorded gaze patterns. These participants engaged in a varying number of conversations that ranged from a single conversation up to seven conversations.

During conversation, participants sat directly across from one another 1 metre apart. To provide an assessment of the accuracy of gaze tracking, we utilised a make-shift test board by taping a piece of paper over a clapper-board with a 1, 2, and 3 drawn at the top-left, middle, and bottom-right of the board, respectively. The board was held up in front of the face of the conversational partner, and the participant was instructed to gaze at the 1, 2, and 3 as the numbers were spoken aloud by the experimenter (each digit spoken with a pause of about 2 seconds afterwards). This was then repeated for the participant who had just had the board held in front of their face (i.e., The board was held up in front of the face of their conversational partner and the numbers were counted). For an example of accurate, and inaccurate tracking, please see online supplement document section-1. After the accuracy check, the clapper-board was used to signal the start of the conversation by performing the clap in-between the two participants. After each conversation, participants rated their perceptions of mutual eye contact via the question ‘There was mutual eye contact between myself and my partner’. This question was rated on a 6-point scale: Never, Very rarely, Rarely, Sometimes, Often, Very often.

Coding eye gaze fixations

To examine participant eye gaze patterns, we exported the Tobii glasses footage as .mp4 files using Tobii analysis software. These audiovisual files contain an eye tracking overlay in the form of a red circle that represents a person’s attentional focus at any point in time. The Tobii software allows for the tracking circle to be exported at varying sizes. We decided on a size of 15 pixels as the ideal size for our purposes. This size was decided as ideal because it provided balance between distinguishing different coding regions of interest, while also maintaining comfort for the coder when watching the playback. Smaller sizes would require the coder to squint when watching the playback that would cause discomfort over the course of coding hours. The Tobii analysis software provides an option for a fixation filter to be applied to the tracking data and overlay. The Tobii I-VT fixation filter classifies gaze into fixations where the tracking is stable upon a small window based on parameters of space, velocity, and time. In our research we used the default parameters set by the Tobii Analyzer software prior to exporting videos as .mp4 files. Specifically, the default settings used were: Gap fill-in (interpolation) -> Max gap length = 75 ms; Noise reduction -> Moving median, window size (samples) = 3; Velocity calculator -> Window length = 20 ms; I-VT classifier -> Threshold (°/s) = 30; Merge adjacent fixations -> Max time between fixations = 75 ms, Max angle between fixations = 0.5; Discard short fixations -> Minimum fixations duration = 60 ms. For a full explanation of parameters see Tobii Technology online White Paper58. The sensitivity of the filter means it does not remove blinks.

Pairs of videos were opened using the Mangold INTERACT software. The use of a clapper-board enabled the recordings to be played back in-sync after adjusting start times appropriately. A participant’s eye gaze behaviour was manually coded using INTERACT according to the location being fixated upon at any point in time for several on-face locations (forehead, eyes, nose, mouth, & other-face) and off-face locations (up, down, off-left, & off-right). ‘Other-face’ refers to the cheeks and jaw areas of the face, essentially any spot not covered by the other locations. ‘Eyes’ included both eyes and in-between the eyes. When a participant blinked, this was included in the preceding code. For example, if a participant was gazing at an eye of their partner this would be coded as ‘eyes’, and if the participant blinked and shifted to ‘off-left’, that blink would have been coded as part of the preceding ‘eyes’ code. Our methods also allowed us to quantify the extent and timing of mutual faze gaze (i.e., both partners simultaneously looking at each other’s face) and eye contact (i.e., both partners simultaneously looking at each other’s eyes). In general, videos were played back frame-by-frame, but for some participants with less variable gaze movements the video could be played back faster. Coding of gaze behavior took approximately 35 minutes (ranged from about 20–60 minutes) per person within a conversation. Coding the 107 sets of gaze data therefore took roughly 62 hours.

For each conversation, each participant was coded separately, however the ability to watch simultaneous video playback from both participants using INTERACT was useful for coding purposes. This is because playing the two video recordings synchronously side by side allows identification of off-face gaze behaviour that is difficult to determine otherwise. This is because off-face glances can at times fall outside the tracking capability of the glasses. When this occurred, the dual video footage allowed the coder to determine where the large off gaze was directed (e.g., off-left). Large sidelong glances were clearly visible when looking at the video footage being played back from the point of view of the conversational partner. Using the visualization capability of INTERACT we can provide a detailed picture of an individual’s eye gaze pattern, see Fig. 2. All eye gaze pattern visualizations are provided in online supplement document section-2.

Figure 2 A visualization of eye gaze patterns. (a) Snapshots of participants D1 (left) and I1 (right) taken from the footage captured by the Tobii glasses. The small red circle represents attentional focus as captured by the Tobii glasses. Note that this snapshot of conversation represents an instance of mutual eye contact between the two participants. The Participant images shown are used with permission granted by the participants. (b) Gaze heatmap overlay of fixation duration for participant I1 gazing at D1 (left), and D1 gazing at I1 (right), for the entire 4-minute conversation. Warmer colour (red/orange/yellow) indicates more gazing time, and cooler colour (green) indicates less gazing time. Using Tobii Pro Analyzer software the head movement of participants was accounted for in the heatmap images. (c) Gaze pattern, and speaking/listening time, for participant D1 (looking at I1) for the first minute and a half of conversation. (d) Gaze pattern, and speaking/listening time, for participant I1 (looking at D1) for the first minute and a half of conversation. (e) On and Off face gaze pattern, for the first minute and a half of conversation, for participant D1 (labelled ‘GAZE’) looking at I1, and for participant I1 (labelled ‘gaze2’) looking at D1. The two blue patterns at the bottom of the image represent periods of mutual face gaze, and mutual eye contact, respectively. The full 4-minute representation of gaze patterns are provided in the online supplemental document section-2. Full size image

We also coded all interactions for speaking and listening turns. Back-channel verbal responses such as “Mmm” and “Yeah” were not treated as a speaking turn and were instead coded as part of a listening turn. Therefore, speaking turns specifically represent verbal acts of questioning or self-disclosure. Coding of speaking/listening turns took approximately 15 minutes (ranged from about 10–40 minutes) per person. Coding the 107 sets of speaking/listening data therefore took roughly 27 hours. Inter-rater reliability was assessed by a separate independent person coding the full 4-minute conversation eye gaze patterns of the participants D1 and I1 that are shown in Fig. 2. Cohen’s Kappa was calculated using the INTERACT software, and was found to be satisfactory for both participant D1 (Kappa = 0.68) and I1 (Kappa = 0.70) for eye fixation coding. This was also the case for D1 (Kappa = 0.78) and I1 (Kappa = 0.92) for speaking/listening coding. In this article we adhere to guidelines for interpreting Cohen’s Kappa and intra-class correlation: Poor (less than 0.39), fair (0.40–0.59), good (0.60–0.74), and excellent (0.75–1)59.

Data availability

Please see supplement material associated with this article.