1. Introduction

Advances in imaging technology and natural user interfaces in the last decade have allowed the enhancement of public spaces with interactive installations that inform, educate, or entertain visitors using rich media content [ 1 ]. These digital applications are usually displayed on large surfaces placed indoors or outdoors, e.g., on walls, screens, or tables using projection or touch screen displays, and users can explore and interact with them in a natural, intuitive, and playful way. In many cases, these installations include 3D content to be explored, such as complicated geometric models or even large virtual environments. Most installations displaying 3D content are currently found in museums and cultural institutions [ 2 ], where the use of detailed geometry is necessary for the presentation of cultural heritage, e.g., by exhibiting virtual restorations of ancient settlements, buildings, or monuments. Other application areas of installations based on 3D content include interacting with works of art [ 3 ], exploring geographical [ 4 ] or historical [ 5 ] information, performing presentations [ 6 ], interacting with scientific visualizations [ 7 ], navigating in virtual city models for urban planning [ 8 ], etc.

The design of 3D interaction techniques for installations requires special attention in order to be intuitive and easy to perform by the general public. 3D environments introduce a different metaphor and extra degrees of freedom, and new users can easily get frustrated through repeated ineffective interactions [ 9 ]. The most fundamental, yet complicated interaction technique for any type of environment is user navigation. 3D spaces naturally require frequent movements and viewpoint changes in order to be able to browse the content from different angles, to uncover occluded parts of the scene, to travel to distant parts, and to be able to interact with objects from a certain proximity. Navigation is a mentally demanding process for inexperienced users, because it involves continuous steering of the virtual body, as well as wayfinding abilities. As far as steering is concerned, a difficult challenge for designers is the meaningful translation of the input device into respective movements in the 3D world [ 10 ]. In most public installations with 3D content, user navigation takes place from a first-person point of view and involves the virtual walkthrough of interior and exterior spaces.

Typical desktop or multitouch approaches are not the most appropriate means of navigating and interacting with 3D content in public installations. A common setup in such systems is to present the content on a usually large vertical surface and to let visitors interact in a standing position at some distance, being able to look at the whole screen. The use of a keyboard or mouse is not very helpful for a standing user, whilst touch or multitouch gestures cannot be performed if visitors are interacting from a distance. To overcome these issues, public installations have been using solutions ‘beyond the desktop,’ usually based on natural user interfaces. Initially, the interaction techniques involved handheld controllers, such as WiiMote, or other custom devices, e.g., [ 11 ]. However, the use of handheld devices in public settings raises concerns about security and maintenance. More recently, users have been able to interact with 3D content in public settings using body gestures, without the need of any additional handheld or wearable device. Developers have taken advantage of low-cost vision and depth sensing technology and have created interactive applications, in which users can navigate or manipulate objects of a 3D scene using body movements and arm gestures in mid-air.

15, A variety of sensors have been used for mid-air interactions in public installations, the most popular one being Microsoft Kinect. Kinect can detect the body motion of up to four users in real time and translate them into respective actions. As such, it is appropriate for standing users navigating and interacting with a 3D scene from a distance, and has been already deployed in public museum environments, e.g., [ 12 13 ]. A secondary, less common option is the Leap Motion controller, which is considered faster and more accurate but is limited to hands-only interaction. The controller has to be at a near distance from the users’ hands and it is therefore more appropriate for seated users, which is somehow limiting for public installations. Numerous techniques for first-person navigation have been implemented using the Kinect sensor, such as leaning the body forwards or backwards to move [ 14 15 ], rotating the shoulders to change direction [ 14 16 ], walking in place [ 15 16 ], using hands to indicate navigation speed and direction [ 17 18 ], using both hands to steer an invisible bike [ 18 ], etc. Two comparative studies have also been setup to assess the effectiveness and usability of Kinect navigation techniques in field or laboratory settings [ 15 16 ], identifying preferences and drawbacks of the aforementioned techniques.

23, Most evaluations of interactive 3D installations using Kinect have concluded that it is a motivating and playful approach but not without problems. Some people feel embarrassed to make awkward body postures or gestures in public [ 19 ]. Also, there are users who find the interactions tiring after a while because of the fatigue caused by some gestures, e.g., having to hold arms up for a long time. Finally, the presence of other people near the installation may cause interference to the sensor and therefore most of these installations require that an area near the user is clear from visitors. An alternative to mid-air interactions for navigating in public installations that has been recently proposed is the use of a mobile device as a controller [ 20 ]. Most people carry a modern mobile device (smartphone or tablet) with them with satisfactory processing and graphics capabilities and equipped with various sensors. Following the recent trend of “bring your own device” (BYOD) in museums and public institutions [ 21 ], where visitors use their own devices to access public services offered by the place, one could easily use her device for interacting with a public installation. For example, using the public WiFi, one could download and run a dedicated app or visit a page that turns her device into a navigation controller. The use of mobile devices as controllers has already been tested in other settings, e.g., games and virtual environments, with quite promising results [ 22 24 ]. This alternative may have some possible advantages compared to mid-air interactions. It can be more customizable, it can lead to more personalized experiences by tracking and remembering individual users, and it could also deliver custom content on their devices, e.g., a kind of ‘reward’ for completing a challenge.

The aims of this work are to examine whether a mobile device used as a controller can be a reliable solution for first-person 3D navigation in public installations, and to determine the main design features of such a controller. We carried out two successive studies for this purpose.

25, In the first study, we sought to explore whether a smartphone controller can perform at least as good as Kinect-based navigation, which is the most common approach today. We setup a comparative study between mid-air bodily interactions using Kinect and tilt-based interactions using a smartphone in two environments and respective scenarios: A small museum interior, in which the user has to closely observe the exhibits, and a large scene with buildings, rooms, and corridors, in which the user has to effectively navigate to selected targets. The interaction techniques used in this study have been selected and adapted based on the results of previous research, i.e., we used a mid-air interaction involving the leaning and rotation of the upper body, which generated the highest outcomes in [ 15 ] and was also one of the prevalent methods in [ 16 ], and a technique based on the tilting and rotation of the handheld device, which was also discovered as usable in [ 7 26 ]. A testbed environment developed for the study automatically measured the time spent to complete each scenario, the path travelled, the number of collisions, and the total collision time. Furthermore, subjective ratings and comments for each interaction technique were collected by the users through questionnaires and follow-up discussions. The results of the first study indicated that the smartphone performed at least as good as Kinect in terms of usability and performance, and it was the preferred interaction method for most of the participants.

Following the encouraging results of the first study, we aimed to look in more depth at the interaction techniques to be used for the design of a mobile controller. For this purpose, we setup a gesture elicitation study to collect preferred gestures from users and improve the guessability of the designed interactions [ 27 ]. We had our participants propose their own gestures for a series of navigation actions: Walking forward and backwards, rotating to the left or right, looking up or down, and walking sideways. They were free to select between (multi-)touch actions, rotating or moving the whole device, or a combination of them, and they could propose any visual interface on the device. Whenever they proposed a gesture, we tested it in the museum environment of the first study using a Wizzard of Oz technique and had our users reflect about it. The results of the study led to interesting observations regarding the preferred gestures of users and the different ways in which users mapped mobile actions to 3D movements in the projected environment.

We present the results of our studies and a discussion about their implication for the design of novel interaction techniques for virtual reality applications presented on public displays.