Motivation behind this project

While I was a researcher at the Austrian Institute of Technology, I was intrigued by the idea that we use our eyes to navigate our environment in much different ways than before. Looking for our way, we do not anymore need to rely on other pedestrians guiding us, but we need to have a smartphone and know how to use a map app. Our eyes moving with us, perform two main tasks: guiding the route in real time to synchronise our body movements AND following the simulated route on our smartphone. The eyes thus, have to move from near to far all the time and guide our locomotion and interactions on the smartphone. It seems to be a crazy cognitive switch! I wanted to see how the interplay between user, her phone and the environment is changing based on the spatial skills of the user and whether I could identify different types of navigators based on their eye behavior.

Overall, it was an exciting project, because I learnt R, how to use eye-tracking and read a lot of papers on cognitive-visual aspects of smartphone use.

Abstract

We asked 22 young adults to perform a pedestrian way-finding task with the Google Maps app while wearing eye-tracking glasses. Analysing the eye-tracking data, we came up with a two-plane coding scheme and visualisation. The environment plane includes the time spans of user’s attention on the environment and the recorded events taking place within user’s visual field (whether the user directly attends them or not). The smartphone plane is marked by user’s attention span on the Google Maps app and the micro-actions (e.g. zooming) the user performs on it. With the goal to shed light on the interplay between the user, the environment and Google Maps app, we applied exploratory statistics to derive interaction styles and we explored how attention allocation changes from the beginning to the end of the task, during an event happening in users’ visual field. Finally we investigated correlations with perceived workload and spatial ability.

Introduction

Since the beginning of 2000’s, we are amidst a digital world that is expanding and reinventing itself continuously and a physical world that has not undergone major changes. The pedestrian experience in a European city is made by the same components - buildings, roads, traffic lights, signage - which have remained astoundingly similar in appearance in the course of time. Yet, pedestrians nowadays have a different experience when moving through a city; one strongly connected with the use of their smartphones. This multitasking experience is known to challenge pedestrians in cognitive, physical and locomotive ways. Engaging in smartphone use (e.g. texting or calling) while on the way was shown to reduce walking speed, the ability to recall objects from the route, and to make street crossing more unsafe for pedestrians.

According to Gärling, pedestrian navigation can be defined as the set of interactions between a wayfinder, a travel planning tool and the environment. Montello describes navigation as a process consisting of two components, wayfinding and locomotion. Wayfinding refers to spatial understanding and decision making, such as route planning and searching for cues.

In this study, we examine the case of pedestrian navigation to an unknown destination with the Google Maps app as a function of attention (to the environment and to the smartphone) and the set of micro-interactions users perform on the phone during the route.

Method

We recruited 11 female and 11 male participants, aged between 21 and 34 years. They were university students from various backgrounds: IT, law, music, medical and archeology, etc. They were selected from a pool of potential participants, after they filled in a questionnaire examining their familiarity with android smartphones and Google Maps; their physical mobility and vision; and their knowledge of the district of the study (they should be unfamiliar with it). In the beginning of the session, participants were briefed about the study and signed the informed consent.

After administering the two measures of spatial ability, the instructor calibrated the eye-tracking goggles (SMI-ETG) on the participant. In a next step, the instructor demonstrated the Android phone and its basic functions to the participant and allowed her to check Google Maps on it for a few minutes. Next, participant and instructor went down the building and at the exit door, the instructor announced the destination to the participant and selected one out of the three routes that appeared on Google Maps. The route went through a wide industrial office area, a park, and a construction site. Participants were free to take time checking the route on the phone before starting to move away from the lab.

The automatic screen lock was deactivated in order to avoid delays when participants wanted to check the application. Participants were told to walk to the final destination in the pace they normally would and to use the dot-on-the-map navigation mode instead of the turn-by-turn instructions. During the navigation, the instructor walked some meters behind the participant, without actively communicating with him/her.

Eye tracking data coding scheme

The eye-tracking footage allowed us to observe everything that was in the visual field of the participants at any time: cars, pedestrians, traffic lights, the smartphone screen.

The first annotation layer was binary, indicating gaze switches between the smartphone and the environment, and was produced with the video annotation software Advene [35]. The second annotation layer was created by two independent researchers who noted down their observations of significant events, each on a set of 5 videos (different set of videos for each researcher), and later combined their observations into a broader coding scheme. Some of the event categories observed were organized under broader categories (e.g. moving people) and some categories were omitted due to low importance (e.g. participant does an unrelated activity like puts gloves or blows nose), low frequency and/or high ambiguity (what does “abrupt loud sound” exactly mean?). All codes refer to time intervals.

Micro-interactions: Actions the participant performs on the GM app in order to orient herself.
- Zoom in: when the participant moves two fingers on the GM app making a gesture from ‘closed’ to ‘open’.
- Zoom out: opposite gesture of zoom in
- tilting smartphone: when the participant tilts the smartphone, the arrow showing direction changes and thus the participant can understand which direction she has to move towards.
- tilting the map: participants tilt the map on the app with a hand gesture to align the direction of the map with their environment. Similarly interesting as an orientation activity and an interaction pattern.
- Environment matching while moving: path1, path2, smartphone: this is a pattern where a participant checks the environment (e.g. in a crossroad) and the smartphone, to orient herself.
- Environment matching while stopped: path1, path2, smartphone: similar to the above.

Events on the environment realm: Events, such as humans moving, traffic lights, or the participant crossing the street, demand a minimum amount of attention to the environment.

Red traffic lights
Participant crossing street
noticeable standing person(s): this refers to standing humans on the way that get noticed by the participant.
noticeable moving person(s) or vehicle(s).
noticeable moving vehicle(s).

Mistake recognition and change of route. When participants recognise that they made a mistake, they take time to process both environmental data and app data to correct their mistake and this is a process that was either vocalised or observed.

Eye-tracking Data Visualisation and Observations

Visual analytics play a crucial role in unveiling possibly hidden patterns in the data. Most eye-tracking visualisations address static stimuli (for example, a user looking at a map with eye tracking glasses) and cluster the eye gazes according to Areas of Interest (AOIs). Visual methodologies for the analysis of dynamic stimuli (eg person wearing mobile eye tracking and moving or watching a movie) are not extensively documented.

In our visual analysis, we are not focused on the actual navigation performance. We are not interested how the participants actually match the map to the environment to find the next turn, nor do we have insights on the cognitive processes taking place in the mind of the participant. We are more interested in the attention allocation of the users during the task of navigation, the events that might divert their attention and their interaction with their phones. This is one reason why our visual analytics method does not take into account AOIs. Another reason is technical. The users were allowed to take any path to reach the destination, thus AOIs wouldn’t be so consistent among all users.

The output of the video annotation was a number of timeframes of different events and actions. Data visualisation was the obvious next step in order to have an overview of the ‘texture’ of the data. The amount of attention switches, events and micro-interactions on the smartphone in one single route did not allow for representation of the whole coding scheme in one timeline that would fit in the standard size of an A4 page. Thus, we came up with the idea of seperating the data into three different categories, the environment interactions plane, the mobile interactions plane and the inbetween plane. In this way, one participant’s route would be depicted in three adjacent scarf plots. Each separate activity is color coded. The visualisation as well as the statistical analysis were performed with R in the RStudio environment.

Visualisations:

Full routes as it’s done

Diagram of rhythms of attention switches
Microinteractions profile
Attention profile

The footage of the eye-tracking device was annotated with the video annotation software Advene [25], in a binary manner: looking at the environment (“eyes up”) vs. looking at the smartphone (“eyes down”). All the calculations of the variables were performed with R statistical software. We performed the correlation matrix of all variables and calculated simple linear regression models [Table 1]. In our calculations, we included NASA-TLX as one unweighted score and each of its subscales as different variables. The highest correlations between all NASA-TLX subscales and the independent variables were reported in the mental demand subscale. That is why we considered it relevant as a separate measure.

Attention Switches, Relative Dwell Time on Screen and Total Task TimeAccording to GM estimation, the walking duration was 16 minutes while participants took on average 19 minutes (Mdn = 1020.6 sec, SD = 278.4 sec).

Participants switched their attention between the smartphone and the environment on average 86.23 times (Mdn = 74.5 times, SD = 44.36). The mean relative dwell time was 32.7% (Mdn = 30, SD = 12.1), which means that on average participants spent almost 1/3 of the total task time looking at the screen. The results are comparable with [30], where attention to the environment takes 46 to 80% during web page loading when outdoors.

Mapping Attention Allocation