Seeing the World Through Digital Prisms: Psychological Implications of Passthrough Video Usage in Mixed Reality
Volume 5, Issue 2. DOI: 10.1037/tmb0000129
by Jeremy N. Bailenson, Brian Beams, James Brown, Cyan DeVeaux, Eugy Han, Anna C. M. Queiroz, Rabindra Ratan, Monique Santoso, Tara Srirangarajan, Yujie Tao, and Portia Wang
Published onJun 24, 2024
Seeing the World Through Digital Prisms: Psychological Implications of Passthrough Video Usage in Mixed Reality
·
Abstract
Millions of people will soon be spending hours each day relying on cameras and screens to show them the surrounding world. Apple, Meta, and other companies are mass-producing headsets that block out light from the real world and instead rely on passthrough video as an enabling technology for mixed reality. The 11 authors of this article each spent a number of hours wearing these headsets in public and in private, with the goal of documenting experiences in passthrough to then organize and review previous research that will help research scholars, industry leaders, and other organizations better understand psychological consequences over time. First, we describe why passthrough will become an essential component of the media landscape. Next, we summarize the technological specifications that make new passthrough headsets stand out from previous ones, but still have lower fidelity compared to human vision on parameters such as field of view, distortion, latency, and resolution. Next, we review relevant previous psychological research. We conclude that the passthrough experience can inspire awe and lend itself to many applications but will also likely cause visual aftereffects, lapses in judgments of distance, induce simulator sickness, and interfere with social connection. We recommend caution and restraint for companies lobbying for daily use of these headsets and urge scholars to rigorously and longitudinally study this phenomenon.
Acknowledgments: The authors thank Sun Joo Ahn, Jakki Bailey, Gerd Bruder, Max Foxman, David Jeong, Jeffrey Hancock, Mark Roman Miller, Philip Rosedale, Gregory Welch, Gordon Wetzstein, and Andrea Stevenson Won for incredibly helpful feedback on an earlier draft of this article.
Funding: This work is not funded.
Disclosures: The authors declare no conflicts of interest.
Author Contributions: Authorship order after the first author is alphabetical.
Data Availability: There are no quantitative data to have used prior or to share.
Open Access License: This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC-BY- NC-ND). This license permits copying and redistributing the work in any medium or format for noncommercial use provided the original authors and source are credited and a link to the license is included in attribution. No derivative works are permitted under this license.
Correspondence concerning this article should be addressed to Jeremy N. Bailenson, Department of Communication, Stanford University, Stanford, CA 94305, United States. Email:[email protected]
Mixed reality (MR) headsets create immersive experiences designed to spatially integrate virtual content into the physical world (see Milgram & Kishino, 1994, for a discussion of the spectrum of virtuality, and Lanier, 2017, Rauschnabel et al., 2022; Skarbez et al., 2021, for more recent discussions of terminology). The newest headsets rely on passthrough video. While using passthrough, a person does not see light from the real world but instead relies on stereoscopic, color, high resolution, low latency, real-time video of the world, which is displayed on small screens inside a headset. There have been thousands of studies in psychology, communication, and human–computer interaction that study human behavior in MR, but that research has tended to focus specifically on the virtual content. In this article, we focus on the use of passthrough video itself, as opposed to the augmented virtual content (see Rolland et al., 1995, for an early example of psychological experimentation on passthrough). Simply put, there is a dearth of research focusing on passthrough as a medium one uses to perform everyday activities while viewing and navigating the real world. Because the newest headsets are light, the cameras and screens are high quality, and the overall system latency is low, passthrough can now be easily used for hours at a time, indoors and outdoors. This change in the temporal nature of passthrough usage necessitates careful consideration of this technology’s implications.
Before writing this article, the 11 authors had ample research, work, and demonstration experience across many passthrough headsets, including the Apple Vision Pro, the Quest Pro, the Quest 3, the Varjo XR-3, and various night vision goggles. As a group (eight female, three male, identifying sometimes in multiple categories as six Asian or Asian American, two Black or African, one Hispanic or Latino/a, one Native Hawaiian and Pacific Islander, five White or European), we designed an institutional review board-approved protocol to develop a systematic and shared understanding of this technology. We each spent approximately 140 min over two or three sessions in passthrough using the Meta Quest 3, resulting in 30 total sessions. Some activities were performed by most participants, such as estimating the distance between themselves and another person in the room, but for the most part, participants chose from a menu of suggested activities, including walking outdoors, playing games, engaging in a conversation, eating, or cooking (Figure 1). In order to ensure safety, there was always a chaperone present who was not wearing a headset.
Our goal was to leverage the expertise of this group in order to explore the trials and tribulations of this technology. Because we have access to a wide variety of up-to-date hardware, understand how to implement protocols that ensure physical and psychological safety, and have the technical expertise to quickly find the edge cases in which technology fails, we are a unique group to explore this medium. In this article, we use these informal field notes (which are not quantitatively proven scientific evidence) throughout the article as we organize and review past literature that relates to the psychological implications of experiencing passthrough via headsets for hours each day.
We first describe why passthrough will become an essential part of the media landscape as the basic architecture of future MR. Next, we summarize the technological specifications that, on the one hand, make the Apple Vision Pro and Meta Quest 3 headsets stand out from past implementations but, on the other hand, fall short of typical human vision. Next, we review previous research that has explored passthrough and similar technologies. Finally, we make recommendations to scholars for areas of future research, to consumers for considering downsides of this technology, and to technology companies for adjusting their user guidelines.
An Inflection Point for Passthrough
Technology companies in Silicon Valley are investing heavily in MR. Leading the push is Meta, whose products are used by almost half of the global population (Dixon, 2023) and Apple, the first company to hit a market cap of 3 trillion dollars and whose products are also used by billions (Leswing, 2023). Meta has sold approximately 20 million MR headsets (Lang, 2023) and has invested around 50 billion dollars on MR (Hoium, 2023). Apple is set to release its first headset in February 2024. Both companies are positioning MR as a medium for entertainment, communication, and work, eventually replacing phones and computers. For example, Apple states on its Vision Pro webpage that the headset “is designed for all-day use” (Apple, 2023), and when announcing the release date, Tim Cook posted on social media that “the era of spatial computing has arrived” (Cook, 2024). Meta literally changed their company name from Facebook to celebrate this product and is following a similar strategy for daily use, partnering with Microsoft to offer hours-long work and productivity-related experiences in MR (Sutrich, 2023).
While the companies have different visions—Apple is focusing more on seated computing done in headset, while Meta has focused on active gaming and fantastical virtual worlds—there is little doubt they are competing to be the leader of MR (Egliston & Carter, 2022). Whether or not MR becomes ubiquitous or a commercial success remains to be seen, but there is little doubt that millions of people will be involved in this real-time experiment. To date, as shown in Figure 2, two canonical types of MR displays have since emerged: video see-through display and optical see-through display.
With most previous MR headsets, users could see digital content juxtaposed onto the real world. Those devices used optical see-through technology, where content was projected onto a transparent lens that allowed them to see light from the physical world. In other words, it was as though they were wearing normal reading glasses, but a small portion of the lens showed digital content. But passthrough is fundamentally different, as the left panel in Figure 2 demonstrates. A person wearing a modern passthrough MR must instead rely on stereoscopic, color, high resolution, low latency, real-time video of the world.
Up until 2023, optical see-through displays have dominated the commercial augmented reality (AR) and MR market, featuring products such as Hololens 1 (2016), Hololens 2 (2019), Google Glass 2 (2019), Magic Leap 1 (2018), and Magic Leap 2 (2022). However, optical see-through displays suffer from several limitations that prevent them from being usable for everyday tasks. First of all, they tend to have small fields of view for digital content (Doughty et al., 2022), akin to holding a piece of printer article horizontally at arm’s length and viewing the virtual layer of the world only through that small window. Hence, the interactive experiences are constrained and unnatural. Second, optical see-through displays generate digital images by integrating rendered with ambient light from the real world, resulting in altered color perception. Notably, white appears more luminous, while dark colors seem transparent (Kruijff et al., 2010; Microsoft, 2022), posing particular challenges to scenes that include people with dark skin (Peck et al., 2022). Third, the real world is much richer—more colorful, detailed, and spatially nuanced—than virtual content. Hence, it is difficult to seamlessly combine the real and virtual content with optical see-through.
To solve these technological challenges, many technology companies have pivoted away from optical see-through in favor of passthrough. This strategy has been successful. For example, the passthrough capability of Meta Quest 3 (Oliver, 2023) has a much larger field of view than current commercial optical see-through headsets. Similarly, passthrough improves the integration of digital and real-world content, both of which are digital in these systems and can be combined at the pixel level (Li et al., 2022; Zheng et al., 2014). As an example of an experience that would not be possible with previous optical see-through systems, passthrough MR headsets can delete large areas of the real world (as opposed to augment it, a concept explicated by Cheng et al., 2022). One of the most popular games on the Meta Quest 3 encourages users to slowly dismantle the walls in their physical room by using a gun that replaces passthrough video data with renderings of an “outdoor” virtual scene beyond the walls, one blast at a time (Meta, 2023a; Figure 3).
Technological Features of Passthrough
Headsets that utilize passthrough do their best to replicate the sights from the real world, but of course, none are equal to actual human vision. In this section, we discuss various features one can use to contrast passthrough with actual vision and use two headsets as examples—the Meta Quest 3 and a monocular night vision system used by the military. In general, no single headset can maximize every parameter due to cost, weight, and optical physics, and we have chosen these two as a way for readers to understand the necessary trade-offs among the technological features in passthrough systems.
Many readers have heard the phrase “tunnel vision,” which means seeing only a small portion of the surrounding world. Field of view is the term used to quantify the observable world an individual can see when using a head-mounted display without head motion. As Figure 4 demonstrates, both headsets have narrower than natural vision. Reducing the field of view can produce negative psychological outcomes, for example, impeding spatial understanding of a scene (Masnadi et al., 2021) or decreasing how present people feel in an experience (see Cummings & Bailenson, 2016, for a review of early research). Seeing the world through a clipped window can be challenging.
Sometimes headsets intentionally sacrifice one dimension in order to maximize another (Warburton et al., 2023). Night vision goggles used by the military often reduce field of view in order to provide images to the eye as fast as possible. Latency is typically operationalized as the amount of time required for a digital display to update given a user’s head motion. When latency is high, the world seems a step behind. Under optimal conditions, the passthrough on the Quest 3 has a latency of about 12 ms, and the night vision goggles are a few milliseconds less than that. These values are just above what is noticeable to the human eye (Ellis et al., 2004; Ng et al., 2012).
Similarly, headsets fall short of vision on resolution. While, of course, real-world objects are not made of pixels, pixels can be used to measure physical distance (Jeong et al., 2021), and vision scientists use pixels per degree (PPD) to measure visual acuity. A person with 20/20 vision has a value of 60 at the fovea (Kalloniatis & Luu, 2007; Tan et al., 2018). The night vision goggles were designed to maximize resolution, with a value just under 50, while the limitations of the cameras on Meta Quest 3 force it to be much lower, about 18 PPD. Other headsets, such as the Apple Vision Pro, have chosen to increase PPD at the expense of other features such as field of view, given that many of the applications featured involve reading small text.
But one of the most critical issues with passthrough headsets is distortion. Anyone who has spent time in a museum’s hall of mirrors that make people appear taller, thinner, or curvier understands this concept. Passthrough distortion occurs for a number of reasons, including the curvature of the small screens inside the headsets, the algorithmic process of integrating multiple camera streams, and dynamic adjustments of lighting and focus. When distortion occurs, straight edges appear curved, and the distance between objects appears compressed or expanded. Because wearing passthrough technology involves seeing the world through a small number of cameras, there is often a discrepancy between the location of a user’s real eyes and the location of the camera display (Rolland et al., 1995). Of course, there are algorithms that minimize distortion (e.g., Chaurasia et al., 2020; Xiao et al., 2022). However, like all dimensions discussed in this section, there are trade-offs. In order to maintain a low latency, the distortion correction must be quick and efficient, and hence imperfect (Figure 5).
In summary, while the technology improves with every new headset and software update, passthrough falls far short of the human visual system—they are slower, grainier, and distorted, and cut off a large chunk of one’s field of view. In the following section, we discuss parallels between the rich history of perceptual studies using prism glasses and critical considerations related to passthrough.
Summary of Research That Directly Tests Passthrough
There have been many technical implementations of passthrough over the past 3 decades and a few dozen user studies, most of which tend to be exploratory in scope with small samples. The Appendix summarizes the bulk of these studies. Our search procedure consisted of completing keyword searches in research databases such as Google Scholar and the Association for Computing Machinery Digital Library. Search terms consisted of a combination of words associated with passthrough, including “passthrough,” “video see-through,” “VST,” “video passthrough,” “augmented reality,” and “mixed-reality.” As seen in the Appendix table, the studies largely focus on human factors configurations, perceptual judgments such as distance estimation, and judgments of realism of objects and self-avatars. In this section, we selectively focus on aspects of previous work that were salient based on our field notes from using passthrough.
Distortion
In our field notes, we experienced consistent video distortion, which was rare when standing perfectly still. But when rotating our heads or moving our bodies, stationary objects regularly appeared to move, sometimes stretching by about fifteen percent of their actual size. Walls appeared inflated or deflated. Objects placed very close to a user’s face—for example, a fork coming toward one’s mouth—became particularly oversized. When objects moved or people were passing by on a bicycle, they would sometimes disappear and seem to teleport from one location to the next. This was particularly taxing when participants performed activities that required concentration, such as drawing, due to warping of lines, edges, shapes, and sizes. We also often experienced distortion in colors and lighting. Sometimes colors seemed more muted, less vivid, and saturated, with a lower contrast, while at other times colors appeared more distinct and vibrant than usual. These dynamic changes in color often resulted from head movements. Abrupt changes in lighting sometimes carried over to distortion of objects—for example, flipping on and off a room light could cause a box to stretch its size.
Previous work has documented how these artifacts hinder the user’s ability to compute spatial information such as path trajectories, depth, speed, and accuracy in passthrough environments (Lee & Park, 2020). Park et al. (2008) compared users’ hand–eye coordination over a range of fifteen passthrough camera positions that varied in depth and height displacements (i.e., the cameras outside the headset were intentionally displaced). Subjects performed four different tasks, such as tracing lines on a touch screen and screwing wingnuts on an assembly board, while using video passthrough that varied in terms of the camera’s height and depth displacement from the user’s natural eye position. Results showed that when the cameras were higher than the natural eye position, task performance suffered. However, mismatches in depth had the opposite effect; by expanding the natural distance between the eyes by 35 mm and inducing an exaggerated stereo, participants were able to improve motor task performance. Distortion has also been shown to produce nausea, oculomotor discomfort, and disorientation (Moss et al., 2011). Moreover, this distortion from passthrough can impact the perception of one’s own body by inducing the feeling that certain body parts are mislocated (Lee et al., 2013).
The distortion from passthrough causes the user to adapt to the system (i.e., sensory rearrangement; Biocca & Rolland, 1998; Rolland et al., 1995). This adaptation also leads to aftereffects, which carry over into the real world. In an early study, Biocca and Rolland (1998) explored whether people adapted to visual displacement in passthrough and what the aftereffects were after removing their headsets. Participants completed multiple trials of a pointing accuracy task and a pegboard task, both with and without their assigned headset type, and repeated certain tasks before and after wearing their headsets. Results indicated that performing these hand–eye coordination tasks took 43% longer when wearing the passthrough headset compared to the control condition. Moreover, motor aftereffects emerged afterward, such that participants continued to overshoot finger positions in a pointing task after they had taken the headsets off.
Distance Estimation
In our field notes, people struggled to accurately judge distances, especially during spatial tasks such as catching a ball or placing pieces into a jigsaw puzzle. These effects were more pronounced when trying to understand the position of moving people, such as navigating through crowds. Eating was particularly difficult, given how food near one’s face appeared larger and closer in passthrough than ground truth in the world, with centimeters mattering when navigating a fork to the mouth. Similarly, a number of fingers struggled to hit buttons in elevators, another context in which small distances make a difference. A common adaptation strategy was to inspect objects at an unnaturally close distance or to use more force to touch objects than one would normally (i.e., pressing the elevator button harder than intended). Chaperones reported that people tended to move tentatively and slowly while walking.
These anecdotal findings resonate with past research, as one of the most robust psychological findings in the history of AR and virtual reality (VR) headsets is distance underestimation, meaning people perceive objects as closer than they actually are (Loomis & Philbeck, 2008). Possible causes include restricted field of view, weight on the head, imperfect depth cues, and rendering quality (see Creem-Regehr et al., 2023; Kelly, 2023, for a recent review).
Errors in distance judgment also occur with passthrough. In a virtual distance estimation task (Gagnon et al., 2020), underestimation increased with distance. Similarly, Vaziri et al. (2017) found that people tend to underestimate distances in passthrough compared to when they were viewing the physical world without any cameras. Pfeil et al. (2021) had participants either wear no headset, a passthrough headset, or a stripped-down headset that emulated the reduced field of view of a passthrough headset. Participants engaged in a blind-throwing task in which they first tossed bean bags at targets located 3, 4, and 5 m away, then repeated the task with their eyes closed. Results showed that participants underestimated distance in passthrough and were less accurate when the target distance increased.
Simulator Sickness
Our field notes showed that a majority of passthrough sessions caused simulator sickness symptoms, ranging from symptoms of eye strain to nausea, dizziness, and headache. In general, the 11 authors spend a lot of time each week in various MR headsets. For over half of us, who typically do not easily succumb to simulator sickness, to do so is quite notable, especially given that individual sessions were typically less than an hour.
One of the most accepted theories of simulator sickness in head-mounted displays is the sensory conflict theory (Reason & Brand, 1975), under which scholars argue that users may experience sensations of nausea, dizziness, stomach awareness, head fullness, and sweating as a result of mismatches between the visual system, vestibular system, and nonvestibular proprioceptors. There are several factors that contribute to simulator sickness related to the user, such as age and gender; related to the experience, such as the locomotion type and duration of the content; and related to the system, such as field of view, latency, and resolution (Saredakis et al., 2020). Simulator sickness changes with experience (for a review, see Adhanom et al., 2022); short exposure to the same VR application on two separate days can reduce simulator sickness by 35%–40% over time (Palmisano & Constable, 2022; Risi & Palmisano, 2019), with continued reduction over several subsequent exposures (Howarth & Hodder, 2008).
Scholars have attempted to understand and address simulator sickness in passthrough, such as through compensating latency through novel reprojection techniques (Freiwald et al., 2018), using a fisheye lens to expand peripheral view and allowing for an undistorted central field of view (Orlosky et al., 2014), and evaluating how people adapt to simulator sickness over time (Kim et al., 2014).
In sum, as seen in the Appendix, researchers have explored how various features of passthrough impact people’s perception and cognition, largely focusing on low-level, perceptual, and motor outcomes such as reaching, pointing, and throwing. In the next section, we review other areas of research that are related to passthrough but peripherally.
Summary of Related Research That Informs Passthrough
Social Presence
In the context of MR, scholars often study social presence, which was initially conceptualized as the experience of emotional and psychological closeness between people during mediated communication (Short et al., 1976), or more recently as the level of perceptual salience of other social actors as “real” (for a review, see Cummings & Wertz, 2023; Oh et al., 2018). Many scholars have studied how technological features of media technology have impacted social presence (Bailenson et al., 2001; Biocca, 1997; Biocca et al., 2003; Lee, 2004; Moser et al., 2020).
But presence in MR can trade off with feelings of social connection to people physically colocated with those wearing the headsets, which we describe as social absence. In 2019, Miller et al. (2019) published an article showing how using optical see-through via the Microsoft Hololens impacted social interaction. Dyads who had not previously met interacted face-to-face, with one of them wearing the Hololens, an optical see-through headset. We intended to study the “glasshole” effect (Due, 2015), in which people have negative reactions to others wearing headsets in public. But in that study, instead of negative backlash toward headset users, we found something unexpected—across multiple dependent variables, participants within each dyad who wore the AR headset during the social interaction reported feeling significantly less connected to their partners than participants who were not wearing the headset.
In our field notes, social absence was common—people in the real world simply felt less real. Especially for strangers, people appeared distant and blended into the background. Moreover, the limited field of view literally caused people around us to be invisible, which is disconcerting when out in public as we are used to seeing people when they are in our periphery. Being in public could sometimes feel more like watching TV than interacting face-to-face. It was often embarrassing to interact with strangers while wearing a headset.
Moreover, because a user’s eyes are not visible while wearing the Meta Quest 3 headset, others in the room have less reason to look at a user’s face. This lack of eye gaze can be especially disconcerting when a user is speaking. Some of us developed compensation strategies by placing a greater emphasis on verbal communication and nonverbal cues such as nodding (similar to findings in VR by Moustafa & Steed, 2018). Note that both Meta and Apple have proposed technological solutions by allowing the eyes to be visible through the headset (e.g., Matsuda et al., 2021).
Longitudinal Headset Research
The explicit goal of Meta and Apple is to have people using passthrough headsets for hours per session on a daily basis. In the studies listed in the Appendix, not a single participant wore a passthrough headset for even 1 hr during experimental sessions. The psychological impacts of wearing passthrough for months are unquestionably different than wearing it for minutes. While there is a lack of research focusing on passthrough longitudinally, some scholars have examined behavior over time in other implementations of MR. This work underscores that observations gathered from a single or few sessions are not representative of real media use. Once people adapt to new systems and are no longer uncomfortable with the novelty of the technology, scholars can gain a more accurate picture of how the system shapes people’s behaviors and attitudes (see Han et al., 2023, for a review).
Starting in the 1990s, Thad Starner, a professor at Georgia Tech, designed and wore an AR headset on a daily basis for over a decade. He used it to check the internet and to take notes during face-to-face conversations. Starner used AR as a research tool, but it also became a tool he relied on in his daily life (Stevens, 2013). Steve Mann, a professor at the University of Toronto, similarly augmented his vision with computing for decades and discovered some of the hazards of AR. He often received negative backlash and even was assaulted by people who tried to rip the headset off his head (Buchanan, 2013). These hazards were compounded by the physiological aftereffects that occurred when he was forced to remove the headset. For example, after one occasion where his device was forcibly removed at an airport, he fell down twice and ended up needing a wheelchair. Studying early adopters in MR allows scholars to gain insights into future use at scale by the general public (e.g., Foxman, 2018). And some early adopters are already becoming “superusers”; a 2023 survey of 5,600 U.S. teens revealed that 4% of headset-owning teens use VR every day (Sandler, 2023).
A handful of scholars have engaged in similar research strategies in VR by placing themselves in headsets over time, though due to the constraints on running experiments, the time range tends to be days, not years. Steinicke and Bruder (2014) conducted a 24-hr VR session with breaks, noting simulator sickness and presence did not diminish over time, with extensive movement contributing to sickness. In addition, Nordahl et al. (2019) exposed two participants to 12 hr of VR use, finding inconsistent simulator sickness patterns but a notable spike after 7 hr. However, in a review of the literature, Dużmańska et al. (2018) found that the persistence of symptoms after leaving VR varied from 10 min to 4 hr. Researchers also found that visual fatigue symptoms, objective pupil size, and relative accommodation responses varied over time when participants were exposed to VR for 8 hr (Guo et al., 2020).
Longitudinal studies have also explored the effects of VR use over time on depth perception, body offsets, and social interactions. Kohm, Babu, et al. (2022) conducted a study of VR use over the course of 12 weeks on depth perception and demonstrated adaptation. Results showed that the underestimation of distances diminished with increasing time spent in VR experiences. In another study by Kohm, Porter, and Robb (2022), participants became more effective at object manipulation using proprioception rather than just visual perception over 4 weeks. Additionally, Bailenson and Yee (2006) found that there were increased task performance abilities and decreased simulator sickness symptoms over 10 weeks. In a recent study, Han et al. (2023) demonstrated that over the course of 8-week VR use, group cohesion, presence, enjoyment, and realism increased significantly over time, and the data measured during the first week were not representative of the final pattern that emerged over time.
To our knowledge, only one study has explored passthrough in a longitudinal manner by allowing participants to sometimes toggle from full VR to passthrough. Biener et al. (2022) aimed to quantify the effects of working in a VR environment for 5 days, 8 hr every day, compared to working in a physical desktop condition. Subjects reported more simulator sickness, negative affect, and frustration when working in VR compared to a desktop computer. They also showed adaptation; participants adjusted to the shortcomings of the headset over time. The consequences of that adaptation remain to be studied—how does passthrough impact people’s perceptual and motor systems, both while wearing passthrough for hours per day and, just as importantly, after they take the headsets off?
Visual Adaptation and Aftereffects
Methodologically, wearing special glasses that alter vision for days at a time is not a new research technique. In 1897, George Stratton spent 87 hr wearing special glasses that literally turned his visual world upside-down. For 8 days, he was either sleeping, blindfolded to keep all light out from the real world, or wearing the inversion glasses (Stratton, 1897). Despite the initially disruptive effects of wearing the glasses, within a few days, his visual system adjusted to the distortion and developed a new perceptual “normal,” such that the world looked upright again. He also noticed that this adaptation process was quickly reversible, though removing the glasses resulted in a short period of readjustment.
Consider the more recent work by Fernández-Ruiz and Díaz (1999). Healthy subjects were instructed to throw clay balls at a small target while wearing prism glasses that induced varying degrees of optical shift. Results revealed that the process of adaptation (i.e., learning how to accurately throw the ball at the target while wearing glasses that made everything look shifted to one side) depended on the number of actual interactions between the visual and motor systems (i.e., the number of times throwing the balls).
But the brain must then readjust to adaptation. For example, looking at a very bright light often results in small spots appearing in the visual field. Temporary visual aftereffects have been documented in a variety of contexts (Anstis et al., 1998). Imagine wearing prism glasses that make everything look shifted to the side. At first, when trying to reach for something, you might miss it because your brain is used to your eyes and hands working together in a certain way for your entire life. However, if you keep wearing the glasses, your brain starts figuring out how to make your hands move just right so you can grab things accurately, even though everything looks a bit off. When you take off the glasses, the brain still wants to move your hands as if everything is shifted, resulting in reaching in the wrong direction for a short time.
Prior findings have shown that the degree of the motor aftereffect changes tends to be short and depends on the degree of optical shift induced by the glasses. The strength of the aftereffect is typically around 3 quarters of the amount of the original shift induced by prism glasses (Facchin et al., 2019). Moreover, neuroscientific evidence has demonstrated that such visuomotor changes can lead to functional reorganization in the brain, such that cells in the visual cortex that normally only respond to one side of the visual field start to respond to the other side of the visual field too, following left–right inverted prism adaptation (Miyauchi et al., 2004; Sugita, 1996). Overall, experimental work spanning decades has converged on the finding that prism adaptation unfolds at a significantly slower pace compared to the process of returning to the native state and that the magnitude of the sensorimotor mismatch dictates the dynamics of this timecourse (Efstathiou, 1969; Wähnert & Gerhards, 2022).
Investigations of the adaptive capabilities of the visual system have extended beyond the use of prism glasses. Haak et al. (2014) conducted a study in which subjects wore an altered reality system that removed almost all vertical visual input for 4 days, such that any vertically oriented information appeared in much lower contrast than horizontal input. Since weak visual input causes neurons to increase their sensitivity to the deprived orientation specifically, contrast adaptation was measured by two tasks in which subjects either had to match the contrast of two patterns or adjust the orientations of two patterns to achieve a desired third pattern. Results showed an unexpected decline in adaptation strength after the initial increase, challenging traditional notions that adaptation typically strengthens or is maintained over time (Haak et al., 2014). Other research has shown longer lasting aftereffects. For example, it is possible to balance out the strength of both eyes in patients with amblyopia (or “lazy eye”) through the daily use of an altered reality system, with significant improvement of vision persisting even months after the training intervention (Bao et al., 2018).
Similarly, a study by Bao and Engel (2012) investigated the visual aftereffects of long-term contrast adaptation. In this study, researchers examined the strength and duration of contrast adaptation using a head-mounted display system that eliminated most vertical visual information for 1, 4, or 8 hr. Depriving subjects of vertical information produced a positive tilt aftereffect, indicating that the component gratings of the test pattern appeared to be tilted toward vertical to the individual, and adaptation and aftereffects both increased with longer exposure. When vision was degraded with lower contrast than the natural environment, people became more sensitive to contrasts. These findings are particularly relevant given our field notes on degraded contrast in passthrough.
A study by Pesudovs and Brennan (1993) explored how people with myopia experienced a decrease in uncorrected vision after two 90-min sessions of wearing spectacles while focusing on objects at a specific distance. Their findings suggested a sensory adaptation to blur and a complex interplay between visual acuity and refractive error. The intricate dynamics of sensory adaptation and plasticity in the human visual system underscore the importance of understanding both short-term and extended adaptive processes.
Technology companies are not intending for passthrough to introduce visual distortions in the way prism glasses do. However, given all the discrepancies in technological optical features discussed above, there will likely be consequences from temporary adjustments in spatial awareness and hand–eye coordination. In the context of passthrough video in mixed reality, the adaptation and readaptation processes could similarly be dynamic in nature as users alternate between wearing and removing the headset and are eventually able to more easily switch between visual contexts (for a related study using a traditional prism glass approach, see Welch et al., 1993). Scholars need to understand how these perceptual adaptations while wearing passthrough, as well as the resulting aftereffects, will impact walking, talking, gesturing, driving, socializing, and just about every other behavior that involves seeing the world and moving through it.
Looking Forward
Passthrough video will be the norm for MR headsets over the next few years. Whether or not this technology becomes pervasive or just a flash in the pan, similar to three-dimensional television, remains to be seen. Passthrough will enable a number of useful MR experiences and will allow fully immersed virtual reality users to quickly check in with the real world without having to remove their headsets. Researchers have also made compelling arguments for clinical use. For example, passthrough should be more effective than corrective eyewear, given that the entire depth range of the real world is displayed on the same focal plane, especially for presbyopes who struggle to focus on nearby objects (IS&T Electronic Imaging Symposium, 2017).
On the other hand, we urge caution to the companies pioneering this industry and invite researchers to examine topics that will help guide development and use of the technology in safe and responsible ways (i.e., Slater et al., 2020). Previous research on prism glasses and long-term headset use suggests that there will be consequences to using passthrough as an everyday medium, and simply put, there has not been any direct research on this topic. Scholars should focus not only on how passthrough changes affect, cognition, and behavior during use, but also on their aftereffects.
For example, there are likely to be developmental issues. According to a 2021 survey, 17% of children between the ages of 8 and 18 own a VR headset (Reed & Joseff, 2022), and about one in 25 child users are donning headsets every day (Sandler, 2023). While scholars have previously examined developmental issues surrounding MR headset use in children (Bailey & Bailenson, 2017; Pimentel & Kalyanaraman, 2022), to our knowledge, there is no research on children’s use of passthrough. Meta has recently reduced their minimum age requirements to 10 years old, down from 13.
Apple is explicitly advertising that people can use their headsets “all day long,” and their safety guidelines will remain unknown until the official release of the headset. Meta has clear and useful health and safety guidelines, but they are designed for problems that might occur during use. In other words, they offer specific strategies to avoid collisions and manage simulator sickness but do not offer insights regarding long-term passthrough usage. Even the safety guidelines encourage extended use, urging users to start with 30-min sessions but then to “increase the amount of time using your Meta Quest gradually as you grow accustomed to the experience.” Although Meta acknowledges the distortions in color and space perception in passthrough mode (Meta, 2023b), it is unclear how a consumer should take action on this advice.
One constructive suggestion is to create guidelines for the amount of time people use passthrough each day and to create schedules that incorporate breaks, take context and location into account, and put other guardrails in place. Given that these strategies have failed epically with smartphones, we are not optimistic. If Apple and Meta create fantastic MR content that utilizes passthrough, people will most likely use it often.
A more modest suggestion is to provide thorough training and onboarding (see Chauvergne et al., 2023, for a recent review of existing MR protocols). Currently, users only undergo training with the Meta Quest 3 before using it for the first time. More detailed, repeated training would be helpful, especially if designed for people intending to use MR daily. For example, soldiers spend dozens of hours learning how to use night vision goggles before putting them to use in the field, not just one time when they first use the goggles (Fitzgerald, 1996). This training also needs to be refreshed each year and adjusted to desired tasks (e.g., simple movement and communication vs. large-scale coordinated operations). Similarly, neurologists and other clinicians who employ prism adaptation tasks as part of their therapeutic practice have specific protocols for how best to minimize motor aftereffects, such as ensuring that patients can see their entire movement from start to finish as they engage in tasks designed to manage visuomotor adaptation (Redding & Wallace, 1996).
We are confident that tech companies will continually improve many of the technical problems raised in this article, such as distortion, low field of view, and increased latency. For example, the Apple Vision Pro, expected to reach consumers in February 2024, has improved upon some of the problematic features we discussed in relation to the Meta Quest 3. But it is going to take time, and even then, no headset will be perfect. In the meantime, millions of people will walk around, cut off from light from the real world, instead seeing family, friends, cars, pets, and sharp knives through imperfect video.
While physical safety is undoubtedly critical, scholars must also focus on social absence, the phenomenon of MR passthrough users feeling socially disconnected from physically copresent others. Based on past research as well as our field notes, one should not assume that the social presence of other people beamed in via passthrough is equivalent to face-to-face interaction. Reduced social presence has potentially concerning consequences, such as invoking distrust or causing people to become “non-people,” to paraphrase the words of Goffman, who asserted that even in the unmediated real world, not all people are perceived as equally present (Goffman, 1959). Researchers should examine these issues carefully but also note that running longitudinal studies in VR requires a particular expertise and a thorough understanding of the technology’s perceptual nuances before experimentation.
This article raises many questions but does not offer many answers. We do provide field notes from our own experiences, but our observations are based on a small sample of passthrough headsets and participants, which likely does not generalize, namely, experts in VR research who possess a unique ability to find bugs and perceptual inconsistencies. Moreover, in this article, we focused on specific areas of previous research inspired by the field notes, but future work should conduct a formal systematic review of passthrough video research.
We believe that passthrough adoption for near-term MR use is very likely, but of course, there is always uncertainty in predicting the future. It may be difficult to imagine the world portrayed by the movie Ready Player One, where everyone emulates George Stratton, Thad Starner, and Steve Mann, wearing headsets all day long in their public and private lives. Few people can even fathom a norm in which face-to-face interaction becomes largely mediated by passthrough headsets. But the largest technology companies are telling us, very transparently, that they are building this world. We should listen to them.
Appendix: Summary of Prior Experimental Research on Passthrough
A thorough sampling of past studies that examine passthrough as a system (i.e., its fundamental properties) and how it can be manipulated (e.g., through applying filters or manipulating perception), ordered by research focus. Beyond system performance, these works include user studies and experiments that evaluate behavioral and cognitive responses. Our search procedure consisted of completing keyword searches in research databases such as Google Scholar and the ACM Digital Library. Search terms consisted of a combination of words associated with passthrough, including “passthrough,” “video see-through,” “VST,” “video passthrough,” “augmented reality,” and “mixed-reality.”
Reference
Research focus
Outcome(s)
Task
Main finding
Adams et al. (2022)
AR display type
Depth perception
Estimating distance of virtual target
Distance judgments were underestimated more when using passthrough than optical see-through and adding a virtual shadow increased accuracy.
Ahn et al. (2019)
AR display type
Size perception
Scale-matching task
Object size estimation was more accurate when using passthrough than using optical see-through or handheld, mobile AR displays.
Ballestin et al. (2018)
AR display type
Depth perception
Precision-reaching task
Depth estimation was more accurate and, eye strain was less intense when using an optical see-through than passthrough.
Debernardis et al. (2014)
AR display type
Text readability
Text identification task
Readability was quicker when using optical see-through than passthrough.
Freiwald et al. (2018)
AR display type
Latency
Move hand at different speeds and move object from one physical place to a virtual place
Simulator sickness was reduced by compensating for latency discrepancy and reducing registration error between virtual and physical world images.
Gattullo et al. (2015)
AR display type
Text readability
Counting target letters and rating visibility of text blocks
In high background illuminances, readability performance was better when using passthrough than optical see-through.
Juan and Calatrava (2011)
AR display type
Presence
Placing hand on table and viewing cockroaches and spiders walk over it
Presence was greater when using passthrough than optical see-through.
Marques et al. (2020)
AR display type
Assembly task performance
Assembly task using legos
Task completion time was quicker, and cybersickness was greater when using passthrough and controllers than a mobile device with touch gestures or movement as input.
Medeiros et al. (2016)
AR display type
Depth perception
Reaching task and depth drawing task
Depth perception was more accurate, task performance was quicker, and immersion was higher using passthrough than optical see-through.
Wilmott et al. (2022)
AR display type
Jitter perception
Report which interval contained the object that jittered after watching 2 virtual content
Jitter perceptibility increased as viewing distance increased and decreased as background luminance increased (i.e., more detectable at dim, compared to brighter background luminance).
Feuchtner and Müller (2017)
Body representation
Body ownership
Interacting with virtual and physical objects using an altered hand representation
Body ownership over a virtual arm stretched more than twice a real arm’s actual length was experienced in passthrough.
Rosa et al. (2019)
Body representation
Body ownership
Watching a virtual knife motion toward a virtual hand and watching the knife and virtual hand disappear
Body ownership and agency over a virtual hand was experienced in both the single (one virtual, one real hand visible) and supernumerary (one virtual and both real hands visible) conditions.
Rudolph et al. (2023)
Body representation
Body ownership
Interacting with virtual objects with a virtual arm prosthesis
Body ownership over a virtual bionic prosthesis that replaced an arm was experienced.
Gruen et al. (2020)
Latency measurement
Latency
Rapid response task similar to the Eriksen flanker task
Calculating a system’s visual latency through an inferred method (via reaction time with and without a headset) and a measured method (via accurate sensors) was comparable.
Ehrsson (2007)
Perspective
Perceptual illusion
Viewing the perspective of a camera sitting behind them and experiencing correlated visual and tactile information
The illusion of an out-of-body experience was induced through a combination of indirect visual information and correlated tactile and visual feedback on the body.
Kawasaki et al. (2010)
Perspective
Skill transmission
Drawing target motions while viewing their own or their partner’s first-person perspective
View sharing helped improve velocity following.
Nishida et al. (2019)
Perspective
Social behavior
Social task involving handshakes
Personal distance was greater, hands were raised higher, and childlike behavior was more frequent when wearing a device that altered eyesight to waist level.
Ueyama and Harada (2022)
Perspective
Task performance
Dart throwing
Task performance was poorer after practicing in passthrough from a first-person perspective and unaffected after practicing in passthrough from a third-person perspective.
Abbey et al. (2021)
Reality
Presence
Quickly selecting one of two boxes with touch based on visual stimuli
When there were no breaks in presence, presence scores were lower in VR than in passthrough with virtual elements.
Blissing et al. (2019)
Reality
Driving task performance
Driving tasks at low speed
Driving in passthrough was more difficult than driving in VR.
Cheng et al. (2022)
Reality
Attitudes, qualitative observations
Think aloud, block construction, and video tasks while applying different diminished reality filters to various scenarios and environments
Acceptability of diminished reality filters depends on the likelihood of physical interferences from the diminished elements, their interaction requirements and behaviors, and the level of social presence.
Maruhn et al. (2020)
Reality
Crossing acceptance, cross initiation time
Crossing a street after a first car passes and before a second car passes
Although there were lower acceptance rates and later crossing initiation when using passthrough than in the real world, results were similar enough to demonstrate the potential of AR for pedestrian research.
Pfeil et al. (2021)
Reality
Depth perception
Blind throwing task
Distance judgments were more underestimated when using passthrough than without a headset.
Gagnon et al. (2020)
Reality
Depth perception
Verbal distance estimation of a target at different locations
Shorter distances (25–200 m) were overestimated, and larger distances were underestimated (greater than 200 m) in passthrough.
Wolf et al. (2020)
Reality
Body ownership
Moving body in front of a virtual mirror
The influence of the system used (passthrough vs. VR) on body weight perception, presence, and embodiment was small.
Fischer et al. (2006)
Stylization
Object discernibility
Pressing a key in response to stimuli
Discerning differences between physical and virtual objects was more difficult when using stylized-passthrough than nonstylized passthrough.
Koshi et al. (2019)
Stylization
Task performance
Determining if math expression on left monitor is equal to the value on the right monitor
Visual noise reduction helped reduce the amount of time to complete a math task.
Steptoe et al., 2014
Stylization
Object discernibility
Object discernibility task and ambulatory behavior task
Stylized AR was associated with chance-level discernibility judgments between physical and virtual objects, conventional AR was associated with more correct judgments, and virtualized AR (extreme stylization) was associated with more incorrect judgments.
Vaziri et al. (2017)
Stylization
Depth perception
Blind walking to make distance estimates
Degrading visual realism did not significantly decrease the accuracy of distance perception.
Vaziri et al. (2021)
Stylization
Depth perception
Blind walking to a target
Severely degrading the detail of a scene did not significantly decrease the accuracy of distance perception.
Knierim et al. (2020)
Temporal resolution
Height estimation
Estimating the jump height of an experimenter
Temporally altering people’s view (slow motion) in passthrough did not affect height estimation.
Kytö et al. (2014)
Visual cues
Depth perception
Aligning position of physical pointer with position of an augmented object
Binocular disparity and relative size cues improved the accuracy of depth judgments.
Lu et al. (2012)
Visual cues
Visual search performance
Searching for a target in an outdoor scene
Contrast works as a subtle cue in passthrough.
Lu et al. (2014)
Visual cues
Visual search performance
Searching for a target in video background
Decreased feature congestion and increased cue size improved visual search.
Biocca and Rolland (1998)
Visual displacement
Hand–eye coordination
Pointing accuracy task and pegboard task
Visual displacement initially impacted hand–eye coordination. Over time, perceptual adaptation occurred.
Lee et al. (2013)
Visual displacement
Task performance
Foot placement and finger touch task
Perceptual adaptation occurred across multiple visual displacements, and task performance did not significantly differ.
Park et al. (2008)
Visual displacement
Hand–eye coordination
Tracing lines on a touch screen, placing a stylus over a dot on a touch screen, tracing the edge of a metal sheet, screwing wingnuts on an assembly board, and tracing a predefined path on a skullvmodel
Height displacement impacted hand–eye coordination, and task performance was less accurate when using a headset than when not using a headset.
Note. AR = augmented reality; VR = virtual reality.
@snake game, I agree that passthrough is becoming an essential part of the mixed reality landscape, especially with companies like Apple and Meta pushing the technology forward. The potential for combining real-world video with virtual content opens up new possibilities but also brings significant concerns, as mentioned in the article.