We're learning more about why Zoom meetings — and videoconferences on similar platforms — can leave us drained.
For decades, scholars have predicted that videoconference technology will disrupt the practice of commuting daily to and from work and will change the way people socialize. In 2020, the Covid-19 pandemic forced a drastic increase in the number of videoconference meetings, and Zoom became the leading software package because it was free, robust, and easy to use. While the software has been an essential tool for productivity, learning, and social interaction, something about being on videoconference all day seems particularly exhausting, and the term “Zoom Fatigue” caught on quickly. In this article, I focus on nonverbal overload as a potential cause for fatigue, and provide four arguments outlining how various aspects of the current Zoom interface likely lead to psychological consequences. The arguments are based on academic theory and research, but also have yet to be directly tested in the context of Zoom, and require future experimentation to confirm. Instead of indicting the medium, my goal is to point out these design flaws to isolate research areas for social scientists and to suggest design improvements for technologists.
Keywords: videoconferencing, nonverbal behavior, mutual gaze, interpersonal distance, computer-mediated communication
Conflict of interest: I have no conflicts of interest to report.
Acknowledgements: Thank you to Jeff Hancock who provided feedback throughout the writing of this piece, and also to Tobin Asher, Norah Dunbar, Eugy Han, Geraldine Fauville, Robert Lynch, Marijn Mado, Anna Queiroz, and Janine Zacharia for feedback. This research was supported by two National Science Foundation grants (IIS-1800922 and CMMI-1840131).
Interactive Content: Juxtaposed frame comparison — eye gaze at close distance, conference room with nine face-to-face speakers (figure 2); BBC Podcast Interview with Jeremy Bailenson, Meeting in the virtual world.
Correspondence concerning this article should be addressed to Jeremy N. Bailenson, Department of Communication, Stanford University, Stanford, United States. Email: email@example.com
In 2020, the Covid-19 pandemic forced a drastic increase in the number of videoconference meetings. Videoconferencing was a critical tool that allowed schools and many businesses to continue working during shelter-in-place. Zoom in particular helped hundreds of millions of people by making videoconferencing free and easy to use. Moreover, if the practice of taking meetings virtually endures postpandemic, fossil fuel consumption should decrease due to a reduction in physical commuting. For example, one study demonstrated that videoconferencing uses less than 10% of the energy required for an in-person meeting (Ong et al., 2014), and a recent review demonstrates that a majority of studies have demonstrated that telecommuting saves energy (O’Brien & Yazdani Aliabadi, 2020).
On the other hand, something about being on videoconference all day seems particularly exhausting, and the term “Zoom Fatigue” caught on quickly, with major news outlets covering the construct. In April 2020, I published an opinion piece (Bailenson, 2020) outlining nonverbal overload as a possible explanation for Zoom Fatigue in both work and social life. Of course, in a newspaper article one does not have the space or the editorial permission to explicate arguments with academic sourcing, so I am writing this piece to expand on it and provide evidence.
While there are dozens of empirical studies in psychology, human–computer interaction, and communication that examine behavior during videoconferencing, there has yet to be rigorous studies that examine the psychological consequences of spending hours per day on this particular medium. Hence this piece outlines a theoretical explanation—one based on previous work—as to why the current implementation of videoconferencing is so exhausting. As opposed to discussing videoconferencing generally, I focus on Zoom in particular. I don’t do this to vilify the company—I am a frequent Zoom user, and I am thankful for the product which has helped my research group stay productive and allowed friends and family to stay connected. But given it has become the default platform for many in academia, and readers of this article are likely familiar with its affordances, it makes sense to focus on Zoom, which jumped from about 10 million users in December, 2019, to more than 300 million users 5 months later (Iqbal, 2020). Also, the ubiquity of the software has resulted in genericization, with many using the word “Zoom” as a verb to replace videoconferencing, similar to “Googling.” Hence, I feel warranted in writing about “Zoom Fatigue” as the brand name is getting traction as the semantic label for the product category.
I focus on four possible explanations for Zoom Fatigue: Excessive amounts of close-up eye gaze, cognitive load, increased self-evaluation from staring at video of oneself, and constraints on physical mobility. All are based on academic research, but readers should consider these claims to be arguments, not yet scientific findings. I point out these design flaws in Zoom with the goal of improving its interface, as opposed to indicting the medium. Moreover, it is my hope that this article motivates scholars to engage in research on the topics, and I fully believe that the data collected will show more nuance than I provide in these arguments.
For those who teach about nonverbal behavior, the elevator is always a great example to discuss theories and findings. In an elevator, people are forced to violate a nonverbal norm—they must stand very close to strangers. This exceeds the typical amounts of intimacy people tend to display with strangers and causes discomfort. As a result, people in an elevator tend to look away from the faces of others by looking down or otherwise averting their gaze in order to minimize eye contact with others. People decrease one cue to compensate for a context-driven increase in another.
Early research in nonverbal behavior documented this trade-off between eye gaze and interpersonal distance (Argyle & Dean, 1965), and my own work has replicated these findings with virtual faces, in that people will give more interpersonal distance when approaching virtual humans who are maintaining virtual gaze compared to ones who do not (Bailenson et al., 2001).
On Zoom, behavior ordinarily reserved for close relationships—such as long stretches of direct eye gaze and faces seen close up—has suddenly become the way we interact with casual acquaintances, coworkers, and even strangers. There are two separate components to unpack here—the size of faces on the screen, and the amount of time the viewer is seeing the front-on view of another person’s face which simulates eye contact. I discuss each of these independently.
The size of faces on a screen will of course depend on the size of the computer monitor, how far one sits away from the monitor, the view configuration one chooses on Zoom, and how many faces are in the grid. Let’s start with a one-on-one conversation. Here is a quick experiment one can run at home if possible given Covid-19. Set up a Zoom call using a typical laptop configuration, with the laptop on a desk and each person sitting on a chair in front of the computer. In my setup, in the “speaker” view configuration, that is, when my face is smaller and on top of the large image of the other user, the length from chin to the top of the head of the other person on the screen was about 13 cm. Then, meet the same person face-to-face, and move back and forth in order to get the person’s head to the same length (it is important to keep the distance between your eyes and the ruler the same in both measurements). In my test, I needed to be about 50 cm away when standing face-to-face. In Hall’s foundational work on personal space (1966), anything below about 60 cm is classified as “intimate,” the type of interpersonal distance patterns reserved for families and loved ones. Think about that—in one-on-one meetings conducted over Zoom, coworkers and friends are maintaining an interpersonal distance reserved for loved ones.
I have done similar calculations with group interactions, and while these measurements remain informal and an area I hope to study more rigorously, this pattern does not appear to change as the group size gets larger. On Zoom grids, faces are bigger in one’s field of view than they are face-to-face when one accounts for how groups naturally space in physical conference rooms.
In the elevator, when faces are larger—that is, when people are closer, riders can solve this by looking down. Everyone reduces the amount of mutual gaze to a minimum. On Zoom, the opposite happens. Consider a Zoom meeting with nine people in a three-by-three grid. As Figure 1 demonstrates, in a typical group Zoom meeting, regardless of who is speaking, each person is looking directly at the eyes of the other eight people for the duration of the meeting (assuming one is looking at the screen).
Anyone who speaks for a living understands the intensity of being stared at for hours at a time. Even when speakers see virtual faces instead of real ones, research has shown that being stared at while speaking causes physiological arousal (Takac et al., 2019). But Zoom’s interface design constantly beams faces to everyone, regardless of who is speaking. From a perceptual standpoint, Zoom effectively transforms listeners into speakers and smothers everyone with eye gaze.
Compare this to a real conference room with nine face-to-face speakers, where each person speaks for roughly an equal amount of time. It is quite rare for one listener to stare at another listener, and even rarer for this nonspeaker directed gaze to last for the duration of a meeting. So, assuming all listeners are always looking at the speaker in the conference room, the amount of eye gaze on Zoom is eight times higher. But it turns out the multiplier effect is larger, because face-to-face, listeners don’t stare at speakers nonstop. Instead, direct eye contact is used sparingly (see Kleinke, 1986, for a review that is early but still remains a useful resource). Even in one-on-one meetings that do not feature a third object to look at, for example, a chalkboard or a projection screen, two conversationalists will spend substantial portions of the interaction averting the gaze of one another (Andrist et al., 2013). If the one-on-one meeting features a third object, then people look at the other person’s face for less than half of the time (Hanna & Brennan, 2007). And the amount of gaze in a face-to-face social interaction depends on a myriad of contextual features, for example, the structural features of the room as well as power dynamics among people (Dunbar & Burgoon, 2005). Figure 2 is a photograph taken from a recent meeting of the Stanford Board of Trustees.
*Interactive figure: Drag the slider to juxtapose the photo.
Notice that a majority of the people in the room are not looking at the speaker, and other than the two sidebar conversations, people who are close to one another do not look at one another’s eyes.
But with Zoom, all people get the front-on views of all other people nonstop. This is similar to being in a crowded subway car while being forced to stare at the person you are standing very close to, instead of looking down or at your phone. On top of this, it is as if everyone in the subway car rotated their bodies such that their faces were oriented toward your eyes. And then, instead of being scattered around your peripheral vision, somehow all those people somehow were crowded into your fovea where stimuli are particularly arousing (Reeves et al., 1999). For many Zoom users, this happens for hours consecutively.
In face-to-face interaction, nonverbal communication flows naturally, to the point where we are rarely consciously attending to our own gestures and other nonverbal cues. One of the remarkable aspects of early work on nonverbal synchrony (i.e., Kendon, 1970) is how nonverbal behavior is simultaneously effortless and incredibly complex. On Zoom, nonverbal behavior remains complex, but users need to work harder to send and receive signals.
For example, consider the work by Hinds (1999). She compared videoconferencing to audio-only interaction, while dyads performed the main task—a guessing game—and a secondary recognition task which is a common way to measure cognitive load. Participants in the video condition made more mistakes on the secondary task than in the audio condition. In explaining the reason for the increased load from video, Hinds argues that dedicating cognitive resources to managing the various technological aspects of a videoconference is a likely cause, for example, image and audio latency.
On Zoom, one source of load relates to sending extra cues. Users are forced to consciously monitor nonverbal behavior and to send cues to others that are intentionally generated. Examples include centering oneself in the camera’s field of view, nodding in an exaggerated way for a few extra seconds to signal agreement, or looking directly into the camera (as opposed to the faces on the screen) to try and make direct eye contact when speaking. This constant monitoring of behavior adds up. Even the way we vocalize on video takes effort. Croes et al. (2019) compared face-to-face interaction to videoconferences and demonstrated people speak 15% louder when interacting on video. Consider the effects of raising one’s voice substantially for an entire workday. It is important to acknowledge that Zoom does allow people in some ways to reduce the amount of monitoring; for example, people need not worry about leg movements given they are not on camera.
Another source of load relates to receiving cues. In a face-to-face conversation, people draw great meaning from head and eye movements, which help to signal turn-taking, agreement, and a host of affective cues (Kleinke, 1986). What happens when these cues are present and perceived by other conversationalists but are not tied to the intention of the person making the gesture? In 2005, my colleagues and I built and tested an avatar communication system in which three people—one presenter and two listeners—were networked into a shared conference room while donning virtual reality headsets (Bailenson et al., 2005). One of the conditions we tested was an “augmented gaze” condition, which redirected the head movements of the speaker in each of the two listeners’ network feeds. Instead of getting the natural head movements of the speakers who would typically scan the room, look down at their notes, and make eye contact when appropriate, both of the listeners perceived direct and unwavering eye gaze from the speaker for 8 min straight. In many ways, this condition simulates Zoom: Gaze is perceptually realistic, but not socially realistic. In our study, users rated the augmented gaze condition with the lowest levels of social presence. For example, participants did not feel “in tune” with the speakers and did not feel the interaction was smooth.
Zoom users face this disconnect often. For example, in a face-to-face meeting, a quick, sidelong glance where one person darts their eyes to another has a social meaning, and a third person watching this exchange likely encodes this meaning. In Zoom, a user might see a pattern in which on their grid it seems like one person glanced at another. However, that is not what actually happened, since people often don’t have the same grids. Even if the grids were kept constant, it is far more likely the glancing person just got a calendar reminder on their screen or a chat message. Users are constantly receiving nonverbal cues that would have a specific meaning in a face-to-face context but have different meanings on Zoom. While of course people do adapt to media over time (Walther, 2002), it is often difficult to overcome automatic reactions to nonverbal cues.
Moreover, in Zoom, receivers are provided fewer cues than they typically get in face-to-face conversations. Most people focus their cameras toward their heads; indeed one of the more celebrated aspects of Zoom meetings is freedom from worrying about how one is dressed below the waist. But as a result, the influences of facial expressions, eye gaze, and size of the heads within a screen are likely magnified on Zoom, compared to face-to-face meetings in which also provide cues about body size and height, leg movements, posture, and other cues. In general, when there are fewer communication cues presented, those particular cues have a larger impact than when there are many cues available (Walther, 1996, or see Walther et al., 2015, for a review). But it is important to note that most of the work on the number of cues in computer-mediated communication has examined linguistic cues, not the video (though see Nowak et al., 2005, for a notable exception). Future work should examine how the number of cues impacts person perception during the real-time video.
Finally, it is important to point out that despite the arguments raised above regarding Zoom contributing to cognitive load, those who attend conference calls frequently realize that audio-only conversations suffer as groups become larger. Inferring the attention of others is nearly impossible once there are more than a handful of people on a conference call, and conversational moves such as turn-taking become difficult to manage. Very few psychology studies on mediated interaction examine groups larger than two or three people, and future work should examine the psychological costs and benefits of video compared to audio in larger groups.
Imagine in the physical workplace, for the entirety of an 8-hr workday, an assistant followed you around with a handheld mirror, and for every single task you did and every conversation you had, they made sure you could see your own face in that mirror. This sounds ridiculous, but in essence this is what happens on Zoom calls. Even though one can change the settings to “hide self view,” the default is that we see our own real-time camera feed, and we stare at ourselves throughout hours of meetings per day. Of all the strange design decisions from Zoom, this one looms large, even if previous platforms had similar features. Zoom users are seeing reflections of themselves at a frequency and duration that hasn’t been seen before in the history of media and likely the history of people (with exceptions for people who work in dance studios and other places that are full of mirrors).
The effect of seeing oneself in a mirror has been studied for decades, starting with the pioneering work of Duval and Wicklund (1972) demonstrating that people are more likely to evaluate themselves when seeing a mirror image (see Gonzales & Hancock, 2011, for a review). While this can lead to more prosocial behavior, the self-evaluation can be stressful. A meta-analysis conducted by Fejfar and Hoyle (2000) reports a small effect size when evaluating the studies that link mirror image viewing to negative affect. While these studies showing distress outcomes utilized analog mirrors, a handful of studies have specifically examined the effect of seeing oneself via real-time video feed as well.
For example, a study by Ingram et al. (1988) shows interaction effects, where seeing a video of oneself has a larger impact on women than men across three experiments. Study 2 from that article demonstrated that women are more likely than men to direct attention internally in response to seeing themselves via live video. Study 3 demonstrates the consequences of that self-focus. Men and women both experienced a negative affect event, specifically taking a test and getting the feedback they performed poorly. Then, they were taken to another room where they either saw a real-time video of themselves or not. Women who saw videos of themselves responded with greater levels of self-focused attention and negative affect compared to the other three conditions. The authors argue that the tendency to self-focus might prime women to experience depression.
These studies typically are short and show participants a mirror image for less than an hour. There is no data on the effects of viewing oneself for many hours per day. Given past work, it is likely that a constant “mirror” on Zoom causes self-evaluation and negative affect. But how this changes longitudinally is an important question moving forward.
Cameras have a field of view, an area they can see. Close up to the camera, the field of view is small, while farther away from the camera the area is larger. This conical shape where the camera sees is called a frustrum. On a Zoom call, people need to stay within the frustrum in order to be seen by others. Moreover, because many Zoom calls are done via computer, people tend to stay close enough to reach the keyboard, which typically means their faces are between a half-meter and a meter away from the camera (assuming the camera is embedded in the laptop or on top of the monitor). Even in situations where one is not tied to the keyboard, the cultural norms are to stay centered within the camera’s view frustrum and to keep one’s face large enough for others to see. In essence users are stuck in a very small physical cone, and most of the time this equates to sitting down and staring straight ahead.
During face-to-face meetings people move. They pace, stand up, and stretch, doodle on a notepad, get up to use a chalkboard, even walk over to the water cooler to refill their glass. There are a number of studies showing that locomotion and other movements cause better performance in meetings. For example, people who are walking, even when it is indoors, come up with more creative ideas than people who are sitting (Oppezzo & Schwartz, 2014). Dozens of studies by Goldin-Meadow (2003 book for a review). Much of that work shows a causal relationship—for example, children who are required to gesture with their hands while learning math showed more learning retention compared to a control group (Cook et al., 2008). While Zoom doesn’t technically prevent one from using gestures during the speech, being forced to sit in view of the camera certainly tampers down movement.
There is a wonderful illusion that occurs during phone calls. When I call someone, I have a vision that they are dedicating 100% of their attention to my voice. But meanwhile, throughout a 30-min phone call, I myself will do all sorts of activities, for example, stretch my lower back, cook pasta for my kids, even have a nonverbal conversation with my wife. But I still maintain the image of the other person as a tunnel-visioned listener. The videoconference destroys this illusion, as we actually see what the other person is doing while we converse. Those familiar with the writings of Wallace (1996) will recognize this argument—he predicts the loss of this illusion will be the reason that videoconferences eventually fall out of favor. People like to do minor physical activities while they talk, and it doesn’t interfere with talking and listening.
While we should not blame Zoom for making a great videoconferencing product that works robustly, we should evaluate why we are choosing a video for so many calls that previously would never have warranted a face-to-face meeting, or perhaps any synchronous meeting at all. Phone calls have driven productivity and social connection for many decades, and only a minority of calls require staring at another person’s face to successfully communicate.
It is important to reiterate what an amazing tool Zoom has been. Families, friends, students, teachers, and employees have benefited immensely from such a robust and available communication tool during Covid-19. This article frames a number of issues with the current interface design behind Zoom which is likely causing psychological consequences and fatigue. Most of the arguments in this article are hypothetical. While they are based on previous research findings, almost none of them have been directly tested. It is my hope that others will see many research opportunities here, and will run studies that test these ideas.
Astute readers likely have already figured out that many of these problems could be solved with trivial changes to the design of the Zoom interface. For example, the default setting should be hiding the self-window instead of showing it, or at least hiding it automatically after a few seconds once users know they are framed properly. Likewise, there can simply be a limit to how large Zoom displays any given head; this problem is simple technologically given they have already figured out how to detect the outline of the head with the virtual background feature. Outside of software, people can also solve the problems outlined above with changes in hardware and culture. Use an external webcam and external keyboard that allows more flexibility and control over various seating arrangements. Make “audio only” Zoom meetings the default, or better yet, insist on taking some calls via telephone to free your body from the frustrum.
The meteoric rise of Zoom has been fascinating to watch as a media psychologist. In less than a year, many people have seamlessly integrated Zoom into their work and social lives, and affordances such as screen sharing have become critical tools. In this article, I have emphasized the differences between Zoom meetings and face-to-face ones. But if one were to count the similarities between the two, they would far outnumber the differences. Indeed, the success of this medium, like many technologies, revolves around its ability to seamlessly mimic face-to-face conversations (Reeves & Nass, 1996). Moreover, regardless of the medium, it is important to acknowledge that meetings in general can be fairly tiring, as can commuting from one location to another, which Zoom eliminates. Perhaps a driver of Zoom fatigue is simply that we are taking more meetings than we would be doing face-to-face.
For decades, scholars have predicted that communication technology will disrupt the practice of commuting to and from work ten times per week. Even when face-to-face meetings will become safe again, it is likely the culture has finally shifted enough to remove some of the previously held stigmas against virtual meetings. With slight changes to the interface, Zoom has the potential to continue to drive productivity and reduce carbon emissions by replacing the commute. Videoconferencing is here to stay, and as media psychologists it is our job to study this medium to help technologists build better interfaces and users to develop better use practices.
Copyright © the Author(s) 2021
Received December 1, 2020
Revision received January 5, 2020
Accepted January 7, 2020