Volume 2, Issue 3. DOI: https://doi.org/10.1037/tmb0000049
Human–robot collaborations that operate in shared spaces are anticipated to become increasingly common in coming years. Decades of social psychological research have revealed that human observers positively influence people’s performance in dominant and negatively in nondominant tasks. While studies indicate moderate support for social facilitation/inhibition effects with robot observers, this evidence is hotly debated. Addressing known methodological criticism, this study investigates how a copresent robot-observer affects Stroop task performance and whether perceptions of that robot’s mental capacities have explanatory value. Results reveal limitations in transferring social facilitation/inhibition theory to robots. Since participants reported high task attention levels across conditions, emerging flow states may have helped them circumvent social facilitation/inhibition mechanisms. It may thus be recommended for future research to consider flow dynamics when investigating social performance effects.
Keywords: mere presence, evaluation apprehension, attention, arousal, theory of mind
Acknowledgment: The authors gratefully acknowledge Joshua Cloudy, Saydie French, Shane Garcia, Kailey Itri, and Miranda Land for their assistance in data collection, and Kristina McCravey for technical and logistical support. This material is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-19-1-0006.
Conflicts of Interest: The authors have no conflicts of interest to disclose.
Data Availability: All study materials, data, and preregistered hypotheses are available in the online materials for this project: https://osf.io/9xry5.
Open Science Disclosures
The data are available at https://osf.io/9xry5.
The experiment materials are available at https://osf.io/9xry5
The preregistered design is available at https://osf.io/9xry5
Correspondence concerning this article should be addressed to Jaime Banks, College of Media and Communication, Texas Tech University, P. O. Box 43082, Lubbock, TX 79409, United States. Email: [email protected]
Social robots are increasingly integrated into industrial production and social professions, engaging in social processes and performing behaviors aligned with the norms of their social environment (Duffy, 2003). Futurists and economists expect human–robot collaborations to continue growing across myriad fields (cf., Demir et al., 2019), such that people increasingly face various communicative challenges that may arise from sharing a workplace with social robots. For instance, humans may need to establish common ground (Kiesler, 2005), coordinate their activities (Gervits et al., 2020), and develop trust with their robot coworkers (de Visser et al., 2020). Human–robot collaborations may further pose significant psychological challenges that are not (primarily) grounded in how humans and robots may overtly create shared meaning. They instead emerge from social robots triggering psychological mechanisms that may influence critical cognitive processes in ways that hinder or improve performance.
The present study examines one such psychological mechanism: Humans’ tendency to perform better or worse depending on the copresence of an observer. This study investigates (a) whether a social robot’s copresence influences performance in a cognitive-processing task and (b) whether perceptions of that robot’s mental capacities explain any such effect.
In one of the first social psychological experiments, Triplett (1898) documented that children performed better in a game of spinning fishing reels when paired with another child than when playing alone. This so-called social facilitation effect (SFE) has received considerable attention and is widely accepted as a well-founded phenomenon within and beyond the scope of social psychology (Steinmetz & Pfattheicher, 2017). However, there is disagreement as to why the SFE unfolds (cf., Aiello & Douthitt, 2001). The four most prominent theorized mechanisms focus on mere presence, evaluation apprehension, effort, and attentional conflicts.
Zajonc (1965) formulated a generalized drive hypothesis suggesting that the mere copresence of others enhances people’s arousal levels. As people become more stimulated, they tend to perform better in tasks that rely on autonomous behaviors (i.e., dominant tasks) and worse when their automized responses are ineffective (i.e., nondominant tasks). This is due to an evolved alertness reaction which leads them to show dominant responses more easily than nondominant responses. Accordingly, SFE occurs in situations where the dominant response is appropriate and results in improved performance. In other situations, where the dominant response is not appropriate, the same mechanism causes the opposite effect as people’s performance deteriorates in others’ copresence (i.e., social inhibition effects; SIE). Mere presence effects have been subject to numerous studies, many of which suggest that the mechanism relies on people’s innate aversion against unpredictable others (Guerin, 1986) that can be triggered even when the copresent other cannot evaluate them (Schmitt et al., 1986).
Evaluation apprehension theory (Cottrell, 1972) refines the generalized drive hypothesis, positing that the mere presence of other people is neither sufficient nor necessary for eliciting SFE/SIE (Cohen, 1979). Instead, people must perceive observers as evaluators (Cohen, 1980). While Zajonc’s explanation was based on an innate drive, evaluation apprehension theory argues for a learned response. The learned response is driven by people’s anticipation of an (un)favorable evaluation by others which then induces higher arousal (Sanna & Shotland, 1990). Despite said differences regarding the nature of people’s drive, both the mere presence and evaluation apprehension rely upon a similar psychological mechanism: Heightened arousal leads to quicker dominant responses.
Ferris et al. (1978) proposed a nondrive explanation (i.e., an explanation that is not based on situationally enhanced arousal) for SFE/SIE in suggesting that effort is driven by anticipated evaluations. According to their cognitive-motivational approach, people make rational decisions about exerting effort in a task. This decision takes into account the presence of an evaluative audience (cf., Harkins, 1987; Paulus, 1983). While increased effort leads to better performance in dominant tasks, it results in poorer performance for nondominant tasks as people tend to process more task-irrelevant information, for instance, about anticipated negative consequences (Griffith et al., 1989). People’s perceptions of being evaluated may therefore operate via drive and nondrive mechanisms.
Another explanation is offered via distraction-conflict theory (Baron, 1986). A copresent audience is task-irrelevant information that can distract from the primary task, creating attentional conflict. This attentional conflict either facilitates or impairs performance through one of two routes. Attentional conflict may elevate arousal such that people are more inclined to show dominant responses (analogous to drive-based approaches; Baron et al., 1978). Alternately, attentional conflicts may lead people to prioritize essential task aspects and demote attention to noncritical information (Cohen, 1978). This information processing mechanism may improve people’s performance in dominant tasks as extraneous information is suppressed. For nondominant tasks, however, essential aspects are too vast to be subject to beneficial prioritization, so focusing on one aspect leaves too few resources left for another, resulting in SIE. Importantly, the drive and nondrive routes are not mutually exclusive and may even interact (Sanders, 1981). More recent approaches further emphasized potential moderators (e.g., the observer’s standing as inferior or superior; Muller et al., 2004) and necessary preconditions (e.g., adequate preparation time to be able to activate attentional control; Sharma et al., 2010) for attention-conflict effects to arise.
All of these theoretical approaches have garnered empirical support. An early meta-analysis by Bond and Titus (1983) found small effects in favor of mere presence and against evaluation apprehension. A critical experiment by Feinberg and Aiello (2006) came to a different conclusion: Physical copresence does not trigger SFE, but evaluation apprehension and attentional conflict do so, particularly when they co-occur. Others, such as Schmitt et al. (1986), assumed that rival mechanisms may be “perfectly harmonious” (p. 246). A more recent meta-analysis indicates that differential reactions may be a function of personality dispositions (Uziel, 2007). Therefore, it is plausible that each theoretical approach may be relevant for different people. A decision as to which mechanisms may have priority over the others is still pending and has not yet been explored within more specific applications, such as human–robot interactions.
To meet the communicative challenges that surface once humans and social robots are collaborating, technological advances have fostered conditions (e.g., rich social nonverbals, natural movements suggesting animacy, naturalistic speech) by which artificial agents may be perceived as a socially relevant audience. Several studies reported changes in people’s performance once a robot was copresent; those changes can be considered largely consistent with theory. Riether et al. (2012) showed across varied domains (i.e., logical, arithmetic, motor) that the difference between performance in dominant and nondominant tasks was greater with an observant robot compared to completing the tasks alone. In fact, said differences were comparable with those of a human observer. Within a board game scenario, Cruz-Maya et al. (2015) found SFE/SIE when a social robot was present during the completion of both easy and difficult levels, respectively. However, other studies could only demonstrate SFE/SIE under certain circumstances—when the robot had a cartoonish appearance (Wechsung et al., 2014), when it was introduced as unlikeable (Spatola et al., 2018), or when it demonstrated social capacities (Spatola et al., 2019, 2020)—indicating boundary conditions.
Although these findings suggest that SFE/SIE may manifest for robot observers, Irfan et al. (2018) advocated against such a conceptual transfer. They argued that (a) people’s participation in a study in which researchers (copresent or not) intend to evaluate them defies a clear attribution of any effects to the observing agent and (b) previous studies failed to replicate SFE/SIE with digital agents (e.g., Hertz & Wiese, 2017), alongside (c) general inconsistency in such effects (Sterna et al., 2019). Beyond that general conceptual challenge, we also note that the supportive studies have been subject to methodological limitations. In the studies by Riether et al. (2012) and Spatola et al. (2018, 2019, 2020), stimulus robots were physically positioned to some degree facing participants such that the robot would not be able to directly observe the participant’s performance—at least not in a way that would be evident to participants. So, although the designs were well suited for testing the mere-presence mechanism, it precludes identifying any effects that could emerge as a function of evaluation apprehension. In both Cruz-Maya et al. (2015) and Wechsung et al. (2014), the stimulus robot took a much more active evaluative role that included direct evaluations during task completion. Rather than limiting which psychological mechanisms might be in action, Sterna et al. (2019) argued that such direct feedback alters the phenomenon of interest. Given these shortcomings, a proper test of whether robots may induce SFE/SIE must include conditions under which any of those mechanisms may play out.
This study seeks to satisfy that multimechanism requirement by avoiding previously stated methodological limitations and, on statistical grounds, allowing for disentangling which mechanism emerges as most influential. More specifically, this investigation uses a well-tested stimulus task (suitable to elicit SFE/SIE; Huguet et al., 1999) and robot (suitable to elicit mind perceptions; Banks, 2021), controls for observer behavior (evaluative but nonreactive), and positioning (physically oriented toward both participant and task) and removes the experimenter from the task space (to isolate the observers’ influence). These conditions satisfied and extant literature’s mixed results acknowledged, we ask a foundational research question:RQ1: (How) does the presence of a social robot observer influence performance in (a) dominant tasks and (b) non-dominant tasks?
If a robot’s copresence elicits SFE/SIE, such an impact would likely result from similar mechanisms as with human observers. Following Irfan et al.’s (2018) argument that a human-subjects study inherently involves the presence of observers—such that mere presence cannot be tangled from the study context—we assume that mere presence is operating across all examined contexts. Thus, toward evaluating the relative activation of the three potential mechanisms, it is hypothesized:H1: SFE/SIE for social robot observers are positively connected by (a) evaluation apprehension, (b) effort, and (c) attentional conflict.
Evidence suggests that people tend to attribute humanlike characteristics to social robots (Złotowski et al., 2018). Spatola et al. (2019, 2020), in particular, revealed that perceptions of a copresent robot’s humanlikeness mediated SFE/SIE. These findings suggest that people’s inference that there is someone there might be a necessary precondition. Since SFE/SIE may rely on the perception that the observer is socially relevant, mind perception emerges as a pivotal process that may determine whether drive and nondrive mechanisms get activated with an observant robot.
Built upon established conceptualizations that distinguish between agentic and experiential capacities (i.e., abilities to think and do and to sense and feel; Gray et al., 2007), current understandings of mind perception differentiate among three capabilities with which different agents can be adequately described (Malle, 2019; Weisman et al., 2017). Malle (2019) argued for splitting agentic capacities into a perceptual-cognitive dimension comprising basic abilities such as perception, knowledge, and communication (i.e., reality-interaction capacity) and a social-cognitive dimension that includes higher-order cognitive skills such as social reasoning, moral cognition, and executive control (i.e., social-moral capacity). Like the original experience dimension, affective capacities to sense and feel constitute the third dimension. Based on these three perceptions, people determine for themselves the mindful nature of a given robot (Banks, 2021).
Following this framework, we argue that affective, social-moral, and reality-interaction capacities may be affiliated with SFE/SIE as a function of observation by a socially relevant being. Accordingly, it was hypothesized:H2: SFE/SIE are positively connected to perceptions of an observer’s (a) affective, (b) social-moral, and (c) reality-interaction mental capacities.
The present study implemented three observer conditions (no observer vs. human vs. robot). While the no-observer condition served as experimental control, the human-observer condition was intended to control whether human-induced SFE/SIE can be replicated by our scenario (cf., Irfan et al., 2018). Participants were randomly assigned to one of these conditions, resulting in n = 39 in the no-observer condition, n = 36 in the human condition, and n = 36 in the robot condition. Randomization was not conducted individually (i.e., randomly assigned conditions per participant) but block-wise (i.e., randomly assigned conditions for each four-session block before participants signed up) to (a) avoid lab reconstructions between sessions, (b) ensuring consistency in leveraging only female confederates as human observers, and (c) reducing error from confederates constantly switching protocols.
Undergraduate students were invited to participate in a study on “task performance under different conditions” for course credit and U.S.$5 compensation. Prior to analysis, 14 participants were excluded from the final sample (five for misunderstanding of the task, five due to robot malfunction or confederate error, one for color-blindness, and one due to discomfort with the robot). Additionally, two were excluded due to a very poor performance in the task. The final sample comprised N = 111 participants (age: M = 19.65, SD = 2.46, range: 18–35 years). Most identified as female (n = 84), others as male (n = 27), none as nonbinary (see online materials, for more information on sample demographics; Koban et al., 2021).
The study’s procedure included two data collections. First, participants completed an online survey capturing technology attitudes and demographics; second, they participated in an in-person lab session. Before entering the lab, all participants completed a COVID-19 screening in accordance with university protocols. All humans wore face masks: The experimenter had a black fabric mask, human confederates wore transparent masks, and participants used a mask of their choice (for a reasoning on the mask choice, see online materials; Koban et al., 2021).
After giving consent, participants were given instructions by an experimenter and led into a separate testing area. Participants in the observer conditions were briefly introduced to either a human confederate (acting as a human observer) or a humanoid robot (serving as a robot observer). Both observers introduced themselves (via standardized script) as a lab assistant who had learned “how people solve a particular kind of task” (indicating ability to evaluate) and who will be present during task completion “to observe [the participant’s] performance” (signaling surveillance). Participants were instructed not to talk to the observer during task completion as this would interfere with their performance; if they spoke, the observer would not respond (removing interaction expectations). The no-observer condition did not include an introduction. Participants were then led to a desk and asked to sit in front of a set of colored buttons (see Figure 1) used to complete a Stroop task (Stroop, 1935). The experimenter explained how to complete the task, emphasizing that “the goal is to respond as quickly as you can without making errors.” As tutorials, the experimenter guided them through three brief practice rounds (four trials each) of (a) congruent stimuli, (b) incongruent stimuli, and (c) attention-control stimuli. The experimenter then left the testing area (to mitigate experimenter-observation impressions), leaving the participant either alone or in the presence of their assigned observer. The observer was visibly positioned on the left or right side of the participant with a clear line-of-sight to the participant’s screen-based performance. During task completion, observers were scripted to appear attentive (e.g., with both participant and screen simultaneously in their field of vision) without communicating with the participant. At the task’s completion (indicated by a distinct sound), the experimenter returned and led the participant out of the testing area to complete a posttask survey.
A computerized version of the Stroop task (Stroop, 1935) was implemented as stimulus task. Among other cognitive tests, the Stroop task has been established as the standard for investigating SFE/SIE (Huguet et al., 1999) and is adaptable to human–robot interactions (Spatola et al., 2018).
In the standard Stroop task (used here), participants are exposed to a series of stimuli that are color-name words (i.e., “red,” “blue,” “green,” or “yellow”) displayed in different colors (i.e., those words are written in red, blue, green, or yellow). Participants are instructed to click the colored button that corresponds with the color in which the word is displayed and not to the meaning of the word itself. The task consists of words where the meaning and color match (i.e., congruent stimuli) and where meaning and color differ (i.e., incongruent stimuli). Both the congruent and incongruent stimuli tap into a dominant response: Reading a word. However, while this dominant response facilitates color naming for congruent stimuli, it interferes with task performance for incongruent stimuli (cf., MacLeod, 1991). In this way, performance on both dominant (RQ1a) and nondominant (RQ1b) tasks can be assessed.
Additionally, a white dot was displayed in the upper-right corner of the screen in random intervals to evaluate attentional conflict. For these attention-control stimuli, participants were instructed to ignore the word stimuli and to only react to the presence of the dot by pressing the white button. These attention-control stimuli were added to two congruent and two incongruent stimuli, both randomly selected from a complete list.
After being guided through tutorials, participants completed three runs with 28 trials each (total of 84 trials). Each run consisted of 12 congruent stimuli (i.e., three iterations of each word with its matching color), 12 incongruent stimuli (i.e., each possible permutation of mismatched word and color), as well as four attention-control stimuli. All stimuli were presented against a light gray background on a 32” Liquid Crystal Display (LCD) monitor with participants being seated at a distance of approximately 4 ft. Response-to-stimulus intervals were set to 1,000 ms (cf., Sharma et al., 2010).
Both observer conditions were designed to be as similar as possible. For the robot-observer condition, a human-sized android was used (Robothespian with Socibot head, Engineered Arts; see Figure 2), featuring an underlit white body shell, a projected female face (i.e., default “Pris” guise), American-English accent (i.e., “Heather” voice), and was introduced as “Ray.” The robot was controlled via Wizard of Oz protocol by one of four student assistants; each was trained for 4 hr on study procedure and robot functionalities and completed multiple runs with training participants. For human conditions, a similarly trained confederate acted as the observer. This confederate was personified by one of three young–adult white women, dressed in a white shirt, and introduced under their real name.
All self-report scales were presented in a randomized order on 7-point Likert scales, unless otherwise indicated.
Task performance was evaluated by averaging each participants’ response time (RT) separately for (a) congruent and (b) incongruent stimuli. Following standard data cleaning procedures (see Spatola et al., 2018), RTs that deviated more than three standard deviations from the sample mean (n = 85 out of 8,856 trials), as well as incorrect trials (n = 202), were excluded prior to averaging.
Differences in task anxiety were accounted for using the five-item evaluation apprehension scale (Spencer et al., 1999). Participants indicated the extent to which they had maladaptive thoughts during task completion (e.g., “Someone would look down on me if I didn’t do well.”). Due to a low corrected item-total correlation (r = .30), Item 5 was excluded from the scale (“I feel self-confident.”). The resulting four-item scale demonstrated excellent internal reliability (α = .96).
To measure perceived effort-exertion during the task, the six-item NASA-Task Load Index (NASA-TLX; Hart, 2006) was used. Several questions (e.g., “How hard did you have to work to accomplish your level of performance?”) queried the nature of their personal experience during task completion. Due to low corrected item-total correlation (r = −.28), the fourth item (“How successful were you in accomplishing what you were asked to do?”) was not included in the index (α = .67).
Participants’ averaged accuracy (i.e., hit/miss ratio) in completing attention-control items was used as an indicator, as lower accuracy scores suggest that participants prioritized the primary Stroop task over completing the attention-control stimuli.
Mind perception was measured via the 20-item version of the multidimensional mental capacity scale (Malle, 2019). Participants were asked to evaluate how much their observer was capable of different mental capacities. The scale consists of three dimensions: Reality-interaction capacity (e.g., “inferring a person’s thinking”; α = .84), affective capacity (e.g., “feeling happy”; α = .98), and social-moral capacity (e.g., “telling right from wrong”; α = .92).
To account for individual differences in participants’ attitude toward robots, general attitudes toward novel technology were assessed through technophilia and technophobia scales (Martínez-Córcoles et al., 2017). In both scales, participants indicate how much they agree or disagree with various statements involving new technologies. The technophilia scale consists of eight items focusing on technological enthusiasm and potential benefits (e.g., “I get excited about new technology”; α = .90) while the technophobia scale has five items regarding discomfort with modern technology (e.g., “I feel uncomfortable when I use new technology”; α = .68).
We also controlled for how much participants liked their respective observer using the 5-item likability subscale of the Godspeed questionnaire (Bartneck et al., 2009), specifying evaluations via 7-point semantic differentials (e.g., unfriendly vs. friendly or dislike vs. like; α = .91). Participants also indicated previous experience interacting with social robots and with other humans with two face-valid single-item measures (“How much experience do you have interacting with social robots/other people like your observer?”).
Dichotomous items were used to verify whether the assigned observer was familiar to participants, in another study or elsewhere (n = 5 answered with “yes”), as well as whether they had ever seen the Stroop task before (n = 39 answered with “yes”).
In response to Irfan et al.’s (2018) concerns, we measured how much participants feel observed, using a single item (“During the task, how much did you feel like you were observed by someone?”). To control for whether participants took the task seriously, another single item was used (“During the task, how much attention did you pay to what was on screen?”).
All study materials, data, and preregistered hypotheses are available in the online materials for this project: https://osf.io/9xry5 (Koban et al., 2021).
Descriptive information is presented in (ref?); zero-order correlations are included in the online materials (Koban et al., 2021). Each analysis originally controlled for the influence of different covariates. Unless otherwise indicated, these covariates had small impact and were disregarded for parsimony (see online materials for complete analyses; Koban et al., 2021).
Descriptive Information and Cronbach’s α Scores for All Relevant Variables | ||||
Variable | Full sample | No observer | Human observer | Robot observer |
---|---|---|---|---|
Feeling observed | 2.79 (1.70) | 2.23 (1.55) | 3.33 (1.74) | 2.86 (1.68) |
Task attention | 6.60 (0.77) | 6.51 (0.94) | 6.58 (0.73) | 6.72 (0.57) |
RT congruent trials | 0.85 (0.14) | 0.86 (0.17) | 0.84 (0.11) | 0.86 (0.12) |
RT incongruent trials | 0.94 (0.17) | 0.95 (0.17) | 0.95 (0.23) | 0.91 (0.11) |
Evaluation apprehension | 3.04 (1.69) | 2.63 (1.69) | 3.38 (1.45) | 3.15 (1.88) |
Exerted effort | 3.25 (1.04) | 3.15 (0.89) | 3.42 (1.21) | 3.17 (1.02) |
RT attentional conflict trials | 0.90 (0.10) | 0.88 (0.11) | 0.91 (0.09) | 0.91 (0.09) |
Reality-interaction capacity | 6.14 (0.85) | 4.85 (1.04) | ||
Affective capacity | 5.88 (1.35) | 2.34 (1.26) | ||
Social-moral capacity | 5.85 (1.06) | 3.60 (1.56) |
Before addressing our research question and hypotheses, we checked whether participants’ felt observed by the mere presence of someone. Analysis of Variance (ANOVA) demonstrated a significant effect, F(2,108) = 4.21, p = .017, η2 = .073. Participants felt the most observed in the human condition (M = 3.33, SD = 1.74), followed by the robot condition (M = 2.86, SD = 1.68), and the no-observer condition (M = 2.23, SD = 1.55). Bonferroni-corrected post hoc testing validated a significant difference between the human and the no-observer condition (p = .014, Cohen’s d = 0.67). No difference was found between the human and robot condition (p = .685, d = 0.28), nor between the robot and no-observer condition (p = .306, d = 0.39). This result suggests that a dedicated observer does only moderately affect participants’ feeling of being observed, especially if this observer is a robot where the effect turned out weaker. The pairwise difference replicates known findings with human observers, validating that this study’s scenario is capable of producing feelings of being watched that differ from nonobserved versions. Participants did not differ with regard to perceived attention paid to the task, F(2,108) = 0.71, p = .492, η2 = .013, with very high scores across all groups (Ms = 6.51–6.72, SDs = 0.57–0.94). Given these results, we considered our observer manipulation partially successful; an observer does weakly or moderately affect participants’ feeling of being observed.
Linear mixed models (using participants as a random factor) were conducted to test whether the observer manipulation (no observer vs. human vs. robot) influences participants’ performance in accordance with SFE and SIE. Descriptive information is reported in Table 1. Compared to the no-observer condition, results showed no significant impact of the robot observer (SFE: β = −.007, 95% CI [−.15, .14], t(110.53) = −0.09, p = .929; SIE: β = −.01, 95% CI [−.16, .13], t(109.09) = −0.16, p = .874) nor of the human observer (SFE: β = −.04, 95% CI [−.18, .11], t(110.52) = −0.51, p = .610; SIE: β = −.04, 95% CI [−.19, .11], t(108.97) = −0.54, p = .590). An exploratory linear regression model using the difference of incongruent and congruent stimuli as criterion resulted in a similar nonsignificant effect (see online materials; Koban et al., 2021).
To gain potential insights into the lack of difference, we conducted an exploratory analysis of whether the observer conditions (no observer vs. human vs. robot) differed from one another with respect to each of the three explanatory mechanisms: (a) evaluation apprehension, (b) exerted effort, or (c) attentional conflict. Again, no significant effects were found (see online materials, for details; Koban et al., 2021). Irrespective of the presence of any kind of observer, participants felt similar apprehension toward being evaluated, exerted similar effort, and responded to attention-control items with similar accuracy.
To investigate whether SFE/SIE for social robots can be predicted by theorized psychological mechanisms, we performed—for those in the robot-observer condition only—linear mixed models using evaluation apprehension, exerted effort, and attentional conflict as predictors, either congruent RTs (for SFE) or incongruent RTs (for SIE) as criterion, and participants as a random factor.
For SFE, regression analysis resulted in no significant effects. None among evaluation apprehension, β = −.04, 95% CI [−.26, .17], t(35.96) = −0.38, p = .704, exerted effort, β = .08, 95% CI [−.13, .29], t(35.96) = 0.73, p = .470, or attention conflict, β = .10, 95% CI [−.10, .31], t(35.96) = 0.97, p = .341, predicted participants’ RT in congruent trials. Similarly, no significant predictions arose for incongruent trials among evaluation apprehension, β = −.05, 95% CI [−.29, .19], t(34.78) = −0.39, p = .698, exerted effort, β = .13, 95% CI [−.11, .36], t(34.80) = 1.04, p = .306, or attention conflict, β = .15, 95% CI [−.07, .38], t(34.81) = 1.32, p = .195. Exploratory analysis for the human-observer group as well as linear regression analysis using the difference scores of incongruent and congruent stimuli resulted in similar nonsignificant effects (see online materials; Koban et al., 2021). H1 must be rejected.
To examine whether forms of mind perception positively predict SFE/SIE, linear mixed models were conducted using perceptions of the robot observer’s reality-interaction capacity, affective capacity, and social-moral capacity as predictors for either congruent RT (for SFE) or incongruent RT (for SIE). Again, participants were included as random factor. For SFE, a significant prediction was found for reality-interaction capacity, β = −.28, 95% CI [−.53, −.02], t(35.97) = −2.10, p = .043. No significant effects were detected for affective capacity, β = .10, 95% CI [−.16, .37], t(36.00) = 0.78, p = .443, or social-moral capacity, β = .13, 95% CI [−.18, .45], t(35.96) = 0.83, p = .411. Similarly, a significant result did emerge for incongruent tasks: Reality-interaction capacity again predicted faster RT, β = −.33, 95% CI [−.61, −.05], t(34.90) = −2.33, p = .026, which is counter to the predicted SIE (i.e., slower RT). In other words, the more participants perceived the robot to be able to perceive its surroundings, think, and communicate, the faster they responded to incongruent trials. Neither affective capacity, β = .23, 95% CI [−.04, .50], t(35.19) = 1.68, p = .102, nor social-moral capacity, β = .12, 95% CI [−.21, .45], t(34.84) = 0.70, p = 487, had a similar predictive value. Notably, both significant effects lost statistical relevance after controlling for covariates. Exploratory linear regression analyses using the difference scores of incongruent and congruent stimuli showed no significant effects (see online materials; Koban et al., 2021). Thus, H2 must be rejected.
Drawing from social facilitation/inhibition theory, the present study investigated whether task performance is influenced by the physical copresence of a social robot. The relevant theory postulates that people tend to perform better in dominant tasks but worse in nondominant tasks when observed by other people; recent research extended this mechanism to artificial observers. This investigation expands the latter work by testing for previously validated SFE/SIE mechanisms (i.e., evaluation apprehension, mere effort, and attentional conflict), as well as people’s mind perception in robots as an ostensible precondition. Acknowledging mixed findings for digital observers (Sterna et al., 2019) and methodological criticism (Irfan et al., 2018), this study followed suggested procedures for SFE/SIE studies to minimize the influence of confounds (i.e., using a well-tested task and a pretested robot, controlling for observer behavior and positioning, isolating the observer’s influence) and, in doing so, addressed why previous findings may have been inconsistent. Our findings, overall, indicated no significant differences in SFE/SIE across observer conditions, no significant differences in SFE/SIE mechanisms, and only a single statistically meaningful effect of mind perception (i.e., improved performance corresponding with perceptions of basic agentic capacities).
We interpret these findings to suggest that extant literature’s inconsistency regarding artificial agent-induced SFE/SIE might emerge from a challenge in meeting the conditions for SFE/SIE to emerge. A robot’s copresence caused only a weakly enhanced feeling of being observed—too weak perhaps to trigger the established mechanisms to a sufficient degree. However, none of the SFE/SIE mechanisms were significantly different for the human-observer group either, even though participants expressed moderately increased feelings of being observed in this condition. In addition, no mechanisms came up as a significant predictor for SFE/SIE, irrespective of whether participants were observed by a robot or a human observer. In short, effects that may have happened with robot observers did not emerge to a statistically meaningful degree, questioning the transferability of SFE/SIE; effects that should have happened with human observers, did not turn out significant either. Participants’ perceptions of reality-interaction capacity predicted a better performance in both the dominant and nondominant stimuli, which stands in contrast to what is assumed in relevant theory. Overall, these findings accentuate recent criticism on SFE/SIE.
Flawed preconditions and methodological shortcomings were at the forefront of Irfan et al.’s (2018) critique of the SFE/SIE paradigm. They noted that people may feel observed when participating in a study, irrespective of whether a dedicated observer was present with them. They even found that participants feel less observed with another person in the room than without, which indicates that an imagined and potentially evaluative audience (cf., Litt, 2012) might be more salient than a physically present observer. They questioned the theory’s standing in general, referring to methodological issues and an exaggerated overall narrative that has brushed off inconclusive results (cf., Glaser, 1982).
Our findings second these concerns—both regarding the transferability of SFE/SIE to robots and the validity of its theoretical propositions. The present study’s results could not support Irfan et al.’s (2018) conclusion that participants without observer may feel similarly (or even more strongly) observed than observed participants (although having found only a moderate effect is still notable). Instead, results showed that robot observers were perceived as an “in-between” evaluator (corresponding with their commonly assigned ontological status; Kahn & Shen, 2017) resulting in weakly increased feelings of being observed. The present study can still comport with Irfan et al.’s (2018) general concern. It might align theory that participants did not exhibit significant SFE/SIE with robot observers because of its failure to induce a strong-enough feeling of being observed, but the copresence of another human did increase observation feelings significantly without triggering meaningful SFE/SIE or any of the well-established mechanisms in a sufficient way.
In addition to those experimental outcomes, participants’ variability in evaluation apprehension, exerted effort, and attentional conflict did not significantly predict SFE/SIE in this study. These general noneffects are critical above and beyond the lack of SFE/SIE because they indicate that previously established explanatory processes might not be easily replicable for human audiences, nor transferable when an audience consists of social robots. This unexpected across-the-board lack of meaningful differences could be explained by contextual factors. Participants may have not been motivated enough to care about their observer. However, participants’ extraordinarily high attention scores suggest that they may have been too motivated to complete the task, likely slipping into a flow state in which they may have forgotten about their surroundings, including their audience (cf., Csikszentmihalyi, 1990). Methodological rigor may have facilitated participants in tuning out the audience, since we intentionally trained confederates and controllers to be noncommunicative during task completion to avoid atheoretical cues. Integrating flow theory may help unravel why many studies fail to replicate SFE/SIE (see, e.g., studies using digital games as task stimuli; Emmerich & Masuch, 2018; Watts et al., 2021); controlling for flow states may help future inquiries into to avoid noneffective observer manipulations.
Together, findings suggest that SFE/SIE may be challenging to replicate in a task setting because participants may not feel a sufficiently enhanced sense of being observed, particularly when their observer is a social robot; and even if they feel observed, moderate feelings may not be enough to activate psychological mechanisms that are thought to have an impact. These findings also challenge significant findings from previous human–robot interaction studies, indicating that they may have resulted because of unknown boundary conditions.
Assuming, for the sake of argument, that the general assumptions of SFE/SIE are valid, these effects should underlie the perception of a robot as a socially relevant agent. If people understand robots as mindless objects, SFE/SIE should not emerge (i.e., there is no one there); if they are understood as mindful agents, effects may surface (i.e., there is someone there) via mere presence (as mindful agents may activate arousal), evaluation apprehension (as mindful agents may be evaluative), mere effort (as mindful agents may award effort), or attentional conflict (as mindful agents may be more salient). Accordingly, Spatola et al. (2019, 2020) found that SFE/SIE with social robots were mediated by anthropomorphic attributions (which were more strongly triggered when participants could interact with them prior to the task), suggesting that perceived sociality narrows people’s attentional spotlight favoring the primary task. The present study revealed limited association between mind perception and performance, with a link between perception of basic agentic capacities: Participants who understood the robot as capable of perceiving and interacting with its surroundings performed better in both the dominant and the nondominant task. This effect qualifies Spatola et al.’s (2019, 2020) findings such that perceptions of basic agentic capacities, even though not displayed during interaction that actively involved participants, may be the gateway for improved performances when humans team with robots. However, said improvement in incongruent trials does not align with social facilitation/inhibition theory which leaves room to speculate whether it may not be easily transferable to robot observers or whether other mechanisms (e.g., arousing challenge/threat perceptions by social robots; Blascovich et al., 1999) may be at work.
The present study is subject to several limitations. First, data collection took place between September and October 2020 during a peak in the COVID-19 pandemic. Logistical demands inherent to the procedure (e.g., distanced recruitment, screening procedure, and safety precautions during the experiment) and participants’ psychological states may have been confounding factors. While we have worked to mitigate those latter concerns by explaining the reasons for protocols and using clear face masks for confederates, we cannot rule out their impact. Second, we recruited a convenience sample of undergraduate students that ended up predominantly young, white, and female. Thus, results may not be generalizable to other nor the general population. Third, most measures were self-reports referring back to participants’ experience during task completion. Such measures are vulnerable to response bias. Fourth, participants were invited to a relatively sterile lab room that may have increased the perceived artificiality of the situation, imposing similar feelings of being observed across conditions. This lab room also has a small security camera installed that blends in with a nearby smoke alarm sensor so that participants could not see the actual camera; the stimulus robot also has an undetectable built-in camera. Despite their inconspicuousness, it is possible that participants may have felt observed by the presence of cameras. Fifth, we utilized a human-sized humanoid as stimulus robot that may have triggered a sense of threat (although likability was very high, M = 6.10, SD = 0.81), three student assistants as human observers whose perceptions may have differed due to individual idiosyncrasies (although participants felt similarly observed by them, p = .436), and five student assistants as robot controllers who may have delivered Ray’s prompts with varying timings (although participants also felt similarly observed by each of them, p = .733). Finally, in contrast to previous work (e.g., Spatola et al., 2019), we did not include a baseline measurement without a dedicated observer or color-neutral stimuli words in the Stroop task to control for interindividual variation in motoric response initiation, but instead relied on randomization to prevent against systematic differences across experimental groups. We nevertheless acknowledge that these procedures minimize potentially confounding prior differences across groups (that may be present in medium-sized samples despite randomization) and enhance statistical power because they allow to apply within-person difference scores (i.e., Stroop conflict scores) instead of raw RTs as criteria.
Human–robot teams that operate in a shared space may become increasingly common (Demir et al., 2019), necessitating robust empirical understandings of basic social–psychological processes, including nonconscious factors that may influence the effectiveness of interactions. Addressing known methodological issues, the present study investigated whether a copresent robot affects performance in a cognitive-processing task similar to known effects for human observers. Results indicate limitations in applying social facilitation/inhibition dynamics to robots, as participants experienced only weakly enhanced feelings of being observed and, more importantly, did not differ in their performance. In the wider context of human–robot collaborations, these findings suggest that people’s performance may not be influenced substantially by sharing their space with an observant robot. Unexpectedly, however, the same noneffects were found for participants who were observed (and who had moderately enhanced feelings of being observed) by another human, indicating the relevance of additional contextual factors. Given high reported task attention, we conjecture that flow states may have inhibited participants’ awareness of observers during task completion, effectively circumventing known mechanics of SFE/SIE. In unpacking the role of copresent interactions, it may thus be fruitful to consider flow theory dynamics (i.e., attentional states induced by a balance of task difficulty and performer skill) as a relevant process in social performance studies.
https://doi.org/10.1037/tmb0000049.supp