Volume 3, Issue 1, Spring 2022. DOI: 10.1037/tmb0000053
Human-like robots and other systems with artificial intelligence are increasingly capable of recognizing and interpreting the mental processes of their human users. The present research examines how people evaluate these seemingly mind-reading machines based on the well-established distinction of human mind into agency (i.e., thoughts and plans) and experience (i.e., emotions and desires). Theory and research that applied this distinction to human–robot interaction showed that machines with experience were accepted less and were perceived to be eerier than those with agency. Considering that humans are not yet used to having their thoughts read by other entities and might feel uneasy about this notion, we proposed that thought-detecting robots are perceived to be eerier and are generally evaluated more negatively than emotion-detecting robots. Across two pre-registered experiments (N 1 = 335, N 2 = 536) based on text vignettes about different kinds of mind-detecting robots, we find support for our hypothesis. Furthermore, the effect remained independent of the six HEXACO personality dimensions, except for an unexpected interaction with conscientiousness. Implications and directions for future research are discussed.
Keywords: uncanny valley, mind perception, detector robots, personality, human–robot interaction
Disclosures: No conflicts of interest.
Data Availability: Data, materials, and supplemental materials are publicly available under https://doi.org/10.17605/OSF.IO/U52KM.
Open Science Disclosures: The data are available at https://osf.io/u52km/?view_only=b2de 1348101b4335a0b95cc95267a2a3
The experimental materials are available at https://osf.io/u52km/?view_ only=b2de1348101b4335a0b95cc95267a2a3
The preregistered design and analysis plan is accessible at https://osf.io/ u52km/?view_only=b2de1348101b4335a0b95cc95267a2a3
Correspondence concerning this article should be addressed to Andrea Grundke, Psychology of Communication and New Media, Julius-Maximilians-Universität Würzburg, Oswald-KülpeWeg 82, 97074 Würzburg, Germany. Email: firstname.lastname@example.org
Thoughts are free, who can guess them?
They fly by like nocturnal shadows.
No person can know them, no hunter can shoot them
with powder and lead: Thoughts are free!
First verse of the German folk song
The thoughts are free [Die Gedanken sind frei]
Since antiquity, humans have found relief in knowing that our cognitions cannot be accessed by anyone but ourselves (e.g., Cicero, ca. 52 B.C.E./1977). Due to the constantly advancing development of artificial intelligence, however, this freedom of thoughts (as expressed in the German folk song Die Gedanken sind frei) is in peril. Likewise, artificial intelligence is increasingly used to evaluate human emotions. How do humans respond to these mind-reading technologies?
Human (and non-human) mind can be distinguished into agency (thoughts and plans) and experience (emotions and desires, Gray et al., 2007), a distinction that has recently been applied to human–machine interaction (Appel et al., 2020; Gray & Wegner, 2012; Taylor et al., 2020). The respective studies show that machines with experience are less well-accepted and often perceived to be eerier than those with agency. Yet, it remains unclear how people react to robots who do not express their own mental states but instead detect the mind of the human user. In two pre-registered experiments, we apply the agency–experience distinction to juxtapose robots that can detect thoughts (thought detectors) with those that can detect emotions (emotion detectors).
Contrary to the effects for self-expressing machines, we propose an opposite effect for mind detection: Thought-detecting robots are expected to be eerier than emotion-detecting robots. Additionally, our second experiment applies the HEXACO model of personality (Honesty-Humility, Emotionality, eXtraversion, Agreeableness, Conscientiousness, Openness to experience) in order to examine whether individual differences moderate this effect.
The production and diversification of service robots is on the rise. The COVID-19 pandemic led to an increased demand for cleaning and disinfection robots, food and medication delivery robots, and edutainment and interaction robots (International Federation of Robotics, 2020). At the same time, a multi-wave international study showed that attitudes towards robots have become more negative over the last years (Gnambs & Appel, 2019). Faced with observations such as these, people may turn to scientific evidence to look for explanations.
A popular framework underlying negative responses to robots is the uncanny valley model (Mori, 1970; Mori et al., 2012; for reviews see Kätsyri et al., 2015; Wang et al., 2015; Złotowski et al., 2015). It states that responses to human-like entities such as robots or digital animations get more positive with increasing human likeness until a steep drop is observed for highly (but not perfectly) human-like entities. Whereas traditional uncanny valley research manipulated the human likeness of entities such as robots by changing their visual appearance (MacDorman & Ishiguro, 2006; Mathur & Reichling, 2009, 2016; Seyama & Nagayama, 2007), more recent research focused on functional features of the respective technologies, as well as user variables and context factors (e.g., Broadbent, 2017; Lischetzke et al., 2017; MacDorman & Entezari, 2015; Mara & Appel, 2015; Piwek et al., 2014; Rosenthal-von der Pütten & Weiss, 2015; Tu et al., 2020). Also, adhering to a psychological viewpoint rather than merely focusing on visuals, the ascribed mind of robots could be a key to understanding negative responses to robots (e.g., Gray et al., 2007).
Theory and research suggest that negative responses to human-like robots may depend strongly on the perception of a human-like mind in a machine (Gray & Wegner, 2012; Hegel et al., 2008; Stein & Ohler, 2017; Wegner & Gray, 2016). Indeed, at the age of nine, children already classify robots as more or less scary depending on whether they attribute a human-like mind to them (Brink et al., 2019).
As an underlying framework for this line of research, the mind perception dichotomy by Gray et al. (2007) has gained a lot of attention in recent years. In their initial research, Gray and colleagues asked participants to describe the extent to which different types of people, animals, God, and a robot possessed specific mental capacities. Based on these data, a principal component factor analysis revealed that mental capacities might be categorized into experience (i.e., the ability to feel emotions, have a personality, and a consciousness) and agency (i.e., self-control, morality, memory, planning, communication, and thought). According to further research, it is especially experience that seems to be a fundamental part of how people conceptualize the human mind and therefore humanness in general (Gray et al., 2011; Haslam et al., 2005; Knobe & Prinz, 2008).
Considering this paradigm, as well as some alternative theoretical approaches (e.g., Malle, 2019; Weisman et al., 2017), the notion of mind perception has become increasingly relevant in the field of human–robot interaction. For instance, Gray and Wegner (2012) combined the uncanny valley hypothesis with the mind perception dichotomy and showed that machines equipped with experience were rated as much more discomforting and uncannier than those demonstrating agency. In a similar vein, it has been shown that participants rather assigned agency characteristics than experience characteristics to robots (Brink et al., 2019; Gray et al., 2007; Wegner & Gray, 2016). Further building upon the work by Gray and Wegner (2012), Appel et al. (2020) presented evidence that a robot with experience was perceived to be eerier than a robot with agency, followed by a robot who merely served as a tool. Indicating notable generalizability, this finding was conceptually replicated for smart speakers in a recent study (Taylor et al., 2020).
The mind perception literature has profoundly advanced the scholarly understanding of how people evaluate autonomous technology. However, we note that the scholarly interest in this regard has mainly revolved around the perception of (artificial) minds in machines—yet hardly looked at the other direction, that is, user evaluations of machines analyzing the human mind. Arguably, while this idea might have been dismissed as technically impossible a couple of decades ago, recent technological advancements have turned mind detection by robots into an imminent reality.
By now, advanced software that allows social robots and other technical devices to recognize the emotions of human users can reach impressive levels of accuracy (e.g., Affectiva, 2018; Alonso-Martín et al., 2013; Chen et al., 2020; Microsoft Azure, 2018), leading to an increased scientific interest in digital forms of emotional recognition and mind perception (Banks, 2019; Bianco & Ognibene, 2019; Dissing & Bolander, 2020; Gray & Wegner, 2012; Kang & Sundar, 2019; Stein et al., 2020). Along these lines, it has been suggested that machines might even become able to detect not only human emotions but also human thoughts in the future—a feat that would reach clearly beyond the capabilities of their human creators. In fact, current-day technology already heralds the rise of these possibilities, as machines have been able to deduce internal thought from eye movements (Huang et al., 2019), create their own theory of mind for humans via computational models (Breazeal et al., 2009; Brooks & Szafir, 2019; Dissing & Bolander, 2020), or use language processing to identify political views (Colleoni et al., 2014), and suicidal intentions (Walsh et al., 2018).
At the same time, it remains unclear how people react to these emerging technologies. Human behavior, appearance, and skills are often used as a reference point when designing modern-day technology (e.g., Eyssel et al., 2012; Huang & Mutlu, 2013; Niculescu et al., 2013; Salem et al., 2011), but users do not always appreciate impressions of humanness in their machines. Indeed, several studies showed that once new technologies threaten human uniqueness, they are typically met with strong aversion (e.g., Müller et al., 2020; Złotowski et al., 2017). Even more so, social cognitive abilities such as mind-reading might play a particular role in this regard (Stein & Ohler, 2017), as our ability to infer and analyze the emotions of those around us has long served as a distinct advantage to our species (Darwin, 2009; Nesse, 1990). Considering this fear of losing our distinctiveness to machines, it appears likely that people might be wary of robots that detect others’ emotions—or even surpass this ability with the possibility to “read” cognitions as well.
To this day, however, only a few psychological studies have actually examined user responses to mind-detecting technology in an empirical manner. Kang and Sundar (2019) found that a robot was evaluated more negatively if it correctly interpreted humans’ sarcasm than if it failed to recognize this aspect of human behavior. Similarly, research by Stein et al. (2019) suggested that an artificial intelligence capable of analyzing participants’ personality traits might be seen as threatening. Yet, previous efforts such as these were clearly limited by the fact that they either focused only on emotional aspects of mind or kept the scope of the detection abilities ambiguous (e.g., Kang & Sundar, 2019; Stein & Ohler, 2017; Stein et al., 2019). Therefore, a structured exploration of user reactions to distinct forms of mind detection by machines is all but needed to close an important research gap in the field of human–computer interaction.
We assumed that—unlike the previously documented responses to robotic agency versus experience (e.g., Appel et al., 2020; Gray & Wegner, 2012)—user evaluations might turn out quite differently for the detection of human agency versus experience by social robots. More specifically, we expected a reversed effect: A robot’s ability to analyze human experience should be perceived as less threatening and less uncanny than a robot’s ability to analyze users’ agency.
In their daily life, humans are generally quite used to other communicators detecting their emotions (Darwin, 2009; Nesse, 1990), whereas precise thought detection is an ability largely unknown from the realm of human-to-human interaction. In turn, people are used to controlling their emotional displays and they have learned to deal with the unintentional communication of emotions (Tamir, 2016), yet they are much less experienced in controlling their thoughts or in coping with the unintentional communication of thoughts and plans. To illustrate this argument, one may consider the embarrassment that people tend to experience when human communication partners detect and interpret a Freudian slip, revealing supposedly true yet hidden thoughts and plans. Based on the large number of studies that have emphasized perceived control as a fundamental prerequisite of positive human–machine interactions (Kang, 2009; Roubroeks et al., 2010; Stein et al., 2019; Sundar, 2020; Zafari & Koeszegi, 2020; Złotowski et al., 2017), we therefore expected a clear advantage of emotion-detecting over thought-detecting machines in participants’ evaluations.
Apart from our main outcome variable eeriness (Gray & Wegner, 2012), which remains one of the most well-established ways of operationalizing robot acceptance (Diel et al., 2022), we used two additional dependent variables to get a more general overview of participants’ assessment of this type of robotic technology. First, we focused on concerns about human identity, which emerged as a meaningful predictor of technology-related experience in previous research (Stein et al., 2019). More specifically, this variable assesses the extent to which users consider a machine as a symbolic threat to the distinctiveness of the human species (i.e., their uniquely human identity)—an impression that has, in turn, been linked to the unwillingness to further interact with technology (e.g., Kang & Sundar, 2019; Stein et al., 2019; Złotowski et al., 2017). As we presented emotion detectors (which have the same abilities as humans) and thought detectors, whose capabilities even exceed those of humans, we assumed that traditional human–machine boundaries could become blurred, resulting in a meaningful effect expressed by this variable. Second, the general evaluation of the new technology was assessed (Appel et al., 2019), in order to observe reactions towards the presented robots in a more generalizable way.
To implement the desired manipulation of robot characteristics, we used vignette texts—as previous work in the field of mind perception (Appel et al., 2020; Gray & Wegner, 2012; Swiderska & Küster, 2020; Ward et al., 2013) showed that this method can be an internally valid and efficient means to convey specific technological possibilities. In our first experiment, descriptions of an innovative robot able to analyze humans’ agency or to analyze humans’ experience were presented. As a control group, we presented a description of a robot who merely served as a tool without any sophisticated analysis abilities. Based on the theory and research outlined above, the following hypotheses guided Experiment 1: H1: The thought detector robot will evoke higher eeriness than the emotion detector robot (H1a), whereas the robot without analysis abilities will evoke the least eeriness (H1b). H2: The thought detector robot will evoke stronger concerns about human identity than the emotion detector robot (H2a), whereas the robot without analysis abilities will evoke the least concerns (H2b). H3: The thought detector robot will yield a more negative general evaluation than the emotion detector robot (H3a), whereas the robot without analysis abilities will yield the most positive general evaluation (H3b).
In addition to providing a replication of the effects tested in Experiment 1 (by using the same vignette texts), the second experiment examined the influence of users’ individual differences on the acceptance of detector robots using the well-established HEXACO model of personality (Ashton et al., 2004). The hypotheses addressing the role of the users’ personality will be introduced after the discussion of Experiment 1. Both experiments were pre-registered, with changes in the hypothesis numbering and exclusion criteria being documented in the online supplement. The pre-registrations, data, codes, and an online supplement can be found at https://osf.io/u52km.
A power analysis with G*Power (Faul et al., 2007) recommended at least 200 participants assuming a small to medium effect size of f = .20 (with α-error probability = .05, and power = .80) for the two-group fixed effect expected in Hypothesis 1a. Another 100 participants constituted the control condition, resulting in 300 participants. We invited 450 U.S.-American residents from the MTurk online participant pool (hit approval rate > 97%, hits > 1,000), in order to have a buffer if careless responding occurred. Of the 443 completions, 44 participants did not have sufficient English skills, as indicated by two control questions, and were therefore not included in our statistical analyses (Kennedy et al., 2020). One additional participant failed an included attention check item and another three participants had large (> ±3 years) deviations when asked twice about their age. Moreover, 21 participants were excluded because their participation time was lower than 100 s (n = 4) or higher than 920 s (n = 17). Another 39 participants interchanged the thought detector robot and the emotion detector robot in the manipulation check and were excluded (see online supplement for additional information). As such, the final sample consisted of 335 participants (154 female, 176 male, 5 non-binary or no answer) with an average age of 39.33 years (SD = 12.00, ranging from 21 to 75 years). Exploratory analyses revealed that age and gender did not moderate the influence of the robot manipulation on the dependent variables (see additional analyses on gender and age for both experiments in the online supplement).
We asked participants to give informed consent before starting the online experiment. Following their random assignment to one of the three conditions, participants were presented with the respective vignette text matching their group. Subsequently, we asked them to fill in the chosen user evaluation questionnaires. Sociodemographic information and questions to identify careless responding and low English proficiency followed (Kennedy et al., 2020; Meade & Craig, 2012; see online supplement for details), before participants were debriefed about the background of the experiment. Participants took on average 290.61 s (SD = 156.00) to complete the questionnaire, with a mean time of 42.67 s (SD = 49.78) spent on the page that presented the experimental stimulus. We complied with American Psychological Association (APA) ethical standards in the treatment of our sample.
Participants read a short text about an innovative robot named Ellix. Based on our between-subject design, three versions of this vignette text were prepared. In the first condition, Ellix was introduced as a thought detector robot. In the second condition, Ellix was supposedly able to detect humans’ emotions. In the third condition, the robot did not have any advanced analysis abilities, merely serving as a daily life tool. The descriptions were based on extracts of the mind perception classification by Gray et al. (2007); however, we made sure to highlight that the robot was not able to feel/think as was the focus of previous work (Appel et al., 2020; Gray & Wegner, 2012) but to recognize thinking or feeling on the human users’ side. The stimuli texts were as follows (thought detector condition, emotion detector condition, control condition):“ Ellix, a robot that can read your thoughts ” “ Ellix is a social robot, i.e., a robot that is meant to interact with humans. Ellix is equipped with over 100 sensors and an advanced artificial intelligence system to make sense of the data it receives from its surroundings. It observes the human iris, facial expressions, voice patterns, and micro-movements of the head. It further studies the posture and movement of all other parts of the body. With decades worth of psychological insight stored in its algorithms, as well as machine learning procedures that make the system smarter with each use, Ellix is able to analyze human interaction partners. More specifically, Ellix possesses the constantly advancing ability to detect what humans think, for example which actions they wish to execute and whether or not they know the answer to a question. ” “ Ellix, a robot that can read your emotions ” “ Ellix is a social robot, i.e., a robot that is meant to interact with humans. Ellix is equipped with over 100 sensors and an advanced artificial intelligence system to make sense of the data it receives from its surroundings. It observes the human iris, facial expressions, voice patterns, and micro-movements of the head. It further studies the posture and movement of all other parts of the body. With decades worth of psychological insight stored in its algorithms, as well as machine learning procedures that make the system smarter with each use, Ellix is able to analyze human interaction partners. More specifically, Ellix possesses the constantly advancing ability to detect what humans feel, for example which feelings they wish to act upon and whether or not they feel anxious when they answer a question. ” “ Ellix, a robot with 100 sensors ” “ Ellix is a social robot, i.e., a robot that is meant to interact with humans. Ellix is equipped with over 100 sensors and an advanced artificial intelligence system to make sense of the data it receives from its surroundings. It observes the human iris, facial expressions, voice patterns, and micro-movements of the head. It further studies the posture and movement of all other parts of the body. By these means, the system is equipped with the most recent technology to be useful as a daily-life tool. ”
The first dependent variable asked about users’ feelings of eeriness in response to the robot and was measured with the help of three items (“uneasy,” “unnerved,” “creeped out”) based on previous research (Gray & Wegner, 2012). A 7-point scale ranging from 1 (not at all) to 7 (extremely) was provided (α = .90, M = 3.61, SD = 1.83).
This dependent variable was a composite of the repulsion scale (Kamide et al., 2012, two items) and three items of the concerns about human identity scale by Stein et al. (2019). These five items (e.g. “I think that humans will be dominated by this robot before long“) were presented on a 7-point scale ranging from 1 (strongly disagree) to 7 (strongly agree), α = .91, M = 2.93 (SD = 1.59).
The third dependent variable consisted of three bipolar items (“hate it—love it”; “negative—positive”; “repulsive—attractive,” Appel et al., 2019), which were presented on a 7-point scale ranging from –3 to +3, α = .97, M = 0.43 (SD = 1.67).
We asked participants to select the robot’s ability that was introduced in the text describing the robot Ellix. Participants had to choose one of three options reflecting the description of the robot (see online supplement for details).
All p-values in this manuscript are based on two-tailed testing. Omnibus tests for the effects of the experimental manipulation on the three outcome variables were conducted. Pillai’s Trace showed that the general linear model combining all three dependent variables did not reach statistical significance, V = 0.03, F(6, 662) = 1.89, p = .081, ηp 2 = .02. On closer inspection, between-subject tests showed a significant group difference for the dependent variable eeriness, F(2, 332) = 3.60, p = .028, ηp 2 = .021. Concerns about human identity, F(2, 332) = 1.27, p = .282, ηp 2 = .008, and participants’ general evaluation of the robots, F(2, 332) = 2.56, p = .079, ηp 2 = .015, on the other hand, appeared to be unaffected by the treatment (see Table 1).
Descriptive Results of Experiment 1
Concerns about human identity
Note. Sample sizes: Thought Detector: n = 101, Emotion Detector: n = 105, Tool Robot: n = 129.
To test our specific hypotheses, planned contrasts were performed. As expected in Hypothesis 1a, the thought detector robot evoked higher eeriness than the emotion detector robot, t(332) = –2.53, p = .012, d = 0.35. The eeriness scores in response to the robot without analysis abilities (tool robot) were lower than the eeriness scores in the response to the thought detector, t(332) = 2.10, p = .036, d = 0.28, but they did not differ significantly from the emotion detector robot, t(332) = –0.56, p = .576, d = 0.07. Thus, the findings provide mixed support for Hypothesis 1b. An analysis contrasting the thought detector with both other conditions, t(332) = –2.65, p = .008, d = 0.31, underscores this pattern of results, indicating that the thought detector robot was perceived to be particularly eerie whereas the difference between the emotion detector robot and the control condition remained negligible.
As indicated by the omnibus analysis of variance (ANOVA), concerns about human identity were not affected by the experimental manipulation. The largest difference between the groups—which emerged between thought detector and emotion detector robot—did not reach statistical significance, t(332) = −1.44, p = .150, d = 0.20. Thus, no support was found for Hypotheses 2a and 2b.
Similarly, we note that the general evaluation of the thought detector robot did not differ significantly from the emotion detector robot, t(332) = 1.66, p = .097, d = 0.23 (Hypothesis 3a). While the robot without analysis abilities was evaluated more positively than the thought detector robot, t(332) = –2.19, p = .030, d = 0.29, it did not differ significantly from the emotion detector robot, t(332) = –0.45, p = .657, d = 0.06. As such, our results offer mixed support for Hypothesis 3b. When contrasting the general evaluation of the thought detector with both other conditions, a significant effect emerged t(332) = 2.19, p = .029, d = 0.26.
The results of this experiment show that a thought detector robot evokes less favorable responses than a robot that can detect human emotions or serves as a simple tool, particularly in terms of higher eeriness. Eeriness has been described as a reaction to something that seems unfamiliar, an entity that eludes the world we know and feel comfortable with (e.g., Jentsch, 1906/1997; Mori, 1970). As humans are not yet used to the notion of having their thoughts and plans read, this detection ability might indeed push a machine right into the uncanny valley. In contrast, an emotion-detecting robot was perceived to be as harmless as a simple tool in our study; participants felt mostly at ease with this hypothetical machine. In our interpretation, this may be explained by people’s familiarity with the respective recognition processes—as well as participants’ confidence that emotional displays can be regulated and coped with and, thus, remain fully under their control.
In a critical reflection on our study, we note that the manipulation check—despite being successful—indicated that several members of the control group had experienced difficulties identifying their condition. Furthermore, more than three dozen participants interchanged the description of the thought detector robot with the description of the emotion detector robot. As a takeaway from these observations, we adapted the materials for our follow-up research by highlighting the important parts of the descriptions in a bold font (see online supplement). Since the evaluation of the emotion detector robot had not differed significantly from the tool robot, we further omitted the tool condition in our second study. Moreover, we advanced the current project by focusing on interindividual differences as an important influence on users’ reactions to mind-reading machines.
The first aim of Experiment 2 was to replicate our main result of Experiment 1: We expected that a thought detector robot would again be perceived to be eerier than an emotion detector robot. Additionally, we decided to focus on the potential influence of dispositional factors regarding user responses to mind-reading robots. Previous work showed that stable individual differences can explain eeriness as a response to humanoid robots (e.g., Lischetzke et al., 2017; MacDorman & Entezari, 2015; Rosenthal-von der Pütten & Weiss, 2015). Therefore, we developed several hypotheses based on the HEXACO model of personality—one of the most often used models of basic personality structure (Moshagen et al., 2019), which consists of the factors honesty-humility, emotionality, extraversion, agreeableness, conscientiousness, and openness to experience.
Extraverted people feel positive about themselves, enjoy leading groups and social interactions, and they experience positive feelings of enthusiasm and energy (Lee & Ashton, 2009). Prior research showed that high extraversion predicted positive responses to robots (Esterwood & Robert, 2020; Mou et al., 2020; Santamaria & Nathan-Roberts, 2017). Given these results, we assumed that extraversion predicted more positive responses to detector robots as well. No differences between thought detector and emotion detector robots were formulated. H4: Being extraverted is associated with weaker feelings of eeriness evoked by mind-detecting robots.
People who are open to experience take an interest in unusual ideas, become absorbed in the beauty of art and nature, and are interested in various domains of knowledge (Lee & Ashton, 2009). Openness was a predictor for the acceptance of new technologies in general (Korukonda, 2007; Nov & Ye, 2008), and some research showed that this trait predicted positive responses to robots (Conti et al., 2017; Morsünbül, 2019; Rossi et al., 2018, 2020, but see Müller & Richert, 2018). We therefore hypothesize that openness to experience predicts more positive responses to detection robots. No differences between thought detector and emotion detector robots were formulated. H5: Being open to experience is associated with weaker feelings of eeriness evoked by mind-detecting robots.
Emotionality is described by the extent to which people experience fear of physical danger, experience anxiety in potentially stressful situations, need emotional support from others and feel empathy for others (Lee & Ashton, 2009). Some research in the context of social robotics has dealt with the conceptually related factor of neuroticism. Neuroticism correlated with a more negative attitude towards a robot (Müller & Richert, 2018). These findings suggest that emotionality would predict higher aversion against supposedly mind-reading robots. No differences between thought detector and emotion detector robots were formulated. H6: Being emotional is associated with stronger feelings of eeriness evoked by mind-detecting robots.
People scoring high on this dimension tend to forgive wrongs that they suffered, are able to control their temper and are willing to compromise and cooperate with others (Lee & Ashton, 2009). Agreeableness was a predictor of trust in an autonomous security robot (Lyons et al., 2020) and was associated with higher trust in machines in general (Chien et al., 2016). Moreover, a higher score on agreeableness correlated with keeping a lower interpersonal distance to robots (Takayama & Pantofaru, 2009). Based on these results, a negative relationship with eeriness was expected for both detector robots. No differences between thought detector and emotion detector robots were formulated. H7: Being agreeable is associated with weaker feelings of eeriness evoked by mind-detecting robots.
Conscientious persons organize their surroundings, are disciplined, and strive for perfection in their tasks (Lee & Ashton, 2009). No correlation between conscientiousness and the attitude towards robots was found in previous research (Müller & Richert, 2018). However, more conscientious people rated robot motion more negatively than less conscientious persons (Bodala et al., 2020) and preferred a text interface compared to a virtual character (Looije et al., 2010). Given these few and mixed findings, we formulated no formal hypothesis and also no assumptions regarding differences between thought detector and emotion detector robots.
The dimension Honesty–Humility is pronounced for people who avoid manipulating others for personal gain, who do not enjoy breaking rules and are uninterested in luxuries (Lee & Ashton, 2009). Special focus was put on the moderating role of the trait honesty–humility in our study. We assumed that people scoring high in the honesty–humility dimension would be less opposed to thought detection, as their overt behavior tends to be in line with their thoughts and plans. The latter is shown by negative correlations between honesty–humility and cheating behavior (Hilbig & Zettler, 2015; Kleinlogel et al., 2018, Moshagen et al., 2018; Pfattheicher et al., 2019). In human–robot interaction, cheating was negatively correlated with honesty–humility when a robot gave instructions (Petisca et al., 2019). Based on this line of argumentation, an interaction hypothesis was put forward. H8: Scoring low in the honesty–humility dimension increases the difference of eeriness evoked by the thought detector robot and the emotion detector robot.
An a priori power analysis with G*Power and considerations regarding power of moderation effects (Giner-Sorolla, 2018; Simonsohn, 2014) yielded an aspired sample size of 500 participants. We invited 600 people of the MTurk participant pool (U.S. residence, hit approval rate > 98%, hits > 1,000) to participate in our online experiment to have a buffer if careless responding occurred. Of the 602 completions, 20 participants did not have sufficient English skills and were therefore not included in the analyses (Kennedy et al., 2020). Five additional participants failed at least one attention check item and another eight participants had large (> ± 3 years) deviations when asked twice about their age. Moreover, 16 participants were excluded because their participation time was lower than 200 s (n = 10) or higher than 2,800 s (n = 6). Seventeen participants interchanged the thought detector robot and the emotion detector robot, failing the manipulation check. The remaining sample consisted of 536 participants (238 female, 291 male, and 7 non-binary or no answer) with an average age of 40.35 years (SD = 11.96, ranging from 19 to 79 years). Exploratory analyses revealed that age and gender did not moderate the influence of the robot manipulation on eeriness (see online supplement).
Again, we asked participants to give informed consent before starting the online experiment. Questions that allow conclusions to be drawn about data quality were included in a similar manner than in the first experiment (see online supplement). Participants were randomly assigned to read a text about one of two robots: A thought detector robot or an emotion detector robot. The same stimuli as in Experiment 1 were used, albeit with a slight variation, we highlighted the manipulated parts of the descriptions in bold font (see online supplement). As an improved manipulation check, participants had to select the abilities of the robot about which they had been informed immediately after reading the robot descriptions. Subsequently, the participants filled in the eeriness and HEXACO measures, followed by the negative attitude towards robots (Nomura et al., 2006) which was used in an exploratory analysis (see online supplement). The survey ended with sociodemographic questions, an opportunity to leave comments, and a debriefing. It took participants an average of 662.09 s (SD = 1063.97) to complete the questionnaire, including a mean duration of 65.24 s (SD = 106.44) spent on the page that presented the experimental stimulus. Again, we complied with APA ethical standards in the treatment of our sample.
Eeriness was measured with the three items used in Experiment 1, resulting in a mean of M = 3.72 (SD = 1.93), α = .91.
We used the HEXACO-60 questionnaire (Ashton & Lee, 2009), consisting of 60 items. Each dimension was measured through 10 items on a 5-point scale ranging from 1 (strongly disagree) to 5 (strongly agree). All Cronbach’s αs reached values of .72 or above. For detailed descriptive statistics see Supplement S6.
In support of Hypothesis 1a and replicating the results of Experiment 1, the thought detector robot (M = 4.08, SD = 1.87) was perceived to be significantly eerier than the emotion detector robot (M = 3.38, SD = 1.92), t(534) = 4.23, p < .001, d = 0.37 (see Figure 1 eeriness results in both experiments).
The main effects and interactions of robot condition and HEXACO dimensions were analyzed by a hierarchical two-step regression. The results of the regression model are depicted in Table 2.
Results of the Hierarchical Regression Analysis
95% CI for B
Openness to experience
Openness to experience
Extraversion × Conditiona
Openness to E. × Conditiona
Emotionality × Conditiona
Agreeableness × Conditiona
Conscientiousness × Conditiona
Honesty–Humility × Conditiona
Note. All continuous predictors were z-standardized; N = 536; CI = Confidence Interval; LL = lower limit; UL = upper limit.
In the first step of the hierarchical regression, all six HEXACO traits and the experimental factor were entered. In addition to the main effect of the experimental factor, a significant effect was found for agreeableness, t(530) = −3.74, p < .001. As expected in Hypothesis 7, being agreeable was associated with a lower level of eeriness evoked by the detector robots. None of the assumed remaining HEXACO effects reached statistical significance, so Hypotheses 4, 5, and 6 had to be rejected.
The second regression step—which also included interaction terms between the HEXACO dimensions and the assigned condition—revealed no interaction effect for honesty–humility, which led to a rejection of Hypothesis 8. However, unexpectedly, we observed a significant interaction between participants’ conscientiousness and the robot condition, B = .48, SE = 0.19, p = .014, ΔR 2 = .01 (see Figure 2), which was further examined using the SPSS-macro PROCESS (Hayes, 2012). Follow-up analyzes (Aiken & West, 1991) revealed that participants who were low in conscientiousness (−1 SD) perceived the thought detector robot to be significantly eerier than the emotion detector, B = −1.11, SE = 0.25, t(524) = −4.45, p < .001, 95% CI [−1.61, −0.62]. In contrast, the detector condition had no impact on participants who were high in conscientiousness (+1 SD), B = −0.16, SE = 0.25, t(524) = −0.64, p = .519, 95% CI [−0.66,0.33]. According to the Johnson–Neyman technique, the manipulation of detecting abilities had a significant effect on participants’ perceived eeriness for z-standardized values ≤ 0.54 of conscientiousness. About 69.59% of our participants fell into this significant region.
Corroborating our results from Experiment 1, the thought detector robot was perceived as significantly eerier than the emotion detector robot. Moreover, a significant effect of agreeableness was found: Higher levels in this basic personality dimension were associated with less eeriness ascribed to the detector robots, matching the way this trait had affected user responses in prior human–robot studies (e.g., Chien et al., 2016; Lyons et al., 2020; Takayama & Pantofaru, 2009). As people high in agreeableness typically react in a tolerant and kind-mannered way to outside influences, it comes as little surprise that they also responded more positively to the presented detection robots. At the same time, we were surprised by a lack of noteworthy effects for the remaining HEXACO dimensions. Also, unlike expected, our data did not reveal a significant interaction of the dimension honesty–humility and the robot condition in our moderated regression analysis. Instead, the thought detector robot was generally evaluated as eerier than the emotion detector robot, regardless of participants’ honesty–humility scores.
As a main result of our second experiment, we therefore note that people’s evaluation of detector robots appears to be mostly unaffected by their fundamental personality traits. Arguably, this implies that the notion of sophisticated analysis robots may cause unease in a rather universal way, emerging as a strong challenge to people’s idea of a good, unthreatening machine.
It should be noted, however, that our data yielded an unexpected interaction effect regarding another HEXACO trait: The higher participants scored in conscientiousness, the smaller was the difference between the eeriness ratings for the two detector robots. In our interpretation, this might be explained by the specific characteristics of highly conscientious individuals, who tend to put a strong emphasis on (cognitive) achievement and performance, while considering overt emotions as detrimental for success (Witteman et al., 2009; for an overview of the interplay of conscientiousness and negative affect see Fayard et al., 2012; Javaras et al., 2012). Further research is needed to find out how human conscientiousness influences interactions with robots—and to scrutinize the robustness of the uncovered interaction effect.
Robots and artificial intelligence are considered key technologies for the societies of today—even if not all prophecies made in science fiction have materialized (yet). User responses to these advanced technologies are of basic and applied relevance. Connecting the mind perception literature (Gray et al., 2007) and the uncanny valley hypothesis (Jentsch, 1906/1997; Mori, 1970), research on human–machine interactions has demonstrated that robots who are ascribed human mind elicit negative responses such as eeriness (e.g., Stein & Ohler, 2017). Importantly, machines with emotions (experience) were found to be more aversive (Appel et al., 2020; Gray & Wegner, 2012; Taylor et al., 2020) than machines with thoughts and plans (agency). Unlike previous research that was primarily focused on user responses to mind in a machine, we focused on a reversed perspective—the evaluation of machines capable of reading the human mind. Following our data analysis, we report that our main assumption held true across two experiments: In the realm of mind-reading machines, a thought detector is perceived as eerier than an emotion detector. With this fascinating outcome, we suggest that our results clearly advance the investigation of the uncanny valley of mind (Kang & Sundar, 2019; Stein & Ohler, 2017), both by shifting its overarching perspective and by introducing an important cognitive component. Offering further support for this main result, our second experiment showed that the stronger aversion against thought-detecting machines remained independent of several basic HEXACO personality dimensions. To us, this suggests that being apprehensive towards the concept of thought detection connects most humans regardless of their personality dispositions.
Proceeding to a psychological interpretation of our findings, we suggest that the need to perceive oneself as being in control is as important for human–robot interactions as it is for human–human interactions; potentially even more so. This desire for control, however, may be harmed by robots that appear able to look into the human mind. While we are used to sharing (and hiding) our emotions during many daily life interactions, it turns into a much more delicate matter if robots or other Artificial Inteligence (AI)-based systems start to correctly infer what its user is thinking; in a dystopian scenario, this information could quickly be used against the human user in question, for instance in a job assessment or law-related context. Considering that the fear of artificial intelligence turning against humans has been named as a central caveat of human–computer interaction research (Cave & Dihal, 2019), even the most pessimistic imaginations should probably be kept in mind when designing detector robots. Based on our findings, we recommend that developers of robotic and AI systems strive for absolute transparency regarding the capabilities of their created products and machines. Privacy guidelines should always be incorporated to make sure that the detecting entity does not share the results of its analysis with third parties; in all likability, this will help to alleviate the apprehension among potential users.
We note several limitations of the current experiments, which might also offer inspiration for future work. First, the observed mean eeriness ratings ranged between 3 and 4 on a 7-point scale, implying that the robot descriptions did not elicit particularly strong eeriness among participants. We assume that the online survey methodology paired with written text manipulations increased participants’ psychological distance to our stimuli, thus preventing stronger emotional reactions. Similarly, since we (purposely) did not offer any information about the robots’ appearance, some participants might have imagined a very friendly looking or cute machine, which might have “softened” the eeriness evoked by our mind manipulation.
Second, we did not specify which emotions or thoughts could be detected by the robot. Emphasizing the detection of negative feelings or cognitions, for example, could have increased eeriness ratings in a notable manner, as participants might see it as more discomforting to have their sadness, anxiety, or anger discovered. A similar notion concerns the reading of thoughts, as it appears highly likely that some cognitions might be more sensitive or confidential for us than others. Hence, future research is encouraged to examine differences in users’ experience and evaluations in response to robots detecting different thoughts and emotions.
Lastly, we believe that the methodological approach of using written vignette texts as stimuli deserves particular attention. While we still consider it as a very useful way of putting the mental abilities of a machine front and center, it might be worth considering to also show pictures or even focus on live interactions with real robots in order to advance the discussed line of research. Doing so, fascinating interaction effects between the robots’ mental capacities and its specific embodiment could be found, as suggested by another recent study (Stein et al., 2020). Building upon this, the influence of thought detection or emotion detection could also be explored in very different contexts: For instance, we strongly believe that a robot’s capability to detect aspects of human mind will be evaluated differently in court cases, a therapeutic setting, nursing scenarios or smart homes (Thakur & Han, 2018). This way, evidence on the generalizability of the reported main effect could be gathered. Along the same lines, it should be explored whether the stronger aversion against a thought-detecting machine also persists in other cultures, as all participants taking part in our online experiments were recruited in the United States. Specifically, it might make sense to focus on participants from more collectivistic societies in future efforts, as the stronger social interdependence in the respective countries might also modulate the desire to avoid having one’s mind read by another entity.
As cherished in the German folk song mentioned at the beginning of this paper (Die Gedanken sind frei), humans seem to truly appreciate the fact that their thoughts may roam free, without the risk of insulting others or having to admit one’s secret desires. Accordingly, we found that the concept of thought-detecting machines—a hypothetical notion that does not seem so far removed from reality anymore, considering current technical developments—elicits significantly more unease than the concurrent idea of a robot analyzing human feelings. While this psychological observation may give developers pause or make them question the ethical boundaries of their innovations, it may also be possible to pave a path for well-accepted thought detectors; as long as control perceptions are kept in mind, people might get used to this novel experience after all.