Abstract

Virtual technology provides educators a unique opportunity to offer students’ racial embodiment experiences, operating an avatar of a different racialized experience than their own. A few studies using headset-based virtual technology demonstrate such experiences can be beneficial for reducing implicit bias; we extend this line of inquiry for personal computer–based virtual environments (PVEs) which would be ideal for teaching in a distance learning capacity given their relative affordability and accessibility. In two previous pilot studies, we found only minimal support for the positive impact of racial embodiment experiences in PVEs. Here, we conducted a third study using a more tightly controlled experimental design, with higher power, and dependent measures (i.e., Implicit Association Tests) that are less vulnerable to social desirability effects. Overall, our hypotheses were not supported; White participants (N = 170) who operated an Asian or Black avatar did not demonstrate significantly lower implicit bias, higher ethnocultural empathy, or higher awareness of racism compared to those who operated White avatars. Peripheral to the tests of hypotheses, there was preliminary support for our newly developed brief Implicit Association Tests that may be of interest to future researchers. Null findings are discussed in relation to the broader literature on racial embodiment.

Keywords: virtual worlds, racial embodiment, Black/African American, Asian/Asian American, social justice education

Acknowledgments: The authors would like to thank the following research assistants: Jared Block, Anushka Chakrabarti, Eunice Chen, Elisabeth Eappen, Mumtaz Fatima, Jessica Fossum, Erika Garcia, Fiona Hinds, Yuqing Jiang, Sophia Jung, Kathleen Lamarque-Navarrete, Chrissy Liang, Shula Mathew, Yuhan Mei, Deana Moghaddas, Carolyn Moor, Jamie Murray, Emma Snowden, Tiffany Sun, Amy Tang, Jocelyn Walker, Maryam Ware, Nickie Yang, and Petra Yang.

Disclosures: Questions regarding this article should be addressed to John Tawa ([email protected]).

The author has no conflicting interests that might be interpreted as influencing the research, and American Psychological Association ethical standards were followed in the conduct of this study.

Data Availability: Materials related to this study, including data files, syntax, and study stimuli are publicly available on the Open Science Framework at the following link: https://osf.io/3d6ba/ (Tawa & Montoya, 2021).

Correspondence concerning this article should be addressed to John Tawa, Department of Psychology and Education, Mount Holyoke College, 50 College Street, South Hadley, MA 01075, United States [email protected]

Social justice educators frequently employ class exercises designed to help students access experiences of oppression from a more subjective perspective (Plous, 2000). For example, students may be asked to imagine themselves from the perspective of a person who experiences racism during a job interview. Indeed, psychological experiments have demonstrated that imagining one’s self from the first-person perspective of another can increase empathy (Galinsky & Moskowitz, 2000); for example, college students who were instructed to write about a day in the life of an elderly man in a photograph from a first-person perspective demonstrated more empathy and less prejudice toward elderly individuals than those who were not instructed to write from a first-person perspective (Galinsky & Moskowitz, 2000). Yet, first-person perspective-taking is cognitively demanding and not all participants may be motivated to make this cognitive effort (Bailensen, 2018). Virtual technology gives users access to computer-simulated social environments and offers educators a unique opportunity to provide students with interactive first-person social experiences that do not rely on students’ imagination. Virtual technology includes both personal computer virtual environments (PVE), which are viewed directly on the computer screen, and immersive virtual environments (IVE), which are viewed through a headset that encompasses the user’s entire field of vision. PVEs may be particularly appealing to educators because they are more accessible to students, and online platforms such as Second Life (https://www.secondlife.com) can be accessed for free and remotely in the case of distance learning. Research using PVEs have found that people’s behavior in virtual environments is similar to their behavior in real-life environments (Eastwick & Gardner, 2009; Friedman et al., 2007; Hasler & Friedman, 2012; Tawa et al., 2015, 2020; Vang & Fox, 2014). For example, avatars (i.e., first-person perspective characters operated by the user) tended to hold realistic interpersonal distances, initiating interactions at approximately 5 virtual meters, and moving closer for more personal engagement (Friedman et al., 2007). Relevant to our study is research finding that avatars’ racial appearance impacts social interactions (Eastwick & Gardner, 2009; Tawa et al., 2015, 2020; Vang & Fox, 2014); for example, avatars were more likely to comply with a request for a favor when approached by White compared to Black avatars (Eastwick & Gardner, 2009). Social justice educators may be particularly interested in the potential for virtual environments to offer students’ embodiment experiences or interactive social experiences while operating an avatar with a visible identity that is different from one’s own (e.g., a White person piloting a Black avatar).

Virtual Embodiment for Prejudice Reduction

Thus far, empirical support for avatar embodiment for prejudice reduction has yielded mixed results. Participants who embodied a Black rather than White avatar while following the movement of a Tai Chi instructor demonstrated reductions in implicit bias which were sustained in a 1-week follow-up (Banakou et al., 2016). Decreases in implicit bias were also supported in a study in which participants embodying a dark-skinned avatar were approached by passer-byers (Peck et al., 2013). Another study, however, found that participants who embodied a Black compared to White avatar during a mock job interview had higher levels of implicit bias toward Blacks (Groom et al., 2009). One possibility for this discrepant finding is the relatively older technology used in Groom’s study, which may have resulted in a less realistic and immersive experience for participants through which embodiment could occur (Peck et al., 2013). Alternatively, Groom et al. (2009) rationale for their findings is that by embodying a Black avatar, participants’ negative stereotypes of Blacks may have been activated resulting in higher prejudice, particularly given that participants in this study were interacting in a setting in which Blacks may experience stereotyped threat (Steele, 1997).1 Stereotyped threat occurs when people become worried that they will confirm a stereotype of their racial group and is prone to be experienced when the social environment heightens people’s awareness of the stereotyped attribute (Steele, 1997). In lieu of proper postexperience processing, negative experiences in a stereotype-threatening context may simply reinforce stereotypes of the group. Yet, guided by a social justice education framework, we speculate that proper postexperience learning and processing about the concept of stereotype threat may provide an opportunity for embodiment experiences—particularly those within stereotype-threatening contexts—to increase participants’ awareness of racism and empathic response. We intend to provide such explicit learning in our study proposed below.

Given these contradictory findings, more research is needed to evaluate the effectiveness of virtual embodiment for prejudice reduction, specifically with PVEs. While the research discussed above offers some encouraging findings from IVEs (Banakou et al., 2016; Peck et al., 2013), these technologies are cost prohibitive and limit accessibility for students, particularly in the context of distance learning. PVEs can be easily downloaded and operated at home and would be ideal for social justice education in a distance learning capacity. To this end, we have previously conducted two pilot studies on racial embodiment using PVEs; we use the minimally supportive findings from these studies to inform the development of the current proposed study.

PVE Embodiment for Social Justice Education: Two Pilot Studies

In the first study, using a within-subject design, White participants sequentially operated Asian, Black, and White racialized avatars in live social settings in Second Life. In the second study, using a between-subject design, Asian and White participants were randomly assigned an Asian, Black, or White avatar and were given a chance to reflect on their experience in an open-ended survey question (see Appendix, for details on both studies). In both studies, the social environments in which the avatars interacted were attended by predominantly White appearing avatars. A script was attached to each avatar and was used to measure social interaction, operationalized as the average number of avatars they interacted with at each 5-s interval over the duration of the trial. Conceptually, social interaction is a proxy measure of behavioral discomfort during intergroup interactions; previous research has found fewer avatar interactions during social events in Second Life to be correlated to intergroup anxiety (Tawa et al., 2020).

We examined to what extent social interactions while operating different race avatars predicted our outcome variables of interest: awareness of racism (Neville et al., 2000) in Studies 1 and 2, and empathy toward people of a different race (i.e., ethnocultural empathy; Wang et al., 2003) in Study 2. To briefly summarize our findings, in Study 1, for White participants, having more social interactions as a White avatar relative to an Asian avatar was related to higher awareness of racism. Consistently in Study 2, for White participants only, less social interaction as an Asian avatar was related to higher awareness of racism. Notably, in both studies, hypotheses for our pilot studies were only supported for White participants’ awareness of racism in relation to operating an Asian avatar. In neither study did operation of a Black avatar impact awareness of racism.

One possible reason for the effect of Asian avatar interactions is that in both studies, the context in which avatars interacted was a social environment (a bar and a jazz club) that could have evoked stereotype threat of Asians as socially awkward and asocial (Lin et al., 2005). These findings, however, stand in opposition to Groom et al. (2009) study in which Black stereotype-threatening environments led to increases in implicit prejudice toward Blacks. We speculate two reasons for this discrepancy: First, in our studies, participants operated multiple avatars which allowed for immediate comparison of difference in social experiences by different race avatars; second, we offered (in Study 2) an opportunity for participants to reflect on their experiences before completing the awareness of racism survey. In the present study, we focus on the latter possibility and emphasize providing participants with postexperiencing processing and education around stereotype threat. In Study 2, all hypotheses related to ethnocultural empathy were not supported, in fact, for Asian participants specifically, more social interaction as a Black or White avatar corresponded to lower empathy. In this case, perhaps Asian participants’ recognition of the relatively lesser stereotyped threat in these social environments (i.e., bar and jazz club) as a Black or White avatar led to decreases in empathy toward these groups.

Overall, findings from Studies 1 and 2 provide only minimal support for our hypotheses regarding the capacity for embodiment experiences to lead to prejudice reduction. Clearly, more research is needed before determining whether adopting PVE embodiment applications in classrooms would be beneficial. We suspect that one reason for our minimally supported findings is that we did not provide an affectively impactful enough experience, particularly evidenced by our lack of findings in relation to ethnocultural empathy. The role of affect can be pivotal for attitude change; indeed, Pettigrew’s (1998) intergroup process model speculates on the importance of emotions experienced during intergroup contact encounters for attitude change. We feel the impact of our design was particularly challenged by our inability to control the other avatars in the study with whom participants interacted given that they were operated by other live users who may or may not have enacted prejudicially toward our participants.

Based on evaluation of previous research and our pilot studies, we proposed a third study seeking to address the following: First, we addressed the issue of our compromised experimental control that resulted from relying on social interactions with other live avatars as a proxy for stereotyped threat. Second, previous studies have primarily used implicit biases as an outcome variable (Banakou et al., 2016; Groom et al., 2009; Peck et al., 2013), which may minimize socially desirable responding. Third, although in Study 2 we did offer a chance for participants to reflect on their experiences, we did not provide participants with any direct postexperiment learning and processing specifically related to stereotype threat which we hypothesized earlier would lead to successful prejudice reduction and ultimately decreases in empathy. In what follows, we proposed a third iteration of our study, which aimed to address these challenges by having participants interact in small group settings in which other avatars are controlled by confederate research assistants (RAs), by employing an implicit bias test as an outcome variable, and by providing postexperiment learning experiences.

In the third study, the small group settings were designed to evoke stereotyped threat experiences specific to Black and Asian experiences. In the Black stereotype threat (BST) condition, the group was tasked with determining how a small college should allocate seats to applicants, evoking stereotypes of Blacks as lacking in intellectual merit (Steele, 1997). In the Asian stereotyped threat (AST) condition, the working group was tasked with planning social distance protocols for a small community in the midst of a pandemic, evoking stereotypes of Asians as associated with disease (Rzymski & Nowicki, 2020). All participants also engaged in a postexperience processing phase, guided by social justice education models that emphasize the importance of educational processing to help students’ transform affectively laden experiences into prosocial attitudes and behaviors (Suyemoto et al., 2009).

In the third study, we aimed to test the following hypotheses:1White participants who operate Asian and Black avatars will show the following outcomes relative to participants who operate a White avatar, particularly in the matched threat condition:ahigher ethnocultural empathy; b higher awareness of racism; c lower implicit bias (anti-Black when compared to White participants operating Black avatars and anti-Asian when compared to White participants operating Asian avatars). 2 Among those in the stereotyped threat condition, the relationships between avatar race and implicit bias and awareness of racism will be mediated by ethnocultural empathy. Specifically:aAmong those in the Black stereotyped threat condition, the relationship of Black avatar assignment and anti-Black implicit bias and awareness of racism will be mediated by ethnocultural empathy, where being assigned a Black avatar results in greater ethnocultural empathy, which then predicts lower anti-Black implicit bias and higher awareness of racism. b Among those in the AST condition, the relationship of Asian avatar assignment and anti-Asian implicit bias and awareness of racism will be mediated by ethnocultural empathy, where being assigned an Asian avatar results in greater ethnocultural empathy, which then predicts lower anti-Asian implicit bias and higher awareness of racism.

Method

Overview of the Method

Each participant operated one avatar (i.e., Asian, Black, or White) within one of two social conditions comprising an incomplete 2 × 3 between-subject design. The two social conditions were intended to evoke stereotype threat specific to Asians and Blacks. In the AST condition, the working group was tasked with planning social distance protocols for a small community in the midst of a pandemic, evoking stereotypes of Asians as associated with disease (Rzymski & Nowicki, 2020). In the AST condition, participants were assigned to either an Asian or White avatar, no participants were assigned to Black avatars. In the BST condition, the group was tasked with determining how a small college should allocate seats to applicants, evoking stereotypes of Blacks as lacking in intellectual merit (Steele, 1997). In the BST condition, participants were assigned to either a Black or White avatar, no participants were assigned to Asian avatars. The task lasted 10 min. Participants interacted in a small “working group” comprised of four avatars (two White, one Black, and one Asian). Participants were told that they would work in a group setting with other participants; in actuality, the three other avatars were operated by RAs.

Avatar and Condition Development

Avatar Development

Each study trial was comprised of one Asian, one Black, and two White avatars. During any one trial, participants operated the Asian, Black, or one of the White avatars and the remaining avatars were operated by RAs. At the sign-up stage, participants indicated which gender avatar they most closely identified; if it was a male avatar, then the study trial was comprised of all male avatars; if it was a female avatar, then the study trial was comprised of all female avatars. Thus, to conduct this study, a total of six participant avatars (a male and female version of an Asian, Black, and White avatar; see “Avatar Headshots” file at https://osf.io/3d6ba/) and eight confederate avatars (male and female versions of “sets” of confederate avatars including two White, one Black, and one Asian) were created.

Avatars were created by shopping in Second Life for “skins,” avatars that have been developed for sale by third-party Second Life developers and are typically marketed with specific racial descriptors (e.g., “Asian skins”). RAs visited locations in which skins were sold and took screenshots of possible avatars. A total pool of prospective avatars included 16 Asian female, 15 Asian male, 25 Black female, 28 Black male, 36 White female, and 24 White male. Through discussion among the principal investigator and RAs, we narrowed down finalists to 28 avatars (7 Asian, 8 Black, and 13 White); these primarily comprised the avatars that we felt were most realistic and that minimized any exaggerated racialized features or cartoonish appearances. We then created an online survey which enabled lab members and volunteers to rate each of the remaining 28 avatars on their racial prototypicality (i.e., the extent to which they were perceived as Asian, Black, and White) as the extent to which they were perceived as “angry,” “sad,” and “happy.” This survey was completed by 13 lab members and volunteers. Decisions about the final 14 avatars were made by selecting those for each race and gender with relatively high scores on racial prototypicality for their race and low scores for being perceived as “angry,” “sad,” and “happy.” Among the Asian avatars, average scores (on a scale from 1 to 6 with 6 indicating greater endorsement) were 5.23 (SD = .51) for racial prototypicality, 2.04 (SD = .99) for angry, 2.04 (SD = .99) for happy, and 2.09 (SD = .91) for sad. Among the Black avatars, average scores were 5.56 (SD = .48) for racial prototypicality, 1.85 (SD = .76) for angry, 2.07 (SD = .91) for happy, and 2.16 (SD = 1.03) for sad. Among the White avatars, average scores (from 1 to 6) were 5.08 (SD = .47) for racial prototypicality, 2.09 (SD = .86) for angry, 2.18 (SD = .91) for happy, and 2.23 (SD = 1.04) for sad.

Task instructions were provided to participants through the chat function in Second Life and were delivered by a “ghost” avatar who attended the trial wearing an invisibility cloak so that they were not visible to participants. This avatar also monitored the trial and made any notes of occurrences that might invalidate data, for example, if participant avatars wandered away from the social interaction group.

Condition Development

In order to develop the BST and AST conditions, the principal investigator and RAs developed a group task and related scripts for the confederate avatars that could evoke a sense of stereotyped threat for participant avatars if they were assigned Black and Asian avatars. In order to make the social interactions as realistic as possible, the scripted portions of the social interactions generally only comprised the “small talk” at the beginning of the trial, and it was during this small talk that statements were made by confederate avatars that could evoke stereotyped threat particularly within the context of the given group task. For example, in the BST condition, the small group was tasked with making decisions about how to allocate a limited number of seats to White and racial and ethnic minority students. As part of the script, one of the White confederate avatars states: “Uggghhh so basically ur talking about affirmative action, which is bull I didn’t get into my top three colleges of choice bc of it I had high gpa clubs etc … and I know ppl who got in just bc they were black and I know for a fact didn’t work as hard as me in hs. ”In the AST condition, the small group was tasked with developing a safety protocol for a community facing an impending viral pandemic. As part of the script, in this condition, Sophie states: “r we not going to tlk about closing borders ?? i mean thats what we should’ve done from the beginning at least from china.” Following the initial scripted portions, RAs were instructed to more organically continue the conversation while taking on the general demeanor of their assigned avatar. Sophie was instructed to continue to make some arguments (if prompted to) against affirmative action in the BST or the closing of borders in the AST but was encouraged to try and become less vocal in the conversation.

RAs were blind to which avatars the participant and their colleagues were operating. The blinding of the participant avatar assignment to RAs was to ensure that RAs did not make subtle changes in their deliverance of the script and follow-up conversations to support the hypothesized impact; for example, an RA operating Sophie could be more vocal against affirmative action if they knew the participant was assigned a Black avatar in order to have a greater impact on the participant. In order for the RA to truly be blind to participant assignment, the RA also needed to be blind to which confederate avatars were present in the scenario. For example, if the RA knew that one of their colleagues was assigned a confederate Asian avatar, then they could deduce that the participant was assigned a White avatar (since there is only one Asian avatar present in the condition). Thus, blinding RAs to confederate avatar assignment was achieved in two ways: First, each RA had their own google sheet containing their daily avatar assignments (rather than sharing a single sheet across RAs); second, all confederate avatar accounts were configured such that when the user logged in the camera faced the avatar itself rather than the default view which faced behind the avatar and in which all other present avatars would be visible. For the participant accounts, the default view was retained, so that they could see all members of the small group that they interacted with. A headshot of each avatar was also placed in the upper right corner of each participant avatar account’s heads-up display, so that participants would be constantly aware of how they were perceived by others (see “Avatar Headshots” file at https://osf.io/3d6ba/).

Participants

Participants included White adults aged 18 years and older recruited through Prolific.co and from university students. Our target sample size was 240 participants comprising approximately 40 participants in each cell of our 2 × 3 design (2 threat conditions and 3 avatar conditions). This number was determined by a power analysis for a 2 × 3 design with a minimum effect size (η_p²) of .07 and an α level of .05. The minimum effect size was determined based on findings from a previous study using a between-group design for examining avatar conditions on Implicit Association Test (IAT; Groom et al., 2009). The power analysis yielded a recommendation of 215 participants; 240 was selected to be divisible by 6 and allow for possible dropout or exclusion. Based on previous research with Second Life, participant exclusion may occur if data are missing or partial due to technical problems, if participants alter the avatar’s appearance in any way, or if participants do not follow task instructions.

From our final sample of 240 participants, 70 were excluded. One participant (0.4%) was excluded for being too late to the study trial, 27 participants (11.3%) were excluded because of a protocol error made by the principal investigator or a RA (e.g., assigning the wrong avatars, failing to provide log-in information to participants, etc.), five participants (2.1%) were excluded because they wandered away from the social interaction during the trial, two participants (0.8%) were excluded because of a technical malfunction (e.g., one participant was repeatedly logged out of Second Life during the trial), and 35 participants (14.6%) were excluded for failing to complete the survey and IAT following the study trial. Survey data and IAT data were also examined for careless or inattentive response patterns (see planned analyses on page 20; Huang et al., 2012; Nosek et al., 2014; Tawa, 2021); no cases were removed on these bases. Our remaining sample was 170 participants. To determine the potential loss of power, we conducted an effect-size sensitivity analysis to determine what effect size we were well powered (80%) to detect. Ultimately, we conducted the analysis as a 2 × 2 rather than an incomplete 2 × 3 (see below), and so with the 2 × 2 design with 170 participants, we had 80% power to detect an effect size (η_p²) of 07. Suggesting that while we were not able to collect 240 participants, because of the incomplete design, there was no loss of statistical power.

Among the final 170 participants, 45 (26.5%) were assigned to an Asian avatar in the Asian threat condition, 29 (17.1%) were assigned to a White avatar in the Asian threat condition, 52 (30.6%) were assigned a Black avatar in the Black threat condition, and 44 (25.9%) were assigned a White avatar in the Black threat condition. One hundred forty (83.5%) of the participants were recruited from Prolific.co and 28 (16.5%) were recruited from the university. Fifty-five (32.4%) described their gender as male, 100 (58.8%) as female, eight (4.7%) as nonbinary, and one (0.6%) as other. Fifty-eight (34.1%) participants operated a male avatar, and 112 participants operated a female avatar. One hundred fifty-eight (92.9%) of the participants were born in the United States.

Procedure

Recruitment Procedure

An advertisement for the study was posted in Prolific.co and was made visible only to Prolific members who described themselves as racially White and as living in the United States. Interested prospective participants were given access to a Google calendar to sign up for available study times, during which RAs were available to operate confederate avatars. Prior to the scheduled time, participants were asked to download Second Life and complete some basic operations to ensure basic competency with Second Life. From June 14th, 2021, to August 3rd, 2021, we recruited participants from Prolific for the Asian threat conditions, with interested participants being randomly assigned to receive either an Asian or White avatar. Although a purer random assignment strategy would be to randomly assign both threat and avatar conditions, at the time, we only had one Inquisit account (i.e., the platform used to collect online IAT data) which permitted only one active IAT test at a time. Following a discussion in early August of 2021, we implemented two changes in our recruitment strategy. First, given very high no-show rates among prospective participants recruited from Prolific (see percentage below), we decided to supplement our Prolific sample with student participants from one of the author’s universities. Second, we purchased an additional Inquisit account, so that recruitment for both threat conditions could occur simultaneously. Beginning August 3, 2021, were recruited participants from Prolific for the Black threat conditions, with interested participants being randomly assigned to receive either a Black or White avatar. Our recruitment aim at this point was to match the 79 participants who were recruited from Prolific who had attended the Asian threat condition with approximately the same number of Prolific participants for the Black threat condition.

Simultaneously, in August of 2021, at the university, and for this sample, participants were randomly assigned to both threat and avatar conditions. The psychology subject pool was used to recruit participants from a pool of students who receive credit or extra credit for participation in studies in their psychology classes. As such, all participants were university students. The study was only available to participants who indicated on a presurvey, completed by the entire participant pool, that they were racially White. Potential participants were able to sign up for a study time, and presurvey data were used to assign avatar gender.2 Prior to the scheduled time, participants were asked to download Second Life and complete some basic operations to ensure basic competency with Second Life. Participants from the university completed the study between October 22, 2021, and March 10, 2022.

Among a total of 653 prospective participants recruited from Prolific who reserved a time on the Google Calendar, only 165 (25.3%) showed up at the study trial. Among the 68 prospective participants recruited from the university who reserved a timeslot, 50 (73.5%) showed up at the study trial.

General Procedure

As explained to prospective participants during recruitment, participants were provided with the log-in information for their assigned avatar and a picture of their assigned avatar approximately 15 min prior to the start of their study trial. Confederate avatars were asked to log in between 5 min early and exactly on time (the variability in log-in times among confederates was purposeful as simultaneous and early log in by confederates might appear suspicious to participants who arrived early).

Participants were then directed to an online survey which included a video lesson about stereotyped threat theory, an open-ended question asking about a time in which they felt they may have experienced stereotype threat, an online administration of one of two assigned IATs consistent with their stereotype condition, and the survey measures. Completion codes were provided to participants only at the completion of the stereotyped threat video and IAT task to ensure that both were completed before continuing on to the remainder of the survey.

Measures

Awareness of Racism

Awareness of racism was measured using the 20-item Colorblind Racial Attitudes Scale (Neville et al., 2000), which assesses people’s denial or awareness that racism creates a system of advantages for Whites and disadvantages for racial and ethnic minorities. A sample item is “Racial and ethnic minorities do not have the same opportunities as White people in the United States.” In the intended use of this scale, nine of the 20 items should be reverse-scored, so that higher scores indicate greater colorblind racial attitudes. In our use of this scale, we reverse-scored the opposite 11 of 20 items such that higher scores indicate greater awareness of racism (and less colorblind racial attitudes). A Cronbach’s α internal reliability estimate with our sample was .93 (ω = 0.95). Scores on this scale ranged from 1.65 to 6 (M = 4.6, SD = 1.05). If participants did not respond to any items on the scale, they were treated as missing: six participants were missing on awareness of racism. If participants responded to some items, the average of the answered items was used.

Ethnocultural Empathy

Ethnocultural empathy was measured using the 31-item Ethnocultural Empathy Scale (Wang et al., 2003), which measures empathy toward people of racial and ethnic backgrounds that are different from one’s own. A sample item is “I share the anger of those who face injustice because of their racial or ethnic backgrounds.” A higher score indicates great ethnocultural empathy. A Cronbach’s α internal reliability estimate with our sample was .90 (ω = 0.92). Scores on this scale ranged from 3.03 to 5.84 (M = 4.57, SD = .64). If participants did not respond to any items on the scale, they were treated as missing: six participants were missing on ethnocultural empathy. If participants responded to some items, the average of the answered items was used.

Implicit Association Tests

We developed two modified versions of the Brief Implicit Association Test (BIAT; Sriram & Greenwald, 2009) to measure participants’ implicit racism toward Blacks and Asians. The BIAT is distinguished from the more traditionally used IAT in that it is briefer and “focal” categories for attribute (words or images) sorting are positioned in the top middle of the screen rather than on the left and right side of the screen, as in the more traditional IAT. The BIAT offers comparable psychometrics to the IAT (Sriram & Greenwald, 2009).

In the Black IAT test, participants sorted images of Black and White faces into the categories BLACK and WHITE, and word attributes into the adjectival categories GOOD and BAD. In the Asian IAT test, participants sorted images of Asian and White faces into the categories ASIAN and WHITE and image attributes into the adjectival categories DISEASE/SICKNESS and NONDISEASE. Picture attributes for the ASIAN, BLACK, and WHITE categories were images of faces selected from the Chicago Face Database (Ma et al., 2015). This data set is comprised of 597 high-definition photos of Asian, Black, Latino, and White models’ headshots. In addition, each headshot has been rated by an average of 43.75 independent raters for multiple variables, including but not limited to expressiveness (e.g., angry, happy, etc.) and racial prototypicality. Although models were instructed to project neutral expressions, there is inevitably subtle variability in expressiveness. For our test stimuli, we selected four male and four female faces from each racial category (i.e., Asian, Black, and White). We sought faces that would have the highest rating in expressive neutrality and racial prototypicality to maximize the likelihood that participants’ associations of faces with attribute words (e.g., lazy, studious) reflect racial associations rather than unique characteristics of the models. Thus, our selection process proceeded as follows: First, we created an “expressiveness” variable by averaging ratings on afraid, angry, happy, disgusted, sad, threatened, and surprised; lower scores indicated less expressiveness. A descriptive analysis of this variable revealed an average expressiveness score of 2.14 (SD = .27); a score of 1.85 comprised the 15th percentile. Then selecting only from those rated with the lowest 15% in expressiveness, we selected the four males and four females with each racial group who were rated highest with regard to racial prototypicality; this was measured by the percent of raters endorsing the correct racial classification. All of our final 18 selected models had a minimum of a 95% rater endorsement of the correct racial classification.

For the Black IAT, word attributes for each adjectival category were generated by the principal investigator and RAs and included GOOD (marvelous, superb, pleasure, beautiful, joyful, glorious, lovely, wonderful) and BAD (tragic, horrible, agony, painful, terrible, awful, humiliate, nasty). For the Asian IAT, image attributes for each adjectival category were collected by the principal investigator (images for both DISEASE/SICKNESS and NONDISEASE). Both the Black and Asian IATs and the images needed to run them are available for download at the following link: https://osf.io/3d6ba/. These scripts can be run using the Inquisit interface (https://www.millisecond.com/download).

IAT scores could range from 2.0 to −2.0. As scores approach 2, they indicate greater associations between the categories in the hypothesized direction (i.e., Black and Bad, Asian and Disease/Sickness); as scores approach −2, they indicate greater associations between the categories in the opposite of the hypothesized direction (i.e., Black and Good, Asian and Nondisease); and as scores approach 0, they indicate little to no association between any of the categories. In our sample, for participants completing the Black IAT (n = 83), scores ranged from −.82 to 1.09, with an average IAT score of .20 (SD = .40). For participants completing the Asian IAT (n = 69), scores ranged from −.71 to 1.08, with an average IAT score of .33 (SD = .37). On average, participants in our sample made slight associations in the hypothesized directions on both tests. In our sample, no participants had 10% of trials lasting greater than 10,000 ms or less than 300 ms, thus following standard IAT procedures, no participants’ IAT were removed (Nosek et al., 2014).

Registered Data Analyses Plan

Initial data exploration steps were be taken to examine normality and homogeneity of variance, appropriate transformations or adjustments to statistical methods3 were made if normality or homogeneity of variance was violated. Improbable response patterns on surveys were flagged (Huang et al., 2012; Tawa, 2021) and manually examined and subjected to removal. We planned to remove IAT protocols with more than 10% of trials lasting greater than 10,000 ms (Nosek et al., 2014). Hypotheses 1a and 1b were planned to be tested using a 2 (condition) × 3 (avatar race) analysis of variance (ANOVA) on each outcome: ethnocultural empathy (Hypothesis 1a) and awareness of racism (Hypothesis 1b). A contrast comparing White versus Asian and Black avatars indicated whether there are average differences across the two groups, while accounting for differences in condition. A significant contrast would provide support for Hypothesis 1 only if the Asian/Black mean is significantly higher than the White mean. Pairwise comparisons White versus Asian and White versus Black were examined if each group differs from White and may provide partial support for Hypothesis 1. Hypothesis 1c was tested using 2 one-way ANOVAs by avatar race with IAT score as the outcome, one for those participants who completed the anti-Black IAT and one for those participants who completed the anti-Asian IAT. For the anti-Black IAT, a significant contrast of means comparing Black versus White and Asian provided evidence for Hypothesis 1c only if the Black mean was higher than the White/Asian mean. For the anti-Asian IAT, Hypothesis 1c was tested identically, but using participants who completed the anti-Asian IAT, and a contrast comparing Asian versus Black and White. Hypothesis 2 was tested using regression-based moderated mediation analyses (Model 8 using the PROCESS macro for SPSS) and bootstrap confidence intervals for the conditional indirect effects (see Figure 1). Avatar race was the independent variable, Helmert coded to include a Black versus White/Asian indirect effect (2a) and an Asian versus Black/White indirect effect (2b). Hypothesis 2a was supported if there is a significant conditional indirect effect of Black versus White/Asian avatar on anti-Black implicit bias and awareness of racism through ethnocultural empathy, for those in the BST condition. Hypothesis 3a was supported if there is a significant conditional indirect effect of Asian versus White/Black avatar on anti-Asian implicit bias and awareness of racism through ethnocultural empathy.

**Figure 1**
Moderated Mediation Model Testing Hypothesis 2

Results

For ease of analysis, we created two variables to represent the experimental condition: (a) avatar race, which was either “White” or “non-White” and (b) threat condition, which was either BST or AST. This deviates slightly from our Stage 1 submission, as we originally planned to have an avatar race factor which was “White,” “Black,” or “Asian”; however, this original plan led to a partial factorial design which was needlessly complex. By including avatar race, threat condition, and their interaction, we maintain a full factorial design while still modeling the differences between Black and Asian avatar ratings through the interaction.

Modeling assumptions of normality and homogeneity of variance were examined by fitting the proposed statistical models and examining descriptive summaries, plots, and statistical tests for the assumptions (Shapiro–Wilks test for normality and Levene’s test for homogeneity of variance). Deviations from normality were observed in the ethnocultural empathy (p = 0.01) and awareness of racism variables (p ≤ .01), but not the IAT score (p = 0.32). Due to reasonable sample size, we continued with analysis as planned (see exploratory analyses for alternative analysis). Violations of homogeneity were not observed for ethnocultural empathy (p = 0.08), awareness of racism (p = 0.08), or IAT (p = 0.65).

The Effects of Avatar Race on Ethnocultural Empathy and Awareness of Racism

To test Hypothesis 1, we used a 2 (avatar race) × 2 (threat condition) ANOVA on the three outcomes: (a) ethnocultural empathy, (b) awareness of racism, and (c) IAT scores. We were predicting simple effects of avatar race in each threat condition (i.e., White vs. Asian in AST, White vs. Black in BST). We tested this using planned contrasts for each comparison. Anα level of 0.05 was used for all tests.

For ethnocultural empathy, the model showed a nonsignificant main effect of avatar race, F(1, 160) = 0.02, p = 0.89, η² < .01, a nonsignificant main effect of threat condition, F(1, 160) = 0.09, p = 0.76, η² < .01, and a nonsignificant interaction, F(1, 160) = 0.17, p = 0.68, η² ≤ .01. In the AST condition, participants operating Asian avatars did not significantly differ on levels of ethnocultural empathy (M = 4.55, SD = 0.69) compared to those operating White avatars (M = 4.52, SD = 0.52); F(1, 160) = 0.03, p = 0.86, d = −0.06. In the BST condition, participants operating Black avatars did not significantly differ on ethnocultural empathy (M = 4.55, SD = 0.73) compared to those operating White avatars (M = 4.60, SD = 0.59); F(1, 160) = 0.12, p = 0.73, d = 0.07. Overall, this suggests no support for Hypothesis 1a.

For awareness of racism, the model showed a nonsignificant main effect of avatar race, F(1, 160) = 0.03, p = 0.86, η² ≤ .01, a nonsignificant main effect of threat condition, F(1, 160) = 0.59, p = 0.44, η² ≤ .01, and a nonsignificant interaction, F(1, 160) = 0.79, p = 0.38, η² ≤ .01. In the AST condition, participants operating Asian avatars had no different levels of awareness of racism (M = 4.75, SD = 1.03) compared to those operating White avatars (M = 4.56, SD = 0.74); F(1, 160) = 0.75, p = 0.39, d = −0.21. In the BST condition, participants operating Black avatars had no different levels of awareness of racism (M = 4.49, SD = 1.16) compared to those operating White avatars (M = 4.6, SD = 1.15); F(1, 160) = 0.27, p = 0.6, d = 0.09. Overall, this suggests no support for Hypothesis 1b.

The Effects of Avatar Race on Implicit Bias and Awareness of Racism and the Mediation of Ethnocultural Empathy

For implicit bias, the model showed a significant main effect of avatar race, F(1, 148) = 2.23, p = 0.14, η² = 0.02, a nonsignificant main effect of threat condition, F(1, 148) = 4.54, p = 0.03, η² = 0.03, and a nonsignificant interaction, F(1, 148) = 0.3, p = 0.59, η² ≤ .01. In the AST condition, participants operating Asian avatars had no different levels of implicit bias toward Asians (M = 0.3, SD = 0.33) compared to those operating White avatars (M = 0.36, SD = 0.41); F(1, 148) = 0.11, p = 0.74, d = 0.16. In the BST condition, participants operating Black avatars had no different levels of implicit bias toward Blacks (M = 0.14, SD = 0.4) compared to those operating White avatars (M = 0.27, SD = 0.38); F(1, 148) = 2.89, p = 0.09, d = 0.33. Overall, this suggest no support for Hypothesis 1c.

To test Hypothesis 2, we fit a moderated mediation model where threat condition moderated the indirect effect of avatar race on awareness of racism and implicit bias through ethnocultural empathy (see Figure 1). These were fit as separate models to accommodate two outcomes (ethnocultural empathy and awareness of racism), but the same seed was used for the bootstrapping, as fitting these models separately is equivalent to fitting them simultaneously while allowing for correlated residuals. The conditional effects of avatar race on ethnocultural empathy are the same as those reported for Hypothesis 1a. In the model for implicit bias, there was a significant effect of ethnocultural empathy on implicit bias controlling for avatar race and threat condition, such that higher scores on ethnocultural empathy corresponded to lower scores on implicit bias (b = −0.11), t(144) = −2.33, p = 0.02. The direct effect of avatar race was not significantly moderated by threat condition (b = −0.10), t(144) = −0.76, p = 0.45. In the AST condition, the direct effect was nonsignificant such that operating an Asian avatar led to no different levels of implicit bias compared to a White avatar after controlling for ethnocultural empathy (b = −0.05), t(144) = −0.49, p = 0.62. In the BST condition, the direct effect was nonsignificant such that operating a Black avatar led to no different levels of implicit bias compared to a White avatar after controlling for ethnocultural empathy (b = −0.14), t(144) = −1.67, p = 0.1. The conditional indirect effect in the AST was not significantly different from zero (ab = 0, 95% bootstrap CI [−0.04, 0.03]). These results do not support Hypothesis 2a. The conditional indirect effect in the BST was not significantly different from zero (ab = 0, 95% bootstrap CI [−0.03, 0.04]). These results do not support Hypothesis 2b. The index of moderated mediation was not significant suggesting the indirect effect of avatar race on implicit bias through ethnocultural empathy was no different in the AST condition compared to the BST condition (ab = 0, 95% bootstrap CI [−0.04, 0.06]).

Next, we examined the results from the model of awareness of racism. There was a significant effect of ethnocultural empathy on awareness of racism controlling for avatar race and threat condition, such that higher scores on ethnocultural empathy corresponded to higher scores on awareness of racism (b = 1.17), t(159) = 13.05, p ≤ .01. The direct effect of avatar race was not significantly moderated by threat condition (b = −0.2), t(159) = −0.85, p = 0.4. In the AST condition, the direct effect was nonsignificant such that operating an Asian avatar led to no different levels of awareness of racism compared to a White avatar after controlling for ethnocultural empathy (b = 0.15), t(159) = 0.84, p = 0.4. In the BST condition, the direct effect was nonsignificant such that operating a Black avatar led to no different levels of awareness of racism compared to a White avatar after controlling for ethnocultural empathy (b = −0.05), t(159) = −0.33, p = 0.74. The conditional indirect effect in the AST was not significantly different from zero (ab = 0.04, 95% bootstrap CI [−0.3, 0.35]). These results do not support Hypothesis 2a. The conditional indirect effect in the BST was not significantly different from zero (ab = −0.06, 95% bootstrap CI [−0.38, 0.25]). These results do not support Hypothesis 2b. The index of moderated mediation was not significant suggesting the indirect effect of avatar race on awareness of racism through ethnocultural empathy was no different in the AST condition compared to the BST condition (ab = −0.1, 95% bootstrap CI [−0.55, 0.37]).

Discussion

Overall, our hypotheses were not supported. Operation of an Asian or Black avatar in the AST or BST conditions did not significantly impact levels of ethnocultural empathy, awareness of racism, or implicit bias. In our two previous pilot studies, we asked White participants to operate Black, Asian, and White avatars in social environments in Second Life with other live avatars. In both pilot studies, we found that the fewer interactions participants had while operating an Asian avatar (determined by counting chats made by the participant) was related to greater awareness of racism. Given that participants in these studies interacted with other live avatars, we reasoned that the varying levels of chats made by participants may have been a result of variance in the degree of prejudice participants experienced across study trials; some participants may have been actively excluded from interactions thus resulting in very few chats, others may have been more welcome resulting in greater numbers of chats. This line of reasoning led to the development of the present study in which we attempted to standardize the social interactions as participants interacted with research confederates in scripted social interactions. However, our overall lack of support for our hypotheses suggests that there might simply be too much within-person variance in PVEs (even when the experiences are standardized) to support the claim that embodiment experiences in virtual platforms such as Second Life will likely lead to positive gains in developing students’ awareness of racism. Indeed, one notable result of Tawa’s (2017) qualitative exploration of students’ reflection articles after completing virtual embodiment experiences in Second Life was that students had a tremendous variation in endorsement of the project as facilitating their understanding of racism. One explanation for this may be that the perception of increased understanding, as measured in Tawa (2017), does not align with improvements on objecting measures, as measured in the present study. This result would align with previous research in interleaved studying (Semani & Pan, 2021). An alternative explanation would be that there are various individual-level factors that may allow embodiment experiences to lead to positive gains in understanding racism. For example, perhaps embodiment experiences only work for participants who have some willingness to fully “play the part” of their assigned avatar and who already have some level of openness to feeling the impact of the stereotyped threat context. Perhaps then, this is not an ideal project for students who have only recently been introduced to social justice education or students who are working through resistance to seeing or acknowledging racism. If these developmental distinctions were to be established empirically, this project may be more ideal for upper-level seminar social justice courses. Identifying and establishing such factors would be important before educators can confidently turn to incorporating embodiment experiences in their social justice curricula.

For the Black stereotyped condition, we did see lower means for implicit anti-Black bias among those who were assigned Black avatars compared to those who were assigned White avatars. While these mean differences were not significantly different from each other (p = .096), they are perhaps worth mentioning here particularly given that they deviate from Groom et al. (2009) study which found White participants operating a Black avatar to have higher anti-Black implicit bias compared to participants operating a White avatar. Groom et al. reasoned that operating a Black avatar may have activated anti-Black stereotypes. One difference in these study protocols that may have accounted for the differences in findings was that we provided education about stereotyped threat and space for reflection and processing following the social experience in Second Life. At a minimum then, we might suggest that postexperience education and processing can buffer the negative effects of embodiment experiences found by Groom and colleagues; however, this hypothesis should be examined experimentally.

In addition to our primary hypotheses, we conducted a few exploratory analyses. We did find an overall main effect of threat condition on implicit bias scores; participants enrolled in the Black stereotyped threat condition had lower implicit bias scores than those in the AST condition. This finding, however, may reflect overall differences in participants’ Black IAT scores compared to Asian IAT scores and not necessarily the conditions of threat. Thus, we ran some post hoc exploratory analyses to examine correlates of the Black IAT and Asian IAT separately. Both higher levels of ethnocultural empathy and awareness of racism were related to lower levels of anti-Black implicit bias. Neither ethnocultural empathy nor awareness of racism was related to anti-Asian implicit bias.

Thus, our study does make one peripheral contribution related to the development of IAT measurements. For this study, we developed brief versions of both an anti-Asian and anti-Black IAT based on Sriram and Greenwald’s (2009) brief IAT structure. These tests differ from traditional IATs in that the “focal” categories for attribute (words or images) sorting are positioned in the top middle of the screen rather than on the left and right side of the screen and allow the IAT to be completed faster and more efficiently. Brief IATs may be ideal for researchers who are looking to minimize participants’ burden in their studies. Our findings do offer some very preliminary construct validity for these new measures. The overall positive mean d-score of the Asian IAT (d = .33) and Black IAT (d = .20) suggests that our sample is making slight associations between Asian and Disease and also between Black and negative words. Moreover, for the Black IAT, higher scores were related to less ethnocultural empathy and less awareness of racism. From a validity standpoint, these are the directions one would expect for these construct relations if the Black IAT was valid. That ethnocultural empathy and awareness of racism did not relate to the Asian IAT does give us some pause regarding the validity of this particular test. In both cases, more research with both of these measures is needed before they can be deemed valid. Stimuli and Inquisit code for both IATs are available at https://osf.io/3d6ba/.

Methodological Limitations and Future Directions

While our findings overall do not support the positive impact of embodiment experiences, we want to remind readers that this body of research is still relatively young and further research should be undertaken before we abandon the idea. As we suggested earlier, there may be a host of individual-level variables that may impact the effectiveness of embodiment experiences. In addition to uncovering these individual-level variables, researchers should also continue to examine how variations in technologies or applications of these technologies for providing embodiment experiences may also have differing impacts. Our hope for this study was to replicate the positive impact of IVEs for embodiment on prejudice reduction (Banakou et al., 2016; Peck et al., 2013) to more accessible PVEs. One possibility is that PVEs, at least in the way we applied them, are simply not immersive enough to impact prejudice reduction. Yet, perhaps creative educators and researchers can develop protocols for PVEs that retain the accessibility of desktop applications and simultaneously increase the immersion in the embodiment experience. Additionally, it is important to replicate the previous research using IVEs in a preregistered study or ideally as a registered report, as these findings may be influenced by publication bias. In the following, we would like to make transparent a few limitations in our study design; should future researchers attempt a similar study, they may use these reflections to avoid making similar mistakes in their own studies.

First, during this study, we suffered from a severe level of attrition. For example, only about 25% of participants recruited from Prolific who reserved a time on the Google calendar showed up at the study trial. Given that our study design required our confederate RAs to be available online for any time a participant signed up, we believe this 75% no-show rate took a toll on our RAs, impacting morale and occasionally impacting RA’s own compliance in the protocol. During our recruitment phase, we attempted a number of strategies to minimize attrition, including the following: We posted the study only a couple days before the actual study times (to reduce the length of time between participant commitment and actual participation); we made explicit warnings in the study ad stating to participants that if they signed up for a study but did not attend, it took away a spot for other participants; we made explicit in the study ad that participation involved downloading the program Second Life and recommended participants do so as soon as they committed to the study. Nonetheless, our attrition rate persisted. We expect that despite our forewarnings, participants who frequent studies on Prolific are used to completing studies immediately and being paid shortly after, and our extended time to completion was atypical for Prolific participants. Our college sample had considerably less attrition (approximately a 50% no-show rate) and might be a better option for future researchers. Future researchers may also consider recruiting within Second Life (or other virtual world platforms), thus obtaining a sample that is more comfortable with the technology and its requirements.

A second possible limitation of our study was how stereotyped threat and implicit bias were operationalized, particularly within the AST condition. Our study was developed at the height of the COVID-19 pandemic and during a time in which anti-Asian hate crimes were on the rise as a result of people blaming Asians for bring COVID-19 to the United States (Wong-Padoongpatt et al., 2022). In order to evoke stereotyped threat in the Asian condition, we asked group members to develop a community protocol for minimizing the spread of a pandemic virus, during which one confederate used the opportunity to state that U.S. borders should have been closed to Asia. As a relatively new racial dynamic in the United States, we had no previous model for how to evoke this sense of stereotyped threat, and no previous understanding of how enduring or pervasive these stereotypes are, and how deeply internalized they may be among the general public. By contrast, we have seen previous studies successfully model Black stereotype threat using a similar resource competition task as ours (Tawa et al., 2015) and have plenty of empirical support for the endurance and pervasiveness of anti-Black stereotypes related to being perceived as lacking merit (e.g., unintelligence, laziness; Steele, 1997). We would encourage future researchers to consider other ways of operationalizing stereotyped threat related to Asian and disease associations.

A final limitation is that despite our efforts to standardize participants’ experiences across trials, they were not truly standardized. One challenge we faced with the development of the scripts for the confederate avatars was balancing the need for confederates to be able to respond to participants’ chat organically so that it would be convincingly realistic for participants, while providing some structure and scripting so that participants would have some similarity in experience. Our strategy of scripting only the “small talk” at the beginning of the scenarios was a good effort at finding that balance, but nonetheless left considerable time for each trial to vary significantly in text and chat content. Moreover, unexpected technical glitches (e.g., a participant avatar or confederate avatar freezing or being momentarily logged out) would also lead to variation in trial experiences. We would encourage future researchers attempting a similar study design to opt for a simpler design that does not rely on four accounts (the participant’s and three confederates) working seamlessly. Despite our lack of support for our hypotheses, we do hope future researchers will consider our recommendations for future studies and continue to pursue the question of the positive impact of embodiment experiences in virtual environments.

Appendix: Pilot Studies

Study 1

Six racialized avatars were created for this study, including male and female version for each race: Black, Asian, and White. Within each gender, all avatars were created identically with the same height, body shape, and clothes. Only their faces and their racial appearance differed. Participants included 22 White college students. Male participants piloted the male avatars, and female participants piloted the female avatars; the Asian, Black, and White avatars were operated for 5 min each in a randomized order. A laminated headshot of the avatar being piloted was placed in the participants’ view directly to the left of the keyboard, so that the participant was constantly aware of how they appeared to others. Participants interacted with live Second Life users in the Blarney Stone Irish Pub, a bar with ethnically Irish themed décor populated by predominantly White appearing avatars. Participants were instructed to socialize as they normally would in a bar setting. A script attached to each avatar recorded the number of other avatars within a 5-m radius. This script triggered at 5 s intervals and was used to construct a social interaction measure, operationalized as the average number of avatars they interacted with at each 5-s interval over the duration of the trial. Upon completion of the virtual component of the study, participants completed survey measures including an awareness of racism measure (Neville et al., 2000). Participants were paid $10 and entered into a drawing for a $200 Amazon.com gift card.

Hypotheses

Black avatars will have lower social interaction scores than White avatars.
Asian avatars will have lower social interaction scores than White avatars.
Fewer interactions with Black avatars will be related to higher awareness of racism scores.
Fewer interactions with Asian avatars will be related to higher awareness of racism scores.
Greater difference between Black avatar social interaction and White avatar social interaction will be related to higher awareness of racism.
Greater difference between Asian avatar social interaction and White avatar social interaction will be related to higher awareness of racism.

Results

A repeated-measures ANOVA determined that social interaction scores did not differ by the order in which avatars were operated, F(2, 20) = .26, p < .05, η_p² = .01. Two separate repeated-measures ANOVAs were run to compare Asian to White and Black to White social interaction scores. Neither social interactions as a Black avatar, F(1, 21) = 1.55, p < .05, η_p² = .07, nor social interactions as an Asian avatar, F(1, 21) = .02, p < .05, η_p² = .001, were significantly different from social interactions as a White avatar. Social interactions as Asian (r = −.36, p = .09), Black (r = −.21, p = 35), and White avatars (r = .02, p = .91) were unrelated to awareness of racism. Two difference scores were created by subtracting Asian interaction scores from White interactions scores (WA) and then by subtracting Black interaction scores from White interaction scores (WB). WB was unrelated to awareness of racism (r = .14, p = .54). WA was positively related to awareness of racism (r = .43, p = .04). When White avatars had more interactions than Asian avatars, participants reported higher levels of awareness of racism.

Study 2

In our second study, we expanded our sample to include both White (n = 30) and Asian (n = 40) participants (all female), increased the duration of time in which they operated each avatar (10 min), and included an additional outcome variable of interest: empathy toward people of a different race (i.e., ethnocultural empathy; Wang et al., 2003). In order to allow for longer avatar duration without increasing participant burden, each participant was randomly assigned their first avatar and then operated a same-race avatar as their second (i.e., a White participant operated a White avatar). In this study, participants’ social interactions occurred in “Frank’s Jazz Club” also a predominantly White environment. Like Study 1, social interaction data were collected with our script at 5 s intervals. After completing both avatar trials, participants completed survey measures including an awareness of racism measure (Neville et al., 2000) and ethnocultural empathy (Wang et al., 2003). In addition, in this study, participants were given an open-ended dialogue box to reflect on their experience in Second Life, to encourage postexperience processing. Participants were paid $10.

Hypotheses

Black and Asian avatars will have lower social interaction scores than White avatars.
The relationship between first avatar (i.e., randomly assigned) social interaction scores and awareness of racism will depend on first avatar assignment and participant race, such that:aAmong Asian participants assigned a Black avatar, lower social interaction with their Black avatar will be related to higher awareness of racism. b Among White participants assigned an Asian or Black avatar, lower social interaction with their first avatar will be related to higher awareness of racism.
The relationship between first avatar (i.e., randomly assigned) social interaction scores and ethnocultural empathy will depend on first avatar assignment and participant race, such that:aAmong Asian participants assigned a Black avatar, lower social interaction with their Black avatar will be related to higher ethnocultural empathy. b Among White participants assigned a Black avatar, lower social interaction with their first avatar will be related to higher ethnocultural empathy.

Results

An ANOVA determined that neither participant race, F(1, 58) = 1.64, p < .05, η_p² = .03; first avatar race, F(2, 58) = .45, p < .05, η_p² = .02; nor the interaction of participant race and first avatar race, F(2, 58) = .18, p < .05, η_p² = .01, was related to social interaction scores with the first avatar. First avatar interaction scores were unrelated to awareness of racism (r = .01, p > .10) and ethnocultural empathy (r = .06, p > .10). A two-way interaction examined how participant race and first avatar race impacted the relationship between first avatar social interaction and awareness of racism. The interaction between participant race and first avatar interaction for awareness of racism was marginal, t(1, 58) = −1.77, p = .08; among White participants who used an Asian avatar, less social interaction with the Asian avatar was marginally related (t = −1.92, p = .06) to more awareness of racism. The analysis was repeated for the outcome ethnocultural empathy. The interaction between participant race and first avatar interaction scores on ethnocultural empathy was significant, t(1, 58) = −2.01, p < .05; among Asian participants who used a Black avatar first, more social interaction with the Black avatar was marginally related (t = 1.93, p = .06) to more ethnocultural empathy. Also, among Asian participants only who used a White avatar first, more social interaction with the White avatar was related (t = 2.58, p = .01) to more ethnocultural empathy.

Racial Embodiment in Virtual Environments for Facilitating Students’ Understanding of Racism

Abstract

Virtual Embodiment for Prejudice Reduction

PVE Embodiment for Social Justice Education: Two Pilot Studies

Method

Overview of the Method

Avatar and Condition Development

Avatar Development

Condition Development

Participants

Procedure

Recruitment Procedure

General Procedure

Measures

Awareness of Racism

Ethnocultural Empathy

Implicit Association Tests

Registered Data Analyses Plan

Results

The Effects of Avatar Race on Ethnocultural Empathy and Awareness of Racism

The Effects of Avatar Race on Implicit Bias and Awareness of Racism and the Mediation of Ethnocultural Empathy

Discussion

Methodological Limitations and Future Directions

Appendix: Pilot Studies

Study 1

Hypotheses

Results

Study 2

Hypotheses

Results

Copyright © 2023 The Author(s)

Received June 2, 2020
Revision received September 26, 2022
Accepted September 28, 2022

Racial Embodiment in Virtual Environments for Facilitating Students’ Understanding of Racism

Abstract

Virtual Embodiment for Prejudice Reduction

PVE Embodiment for Social Justice Education: Two Pilot Studies

Method

Overview of the Method

Avatar and Condition Development

Avatar Development

Condition Development

Participants

Procedure

Recruitment Procedure

General Procedure

Measures

Awareness of Racism

Ethnocultural Empathy

Implicit Association Tests

Registered Data Analyses Plan

Results

The Effects of Avatar Race on Ethnocultural Empathy and Awareness of Racism

The Effects of Avatar Race on Implicit Bias and Awareness of Racism and the Mediation of Ethnocultural Empathy

Discussion

Methodological Limitations and Future Directions

Appendix: Pilot Studies

Study 1

Hypotheses

Results

Study 2

Hypotheses

Results

Copyright © 2023 The Author(s)

Received June 2, 2020Revision received September 26, 2022Accepted September 28, 2022

Received June 2, 2020
Revision received September 26, 2022
Accepted September 28, 2022