Volume 3, Issue 3: Fall 2022. DOI: 10.1037/tmb0000076
Can a virtual reality (VR) simulation promote acquisition of scientific skills with real-life practicability? In order to answer this question, we conducted (I) an online study (N = 126) and (II) a field study at a high school (N = 47). Study I focused on the instructional design of VR by comparing the effects of different pedagogical agents on acquiring pipetting skills. We found no significant differences between the conditions, that is, it did not seem to make a difference whether the pedagogical agent was present or not, or if it demonstrated the procedure or not. Study II focused on transfer of skills learned in VR to real-life with the addition of a control group who were taught by a real-life instructor. The results indicated that performance in VR can predict performance on a real-life transfer test. However, comparisons between the two groups showed that the students who received virtual training made more errors, experienced more extraneous cognitive load, and learned less compared to the students who were taught by the real-life instructor. Across both studies, all students experienced an increase in self-efficacy from prior to after the intervention, although the students taught by the real-life instructor experienced the largest increases in Study II. Hence, VR should not replace traditional ways of teaching scientific procedures. Rather, it can be a complement to traditional teaching that can increase accessibility.
Keywords: virtual reality, pedagogical agents, learning, transfer, science education
Special Collection Editors: Jeremy N. Bailenson and Richard E. Mayer.
Action Editor: Richard Mayer was the action editor for this article.
Acknowledgments: The authors would like to thank Michael Atchapero for leading the development of the software and Rokoko for lending us their equipment to create the animations. The authors would also like to thank Per Størup Lauridsen and Claus Scheuer-Larsen for their involvement in the project, as well as Elisabeth Sejbak, Adéla Plechatá, and Zuzanna Bald for their efforts in Study II. Lastly, the authors would like to thank all the people who participated in the study.
Disclosures: The authors have no conflicts of interest to disclose.
Data Availability: The data that support the findings of this study are available via Open Science Framework (Klingenberg & Petersen, 2022; Petersen & Klingenberg, 2022). Study I: https://osf.io/ruxab/ and Study II: https://osf.io/e87fp/.
Correspondence concerning this article should be addressed to Gustav Bøg Petersen, Department of Psychology, University of Copenhagen, Øster Farimagsgade 2A, 1353 Copenhagen, Denmark. Email: email@example.com
Consider learning about scientific procedures in a virtual environment with no limits on the amount of attempts possible and no hazardous consequences if you make a mistake. Now add to that your own private virtual instructor who can demonstrate correct performance and assist you if you require help. This research article describes two experiments that deal with learning how to use a pipette via the technology of virtual reality (VR). This article investigates the following research questions:1Do virtual instructors (hereinafter referred to as pedagogical agents) influence the acquisition of pipetting skills in VR? 2 Do pipetting skills learned in VR transfer to real-life?
In the following sections, we present the theoretical and empirical background to this investigation. First, we provide an overview of the latest research on the usefulness of VR in education. This is followed by a section on simulation-based learning and transfer. Finally, we provide an introduction to pedagogical agents and two underlying learning theories that can describe their effectiveness: social cognitive theory and cognitive load theory.
VR technologies are currently used across a variety of disciplines for training and instructional purposes (Radianti et al., 2020). In general, simulation-based learning in VR is associated with large positive effects on learning outcomes, as indicated by a recent meta-analysis, Hedge’s g = 0.85 (Chernikova et al., 2020). Another meta-analysis suggests that learning through immersive VR is more effective than learning through nonimmersive approaches, with a small positive effect, Hedge’s g = 0.24 (Wu et al., 2020). Focusing on extended reality (XR) training experiences (i.e., training via VR, augmented reality, or mixed reality), Kaplan et al. (2021) performed a meta-analysis and concluded that XR is as effective as traditional training approaches in terms of performance in the real world. This is valuable to VR training, as it indicates that we can profit from known VR affordances, such as practicing tasks that are expensive, dangerous, or risky (Dalgarno & Lee, 2010), without cost-to-performance outcomes. Yet, research is still needed to determine exactly when, how, and where the implementation of VR simulations benefits learning (Parong & Mayer, 2021).
Recent reviews of VR applications for education indicate that VR is especially useful for teaching procedural–practical knowledge (Jensen & Konradsen, 2018; Radianti et al., 2020) and for immediate transfer of behavioral skills (Checa & Bustillo, 2020). One explanation for this is that VR allows for contextualized learning experiences that facilitate transfer of skills to real-life situations (Dalgarno & Lee, 2010; Di Natale et al., 2020). A systematic review by Concannon et al. (2019) recommends using VR in postsecondary school education for experiential learning. This is consistent with the notion presented in Bailenson et al. (2008) that one of the primary learning affordances of three-dimensional learning environments is to facilitate experiential learning, especially of tasks that are too expensive or dangerous to perform in the real world. This makes VR an obvious choice for teaching scientific procedures, such as the skill of pipetting, which is the focus of the present study.
It can be argued that most simulation-based learning aspires to facilitate transfer—that is, the ability to apply what one has learned in different contexts (Bossard et al., 2008). The notion of transfer of training is particularly relevant for the present investigation, as it is measured by performance of a target task (Bossard et al., 2008).
Virtual simulations can facilitate transfer (Makransky, Borre-Gude, et al., 2019). Nevertheless, one relevant question is how identical the simulated behavior has to be to the actual target behavior to enable transfer of training, especially when dealing with a technique such as pipetting that requires dexterity. According to Levac et al. (2019), the fidelity of virtual environments can enhance the extent of transfer. Fidelity refers to “the precision with which a virtual environment imitates interactions in the natural environment” (Levac et al., 2019, p. 11). Hochmitz and Yuviler-Gavish (2011) distinguish between two types of fidelity in simulations for procedural skills acquisition: physical fidelity and cognitive fidelity. Physical fidelity refers to the physical similarity between the simulator and the real world, whereas cognitive fidelity refers to similarity of cognitive activities (Hochmitz & Yuviler-Gavish, 2011). An ideal VR system (Slater, 2009) could provide extremely high levels of both physical and cognitive fidelity by means of visual, auditory, and haptic displays (and possibly also taste and smell displays), and thereby provide ideal transfer conditions. Unfortunately, most consumer VR systems today, such as the Oculus Quest used in the present investigation, mostly cater to the visual and auditory senses. Study II examines transfer of pipetting skills learned in VR to a real-life task.
In VR where head-mounted displays may shield learners from the real world, real-life adult or peer models may be replaced by virtual instructors. Virtual instructors used for pedagogical reasons are often referred to as pedagogical agents: human-like characters used in electronic learning environments to serve various instructional goals (Veletsianos et al., 2009). It is commonly accepted that a pedagogical agent needs to be physically present on screen as opposed to being represented by just a voice (Heidig & Clarebout, 2011). However, current evidence indicates that pedagogical agents need to display high embodiment (as opposed to being a static image) to improve learning (Wang et al., 2021). Study I examines the influence of pedagogical agents on the acquisition of procedural knowledge and skills in VR. What follows is an overview of central theoretical perspectives on using pedagogical agents for virtual learning.
Social agency theory (SAT) is one of the most influential theories related to pedagogical agents. As an addition to the cognitive theory of multimedia learning, SAT represents the idea that social cues in multimedia lessons can lead to better learning outcomes by inciting social presence (Mayer, 2014). Social presence can be defined as a psychological state where virtual social actors are experienced as actual social actors (Lee, 2004). According to SAT, pedagogical agents can function as a social cue in virtual lessons that enhance learning by activating basic human rules of cooperation; in other words, the learner will expend energy to understand the agent’s message (Mayer, 2014). This phenomenon was coined the media equation by Reeves and Nass (1996). A recently published meta-analysis finds that learning with pedagogical agents is more effective than learning without them, thereby corroborating SAT (Castro-Alonso et al., 2021). Furthermore, a recent study finds that pedagogical agents lead to enhanced brain activity in the social areas of the brain, thereby providing neuroscientific support for SAT (Li et al., 2022).
Ever since the dawn of pedagogical agents as a research topic, scholars have been interested in agent design—that is, how they should look and function (Heidig & Clarebout, 2011). McDonnell and Mutlu (2021) recently wrote a book chapter on the appearance of agents. Here, they claim that most agents follow what can be called a metaphoric design, meaning that the design is inspired by an existing entity such as a person in a particular profession (McDonnell & Mutlu, 2021). Metaphoric design involves two dimensions, appearance and behavior, and its power lies in triggering users’ mental models of an agent’s abilities (McDonnell & Mutlu, 2021). In terms of pedagogical agents, it follows that an agent with an expert-like appearance will prepare the learner for receiving information (Baylor & Kim, 2005).
VR constitutes an ideal way of harnessing the educational powers of pedagogical agents. By immersing learners in virtual environments with realistic and life-size agents, it is possible to maximize the feeling of interacting with an actual human being. Indeed, the closer an agent is to human scale, the higher the chance that it will support human communication mechanisms (McDonnell & Mutlu, 2021). Makransky, Wismer, et al. (2019) examined the effect of delivering laboratory safety instruction in VR via two pedagogical agents: a robot-like drone and a young female scientist. They demonstrated that boys learned better with the drone, and that girls learned better with the female scientist. Petersen et al. (2021) investigated the effect of differently designed pedagogical agents on the acquisition of knowledge about viral diseases in VR. Contrary to what was expected, it was reported that learning with agents led to lower factual knowledge acquisition compared to learning without one. This finding indicates that agents may also have adverse effects on learning, supposedly due to the limitations of the human cognitive system.
There are many roles that pedagogical agents can assume during virtual lessons. Baylor and Kim (2005) designed three different agents who functioned as either expert, motivator, or mentor and found that each role impacted learning differently. Taking a slightly different approach, Schroeder and Gotch (2015) defined four roles that agents can assume: demonstrating/modeling, coaching/scaffolding, being a source of information, and testing. Of particular, relevance to the topic of this study is the role of demonstrating/modeling where the agent physically demonstrates how to successfully accomplish a certain task (Schroeder & Gotch, 2015). Although several studies of agents gesturing have been conducted (Davis, 2018), less is known about agents demonstrating how to carry out a procedure (Schroeder & Gotch, 2015). The potential instructive benefits of agents demonstrating tasks involving human movements have also been discussed under the umbrella term dynamic visualizations in the literature and linked to the activation of the mirror neuron system (van Gog et al., 2009). In essence, the mirror neuron system is the mirroring capacity of the human motor system, which is activated by observing motor actions performed by other people (van Gog et al., 2009). According to van Gog et al. (2009, p. 23), “the mirror neuron system mediates imitation, by priming (i.e., preparing the brain for) execution of the same action.” This system has been shown to respond to both human and robotic body parts (Gazzola et al., 2007). A 2007 meta-analysis by Höffler and Leutner (2007) examined the influence of dynamic versus static visualizations on learning outcomes. Overall, their results indicated an advantage of dynamic visualizations over static visualizations (d = 0.37). Furthermore, moderator analyses indicated an even stronger advantage of dynamic visualizations (d = 1.06) when procedural-motor knowledge was considered.
In their book chapter concerning design principles for virtual humans in educational technology environments, Craig and Schroeder (2018) introduce a type of agent-based learning environment which draws on similar principles of learning by observation described above: vicarious learning environments. In these types of environments, the learner observes a virtual student interacting with a mentor agent to learn a particular skill, displaying both learning as well as the metacognitive skills required for learning (Craig & Schroeder, 2018).
Learning by observing pedagogical agents can be understood theoretically using social cognitive theory. In the following, social cognitive theory is introduced along with another relevant theoretical perspective: cognitive load theory.
Learning skills and procedures does not need to happen in psychological isolation. According to social cognitive theory (formerly known as social learning theory), there are numerous situations where people observe others directly and integrate the obtained knowledge into their long-term memory (Bandura, 1999). Bandura proposes that humans have evolved the capacity for observational learning as a way of adapting to the constraints of time, resources, and mobility that can limit acquisition of new information. Thus, learning from information conveyed by models in our surroundings is an essential component of knowledge acquisition (Bandura, 2008).
It is commonly accepted that there are four component processes to observational learning: attention, retention, production, and motivation (Bandura, 1999; Schunk, 2012). Attentional processes determine what people observe and how they perceive it. They are affected by task features (e.g., colors, oversized, and interactive features) as well as the learner’s beliefs about the value of the observed behavior (Schunk, 2012). Representational processes, also referred to as retention, concern the cognitive processes that are involved when integrating the observed behavior with knowledge in long-term memory (Schunk, 2012). Production processes involve generating new behaviors based on the stored information (Schunk, 2012). Finally, motivational processes determine which of the observed behaviors are reproduced (Schunk, 2012).
Bandura (1977) also coined the term self-efficacy, which can be defined as learners’ belief about their capabilities to perform certain actions or behaviors. According to Bandura (1977), high levels of self-efficacy can positively influence performance outcomes. Bandura (1977) highlights four sources of self-efficacy: personal performance accomplishments, vicarious experiences, verbal persuasion, and emotional arousal. In a virtual environment such as VR, the learner can gain personal performance accomplishments through mastery experiences in the virtual environment (Makransky & Petersen, 2021). Vicarious influences can be achieved through observing a virtual instructor perform the desired behavior. Verbal persuasion can be achieved through encouraging comments from the instructor; and lastly, the technological affordances of the system can create a feeling of presence which may have a positive effect on emotional arousal (Parong & Mayer, 2021). To conclude, social cognitive theory and Bandura’s (1977) notion of self-efficacy highlight the importance of the social aspects of learning experiences. In a virtual environment, this is typically facilitated by the use of pedagogical agents.
While the sections above highlight the potential advantages of using agents for learning, an opposing view related to the limitations of the human cognitive system also exists. van Gog and Rummel (2010) discussed agents in their review of example-based learning and noted the risk that learners could get distracted by irrelevant details of the agent rather than focusing on the task. Such cautions can be related to cognitive load theory, which is the focus of the following section.
Cognitive load theory is based on our knowledge of human cognition. It predicts that learning is impaired when the cognitive load required to process the learning material exceeds working memory capacity (Sweller, 1994, 2011). Learners experiencing cognitive overload might lose sight of the learning objectives and have trouble gaining deep knowledge of the learning material (Schnotz & Heiß, 2009). With this in mind, cognitive load theory aims to generate optimal instructional design that fosters learning by taking the human cognitive architecture into account. Several media comparison studies have linked poorer learning outcomes when learning via VR to heightened levels of extraneous cognitive load (Makransky, Terkildsen, et al., 2019; Parong & Mayer, 2018, 2021). Hence, cognitive load is an important factor to consider when designing VR for education.
According to Sweller (2010a, 2011), cognitive load is a multifaceted aspect that can be divided into intrinsic, extraneous, and germane load. Intrinsic cognitive load is influenced by the nature of the learning material and the expertise of the learner (Sweller, 2011). Therefore, intrinsic load cannot be changed if the learning material is unaltered and the level of expertise remains the same. Extraneous cognitive load is dependent on how the learning material is presented—that is, the instructional design (Sweller, 2010b; van Merrienboer & Sweller, 2005). Therefore, extraneous load is malleable and should always be reduced. Researchers have previously argued that pedagogical agents in multimedia learning environments might cause extraneous cognitive load and therefore decrease learning (Clark & Choi, 2007). These adverse effects could be particularly pronounced in VR, which often involves a lot of additional visual material and a new interaction system. Germane cognitive load refers to the working memory resources devoted to dealing with essential learning material (Sweller, 2010b). The aforementioned SAT predicts that pedagogical agents can promote generative processing (analogous to germane cognitive load) in the learner (Mayer, 2014).
The theory and research cited above was used to formulate the hypotheses examined in this two-study investigation, which was conducted in compliance with an institutional review board. The following section introduces Study I.
Study I focuses on the influence of pedagogical agents on skills acquisition in VR with the specific skill in question being the scientific procedure of pipetting. The online format was chosen due to advantages such as being able to collect data from a large amount of participants from different countries in a short amount of time, without jeopardizing the validity of the conclusions drawn (Mottelson et al., 2021).
Based on social cognitive theory, we expected that observing an agent demonstrating pipetting would lead to better pipetting performance compared to hearing about it from a passive agent. To control for the conflicting perspectives regarding pedagogical agents as facilitators of learning (SAT) or merely unnecessary distractions (cognitive load theory), two types of demonstrating agents were employed: one being a full-bodied agent and the other being merely a hand.Hypothesis 1: Observing an agent demonstrating pipetting will lead to better pipetting performance compared to hearing about it from a passive agent (Study I).
An open research question dealt with potential differences between the two types of demonstrating agents. Based on cognitive load theory, a hand demonstrating pipetting could lead to better pipetting performance compared to a fully visualized pedagogical agent due to lower levels of extraneous processing. However, the reverse scenario could be predicted by SAT, as social presence (caused by the presence of a virtual human) could lead to deeper cognitive processing and better learning outcomes.
Study I also hypothesized that a virtual pipetting lesson would lead to enhanced self-efficacy due to possibilities for personal mastery experiences.Hypothesis 2: There will be an increase in self-efficacy from pre- to posttest for all participants (Study I).
The hypotheses were preregistered via Open Science Framework (https://osf.io/8udsb).
Guided by recommendations on conducting unsupervised VR studies online (Mottelson et al., 2021), a total of 140 participants took part in the experiment from their own homes. Out of these, 10 participants were discarded due to having used the same Oculus Quest device, and three participants were discarded due to incomplete data. Data were considered incomplete when participants immediately indicated that they were done when entering the transfer test, that is, they did not demonstrate any procedure or skill that they were taught during the tutorial. Although this could imply that they did not learn anything during the tutorial, it could also signify that the participants were only interested in receiving the promised gift card. Furthermore, data from one participant were excluded based on unusually slow completion time in the transfer test as defined in the preregistration (above 3 × SD + M). This left us with a total of 126 participants whose data were used in the analyses. The next section provides a description of the sample based on a number of parameters assessed in the pretest.
Participants were mostly men (107 men, 9 women, 5 nonbinary, and 5 did not wish to answer). Over half of the participants were between 18 and 29 (78), the rest were 30–39 (27), 40–49 (15), 50–59 (5), and 60+ (1). Roughly half of the participants had used VR more than a 100 times before (60). The remaining had tried it between 51 and 100 times (33), 21 and 50 times (19), 11 and 20 times (9), 4 and 10 times (4), or 1 and 3 times (1). In other words, the participants were mostly VR expert users, which is natural considering the fact that they had to have their own VR headsets to participate. On the basis of IP address, the participants were located in 35 different countries, among the most common: United States (44), United Kingdom (10), Canada (10), Germany (8), and Poland (5). Participants had widely different educational backgrounds. The majority reported that their highest completed education was high school/general educational development; GED (42). The remaining answered bachelor’s (31), associate/2-year college (28), master’s (14), PhD (5), primary/middle school (3), and professional degree (3). Finally, the participants were highly proficient in English language: most spoke English fluently (81); the rest spoke English very well (28) or well (17). Judged by themselves, most of the participants possessed relatively little prior knowledge about the topic of the simulation. When asked to rate their knowledge of using a micropipette, the majority answered very low (38), low (47), or moderate (31). This was also the case in terms of their knowledge of conducting a serial dilution: very low (67), low (35), and moderate (19).
The participants were randomly assigned to receive instruction from an active instructor (N = 47), a passive instructor (N = 41), or a hand only (N = 38). We note that the number of participants in each condition was slightly uneven due to our initial discarding of certain participants based on the mentioned exclusion criteria.
Participants answered the questionnaires in-game. The items and their sources are available in the codebook accompanying this investigation, which can be accessed via the link provided under the Data Availability section.
The pretest assessed basic demographic information about the participants, including age, gender, VR experience, highest completed education, English proficiency, and level of previously completed science courses. It also collected self-reported knowledge of using a micropipette and conducting a serial dilution. The self-report format was used to avoid potential pretesting effects (Hartley, 1973). Furthermore, a series of items assessed participants’ general interest in and experience with science. These pretest items were adapted from Parong and Mayer (2018) and amounted to a total of 18 items. Finally, a measure of their self-efficacy for pipetting was collected. The self-efficacy scale was adapted from Pintrich (1991) and consisted of four items.
We collected data on three pipetting performance indices during a virtual transfer test: dexterity with the pipette, safety behavior, and accuracy of a serial dilution. Dexterity was operationalized as number of mistakes in five different areas: (a) tilting the pipette, (b) aspirating after placing the pipette in liquid, (c) not using the second stop on the plunger when dispensing liquid, (d) using a contaminated tip, and (e) mixing the original beakers with other liquid. Correct safety behavior was operationalized as remembering to put the laboratory coat and gloves on before beginning the transfer test. Accuracy of the serial dilution was assessed by comparing the participants’ results in the transfer test to the correct values of the serial dilution. The performance measures corresponded to standard performance measures when using a real-life pipette.
The posttest consisted of five different measures. A series of 15 true/false and multiple-choice questions assessed declarative knowledge about the learning material. Additionally, a sorting task required the participant to sort the correct order of steps associated with transferring a liquid from one container to another. These were designed in cooperation with an expert in psychometrics and a content matter expert. Additionally, the participants’ self-efficacy was assessed again, using the same items as in the pretest. Measures of social presence and extraneous cognitive load were also incorporated in order to explain potential differences between the conditions. The social presence scale consisted of five items from Makransky et al. (2017). The extraneous cognitive load scale was related to the instructions used and consisted of three items from Andersen and Makransky (2021). Finally, scores on four uncanny valley indices were collected, as is advised when studying artificial human beings. These were eeriness, warmth, attractiveness, and humanness (Ho & MacDorman, 2017). The uncanny valley indices consisted of a total of 18 items from Ho and MacDorman (2010, 2017). Recent research finds that the uncanny valley effect is particularly pronounced in VR (Hepperle et al., 2022), which necessitates measuring it.
The simulation employed in present study was designed to teach the scientific procedure of pipetting, a technique commonly used in science where a pipette is used to measure and transfer specific volumes of liquid. The virtual environment was developed using Unity 2021, and targeted Oculus Quest 1 and 2 (immersive VR) devices. The environment was modeled after an authentic laboratory and featured animated three-dimensional models related to the topic of pipetting; these three-dimensional models were found on the Unity Asset Store (see Figure 1).
The simulation is divided into two sections. The first section functions as a practical tutorial that teaches laboratory safety procedures, which includes wearing a lab coat and gloves, and introduces various steps in pipetting. These include holding the pipette at a correct angle, choosing the correct dial setting on the pipette, correctly drawing up and dispensing liquid, attaching and discarding a pipette tip, and handling chemical waste. During the tutorial, the learners are required to attempt these procedures in VR, using their handheld controllers. The tutorial is identical to a standard, real-life, introductory pipetting session for adults or young adults. Furthermore, the virtual pipette is modeled after how a real pipette functions. This includes elements such as a plunger with two stops and a dial function that enables presetting the volume of microliters. Guided by the signaling principle (van Gog, 2014), the environment featured cues that directed the learners’ attention toward relevant objects during learning. Please see Figure 2 for an illustration of how the Oculus Quest controller and a real pipette are used for aspirating liquid (a demonstration video can be accessed via the following link: https://vimeo.com/672240481).
The second section of the simulation functions as a transfer test in which learners are prompted to attempt a serial dilution task in the laboratory without help or assistance from the virtual instructor. The instructor merely encourages the learners to use their newly acquired skills to conduct the serial dilution task by following instructions on the computer monitor. At this point, a schematic diagram representing the serial dilution task is displayed on the computer monitor. This diagram is equivalent to illustrations in standard laboratory manuals used for teaching purposes.
Instructions on pipetting techniques were provided by a specific pedagogical agent depending on the condition (see Figure 3). However, the narration was identical for all three conditions. In two of the conditions, the pedagogical agent looked like a female laboratory teacher featuring eye contact, idle animations, as well as speech and lip synchronization. To investigate the impact of learning through observing an agent demonstrating procedures, the instructor in one of these conditions used gesturing, movement by walking and demonstration of pipetting technique, whereas the instructor in the other condition remained statically in one position during the entire simulation. In the third condition, only a virtual hand was depicted, which demonstrated the procedures in pipetting. In the development of the active condition, Smartsuit Pro and Smartgloves from an animation company, Rokoko, were used to animate the instructor. A female voice actor with a British accent recorded the manuscript, which is identical in all three conditions.
The study was initiated on August 5, 2021, and terminated on September 5, 2021. Thus, it ran for the duration of 1 month. Recruitment was done via online communities such as Reddit, Facebook, and Twitter. Here, we advertised the experiment, briefly explaining that it was an educational VR study for participants above the age of 18, designed for Oculus Quest. If interested, the participants were directed to an official app listing on the open app store SideQuest. The app listing described the topic of the simulation and provided screenshots; it also linked to a consent form, which participants were required to read and understand. To start the experiment, participants were instructed to download the app and launch it. Following the in-game pretest, the virtual lesson proceeded as a tutorial on pipetting with an inbuilt transfer test. Condition was assigned randomly at run time on the device. Hereafter, an in-game posttest collected various learning experience measures. Finally, participants voluntarily entered their email to receive a gift card worth 10 USD. It was not possible to skip parts of the simulation. The mean duration of the simulation was 906.4 s or approximately 15 min (9 min in the tutorial and 6 min in the transfer test on average).
Statistical analyses were performed in R Version 4.0.5. A p value less than 0.05 was considered statistically significant.
A one-way analysis of variance (ANOVA) indicated no significant differences between groups in pretest levels of self-efficacy: F(2, 123) = 0.58, p = .564. Further, Kruskal–Wallis tests indicated no significant differences between groups in self-reported knowledge of using a micropipette, H(2) = 0.97, p = .6147, and conducting a serial dilution, H(2) = 2.85, p = .2405; general interest in and experience with science, H(2) = 1.90, p = .3866; and level of previously completed science courses, H(2) = 3.03, p = .2193.
As a manipulation check, time looked at the instructor in seconds was compared between the two agent conditions (passive vs. active): H(1) = 4.50, p = .03394. Students in the active condition (Mdn = 72 s) looked significantly more at the instructor than students in the passive condition (Mdn = 55 s). This indicates that learners with a demonstrating agent were more observant of its demeanor.
As a measure of performance on the transfer test, we used an aggregated measure of the percentage of correct solutions across test tubes A, B, C, and D, which was divided into three levels. Thus, participants’ performances on the transfer test could either be low, medium, or high. A chi-square test showed no significant differences: χ2(4, N = 126) = 5.46, p = .2433.
As a measure of dexterity, we looked at the number of mistakes in five different areas: (a) tilting the pipette, (b) aspirating after placing the pipette in liquid, (c) not using the second stop on the plunger while dispensing liquid, (d) using a contaminated tip, and (e) mixing the original beakers with other liquid. We allowed a maximum of five mistakes for areas (a), (c), and (d). Due to a large amount of mistakes in (b), we allowed a maximum of 20 mistakes. As close to zero mistakes were observed for (e), this measure was omitted from the following analysis. Per Shapiro–Wilk test for normality, none of the dexterity measures were normal. Thus, Kruskal–Wallis tests were used to test for differences between groups. Bar plots can be viewed in Figure 4.
The results showed no significant differences between groups with regard to tilting the pipette, H(2) = 0.83, p = .6593. The same was observed with regard to aspirating after placing the pipette in liquid, H(2) = 2.07, p = .3555; as well as neglecting to use the second stop on the plunger when dispensing, H(2) = 1.47, p = .4797; and using a contaminated tip, H(2) = 0.62, p = .7339.
Safety was measured as remembering to put both coat and gloves on in that order. A chi-square test showed no significant differences: χ2(2, N = 126) = 1.66, p = .437.
To summarize, Hypothesis 1 (H1) stated that an agent demonstrating pipetting would lead to better pipetting performance compared to hearing about it from a passive agent. The results indicated no significant differences between the conditions on any of the performance parameters. Thus, we failed to reject the null hypothesis.
A paired-samples t test showed a significant increase in self-efficacy for all participants from pretest (M = 3.03, SD = 0.89) to posttest (M = 3.87, SD = 0.73), t(125) = 10.22, p < .0001. Consequently, Hypothesis 2 (H2), which stated that there would be an increase in self-efficacy from pre- to posttest for all participants, was accepted.
In addition to hypothesis testing as described above, we performed a number of exploratory analyzes. We report the results of these in the following.
A Kruskal–Wallis test indicated that there were no significant differences between conditions in overall declarative knowledge score: H(2) = 3.20, p = .2018. Note that the overall median was 13 out of a possible of 16 points. In other words, the central tendency across groups in terms of knowledge was relatively large.
A one-way ANOVA indicated that the groups did not differ in their levels of social presence, F(2, 123) = 0.60, p = .553. The mean social presence score was 2.91 across groups (SD = 0.91).
A Kruskal–Wallis test indicated that there were no significant differences between conditions in extraneous cognitive load, H(2) = 0.53, p = .7675. The median extraneous cognitive load score was 2.
Regarding pre- to posttest increases in self-efficacy for the three groups, we ran a mixed ANOVA which turned out nonsignificant for the interaction between time and condition: F(2, 123) = 0.18, p = .835. Hence, the increase in self-efficacy did not differ significantly between the three groups. Furthermore, Spearman’s rank-order correlations suggested that posttest self-efficacy correlated significantly with total dexterity errors on the virtual transfer test: r(124) = −0.25, p = .004; this was also true for knowledge: r(124) = 0.28, p = .001.
Finally, in terms of the uncanny valley indices, Kruskal–Wallis tests indicated no significant differences between groups on any of the measures: eeriness, H(2) = 4.48, p = .1065; attractiveness, H(2) = 1.01, p = .6045; humanness, H(2) = 0.68, p = .7102; and warmth, H(2) = 0.72, p = .6971. Median scores across groups were 2.25 for eeriness, 3.5 for attractiveness, 2.6 for humanness, and 3.8 for warmth.
Over 20 years ago when pedagogical agents were a novel paradigm for interactive learning, they were envisaged as an entity that could make learning more engaging by way of human-like interaction, and thereby improve learning (Johnson & Lester, 2018). In the context of a VR lesson on how to use a pipette, we investigated the impact of a pedagogical agent on learning. Specifically, we were interested in the impact of using the agent as a demonstrator of how to successfully operate a pipette. We theorized that a potential positive influence on learning could be attributed to the human mirror neuron system. However, there were no significant differences across groups; watching a pedagogical agent demonstrating how to use the pipette made no difference in terms of pipetting performance on a virtual transfer test when compared to listening to a passive instructor. It should be noted that this does not preclude that a pedagogical agent potentially could enhance learning in VR. One explanation for the lack of effect could be that there was not enough added value of having an agent demonstrating: participants in the passive instructor group were equally capable of learning the technique by means of the narration alone. This finding could still be understood in light of the human mirror neuron system, as research has found that the mirror neuron system also responds to action-related sentences (Tettamanti et al., 2005).
Furthermore, there was not evidence to conclude that having an agent depicted was more effective than only watching a hand, as otherwise suggested by the literature. According to SAT, adding social cues, such as an on-screen pedagogical agent or a human voice, to a multimedia lesson is posited to enhance learning by increasing the experience of social presence (Mayer, 2014). However, the experimental groups displayed equal amounts of social presence, which suggests that what matters when learning in VR is in fact the voice. This is consistent with Rzayev et al. (2019), which finds that a pedagogical agent results in comparable levels of social presence as a narration.
Looking at self-efficacy, there was a significant increase from pre- to posttest across conditions. This suggests that hands-on experience in VR enhances one’s sense of efficacy regarding the topic in question and corroborates previous VR research (Makransky & Petersen, 2021; Meyer et al., 2019). Exploratory analyzes indicated that self-efficacy was negatively correlated with dexterity errors and positively correlated with knowledge, meaning that learners’ perception of their own abilities correlated with their actual abilities.
The fact that Study I was conducted as an online experiment had a number of advantages. For instance, most of the participants could be characterized as expert VR users. In other words, their experience with VR might have lowered the odds of seeing a novelty effect whereby the experience of dealing with a novel technology impacts the results (Clark, 1983). Further, the diversity of participant demographics could extend the external validity of the findings. However, there were also important limitations in Study I. For instance, we had no control over what the participants did during the experiment, as they completed the experiment remotely. This potentially lowers the internal validity of the study. Moreover, many of the participants did not have English as their first language, although the learning material was in English. Statistical analyzes, however, showed that English proficiency was not significantly associated with any of the learning outcomes. The main limitation of Study I, however, was that there was no control group or real-life assessment of how well the virtual skills transferred to the real world. Hence, we conducted Study II as a follow-up study where the VR simulation was tested against a real-life lesson on pipetting. Additionally, we employed a real-life transfer test 2 days after the intervention to examine participants’ skills at using an actual pipette. It can be argued that in order for VR to become widespread within education, it must be proven that skills practiced virtually can transfer to the real world (Bossard et al., 2008).
Study II focuses on transfer of pipetting skills learned in VR to a real-life task, using a lesson conducted by a real-life instructor as control.
Based on prior VR research related to learning skills, we expected that participants’ pipetting performance in a VR simulation could positively predict performance on a real-life transfer test.Hypothesis 3: Pipetting performance in a VR simulation can positively predict performance on a real-life transfer test (Study II).
Furthermore, since research shows that VR can lead to similar achievement as traditional training, we expected that people learning in VR would display equivalent performance to people learning from a real-life instructor on a real-life transfer test.Hypothesis 4: People learning in VR will display equivalent performance to people learning from real-life instruction on a real-life transfer test (Study II).
Finally, as in Study I, it was hypothesized that possibilities for personal mastery experiences, virtually as well as in real-life, would lead to enhanced self-efficacy.Hypothesis 5: There will be a significant increase in self-efficacy from pre- to posttest for participants in both conditions (Study II).
The hypotheses were preregistered via Open Science Framework (https://osf.io/qanwx).
The study was conducted as an experiment at a Danish technical high school with 47 first-year students. Participants were randomly assigned to one of two conditions in which they received instructions on pipetting from an animated pedagogical agent in VR (N = 24) or from a real-life instructor in a chemistry laboratory at the school (N = 23). Pictures from the intervention can be seen in Figure 5.
The study was conducted in the course of 2 days. On Day 1, participants received the learning intervention, and 2 days later, they participated in a transfer test, leaving 1 day in between the intervention and the transfer test. This was done intentionally to assess whether the skills learned on Day 1 could transfer to a real-life situation when (a short period of) time passed in between. We note that two students were absent on the day of the transfer test and only participated in the intervention. Conversely, two students only had their data collected at the transfer test and not on the day of the intervention; they dropped out at the end of the VR lesson, just before the data were sent to the server. Therefore, they were allowed to participate in the transfer test.
The participants were Danish students between the ages of 15 and 18. Similarly to Study I, most of the participants identified as male (34 men, 10 women, 1 nonbinary). The students in the VR group had mixed VR experience. Few had never tried it before (3). The rest had tried it 1–3 times (7), 4–10 times (4), 11–20 times (1), 21–50 times (2), 51–100 times (1), or more than 100 times (4). When asked how well they spoke English, the participants answered: not well (3), well (14), very well (16), fluent (12). In other words, most of the students were adequate English speakers according to their own accounts. When asked to rate their knowledge of using a micropipette, the distribution of answers was very low (5), low (14), moderate (25), and high (1). When asked the same thing regarding conducting a serial dilution, most were new to the area: very low (19), low (18), and moderate (8).
The VR lesson used for the experiment was identical to the VR condition with an active pedagogical agent used in Study I, that is, it was a two section tutorial on pipetting taught by an instructor, who gestured, walked around the virtual laboratory, and demonstrated pipetting techniques. Pre- and posttest questionnaires were administered in-game.
The setup was designed to be identical to the virtual environment. Thus, the lesson took place in a chemistry laboratory at the school where the students received instructions on pipetting. The instructions were given in English by a female experimenter, who has passed a basic pipetting course and has scientific experience. No harmful substances were used.
Similar to the VR lesson, the real-life lesson was divided into two parts. The first part functioned as a tutorial in which all students gathered around one workstation where the instructor explained and demonstrated laboratory safety procedures and pipetting techniques. This included wearing a lab coat and gloves as well as the various steps in pipetting. To ensure consistency between groups, the instructions followed a manuscript.
In the second part of the lesson, the students were encouraged to attempt these procedures themselves, using a real pipette. At this point, the instructor did not provide further assistance. However, she ensured that the students were divided into smaller groups, and that each group had a working station with all the necessary assets. These included a micropipette, pipette tips, a box of gloves, test tubes in a rack, a chemical waste container, a sharps disposal container, as well as two beakers containing water and solution, respectively. All lab coats were hanging on the same rack by the entrance of the laboratory. Thus, students worked in pairs or groups of three around one workstation, where they had to complete a serial dilution task. A schematic diagram representing the serial dilution task was depicted on the blackboard. Students were instructed that all members of the group had to try the micropipette. They could talk with members of their own group, but conversations between groups were not allowed.
Pre- and posttest questionnaires were administered prior to and after the lesson, respectively, via the survey framework formr. Students completed the two questionnaires individually on their own laptops.
To ensure similarity between the two experimental conditions, both setups were designed to be as similar as possible. Thus, both lessons took place in laboratory settings. Students in the real-life condition were presented with a workstation containing identical assets to the workstation in the virtual laboratory. In the second part of the lesson, the same schematic diagram representing the serial dilution task was used in the classroom as in the virtual environment. Furthermore, the instructions given during the real-life lesson were identical to the narration used in the VR lesson. The experimenter who acted as the female laboratory science teacher during the real-life lesson was also the voice actor who recorded the manuscript for the VR condition. Thus, students in both conditions received identical information in English from a female laboratory instructor with a British accent. Furthermore, the same female experimenter modeled the gestures used in the active VR condition, that is, the animations made for the virtual instructor were modeled after the real-life instructor using Smartsuit Pro and Smartgloves from the animation company, Rokoko. This was done to ensure as much consistency across conditions as possible.
The same measure as in Study I were used, with the exception that we did not collect data on social presence, the uncanny valley, or pipetting performance (during intervention) for students in the real-life group. The transfer test performance was rated with regard to three predefined categories: safety, dexterity with the pipette, and serial dilution. Specifically, error counts in these areas were recorded. The measures are available in the codebook accompanying this investigation, which can be accessed via the link provided under the Data Availability section.
The experimental intervention, including pre- and posttest questionnaires, was conducted over the course of 1 day with two different classes of students. A transfer test was administered 2 days after the intervention. Participants were recruited via collaboration with high school teachers who used the experiment as part of their own teaching practice. To ensure that students’ choice to participate was not influenced by their relation to their own teacher, recruitment was organized by a teacher with no connection to the class.
On the day of the experimental intervention, the students were initially assembled in a classroom, where they received an oral introduction to the experiment. All participating students were required to read a letter of information about the experiment and sign a consent form. Informed consent was collected from the students as they were all above the age of 15. Then, each student received a randomized ID number, which assigned them to one of the two conditions. Students in the VR condition were led into a different room, where they engaged in the pipetting simulation with assistance from two experimenters. The simulation was administered on Oculus Quest devices. Following the simulation, students received a common demonstration of certain functions, which are operated differently on a real micropipette compared to the virtual micropipette in the simulation (they did not hold the pipette themselves). This was to ensure that students in the VR condition had the necessary knowledge to complete the Day 2 transfer test. It is generally recommended that differences between the performance of a procedure in a simulator and the real world should be brought to light to avoid negative transfer (Maran & Glavin, 2003). Students in the real-life condition were led into the chemistry laboratory at the school, where they received a lesson on pipetting from a female experimenter. Ultimately, all students returned to the classroom and a short debriefing was given by the experimenters. The total duration of the main intervention was approximately 1.5 hr.
Two days after the experimental intervention, all participating students completed an individual transfer test in which they had to demonstrate their knowledge of pipetting. The test took place in a laboratory at the school where three screened workstations were set up. Each student had to attempt a serial dilution task similar to the one they had practiced either in VR or in real life. Students performed the task one at a time and were rated individually on their performance by an observer. A total of three observers, who were blind to the students’ experimental condition and not present during the intervention, were rating students’ performance during the transfer test. Thus, three students could engage in the test at the same time. The duration of the transfer test was 12 min on average per group.
Statistical analyses were performed in R Version 4.0.5. A p value less than 0.05 was considered statistically significant.
A one-way ANOVA showed no significant differences between groups in terms of pretest levels of self-efficacy: F(1, 43) = 0.00, p = .968. Additionally, Kruskal–Wallis tests indicated no significant differences between groups in self-reported knowledge of using a micropipette, H(1) = 0.68, p = .4086, and conducting a serial dilution, H(1) = 1.92, p = .1655; and general interest in and experience with science, H(1) = 1.75, p = .1855.
Regression analyses were performed to assess the predictive value of the pipetting performance measures in VR on real-life transfer test performance. A Poisson regression was run to predict the number of serial dilution errors on the real-life transfer test based on the total amount of dexterity errors made in VR: B = 0.078, p < .0001, resulting in an exponentiated value of 1.081. In other words, there was a 8.1% increase in the number of serial dilution errors made on the real-life transfer test for each extra dexterity error made in VR.
None of the other regression models were significant.
Therefore, Hypothesis 3 (H3), which stated that pipetting performance in a VR simulation would positively predict performance on a real-life transfer test, was partially accepted.
Bar plots of transfer test performance measures for students who received VR or real-life instruction can be viewed in Figure 6.
Kruskal–Wallis tests were employed to test for significant differences between the two conditions on the transfer test performance measures. With regard to their safety performance, the groups did not significantly differ: H(1) = 0.59, p = .4409. Conversely, in terms of dexterity with the pipette, there was a significant difference between the groups (favoring the real-life condition): H(1) = 12.14, p < .001. Finally, there was a close-to-significant difference between the groups in terms of their serial dilutions: H(1) = 3.45, p = .06344. Thus, considering our predetermined cutoff for significance, we could not reject the null hypothesis. In order to interpret the null results, we employed the Hodges–Lehmann estimator of location shift, which is used to assess the median of differences between two groups. Please note that this deviates from the plan to calculate Bayes factors originally proposed in the preregistration; however, we deemed it necessary to employ the former method due to the nature of the data. In terms of safety performance, the Hodges–Lehmann estimator (i.e., the median of differences in errors) was 0. Looking at success at the serial dilution, the Hodges–Lehmann estimator was 0. For comparison, the Hodges–Lehmann estimator for the dexterity measure was 6.
Therefore, Hypothesis 4 (H4), which stated that people learning in VR would display equivalent performance to people learning from real-life instruction on a real-life transfer test, was partially accepted: The groups performed equivalently with regard to safety and their serial dilutions. However, students in the real-life condition performed significantly better when looking at dexterity with the pipette.
A paired-samples t test showed a significant increase in self-efficacy for all participants from pretest (M = 3.45, SD = 0.73) to posttest (M = 4.01, SD = 0.56), t(44) = 4.83, p < .0001.
Consequently, Hypothesis 5 (H5), that is, there will be an increase in self-efficacy from pre- to posttest for all participants, was accepted.
A Kruskal–Wallis test indicated that there was significant differences between conditions in overall declarative knowledge score: H(1) = 13.29, p = .0003. The median score was 11 for the VR group and 14 for the real-life group.
A one-way ANOVA indicated that there was significant difference between the conditions in extraneous cognitive load, F(1, 43) = 6.77, p = .0127. Students in the VR condition (M = 2.71, SD = 0.66) experienced higher levels of cognitive load than students in the real-life condition (M = 2.16, SD = 0.76).
Lastly, a closer examination of the pre- to posttest increases in self-efficacy showed that students in the real-life condition (M = 3.45, SD = 0.91 at pre; M = 4.27, SD = 0.45 at post) gained more self-efficacy than students in the VR condition (M = 3.45, SD = 0.49 at pre; M = 3.73, SD = 0.53 at post). Furthermore, Spearman’s rank-order correlations suggested that posttest self-efficacy did not correlate significantly with the learning outcomes.
In their review, Concannon et al. (2019) contend that VR is not on its way to replace traditional ways of learning and refer to the fact that 34% of studies show no effect or negative effects of learning through VR. Rather, the authors argue that VR is a complement to learning. Similarly, Parong and Mayer (2021) note that it is important to research when, how, and where VR simulations may benefit learning. We carried out Study II in order to assess whether VR training benefited performance in the real world, and if the benefit would be comparable to traditional teaching. In other words, we aspired to examine whether VR could be a valuable instructive tool in pipetting education. One of the key findings was that objective performance in VR was able to predict real-life performance, which is evidence of transfer of training from VR. Specifically, dexterity errors with the pipette in VR predicted whether errors were made on a real-life serial dilution. However, when looking at dexterity errors with the pipette on the real-life transfer test, students in the VR conditions made far more errors than those who received real-life instruction. Such a disparity could be understood with reference to the previously mentioned distinction between physical versus cognitive fidelity of simulators (Hochmitz & Yuviler-Gavish, 2011); dexterity with an actual pipette seems to be contingent upon the physical fidelity of the training. Along these lines, Maran and Glavin (2003) advice that when the focus is on development of fine motor skills, a simulator should precisely replicate the required movements to avoid negative transfer. Turning to safety performance, however, the data did not provide evidence for a significant difference between the groups. This result is consistent with prior research that VR is an effective medium for training laboratory safety behavior (Makransky, Borre-Gude, et al., 2019). A close-to-significant difference was found in terms of serial dilution errors in the transfer test—however, the median of differences between the two groups was zero.
Another noticeable finding in Study II concerns the group differences in knowledge and extraneous cognitive load. Students in the VR condition retained significantly less knowledge and experienced significantly higher extraneous cognitive load. A similar finding is reported in Parong and Mayer (2021), who compared a VR and a PowerPoint lesson and found that VR was associated with higher extraneous cognitive load, which led to lower retention scores. The Cognitive Affective Model of Immersive Learning also explains this link theoretically and empirically (Makransky & Petersen, 2021). Additionally, the students in the VR condition experienced a smaller increase in self-efficacy compared to the students in the real-life condition.
A central limitation in Study II was that the sample consisted of students who did not have English as their first language. Seeing that the intervention was in English, this could potentially have impacted the findings negatively. It should be noted, however, that Danes in general are highly skilled in English. Moreover, English proficiency was not significantly associated with any of our learning outcomes. Another notable limitation was that students receiving real-life instructions were allowed to interact with their group during the learning lesson, whereas students in VR experienced the simulation individually and did not interact. It is also important to highlight that a day passed between the intervention and the transfer test, during which students could interact, enhance their pipetting knowledge by other means, or practice their pipetting skills elsewhere (although given the laboratory asset requirements, this seems unlikely). Finally, the relatively small sample size and resulting low statistical power is a limitation, as it undermines the reliability of research (Button et al., 2013). Button et al. (2013) highlight three problems in studies with low power: (a) low chance of discovering true effects, (b) lower chance that an observed effect actually signifies a true effect, and (c) exaggeration of the magnitude of discovered true effects. Therefore, further research on acquiring procedural skills via traditional instruction versus VR instruction with larger sample sizes is necessary.
In this investigation, we conducted two different experiments to answer our research questions. Study I found no significant differences between conditions, suggesting that the design of the pedagogical agent did not affect learning outcomes. However, it lacked an authentic control condition as well as a real-life transfer test, which we employed in Study II with students from a technical high school. Results from Study II indicated that the VR group performed worse than the group receiving real-life instructions on many parameters, suggesting VR instruction might not benefit the acquisition of scientific procedural skills. However, it did indicate that performance in VR can predict performance on a real-life transfer test. Across all conditions in both studies, the participants’ self-efficacy increased from pre- to posttest. In the following, we delineate the practical and theoretical implications of the present investigation.
There are numerous practical implications that can be drawn from the present investigation. First, the results can be used to inform decisions on when to use VR simulations for training. While previous research within the biopharma industry indicates that VR is a cost-effective replacement to real-life training (Wismer et al., 2021), our results show that VR’s usefulness depends on the instructional goal in question.
When compared with traditional training, the results insinuate that VR simulations may not be an optimal solution for teaching learners how to use a pipette dexterously. This finding could be explained by the lack of similarity between the functioning of a VR controller and an actual pipette. Conversely, VR training was as effective as real-life instruction in terms of safety behavior and performing serial dilutions. In other words, VR training can be useful when the target behavior does not rely on fine motor skills. However, it is likely that haptic gloves with force feedback will become mainstream VR accessories in the future, which could revolutionize VR simulation training—even of fine motor skills (Meta, 2021). These findings corroborate the notion that VR is a useful complement to traditional education when it comes to trying things that would be too impossible, dangerous, or expensive to experience in real life (Bailenson, 2018). On a related note, the results also underscore the value of using VR for remote learning in the sense that anyone, no matter their location, can access science education without an expensive lab at hand. This could be valuable when circumstances such as the recent COVID-19 pandemic force students to stay at home (Petersen et al., 2021).
On a methodological level, this investigation shows that remote VR studies can be a useful addition to the traditional lab and field experiments that dominate the field. As highlighted by Henrich et al. (2010), 96% of psychology samples are from Western, educated, industrialized, rich, and democratic (WEIRD) countries, although WEIRD populations only constitute 12% of the world’s population. One of the main advantages of remote studies is that they provide a diverse study population, which could potentially strengthen the generalizability of results. Furthermore, as demonstrated by recent research conducted during the COVID-19 pandemic, experiments in different instructional contexts are necessary, as educational technology use differs in the classroom versus at home (McLaren et al., 2022).
Contrary to what was expected, the pedagogical agents employed in Study I did not have a significant effect on learning. Even though such null findings should be interpreted with caution, a number of theoretical implications for the study of pedagogical agents can be drawn from these results.
First, the results do not show compelling evidence in favor of using pedagogical agents to demonstrate performance of tasks. This indicates that social cognitive theory might not be a suitable theory for describing the benefits of using pedagogical agents in VR. As mentioned in Wang et al. (2021), a comprehensive theoretical model that explains the influence of pedagogical agents on learning is urgently needed. Such a model could build on existing theories, such as SAT, as well as empirical research on effective pedagogical agents. For instance, research suggests that intelligent tutoring systems (i.e., computer-based systems that track the user’s psychological state and respond with appropriate instructional activities) are highly effective (Graesser et al., 2012; Nye et al., 2014).
Another theoretical implication of this investigation concerns the psychological constructs involved when learning with pedagogical agents. As previously mentioned, SAT theorizes that pedagogical agents promote social presence which in turn promotes generative processing (Mayer, 2014). In the present investigation, however, we did not find a significant effect of pedagogical agents on social presence. This suggests that social presence might not fully capture the effect of being in the presence of a virtual entity in VR, and that it could be useful to examine other constructs in future studies. For instance, Ryan et al. (2019) use the term social influence to describe the effect of interacting with virtual representations. Social influence refers to changes in a person’s cognitions, attitudes, physiological states, and behaviors resulting from the perception that another person is present (Fox et al., 2015). Kyrlitsias and Michael-Grigoriou (2022) propose numerous methods for evaluating the effectiveness of social interactions with virtual humans, including objective measures suitable for VR.
This article illustrates how online and field research can provide added value to an investigation. Study I focused on the instructional design of a VR application by assessing the influence of differently designed pedagogical agents on acquiring pipetting skills in VR. There were no significant differences between the conditions: In other words, it did not seem to make a difference whether the pedagogical agent was present or not, or if it demonstrated the procedure or not. Study II extended the investigation to a high school setting and focused on the transferability of skills learned in VR to real-life, with the addition of a control group. As such, it provided ecological validity to the investigation. What was found was that performance in the VR simulation could predict performance on a real-life transfer test. In addition, comparisons between the VR condition and the control condition, who received traditional teaching, showed that students in the VR condition displayed worse dexterity with the pipette. A similar trend appeared when looking at knowledge and extraneous cognitive load: The VR group experienced more extraneous cognitive load and learned less. A finding, which was consistent across studies, was that all students experienced an increase in self-efficacy from prior to after the intervention. Hence, VR may not constitute an ideal way of training when it comes to learning a scientific procedure such as pipetting. Rather, it may be a complement to traditional teaching methods that can increase accessibility.