Abstract
Although the idea of learning engineering dates back to the 1960s, there has been an explosion of interest in the area in the last decade. This interest has been driven by an expansion in the computational methods available both for scaled data analysis and for much faster experimentation and iteration on student learning experiences. This article describes the findings of a virtual convening brought together to discuss the potential of learning engineering and the key opportunities available for learning engineering over the next decades. We focus the many possibilities into ten key opportunities for the field, which in turn group into three broad areas of opportunity. We discuss the state of the current art in these ten opportunities and key points of leverage. In these cases, a relatively modest shift in the field’s priorities and work may have an outsized impact.
Keywords: learning engineering, learning, mastery, feedback
Acknowledgments: The authors also wish to acknowledge the support of Nidhi Nasiar, Kayla Meyers, and Rina Madhani. The authors would like to thank Tom Kalil and Kumar Garg, who provided invaluable help and support. Many colleagues gave thoughtful suggestions and feedback including David Porcaro, Bror Saxberg, Gouri Gupta, Caitlin Mills, Chris Dede, Scott Crossley, Ben Motz, John Whitmer, Pior Mitros, and all of the participants in the 2020 Asynchronous Virtual Convening on Learning Engineering.
Funding: The preparation of this document and the asynchronous virtual convening were supported by a grant from Schmidt Futures, a philanthropic effort founded by Eric and Wendy Schmidt.
Disclosures: The authors declare no conflicts of interest in this work.
Correspondence concerning this article should be addressed to Ryan S. Baker, Graduate School of Education, University of Pennsylvania, 3700 Walnut Street, Philadelphia, PA 19104, United States or Ulrich Boser, The Learning Agency, Washington, DC 20001, United States. Email: [email protected] or [email protected]
There is an increasing interest in developing the discipline and practice of learning engineering to improve student outcomes. Learning engineering combines scientific knowledge and theory on learning. Learning engineering applies a rigorous combination of theory, data, and analysis to develop and improve educational systems and methodologies to produce enduring and high-quality learning. In terms of academic disciplines, learning engineering brings together a combination of computer science, data science, cognitive psychology, behavioral science, education, and instructional design. The emerging learning engineering community meets in an increasing number of places, including annual conferences on Adaptive Instructional Systems, the users meetings of the Generalized Intelligent Framework for Tutoring, the Learner Data Institute, the Advanced Distributed Learning community, IEEE ICICLE (standards meetings and one past conference), as well as at more established conferences such as Artificial Intelligence and Education, Learning Analytics and Knowledge, Educational Data Mining, ACM Learning @ Scale, Learning Sciences, and the International Conference on Computers in Education.
Though the idea of learning engineering was first introduced by Herbert Simon in the 1960s (Simon, 1967), the uptake of learning engineering by the broader field of education has been slow among both researchers and educational technology developers. However, today the use of learning engineering has increased considerably. This trend has been supported by better tools for authoring and experimenting with virtual and blended learning environments, a leap forward in the instrumentation of these learning technologies, the advent and low cost of cloud computing, and advances in both data science and educational research and design methods. As will be discussed below, there are now many examples of how a data-intensive learning engineering approach has the potential to benefit learners. However, the benefit of this approach has not yet extended to the full range of student learning experiences nor to a broad range of learning software developers and educational organizations.
In this article, we discuss the potential of learning engineering to bring the theory and practice of learning forward, including both key areas where learning engineering can bring benefits and key steps towards making those benefits a reality. We also discuss some of the challenges that have slowed the uptake of learning engineering and how these challenges can be addressed, with an eye towards the research still needed for learning engineering to reach its full potential. To summarize, this article attempts to identify key areas within the science and engineering of learning that could lead to major improvements in educational outcomes. We also discuss which enabling advances could be particularly important to speeding progress in educational research and development.
The structure of the remainder of the article is as follows: We first discuss the convening that formed the source of our recommendations. Second, we give an executive summary of the convening’s recommendations along with a table summarizing the recommendations further. We then discuss definitions of learning engineering and some key early successes. The bulk of the paper follows—a detailed exploration of opportunities within each of the ten categories of recommendation. The paper concludes with a discussion of challenges that cut across the recommendations.
Source of Recommendations
This article’s recommendations came about through a community convening where around 30 stakeholders in the field were invited to come together to discuss the potential of the emerging field of learning engineering and key challenges and opportunities for the field, the 2020 Asynchronous Virtual Convening on Learning Engineering. The convening was intended to build on past convenings bringing together researchers and practitioners to discuss more specific aspects of learning engineering, such as the annual convenings organized by the Generalized Intelligent Framework for Tutoring (GIFT) community which resulted in several books on key considerations in learning engineering (available at www.gifttutoring.org).
Due to the COVID-19 pandemic, and the impossibility of traveling to hash out ideas together in person, the organizing team chose an unusual structure. This structure lacked face-to-face discussion and large group discussion but enabled the organizers to solicit the opinions of a range of individuals at times of their convenience. It also allowed for multiple rounds of soliciting different opinions, which allowed us to realize in many cases that a key perspective was missing and then solicit it. In this convening, a set of questions was posed to a group of researchers, learning system developers, policy makers, and thought leaders. Some of these stakeholders were supported in assembling into small groups (taking both interests and time zones into account) and met virtually to propose ideas on the future of the field. Other stakeholders met one-on-one with the organizers of the convening or offered written comments. The organizers followed up with clarifying questions on some of the most intriguing ideas and put together this report summarizing the findings. This report attempts to represent the perspective of many stakeholders while bringing the ideas into a single voice and set of coherent recommendations. The participants represented a range of different backgrounds and perspectives coming from the places where learning engineering is conducted, studied, and supported: academia (education, computer science, information science, linguistics, and other fields), large and established educational technology companies and publishers, nonprofit R&D centers, start-up companies, government, and philanthropy. Participants were recruited on the basis of leadership and expertise in learning engineering or closely related fields. Multiple rounds of recruitment occurred—in several cases, participants in the convening noted that specific expertise was missing and recommended another participant for inclusion. A full list of participants is given in Table 2 (in Appendix).
Though a range of stakeholders were involved in the convening, teachers, school leaders, parents, and other members of broader society were not represented in the convening (except to the degree that many of the participants were themselves parents and some were former teachers). This article represents the perspectives of 30 stakeholders working in learning engineering and related areas or working with learning engineers in various fashions. Although we attempted to include a cross-section of perspectives, both disciplinary and organizational, there are inherent limitations within an intensive but small-scale effort of this nature. The choice was made to focus on stakeholders who are already familiar with learning engineering in some fashion and therefore are more familiar with its potentials and opportunities. This choice almost certainly omitted key ideas from teachers and school leaders who will be impacted by learning engineering projects. Making fuller efforts to include these voices (e.g., Holstein et al., 2019) will expand the potential of the field to solve the right problems. In addition, due to the substantial differences in educational systems and needs for learning engineering around the world, the decision was made early in the process to focus on the educational system in the United States. As such, most (though not all) of the stakeholders were based in the United States, a limitation to the scope of our findings.
High-Level Findings Summary
The 2020 Asynchronous Virtual Convening on Learning Engineering identified ten key areas of opportunity for learning engineering, which in turn were divided into three broad areas: Better Learning Engineering, Supporting Human Processes, and Better Learning Technologies. Each of these opportunities represented areas where progress had already occurred and successful examples existed. The goal and challenge are to take this work forward and increase its scale of application. The recommendations around this are summarized in Table 1, and will be discussed in detail through the remainder of the article.
Table 1Recommendations Summary |
Top 10 opportunities | Examples |
---|
Better Learning Engineering |
1. Enhance R&D Infrastructure in Widely Deployed Platforms | Make high-quality data available for a broader range of platforms Develop an ecosystem where researchers can more easily build on each others’ findings and research code Develop general purpose software components for identifying how effective content is Extend experimentation infrastructure to a broader range of learning platforms, along with good tools for authoring content for studies Extend experimentation testing infrastructure to study the effectiveness of combined interventions Develop general purpose software components for reinforcement learning Embed measures of student affect, self-regulated learning, and engagement into learning engineering platforms Support the development of easier processes and technologies for IRB and privacy compliance for learning platforms |
2. Build Components to Create Next-Generation Learning Technologies Faster | Create production-grade components for student modeling that can be integrated into different learning systems and used at scale Support research on the data needs and practical limitations of modern student modeling algorithms Create reusable components for interventions such as mindset interventions Develop production-grade toolkits to facilitate modeling complex student behavior Develop toolkits for natural language processing in education |
3. Learning Engineering to Support Diversity and Enhance Equity | Require that projects collect more complete data on learner identity and characteristics Require that projects check models and findings for algorithmic bias and differential impact Encourage participatory and inclusive design, involving members of the communities impacted |
4. Bring Learning Engineering to Domain-Based Educational Research | Create a network to incentivize and scaffold widespread sharing and collaboration on domain knowledge gaps Fund support for hybrid AI/human methods for knowledge graph discovery Support infrastructure for discovering and remedying student misconceptions |
Support human processes |
5. Enhance Human–Computer Systems | Increase the richness of data given to teachers while maintaining usability and comprehensibility Provide teachers with real-time recommendations about when to provide additional support to students and what kind of support to provide Support research on integration of computer tutoring and human tutoring |
6. Better Engineer Learning System Implementation in Schools | Improve integration of data between classroom practices, students’ learning experiences, and teacher professional development to study which practices around the use of learning technology are effective and scalable Develop a taxonomy of teacher practices around the use of learning technology, and use it to study which practices and professional development is effective and scalable Develop automated and semiautomated methods to encourage teachers to use the right practice at the right time |
7. Improve Recommendation, Assignment, and Advising Systems | Develop advising and recommendation systems that support better advising practices Design explainable AI methods for repurposing prediction models into easy-to-understand recommendations for advisors and students Fund infrastructure that enables experimentation around prediction and recommendation and connects it with outcome data |
Better learning technologies |
8. Optimize for Robust Learning and Long-Term Achievement | Increase awareness of existing cognitive science findings around robust learning Incentivize and plan for longer-term follow-up for A/B studies |
9. Support Learning 21st-Century Skills and Collaboration | Develop data science challenges to drive competition to create reliable and valid measures of 21st-century skills, including collaboration, using new technologies and data collection methods Develop data science challenges to drive competition to create learning systems that scaffold collaboration and support the development of 21st-century skills |
10. Improved Support for Student Engagement | Examine which engagement/affective interventions (both teacher-driven and automated) are effective for which students, in which situations Create a competition where engagement/affective interventions are combined and compared in a sample large enough to also study individual differences Develop better understanding of teacher and student preferences and comfort for engagement/affective interventions |
Note. R&D = research and development; AI = artificial intelligence; IRB = Institutional Review Board |
Better Learning Engineering involves improving the process of learning engineering itself to support faster progress across the field. Within the area of Better Learning Engineering, four types of opportunity were identified. First: Enhance R&D Infrastructure in Widely Deployed Platforms, extending automated experimentation, and educational data mining (EDM) to a wider number of learning platforms. Second: Build Components to Create Next-Generation Learning Technologies Faster. Developing a new learning platform with advanced adaptivity currently takes years of effort, and there is substantial duplication of effort across learning platforms. These challenges could be addressed by creating reusable components for generally applicable development tasks, such as student modeling, modeling complex competencies, mindset interventions, and the educational applications of natural language processing.
Third: Use Learning Engineering to Support Diversity and Enhance Equity, moving from research findings on learning technologies and machine-learning models obtained on convenience samples to equitable approaches demonstrated to generalize to broader and more diverse groups of learners. The importance of learning engineering repairing rather than sustaining or amplifying inequities in our society cannot be overstated. Fourth: Bring Learning Engineering to Domain-Based Education Research, enabling this large community to work with higher quality rapid research tools and more efficiently share their results.
Supporting Human Processes consists of better engineering the processes that rely upon human judgment, decision making, and activity in education. Within the area of Supporting Human Processes, three types of opportunities were also identified. First: Enhance Human–Computer Systems, using computers for routine and repetitive parts of instruction, empowering teachers and tutors with more complete information from the computer, and developing technology that knows when to loop in a tutor or teacher when the learner is not making progress. Second: Better Engineer Learning System Implementation in Schools. Many learning systems and curricula work well with motivated teachers, supportive administrations, and ongoing support from developers but fail when extended to a broader range of classrooms. Learning engineering can play a role in determining which practices around the use of learning technology are both effective and scalable and using learning analytics to design automated scaffolds for effective practices. Third: Improve Recommendation, Assignment, and Advising Systems, using data science to develop systems that proactively analyze student trajectories and make recommendations that increase the likelihood of successful graduation and life outcomes.
Finally, opportunities in Better Learning Technologies involve improving student-facing learning technologies to improve the effectiveness and overall quality of learning experiences. Within the area of Better Learning Technologies, three types of opportunities were identified. First: Optimize for Robust Learning and Long-Term Achievement, switching from rapid innovation cycles that focus on short-term learning gains to longitudinal work that verifies that designs and algorithms benefit students over a longer period of time and prepare them to learn in new situations. Second: Support Learning 21st-Century Skills and Collaboration, using learning analytics to develop reliable and valid measures of complex skills such as communication, critical thinking, and collaboration and produce learning experiences that support their development. Finally: Improve Support for Student Engagement, leveraging the advances in automatically detecting student affect and behavioral disengagement and embedding them in engagement/affective interventions that are both effective and acceptable to teachers, school leaders, parents, and students.
What Is Learning Engineering?
Teachers, policy makers, researchers, and parents have wanted to know which practices are best for understanding and supporting learning since before there was an established science of learning. However, even as the science of learning has emerged over the last decades (Hoadley, 2018), there is still a considerable gap between theories and findings on learning and the curricula and approaches that are used in the real world (Uncapher & Cheng, 2019).
There is an increasing interest in building the discipline of learning engineering (Simon, 1967). There exist several different definitions and perspectives on what learning engineering is or should be, and which disciplinary roots are most essential to this emerging interdisciplinary field. Nonetheless, there is general agreement that learning engineering involves combining scientific knowledge on learning with rigorous practice and data, to develop and improve educational systems and methodologies to produce enduring, high quality, and efficient learning.
Learning engineering is about design (Means, 2018). It leverages not only the learning sciences and cognitive science (Rosé et al., 2019), but also computer science and the rapid developments in the data available and the learning analytics methods available to analyze it (Shimmei & Matsuda, 2020). Both the science and the engineering of learning have often been treated as subsumed within the broader umbrella of the learning sciences (Nathan & Alibali, 2010). The emergence of learning engineering as an area in its own right creates an opportunity to distinguish learning engineering from learning science. Although learning scientists have long conducted their research through designing artifacts and learning experiences (Cobb et al., 2003), the foremost goal of learning science is the investigation of basic research questions and the development of theory. By contrast, the foremost goal of learning engineering is the enhancement of learning outcomes and related outcomes at scale. If we think of learning engineering as applying the ideas of engineering to learning and ultimately developing a practice of engineering learning, then learning engineering becomes about both effectiveness and efficiency (Dede et al., 2018).
In these terms, our field’s efforts to transform learning into an engineering discipline are in their beginnings. Indeed, we are only recently starting to see the emergence of organizations dedicated to applying the rigor of engineering practice to learning (discussed above). Ultimately, the metaphor of engineering suggests a possible future for education that is reliable, predictable, scalable, and based on known and standardized toolkits and specifications. We remain far from this future, and considerable development and research are needed to reach this point. Developing the tools needed to truly make learning an engineering discipline is a goal that underpins several of the recommendations within this manuscript.
Learning Engineering: Past Successes
While the field of learning engineering is still emerging and being codified, recent work and previous work that could be classified as learning engineering have already begun to contribute significantly to our understanding of how to best design, measure, and implement learning systems. Learning engineering’s blend of theoretical and algorithmic innovation, combined with a focus on developing and testing systems in a real-world setting with the goal of making a real impact on learning, has already led to advances in learning science and technologies.
One such contribution is the development and implementation of theoretical frameworks that provide systematic and structured guides on how to best implement research findings in an applied setting. These frameworks are designed to help guide instruction and/or instructional design, providing instructional principles to practitioners and researchers that are based upon empirical evidence. Perhaps the most widely used framework is the Knowledge Learning Instruction (KLI) Framework (Koedinger et al., 2012), which provides researchers and practitioners a systematic rigorous framework for examining the interaction between changes in instructional events and students’ transfer, retention, and preparation for future learning. Work within the KLI framework has investigated how practices interact and which combination of practices are most appropriate (Koedinger et al., 2013), as well as applying KLI principles to the study of new environments such as massive online open courses (Koedinger et al., 2016). Work using KLI has extended well beyond the original team that developed KLI (e.g., Bergamin & Hirt, 2018; Borracci et al., 2020).
Learning engineering and its antecedent work have also led to the development of paradigms for learning system and content development, which provide an overall approach to learning design. For instance, cognitive task analysis breaks down tasks into components to better understand and identify what skills, knowledge, and actions are needed to complete the task at an acceptable performance level (Lovett, 1998). Constraint-Based Modeling is used in systems like SQL-Tutor to support students in learning material where there may be multiple correct answers by representing the features of a correct answer rather than the process of finding it. This approach then bridges naturally into giving students feedback on the errors that they make during problem-solving (Mitrovic & Ohlsson, 2016). Knowledge graphs/spaces provide an ontology or model of the knowledge and skills needed for a specific task or domain knowledge (Doignon & Falmagne, 2012) and are used to select what content a student should work on next. Instruction modeling, used in Reasoning Mind, is the practice of engineering automated replications of the instructional practices of expert teachers (Khachatryan, 2020).
Example: A Practical Improvement: Improved Feedback
One of the major successes of learning engineering and the learning platforms it supports has been the provision of useful feedback to learners. Providing timely accurate feedback is crucial for student success (Hattie, 2012) but is often prohibitively time-consuming for teachers. By using learning engineering to iterative improve automating feedback, it is possible to scalably give students timely feedback that is adaptive to their student’s current state of knowledge in a domain and topic (Van Der Kleij et al., 2017).
Multiple approaches to automated feedback have found success in learning environments. Technology can provide teachers with recommendations on how to guide students based on student performance (e.g., Ingebrand & Connor, 2016). Automated student feedback and adaptive scaffolding have led to learning gains for students (e.g., Kim et al., 2018; Kroeze et al., 2019; Zhu et al., 2020). Dashboards used by students are one location where such feedback can be given. For example, one study explored the impact of a dashboard that provided real-time feedback of their digital engagement with course material, giving alerts when the engagement was detected as low (Khan & Pardo, 2016).
There are a number of dimensions along which the design of feedback can vary. The best approach to feedback and the best combination of design choices may vary by context (Koedinger et al., 2013). Numerous factors have been studied to determine what forms of feedback are effective for different learners and situations. For example, one study determined that the nuanced revising behaviors of writers (i.e., patterns of deletion, addition, and keystrokes) can inform adaptive feedback to promote better writing output (Roscoe et al., 2015). Another study found that responses to feedback varied significantly across grade levels but that superficial changes to feedback messages were not impactful (Howell et al., 2018). Feedback systems that account for students’ affective states can enhance engagement and learning by providing personalized prompts and activity recommendations (Grawemeyer et al., 2017).
Example: EDM and Learning Analytics: Better Measurement of Learning as It Occurs
Data and models of it have played a major role both in refining learning systems, and in creating algorithms that can be used to underpin personalization and adaptivity. Traditionally learning and student progress—and a system’s degree of success in supporting these—was measured using delayed, distal, and loosely aligned information such as grades and standardized test scores. The move towards measuring learning and other forms of progress using log data has allowed the development of measures which are immediate, proximal, and tightly aligned to the learning experience, with emerging interest by both data scientists (Fischer et al., 2020) and psychometricians (Bergner & von Davier, 2019)
Simply developing the ability to measure learning as it changed (i.e., Pelánek, 2017) was a step that enabled mastery learning, the underpinning of many modern adaptive learning systems. Going beyond that to measuring complex learning and performance in real-time (Gobert et al., 2013; Henderson et al., 2020; Rowe et al., 2017) enabled learning systems such as Inq-ITS (Li et al., 2018) to provide feedback and support on complex skills such as scientific inquiry. Going further still, recent experimental systems measure and attempt to support students in learning to self-regulate their strategy (Duffy & Azevedo, 2015; Roll et al., 2018) and affect (DeFalco et al., 2018; Karumbaiah et al., 2017). Better measurement has also supported efforts to iteratively engineer learning systems (Aleven et al., 2017; Huang et al., 2020), for instance by identifying where skills are mis-specified (Moore et al., 2020) or by systematically searching for less-effective learning content (Baker et al., 2018; Peddycord-Liu et al., 2018). This work has also led to the development of interventions and modifications to pedagogy that improve outcomes. While almost all of these interventions are relatively small in scope, they show the potential for the field. One study showed how automated content selection improved outcomes (Wilson & Nichols, 2015); others found that providing support for engagement led to better learning (Baker et al., 2006; Karumbaiah et al., 2017); and work in the Learnta system has shown how providing recommendations to instructors can help them select material for their students to work on (Zou et al., 2019). The same type of approaches can show what does not work, also key to iteration. Using UpGrade from Carnegie Learning, the Playpower Labs team found that adding a specific set of “gamification” features to an intelligent tutor actually reduced learner engagement by 15% in one context (Lomas et al., 2020).
Top High-Leverage Opportunities
In the following sections, we discuss the findings of the 2020 Asynchronous Virtual Convening on Learning Engineering in terms of ten high-level recommendations as to where the high-leverage opportunities are for learning engineering (see Table 1 for a summary). As discussed above, these recommendations are grouped into three broad categories: Enhancing Learning Engineering, Supporting Human Processes, and Developing Better Learning Technologies.
Better Learning Engineering: Enhance R&D Infrastructure in Widely Deployed Platforms
One major step towards increasing the positive impact of learning engineering is to improve the infrastructure of the field, the support available for conducting learning engineering. In this sense, we consider the field’s R&D infrastructure to be all of the enabling tools and technologies that make it easier and faster to conduct high-quality R&D. This includes software used to build, study, and iterate learning systems. It also includes software and processes enabling automated experimentation, software that can be quickly embedded in new learning systems to enhance their functionality, and tools and processes for data security, privacy, and compliance. It furthermore includes shared data sets to bring in a wider community of researchers who can help iterate design.
If high-quality learning engineering research can be conducted faster and with less effort, then the benefits of learning engineering can become available faster and more widely. There has already been considerable investment in research and development infrastructure for learning engineering, and the impacts are seen in the rapid deployment of studies through the ASSISTments platform (Ostrow & Heffernan, 2016) compared to the previous more intensive LearnLab model (Koedinger et al., 2012). The LearnLab model was in turn already much faster and easier for researchers than developing all infrastructure and arranging each research study one by one. Developing infrastructure and tools so that the millions of educational practitioners and researchers in the United States can use better, faster methods to study educational improvement with the thousands of scaled learning systems is a key opportunity for learning engineering. One could argue that all of the other opportunities for learning engineering that this document will discuss will be supported by improving the infrastructure for learning engineering. There are several opportunities in this area.
First, there are opportunities around increasing the scope of educational data available and the tools for collaboration among researchers. One of the key developments facilitating the uptake of EDM methods has been the wide availability of high-quality data. However, these benefits have been spread unevenly, with the majority of research conducted on data from a small number of platforms.
Simply making data available for a broader range of platforms would amplify the positive impact on the field. Going further, the creation of an ecosystem where researchers can more easily build on each others’ findings and research code would speed work compared to today. Currently, even when researchers share their code, it is often difficult to get it to run correctly (Boettiger, 2015). Both LearnSphere (Liu et al., 2017) and the MORF platform (Gardner et al., 2018) have attempted to create ecosystems that function in this fashion. Although LearnSphere is widely used in general, neither platform’s ecosystem has achieved widespread use. This may be due both to constraints on what kinds of research is possible in these platforms, as well as a lack of incentives for researchers to share their code in this fashion.
Part of the benefit of large datasets is that they can help find the complex answers that are needed to resolve long-standing “dilemmas” in education, such as the assistance dilemma (Koedinger & Aleven, 2007). In this instance, large-scale data with variation in what assistance students are offered could increase understanding of what support is helpful to which students and under what conditions. Does the assistance needed differ for students who generally have very low prior knowledge (as in Tuovinen & Sweller, 1999)? When is a worked example better than a hint (see McLaren et al., 2014, for instance)? Do students from different cultural backgrounds respond differently and benefit differently from the same seemingly cognitive learning support (as in Ogan et al., 2015)? Can we develop generative and broadly applicable theory that works across learning contexts? Large and diverse data sets can help.
In addition, these datasets can also catalyze “benchmark” challenges that researchers and technologists compete on and incentivize advancements in both fundamental and domain-specific aspects of education. Data competitions around automated essay scoring and future performance prediction have attracted a large number of competitors and produced scientific contributions and technological advancements (Stamper & Pardos, 2016; Taghipour & Ng, 2016); many areas appear ripe for such competitions. For instance, images around math handwriting seem ready for a competition, given developments in optical character recognition (OCR) and the amount of handwriting in math. Similarly, the use of voice recognition to identify students struggling to learn to read could benefit from the availability of a benchmark dataset. In this instance, the field needs audio files of younger readers as well as “ground truth” data on future reading issues.
EDM methods can also help to drive progressive refinement and improvement of learning technologies. For example, several online learning platforms now automatically distill information from their student models to determine if specific skills are ill-specified (Agarwal et al., 2018) or if specific learning content is more effective or less effective than the average (Baker et al., 2018). However, these types of approaches are still seen only in a small number of learning platforms. Creating infrastructure that can be widely adopted—such as packages that take data formatted in a standard fashion and generate automated reports—may speed the adoption of this type of practice.
Beyond this, A/B testing and other forms of rapid automated or semiautomated experimentation (see review in Motz et al., 2018) make it possible to quickly ask questions about learning. Currently, this type of technology is only used within a small number of platforms and studies (see review in Savi et al., 2017), although its scope is expanding over time. Extending this infrastructure to a broader range of learning platforms, along with good tools for authoring content for studies, would facilitate research on a variety of questions important to learning engineering. The same experimental infrastructure that supports scientific discovery may also hold benefits for refining and progressively improving a learning system. When doing so, attention must be paid to making sure that a comprehensive set of measures (engagement and longer term outcomes) are applied when evaluating the results of A/B tests to understand why one condition works better than another.
This infrastructure may help the field to understand not only whether a specific intervention works, but which interventions work in combination. Over the last decades, there has been a great deal of work investigating single interventions, but considerably fewer studies of whether two interventions have an additive effect or are in fact counterproductive when combined (Cromley et al., 2020; Koedinger, et al, 2013). Answering this type of question needs considerable data. In these cases, strategies such as reinforcement learning (Zhou et al., 2020) may help a learning system decide what intervention to offer which student, in which situation. Creating software packages that implement these reinforcement learning algorithms and can be plugged into a variety of learning platforms will speed the process of improving learning platforms. Some variants on reinforcement learning can also help avoid “local maxima,” where iteratively making small improvements leads to an improved system, but a different part of the solution space would have been better still.
This sort of experimentation should not be limited to learning scientists and engineers. Teachers can improve their teaching practice by conducting more systematic investigations of their own teaching, a practice seen for decades in Japan (Stigler & Hiebert, 1999). Partnering teachers with learning engineers in these endeavors will benefit both groups.
More broadly, infrastructure should encourage more open science. This means more sharing of data as well as more sharing of tools. This could take the form of a crowdsourced platform where data could be stored, processed, and analyzed, similar to emerging platforms in other fields focused on nurturing innovation. One idea would be the creation of a “data-processing” platform for learning engineering where data can be stored, processed, and analyzed. Similar platforms are emerging in other fields (i.e., Galaxy used for biomedical research; Blankenberg et al., 2014) to support increased open innovation.
This infrastructure can be both bottom-up and top-down. For instance, many researchers will develop their own ideas of what to test to further their own work, relying on their intuition for important research questions. But at the same time, the field should use the infrastructure to explicitly test some of the ideas laid out in survey papers. Similarly, there are longstanding debates over issues such as the timeliness of feedback that are ripe for further testing within such infrastructures.
Beyond this, support for embedding a greater number of measures—measures of student affect, self-regulated learning, and engagement, among other constructs—into these platforms would help us understand the full impact of an intervention. For example, if an A/B test only considers student correctness or the time taken to master a skill, it may miss important impacts on student engagement (see Lomas et al., 2013, for instance).
However, this type of infrastructure presents challenges. The first challenge is funding to create “research as a service” opportunities, incentivizing companies to leverage their data for research. Practically speaking, this means two things. Funding organizations should support the development of research infrastructure. Instead of funding individual researchers who must create their own tools for experimentation, supporting the creation of open access tools will enable a broader range of educators and researchers to perform experiments on large populations. Broadening the field in this fashion will accelerate the rate of discoveries beneficial to learning engineering.
When it comes to funding, there should also be greater support for private companies opening up their data. A number of organizations currently argue that they cannot offer research as a service because there is not yet a market. Additionally, data are often considered proprietary and therefore not made available to researchers. But private industry could be incentivized to open their data and systems to researchers. Increasing data openness would not only add to the science of learning’s body of knowledge, but also benefit the platform’s continued improvement. The success of such an effort will also depend on data being high enough quality to support learning engineering. Encouraging better data standards will support this effort, especially if it involves identifying and scaling best practices in data capture rather than adopting “lowest common denominator” data schemas that work in all platforms but discard the most useful data. For example, it may be beneficial to identify which variables have been useful in past research efforts and begin to capture them more systematically.
Data standardization efforts, such as xAPI, IMS Caliper, and the Pittsburgh Science of Learning Center DataShop formats, may remove some of the barriers to learning engineering. However, there have been significant challenges using data collected in the practice of educational research and development, even when using these standards, primarily caused by the lack of consideration of the ultimate uses of the data collected. Support to extend these standards for use in a broader range of learning sciences and learning engineering applications could serve to improve the quality of data and reduce the data engineering efforts currently needed. For instance, these standards could be extended with fuller representation of the context of learner behavior and more in-depth representation of inferences made by the system about learners.
Beyond funding and technical capacity development, there are also key challenges around ethics, legal compliance, and student privacy. Currently, for many developers, the easiest way to comprehensively protect learner privacy (or at least themselves) is to avoid sharing any data at all, or to share extremely limited data such as global summaries or highly redacted data sets. Often, measures are taken to avoid holding personally identifying information inhibit the feasibility of longitudinal follow-up or avoiding algorithmic bias. If data limitations make it so that most learning scientists and learning engineers can only ask certain questions, then largely those are the questions that will be asked.
Learning engineering can be part of the solution to this problem, providing frameworks and research around best practices for data security and privacy within educational technologies: methods for automatically redacting forum post data (Bosch et al., 2020), obfuscation, and blurring methods that retain but reduce the specificity of demographic information or introduce a small amount of error into data to prevent confident reidentification (Bettini & Riboni, 2015), platforms that allow trusted organizations to hold personally identifying information for longitudinal follow-up, and platforms that allow analysis using full data but do not allow querying or output in terms of specific data values (Gardner et al., 2018). Support for these technologies will enable a broader range of research, helping to achieve many of the other recommendations in this report. For example, efforts to ensure equity in learning systems (recommendation 3) will become infeasible if demographic data is discarded in the name of privacy.
In terms of compliance, the ASSISTments platform has done important work to streamline Institutional Review Board (IRB) processes at Worcester Polytechnic Institute (WPI) and create standard procedures for the WPI IRB to work with other IRBs. Having to obtain approval for each study from a different IRB with different “house rules” is a major delaying and complicating factor for learning engineering research; replicating this work across platforms therefore has considerable possible benefits. Compliance issues become even more challenging when dealing with cross-border research, where different countries often have different expectations and rules for human subject’s protections and privacy. Ideally, processes should be designed so that there is both standardization (limited bureaucratic effort to bring in new research or school partners) and flexibility (ability to easily accommodate different rules and practices).
Better Learning Engineering: Build Components to Create Next-Generation Learning Technologies Faster
Another key area of enabling research and development for learning engineering is in creating software components that make it possible to build next-generation learning technologies more quickly. Currently, developing a new learning platform with advanced adaptivity takes years of effort, limiting entrants to the field and leading to a proliferation of lower quality platforms. There are a range of potential tools and components that could be engineered for general purpose use.
At the moment, while several adaptive learning systems support various forms of adaptivity, their designs are largely one-off. The same functionality has been repeatedly created several times for different learning systems. Occasionally, the same infrastructure will be used within a company or a team for multiple curricula (see, for example, Cognitive Tutors for Algebra, Geometry, Middle School Mathematics; Algebra Nation content for Algebra and Geometry; ALEKS content for Mathematics and Chemistry), but different teams build adaptive functionality from scratch.
Ideally, an effort to create general components for adaptivity would be modular in nature, so a new developer could integrate specific components into their own architecture. General purpose architectures such as the GIFT architecture (Sottilare et al., 2017) offer another reasonable approach to sharing components, but require developers to adopt these architectures wholesale. Similarly, if there is an intervention with benefits thought to be at least somewhat general across learning domains and populations (such as a values affirmation intervention; Borman, 2017), it could be placed into a component so it can be reused across systems and does not need to be reimplemented.
Perhaps the largest opportunity for creating reusable components is in the area of student modeling. There has been a great deal of research in the last two decades into how to model a range of aspects of the student. However, little of this research has found its way into actual use in scaled learning systems. Currently, no education-specific algorithm (not even the widely used Bayesian Knowledge Tracing; Corbett & Anderson, 1995) has good “plug and play” infrastructure for building models, continually refitting them, and deploying the algorithm into a running system for its most common use, mastery learning.
Some already widely used algorithms, like Bayesian Knowledge Tracing (BKT; Corbett & Anderson, 1995) and ELO (Klinkenberg et al., 2011), can quickly and relatively easily be built into robust, full-featured, and implementation quality toolkits. Beyond that, there are more recently developed algorithms that offer benefits such as the ability to fit complex relationships between items (Zhang et al., 2017), the ability to represent partial overlap between skills (Pavlik et al., 2020), and consideration of memory and forgetting (Mozer & Lindsey, 2016; Pavlik et al., 2020; Settles & Meeder, 2016). However, work is still ongoing to understand and address these algorithms’ limitations for real-world use (Yeung & Yeung, 2018). Thus, developing implementation quality toolkits will also involve research into practical questions such as how much data is needed for these algorithms to function effectively, work already conducted for older algorithms such as BKT (e.g., Slater & Baker, 2018).
Going further, current student knowledge modeling typically captures relatively straightforward knowledge such as specific algebraic skills or factual knowledge rather than deeper conceptual understanding or complex generalizable skills. While a small number of projects have attempted to model and infer conceptual understanding (i.e., Almeda et al., 2019; Kim et al., 2016; Rowe et al., 2017) or inquiry skill (i.e., Gobert et al., 2013), these efforts largely represent “one-off” research projects. There is a lack of production-grade toolkits for this type of modeling that can be quickly leveraged by practitioners.
One additional area where reusable software components would speed progress is in natural language processing. Much more should be done to develop toolkits, and funders should support the creation of toolkits that support a greater variety of activities. Natural language processing can be used for a wide variety of educational applications, from automated feedback on essays (Roscoe et al., 2013), to detecting the depth and quality of student participation within online discussion forums (Farrow et al., 2020), to the creation of automated conversational agents in systems such as Jill Watson (Ventura et al., 2018), to adapting content for differing student reading levels, as is seen in the Newsela platform (Rand, 2020). Systems exist today which use each of these types of functionality, but the engineering process is highly intensive. Text classification still relies mostly on general purpose tools rather than tools tailored to educational domains. Tools exist for the creation of automated conversational agents, but either require extensive expertise (e.g., Cai et al., 2015) or offer only a limited range of functionality (Wolfe et al., 2013).
Three broad areas of natural language processing software components could be particularly valuable to learning engineering. First, development to make it easier to integrate existing algorithms smoothly and seamlessly into educational technologies via APIs, including algorithms for measuring readability (Crossley et al., 2017b), text cohesion and sophistication (Crossley et al., 2016), and algorithms for sentiment analysis (Crossley et al., 2017a). Second, tools for integrating speech-to-text software into educational technologies would make it possible to automatically translate students’ spoken responses into text that NLP tools can process. Third, work to develop linguistic analysis resources such as keyword lists and topic models for specific target domains such as mathematics. These steps would considerably facilitate the integration of natural language processing into a broader range of learning systems.
Better Learning Engineering: Learning Engineering to Support Diversity and Enhance Equity
Promoting equity in education and closing achievement gaps is a long-standing goal for educators and researchers but has remained elusive (Hanushek et al., 2019). The importance of learning engineering repairing rather than sustaining or amplifying inequities in our society cannot be overstated. The need to support diversity and enhance equity interacts in key ways with all of the other recommendations in this report, and learning engineering will not reach its full potential if it does not play a key role in addressing this vital challenge for society.
One approach to making instruction more equitable is making learning more individualized. As several convening participants noted, it is essential to get beyond one-size-fits-all interventions and create interventions that are sensitive to differences between learners and promote equity. Learning engineering is well-suited to help educators identify these needs and provide for them, in principle producing a more equitable learning experience through technology-enhanced innovation (Aguilar, 2018).
However, it is not given that individualized learning will steer instruction and assessment towards equity. Developers must be mindful to avoid algorithmic biases in analytics and recommendations (Gardner et al., 2019; Holstein & Doroudi, 2019), which can lead to models and interventions being less effective for specific (often historically underserved) groups of learners. Research has suggested that models fit on convenience samples can be less effective for specific groups of learners (Ocumpaugh et al., 2014). Building models that are verified to function correctly for all of the groups of learners using it remains a challenge for the field, although tools such as The Generalizer (Tipton, 2014) can help identify schools to sample to achieve a representative population.
However, there is still limited understanding of which differences between learners matter in specific situations and how these differences impact the effectiveness of learning technologies. As demand increases within school districts for evidence of equity as well as evidence of broader effectiveness (Rauf, 2020), it will become essential for learning engineers to fill this gap. This limitation can be addressed fairly quickly if research funders require that projects work with diverse and representative populations of learners, collect more complete data on learner diversity, and check models and findings for algorithmic bias using these variables. Race, ethnicity, studying in a second language, gender, neurodiversity, disability status, urbanicity, and military-connected status can all impact algorithm effectiveness (Baker & Hawn, 2021). However, data on these variables is currently seldom even collected (Paquette et al., 2020), a key first step that needs to be taken for the field to move forward on increasing equity. See recommendation 1 for a discussion of privacy and security issues surrounding the collection and use of this type of data.
Similarly, the designs that work in one learning context may not work as well in other contexts due to differences in school culture, students’ prior learning experiences and curricula, and differences in the national culture and background of learners. For example, learning science findings derived from nondiverse populations sometimes do not apply to other learners (Karumbaiah et al., 2019). Similarly, learning science findings obtained in one national culture sometimes do not apply within other cultures (Ogan et al., 2015). Improving the degree to which learning experiences are culturally relevant and build on material that students are already familiar with can also have significant benefits (e.g., Lipka et al., 2005; Pinkard, 2001). Developing designs that function well where they are applied is much easier when developers conduct participatory and inclusive design, involving members of the communities impacted (Tuhiwai Smith, 2013).
A final concern for equity through learning engineering is that not all students have equal experience with or access to technology. Remote learning or opportunities to acquire additional support are limited by a student’s access. For example, despite massive online open course (MOOC)s being thought of as an educational equalizer, they are not equally available to all learners (Park et al., 2019). MOOC outcomes are also generally poorer for learners from less privileged socioeconomic backgrounds (Kizilcec et al., 2017).
The recent pandemic has shown that technology access can also be much lower in underfunded school districts than elsewhere (Wolfman-Arent, 2020) and for undergraduates coming from lower income backgrounds (Jaggars et al., 2021). Learning engineering clearly cannot solve all inequities that lead to differences in access to technologies or how it is used. But the field needs to consider how effective results will be in practice given these constraints. Realistic estimations of effectiveness will encourage transparency around who will benefit from learning engineering advances while also shining a spotlight on the inequities that exist for students and the need to address them.
Better Learning Engineering: Bring Learning Engineering to Domain-Based Education Research
Designing a high-quality learning system or curricula for a domain depends on having a high-quality model of the content and learning needs in that domain. This need is magnified for adaptive learning systems, which often need to make real-time decisions based on such a model. Ideally, such a model represents the content of a domain, the structure of how that content interrelates (which skills are prerequisite to other skills) and the concepts with which a disproportionate number of students struggle (including misconceptions and preconceptions).
However, the current state of domain modeling in learning engineering is highly uneven between domains, particularly in terms of modeling the higher level relationships between content (such as prerequisite graphs). There has been considerable work on modeling mathematical domains, with some areas of mathematics having full prerequisite knowledge graphs developed separately by multiple companies or curricular providers. In this case, there are probable opportunities for the field by creating incentives for these knowledge graphs to be shared more broadly—there is considerable inefficiency in having multiple organizations (possibly even funded by the same funders) spend hundreds of thousands of dollars to create highly similar knowledge graphs.
By contrast, there has been considerably less focus on domain modeling in other domains. In science, language learning, and reading, models of student learning progressions focus on the development of specific concepts (e.g., Bailey & Heritage, 2014; Van Rijn et al., 2014), and there are Bayesian Network models that compose metaskills but do not model skill prerequisites formally (Shute & Rahimi, 2021). Work in biology has also represented how different concepts interrelate but in a way that supports fine-grained question generation rather than in a coarser fashion that supports curricular sequencing (Olney, 2010; Olney et al., 2012). In other domains, there has been even less work on domain modeling; indeed, existing ways of representing domain structure in adaptive learning systems may not even be appropriate in domains such as history or social studies, where it may not be feasible to identify clear prerequisite relationships.
However, recent approaches have been proposed that may be able to capture prerequisites for knowledge graphs more generally, from data sources such as student interaction data, the textual content of learning materials, and even university course catalogs (Chen et al., 2018; Liang et al., 2017; Pan et al., 2017). Providing funding for these approaches has the potential to speed the development and deployment of knowledge graphs to a broader range of domains and grain sizes, particularly if developers keep humans in the loop to identify spurious correlations. It may also be possible to distill draft versions of knowledge graphs from curricular documents developed by educational agencies, schools, and standard organizations (e.g., Pruitt, 2014; Sahami & Roach, 2014).
One way to speed efforts in this area would be to create a network of R&D teams who receive funding for their work to create knowledge graphs, under the agreement that they will open-source and share the knowledge graphs and findings they produce. This network could include machine-learning researchers working on automated approaches to distill knowledge graphs, and a central hub whose work is to integrate across all of the work being conducted into single, crowdsourced, shared knowledge graphs. These shared knowledge graphs would represent commonalities (including translation across different ontologies) as well as the different ways knowledge can be represented based on how skills and concepts are taught.
In addition, the infrastructure improvements discussed above in “Enhance R&D Infrastructure in Widely Deployed Tools” section can be used to support discoveries about student misconceptions that impact their learning (Elmadani et al., 2012), and pedagogical strategies for helping students learn these difficult skills and concepts (Lomas et al., 2016). The PhET platform in particular has attracted strong attention from domain-based educational researchers for studying learning in their domains (e.g., Correia et al., 2019; Wieman et al., 2010; Yuliati et al., 2018). By bringing domain-based educational researchers together in platforms like this one, we can crowdsource the development of prerequisite graphs and other domain structure models that can then be used at scale.
Supporting Human Processes: Enhance Human–Computer Systems
One of the biggest opportunities for learning engineering is in enhancing the systems that emerge when humans (teachers, guidance counselors, school leaders) and learning technologies work together to better support students. In best cases, it is possible to combine what computers are good at (rapid measurement and simple inference at scale) and what humans are good at (understanding why a problem is occurring and adjusting teaching accordingly; Baker, 2016). Examples of this include practices such as proactive remediation, where a teacher obtains information from a learning platform on a specific student’s progress and reaches out to them to offer assistance (Miller et al., 2015), and redesign of classroom activities based on the previous night’s homework data (Feng & Heffernan, 2006).
Perhaps the greatest immediate opportunity for enhancing human–computer systems in education is in enhancing the data provided to classroom teachers through dashboards. Dashboards have become a key part of learning technologies (Bodily & Verbert, 2017), and they are used by instructors in a considerable variety of ways (Bull, 2020, p. 440), from identifying students in need of immediate learning support (e.g., Miller et al., 2015), to selecting and resdesigning content (Lazarinis & Retalis, 2007), to identifying common mistakes across students (Yacef, 2005), and to supporting practices around classroom orchestration (Tissenbaum & Slotta, 2019). Dashboards support teachers in providing students with a broader range of types of feedback and support (Knoop-van Campen et al., 2021). Different dashboards provide a range of different types of information, from dropout prediction/SIS dashboards with data on attendance and assessments (Singh, 2018) to fine-grained learning technology dashboards that provide data on in-the-moment student performance (Feng & Heffernan, 2006). In some cases, these two types of dashboard are being integrated. For example, MATHia LiveLab predicts if a student will fail to reach mastery and provides these predictions to teachers (Fancsali et al., 2020).
Thus far, most dashboards for teachers do not provide in-depth data on student cognition or affect, although exceptions exist, such as Inq-ITS’s dashboard presenting data on student inquiry skill and supporting teachers in providing scaffolds to struggling students (Adair et al., 2020). Here, the challenge and opportunity are to increase the richness of data given to teachers while maintaining usability and comprehensibility.
Ethnographic research has suggested that teachers do not just want raw data. Instead, they want real-time recommendations about when to provide additional support to students and what kind of support to provide (Holstein et al., 2019; Martinez-Maldonado et al., 2014). In tandem, it will be essential to design dashboards—or other ways of informing teachers (i.e., Alavi & Dillenbourg, 2012; Holstein et al., 2018)—that support and encourage effective pedagogies for classroom data use. In other words, data dashboards are not just about communicating information, they are about supporting/changing practices, and will succeed to the extent that they support teachers’ (or students’ or school leaders’) goals. Going further, a key topic for future research will be to identify whether providing a dashboard has an impact on improving student outcomes, an under-studied area (but see Xhakaj et al., 2017).
An additional avenue for enhancing human computer systems is through enhancing the integration of computer tutoring experiences and human tutoring experiences. Many online learning platforms today offer access to human tutors as a complement to their otherwise digital offerings, from home credit recovery platforms such as Edgenuity (Eddy, 2013) to blended learning systems such as Carnegie Learning (Fancsali et al., 2018) and Reasoning Mind (Khachatryan et al., 2014). An industry of companies, such as Tutor.com, has grown to offer these services. However, there has been relatively limited formal research on this practice and how to use learning engineering to enhance it. In one of the few examples of research, Carnegie Learning analyzed the factors that predicted that a student would seek human tutoring through a linked platform (Fancsali et al., 2018). Considerably more research is needed to support learning engineering efforts in this area. Specific questions to address include:
Can we understand what leads to tutors being more or less effective in this blended context?
Is the earlier research on what makes human tutors effective relevant in this context?
When should students seek help from the computer, and when should they seek help from a human tutor?
At the moment, human tutoring embedded into computer tutors is on demand and depends on student metacognition, and is therefore used unevenly (Fancsali et al., 2018). Current integration of human tutoring also requires tutors to get up to speed quickly based only on information directly provided by the student. Eventually, through learning engineering, the field may be able to develop a more sophisticated blend of approaches: using computers for routine and repetitive parts of instruction, empowering teachers and tutors with more complete information from the computer, and developing technology that chooses to loop in a tutor or teacher when the learner is not making progress.
Supporting Human Processes: Better Engineer Learning System Implementation in Schools
Many learning systems and curricula work well under favorable conditions: motivated teachers and supportive administration, with significant involvement from the developers in support for teacher professional development, as well as ongoing support during the use of the system. However, these same learning systems and curricula often fail when extended to a broader range of classrooms, where teachers may be unfamiliar or uncomfortable with new methods and technologies and may attempt to assimilate new technologies back into traditional teaching practices (Wilson et al., 2018). These challenges in implementation are ultimately challenges for learning engineering. Can we design learning technologies that are easier for teachers to incorporate into their practice, while maintaining the benefits and advantages of these technologies?
The example of mastery learning is an example of a success in learning engineering implementation. Mastery learning, the practice of advancing students between topics only when they demonstrate they know their current topic, proved both effective and difficult to scale in traditional classrooms (Guskey & Gates, 1986; Kulik et al., 1990). Adaptive learning systems such as Cognitive Tutor/MATHia, ALEKS, Algebra Nation, and Dreambox made mastery learning a key part of their approach and scaled to hundreds of thousands of learners. However, even in these cases, some teachers work around the mastery learning aspects of the system’s design, overriding the system to advance students who have not fully learned the current topic (Ritter et al., 2016). These decisions lead students to struggle more and make more errors (Ritter et al., 2016). Understanding why teachers make these decisions will be key to developing learning systems and strategies that implement them that work in the real world, at scale. We cannot change teacher decisions, or improve our systems to better meet teacher goals, without understanding those decisions. In these cases, both the design of learning systems and the design of professional support and training become areas where learning engineering is the key.
But what are the right practices to help teachers develop? Many teachers adopt practices that are less effective than the practices designers intend, but some teachers may adopt practices around learning technologies that work better than what the designers intended (e.g., Schofield, 1995). There has been increasing effort to learning from and scale the strategies that exemplary teachers use (Perkins et al., 2012). Improved integration of data between classroom practices and students’ learning experiences can be used to study which practices around the use of learning technology are effective and scalable, and what contexts/situations these practices work best in. The data can then be used to analyze and detect whether best practices are being used, and develop automated and semiautomated methods to encourage teachers to use the right practice at the right time. For example, a pop-up message in a dashboard might encourage a teacher to speak with a student who has been struggling for the last 30 min. By also integrating data on teachers’ professional development experiences, learning engineers can study which professional development experiences lead to changes in practice and better outcomes for learners.
Hence, creating data systems that connect data on teachers’ professional development, their classroom practices, and students’ learning experiences will act as a key enabling factor for research and development to improve implementation. Once this infrastructure is in place, support for work to develop a taxonomy of teacher practices in classrooms using learning technology will facilitate the study of which practices benefit learners and how to engineer systems and professional development that produce those practices. These efforts should acknowledge that there may not always be a single set of practices that are optimal in all situations. The effectiveness of a given strategy may be impacted by local conditions and the attributes of both students and teachers.
What’s more, a data-infused approach could help determine if some teachers do a better job than their peers in helping a student master a given concept, a longstanding question in education. Next-generation approaches will leverage far more information than previous very aggregate value-added approaches (compare to Konstantopoulos, 2014) and instead identify specific instructional approaches that help students learn specific material. It may even become possible to identify how the best teachers customize and adapt learning environments for their classrooms, feeding back into the design of both professional development and adaptive support.
Supporting Human Processes: Improve Recommendation, Assignment, and Advising Systems
Over the last decade, applications that incorporate models that can predict if a student will fail a course or drop out have become a common part of K-12 and higher education (Bowers et al., 2012; Milliron et al., 2014). Today, these models are used to provide reports to school leaders and advisors (Milliron et al., 2014; Singh, 2018) or to drive simple automated interventions (Whitehill et al., 2015). These approaches have been successful at improving student outcomes in a variety of contexts (Arnold & Pistilli, 2012; Milliron et al., 2014; Whitehill et al., 2015).
However, predictive models are generally not yet built into advising or recommender systems used to support students in selecting courses. For example, course selection and registration processes at many institutions involve limited advising, leaving it to students to identify and select courses with minimal support. This leads to many students taking “excess” credits in college or community college that do not count towards degree requirements and use up financial aid (Zeidenberg, 2015).
There is an opportunity to use learning engineering to develop advising systems that proactively analyze student trajectories and make recommendations to advisors or to the students themselves. These recommendations may be able to increase the likelihood that the student achieves their goals, during high school (graduation and enrollment into college), college (completion of degree or transfer to 4-year college), and in the workforce (success at obtaining job and at job performance). There are systems in wide use that make course recommendations to students (Bramucci & Gaston, 2012); there are already models that can make this type of longitudinal prediction (see, for instance, Makhlouf & Mine, 2020; San Pedro et al., 2013); and there are models for how to optimally deliver recommendations of this nature (e.g., Castleman & Meyer, 2020). But there are only a few examples of integrating all three of these components together (e.g., Jiang et al., 2019).
The key challenge is to take models developed with the purpose of prediction and repurpose them for a different use recommendation. Next, learning engineering is needed to make the recommendation and proposed intervention maximally effective at achieving its goals, going beyond just improving the algorithms to re-engineering the practices of counselors and advisors and shaping their practices with the technology. A key part of this will be conducting iterative design building on relevant research literatures such as the extensive work on nudge interventions (Damgaard & Nielsen, 2018; Hansen & Jespersen, 2013) to develop recommendations that students and instructors follow and that achieve their desired goals of improved outcomes.
One challenge to achieving this goal is the current low degree of explanation provided along with recommendations. Many of the algorithms currently being used for advising and recommendation do not provide details on why recommendations are made, making it difficult for practitioners to understand and trust the recommendations and raising concerns of unknown algorithmic bias. There are also concerns that current approaches offer recommendations for students “on the bubble” of success and failure while leaving students at very high risk unsupported.
Recommendation and advising systems can be improved through increased support for research and development on repurposing prediction models in this space for use in recommendation. There are two key steps to this. First, research and development on how to distill human-interpretable and actionable recommendations out of complex prediction models. Adapting explainable AI methods to the problem of actionability—so that models are not just explainable, but explainable in ways that enable action (Kay et al., 2020)—will require projects that bring together experts in machine learning, human–computer interaction, and education.
Second, modern recommender systems in other domains can improve their own performance by studying whether their recommendations are followed and what the results are. This goal can be achieved by building a laboratory at a specific institution such as a community college (or several such institutions) that brings together an infrastructure that enables experimentation around prediction and recommendation and connects it with outcome data. In taking these steps, it will be essential to support and fund solutions that are inspectable, understandable, trustworthy, and beneficial to the full range of learners.
A related problem is matching students to schools in large school districts or local educational agencies. Many school districts still use cumbersome, complicated multistage enrollment processes where many schools leave places unfilled, and many students end up in schools that they do not prefer and end up leaving. Work over the last decade in several cities has shown that even relatively simple matching algorithms can lead to much better matching outcomes (e.g., Pathak, 2017). Extending this work with the use of sophisticated AI-driven recommender systems has the potential to guide students to make better choices about which schools they list as their preferences, choices that are more realistic and more likely to lead to personal and career success.
Better Learning Technologies: Optimize for Robust Learning and Long-Term Achievement
It is important to design learning experiences that support students in developing robust learning: learning that is retained over time, transfers to new situations, and prepares students to learn in the future (Koedinger et al., 2012). Learning design has often emphasized short-term learning outcomes, as they can be easier to measure. As Caitlin Mills noted in the convening, “The things that may be most easy to immediately measure (number of problems done or such) may not be the things we wish to optimize.”
There is a risk that this problem will be amplified by learning engineering, despite its significant benefits overall. The common practice of improving a product using rapid innovation cycles risks focusing on measures that can be applied at that rapid time scale. Similarly, the practice of using learning system data to assess the effects of an innovation risks focusing energy on improvements that are easy to measure. It is easy to measure immediate performance improvement on an exact well-defined skill, and there are now hundreds of examples of this type of work. It is significantly harder to quickly measure transfer across skills or preparation for future learning, although many examples still exist (Koedinger et al., 2012). Taking this step is nonetheless essential to guarantee that learning engineering produces learning that is active and useful to learners.
However, the field has a poor understanding of which interventions’ effects persist over time and the dynamics of different interventions across multiple time scales (i.e., dosage, appropriate repetition, half-life). Measuring long-term impacts requires researchers to plan ahead and maintain continuity of follow-up on students. Relatively few researchers even look at retention of knowledge over the span of a few weeks or months. It becomes even more difficult over a period of several years, a span of time where students move to new schools, learning systems may change their designs in significant ways, and research team composition is likely to change. Unlike research on much coarser-grained interventions, such as charter schools (Sass et al., 2016), we are only aware of one example where students who used an adaptive learning system were followed up over the span of several years (San Pedro et al., 2013, 2015; Almeda & Baker, 2020).
Given that many pedagogies and learning strategies work in the short term and for the exact material studied, but can lead to poor recall and transfer (Braithwaite & Goldstone, 2015; Rawson et al., 2013; Rohrer, 2015; Schmidt & Bjork, 1992), this limitation in current practice carries risks of worsening outcomes for students rather than improving them. We need to better understand the longer term impacts of learning engineering decisions. This concern can be addressed in multiple ways.
First, the field should be made more aware of designs and approaches that are already known to lead to worse outcomes in the long term, such as cramming (Rawson et al., 2013), massed practice (Rohrer, 2015), and the lack of interleaving of related skills (Braithwaite & Goldstone, 2015). Many learning systems still use the massed practice of skills taught in a block, particularly in mathematics, although an increasing number of learning systems are using learning engineering and data to optimize spaced practice (Butler et al., 2014; Settles & Meeder, 2016). However, in doing so, it is important to take into account specific cases where the initial massed practice is beneficial, such as the learning of characteristic perceptual features (Carvalho & Goldstone, 2017).
Second, the field needs to go beyond conducting short-term studies on interventions and designs. Short-term A/B tests are convenient to run but may favor approaches whose benefits do not sustain. As such, we recommend additional work in this space. Specifically, explicit plans should be put in place to follow-up promising studies, systems, and pedagogies, to see if the apparent benefits sustain over a longer term. This should be applied to a range of types of intervention, from learning interventions to persistence/retention interventions. A range of possible benefits may be possible, from greater educational attainment to career and even health benefits.
This type of follow-up research is easier if it is planned for in advance, by saving key follow-up information (in ways that respect students’ choices of whether to be followed longitudinally) and deciding in advance how and when follow-up will occur. This trend is already occurring and simply needs further encouragement, perhaps by setting aside funding for follow-up at the start of a project.
Similarly, there should be greater emphasis on preparation for future learning. What areas of mastery, or ways of learning a topic, improve a student’s ability to learn the next topic faster or more effectively (e.g., Bransford & Schwartz, 1999; Chi & Vanlehn, 2007)? Much of the work seen in this area so far is specific to a single learning domain, such as a specific mathematical skill being key to the development of other later skills (Booth & Newton, 2012). For example, work by NWEA and EDC involving a data set of over 200,000 students found that student performance in four 6th grade mathematical domains were each independently predictive of achievement in 8th-grade algebra (Almeda et al., 2020). For example, a one standard deviation increase in Real and Complex Number Systems was related to one-third of a standard deviation improvement in math overall 2 years later. This applies beyond just mathematics. Emerging work suggests, for instance, that scientific inquiry skills learned in physics can be applied in biology as well (Sao Pedro et al., 2014). By finding which learning activities are most essential for future progress, we can focus instructional and design effort where it has the highest potential impact. More should be done to look at the educational impact and accelerated future learning, further downstream from the intervention, even months or years later.
Better Learning Technologies: Support Learning 21st-Century Skills and Collaboration
Much of the learning technology currently in use focuses on relatively narrow academic skills, but more complex skills such as collaboration, communication, and critical thinking (often referred to as “21st-century skills”) will be key to career and life success in the coming decades (Dede, 2010). These skills are often hard to measure as they do not have a purely right or wrong answer to classify easily. Using new technologies, new data collection tools, analytics, and psychometrics, learning engineering can focus on the development of reliable and valid measures of these hard-to-measure constructs and produce learning experiences that support their development.
For example, game-based assessments and simulations appear to have promise for measuring a range of 21st-century skills, from inquiry skills (Gobert et al., 2013; Sparks & Deane, 2015), to cognitive flexibility and conscientiousness (Shute et al., 2015), to collaborative problem-solving (Chopade et al., 2018; San Pedro et al., 2019). Intelligent tutors have also proven to be useful environments for studying self-regulated skills such as help-seeking and strategies for improving these skills (Aleven et al., 2016). One of the largest challenges to developing these types of measurements is obtaining reliable and agreed-upon human judgments of 21st-century skills that can be used to leverage machine learning or to inform evidence-centered design approaches to developing these measures. One path to collecting this data may be to improve tools for visualizing and annotating student log data (Rodrigo et al., 2012; Gobert et al., 2013; Rowe et al., 2019), to support discussion and refinement of coding schemes, comparison between human coders and analysis of their differences, and data-driven discussion around measurement design.
An area of particular importance is 21st-century skills around collaborative learning and collaborative performance. Collaborative work is an integral part of our society both at the academic level and in the workforce. Learning engineering is uniquely positioned to help practitioners, employers, and students better understand collaboration through advanced technology and methodologies. While there has been initial work using evidence-centered design to assess collaboration (Andrews-Todd & Kerr, 2019; Nouri et al., 2017), this work is still in its beginnings.
Collaboration is an important strategy for learning, but current learning tools and systems for collaboration are less advanced than tools and systems for individual learning. Learning engineering can begin to shed more light on best practices for evaluating collaborative work, teams, communication, and other skills directly related to 21st-century skills. This challenge becomes more tractable as learning shifts increasingly online. Collaboration taking place completely in person is difficult to measure without complex multimodal approaches (Järvelä et al., 2019; Noel et al., 2018) or sophisticated equipment (Martinez-Maldonado et al., 2013) that are difficult to deploy in real classrooms. By contrast, collaboration taking place fully online can be considerably easier to measure. Discussion forum data, for instance, is quite easy to analyze, leading to research that integrates across grain sizes, from textual cohesion to social networks (Joksimović et al., 2015). Even ZOOM recordings, while not collected with data analysis in mind, provide direct images of participants’ faces and a view of the document being shared, which will be easier to work with than cameras deployed in classrooms where students are moving around as they work together.
Learning engineering can help develop tools that scaffold collaboration and better assess collaborative skill, to help learners learn to collaborate and learn more effectively while collaborating. As an example, research and development could further examine the viability of wearable devices that measure social interactions (Montanari et al., 2017). Such tools could better measure 21st-century skills like collaboration and social engagement by examining which students engage with others and how it shapes performance and learning (Evans et al., 2016; Martinez-Maldonado et al., 2012). In general, data will enable us to study which learner behaviors and strategies lead to effective collaboration and learning outcomes through collaboration. Given the unique nature of collaborative learning, social relationships and the behaviors that support their development may play a key role (i.e., Gašević et al., 2013; Kreijns, 2004).
Learning engineering has a role to play in creating learning experiences that can measure and support the development of 21st-century skills. This work will involve the creation of better measures and better methods for developing measures. Evidence-centered design and EDM have each been successful at measuring specific 21st-century skills (e.g., Gobert et al., 2013; Kantar et al., 2018; Shute & Torres, 2012; Snow et al., 2015); however, there is still insufficient work to understand when each method is best and how to use these methods together (but see Mislevy et al., 2012; Rupp et al., 2012).
Support for formalizing methods for measuring 21st-century skills, including collaboration, may expand the use of these methods, particularly if the field can successfully articulate and systematize how evidence-centered design and EDM should be used together. In addition, work to enhance students’ 21st-century skills, including collaboration, has not sufficiently looked into the long-term retention of what is learned and the translation of those skills to new contexts (a more general problem; see previous recommendation).
These areas of learning engineering are currently moving forward, and these goals are on track to be achieved … but not quickly. Hence, the major challenge here is to speed research on key goals such as developing better measures of 21st-century competencies and better methods for developing them. Although considerable effort goes into this problem today, different research and development teams are working on different aspects of this problem. As such, there is limited scope for the type of competition that often accelerates progress. Attempts to bring together large numbers of researchers and developers to discuss these problems have led to committee solutions that do not seem to kick-start the field (e.g., Graesser et al., 2018; Krumm et al., 2016). Instead, the field may benefit from explicit competition, such as seen in the ongoing competition to develop better measures of student knowledge or in other domains such as natural language processing and image processing.
Existing competitions in educational data have been too brief in duration for this type of challenge. Instead, it may be worthwhile to establish challenges like the Loebner Prize that attach funding to demonstrating specific types of functionality in measurement or skill development. For instance, a prize could be given to the first team to produce an automated measurement of collaboration associated with better workplace outcomes in cross-culture workplaces, or the first team to produce an intervention based upon automated measurement of conscientiousness that led to higher conscientiousness in real-world tasks.
Better Learning Technologies: Improved Support for Student Engagement
There is growing acknowledgment that there is more to learning than just what is learned. Student engagement can make a big difference, for immediate learning (Cocea et al., 2009; Craig et al., 2004), course performance (Cheng et al., 2011; Fincham et al., 2019), and longer term interest and participation in a subject (Almeda & Baker, 2020; San Pedro et al., 2015). Developing technologies that take student engagement and affect into account has therefore become an important goal for many in the learning engineering field. Engagement and affect have been measured from both sensors and from logs of student interactions with learning systems (see reviews in Baker & Ocumpaugh, 2014; Baker & Rossi, 2013; Calvo & D’Mello, 2010; Kovanovic et al., 2016). However, though the technology exists to measure the engagement and affect, the technology is not yet in place to reliably use these measurements to improve student experiences. Though a small number of approaches have been effective at improving engagement and learning, these technologies have not scaled.
Investments in infrastructure in this area may assist in the scaling of this type of technology. Currently, three approaches have been used to collect data on engagement and affect for developing automated measurements: classroom observations, video data, and self-report. The classroom observation path to developing automated measurements has been used in over a dozen systems, has a widely used Android app for data collection (Baker et al., 2020), and even the financial costs have been systematically studied (Hollands & Bakir, 2015). However, it is not feasible in remote learning contexts. Several self-report instruments exist for capturing affect and engagement in real time or near real time (Hutt et al., 2019; Sabourin, Mott, & Lester, 2011; Smith et al., 2018). Developing a software plug-in that implements a standard self-report instrument for engagement and affect that has been validated across learner populations will increase the feasibility of collecting large-scale remote data. This self-report data can then be used to develop detectors that recognize student engagement and affect in real time from interaction data. For video, it may be possible to develop a single suite of engagement/affect detectors validated to work across populations, as much has been done for basic emotions by several commercial vendors. The largest challenge to doing this will be the collection of a large scale and diverse corpus of video learning data, annotated in terms of key individual differences such as age, gender, race/ethnicity, and type of camera/webcam, labeled by culturally appropriate trained coders (Okur et al., 2018).
Moving forward, learning engineering has the opportunity to examine which engagement/affective interventions (both teacher-driven and automated) are effective for which students, in which situations. A range of possible types of interventions have been developed, from empathic and conversational agents (D’Mello et al., 2009; Karumbaiah et al., 2017; Robison, et al., 2009), to visualizations of student engagement (Arroyo et al., 2007; Xia et al., 2020), to automated feedback on the quality of discussion forum posts (Butcher et al., 2020), to messages provided between learning activities (DeFalco et al., 2018), or to students who have ceased participation in a learning activity (Whitehill et al., 2015).
However, relatively little work has studied how students’ individual differences impact the effectiveness of these interventions (but see Arroyo et al., 2013; D’Mello et al., 2009), and insufficient work has compared different intervention types to each other or attempted to integrate multiple interventions. Learning engineering can help to answer these questions. This limitation in the research so far can be addressed through creating a competition where different interventions are compared and integrated, in a large sample of students where individual difference measures are also collected.
In parallel, work is needed to figure out how to design these interventions in ways that teachers, school leaders, parents, and students are comfortable with. Many interventions that have been successful at reducing disengagement or improving affect are not acceptable to students or teachers, reducing their potential to scale. Greater support for work to understand these individuals’ needs and desires and to design in accordance with these needs (i.e., Holstein et al., 2019) increase the potential for uptake and scaling.
The role of parents is particularly important. As a long-standing body of research shows, parents, home life, and other factors external to the classroom have a considerable impact and can exacerbate achievement gaps (Hara & Burke, 1998). A number of programs show that engaging parents can make substantial impacts (Berkowitz et al., 2015; Mayer et al., 2015). One of the challenges in empowering parents to support their children is the numerous gaps that currently exist in the communication between school administrations, students, and parents. Simple interventions like providing parents login information for school learning management systems can lead to improvements in student achievement (Bergman, 2020), as can providing parents automated text nudges (Bergman & Chan, 2019). Hence, it is becoming clear that opportunities can be created if learning engineering focuses on parents as a lever.
Conclusion
This article summarizes the findings of the 2020 Asynchronous Virtual Convening on Learning Engineering. In this article, we have outlined ten key areas of opportunity for research and development in learning engineering, and within those areas propose 33 high-potential directions for work, summarized in Table 1. There are overlaps between many of these potential directions. For example, better algorithms for equity can and should be pursued in projects focused on other topics, and enhanced R&D infrastructure will support all the other areas of opportunity in this article. Encouraging R&D teams to consider multiple of these opportunities in a single project will help to expand coverage and bring the field forward. As this article demonstrates, learning engineering has the potential to have huge impacts across a variety of areas. A considerable number of successful examples of learning engineering exist. However, scaling and dissemination remain as challenges.
Despite considerable thought around how to scale learning technologies (Clarke & Dede, 2009), scaling up remains a challenge for many key innovations that have been developed or enhanced through learning engineering. Too many of the most sophisticated technological approaches and pedagogical approaches remain in research classrooms, either as wholly academic projects or as demonstrations and pilots by platforms that have broader use. The move towards making learning platforms into platforms for research has led to a proliferation of papers on how to engineer learning better, but many of those innovations have not scaled even in the systems being studied. This situation underpins the recommendation around better engineering of implementation, a key step towards scale.
Dissemination of ideas remains as much a challenge as scaling up specific learning systems. Many of the findings of learning engineering could apply in new learning platforms and in nontechnological learning situations, but remain applied in a single platform. Even when shared, most learning engineering findings are disseminated in academic journals or at academic conferences. While this is effective at engaging other scientists, sharing ideas, and promoting collaboration, these mediums are not optimal for putting work into practice at scale.
Teachers, parents, policy makers, and even many learning system developers are unlikely to read (or have access to) academic journals and conferences and thus are often unaware of new results and findings that directly impact their classrooms and learners. Until work is more widely disseminated, there will remain a disconnect between the large volumes of high-quality R&D work being generated both in academia and industry and educational practice. Thus, in addition to recommending the research and development directions within this document, we also recommend continued efforts to enhance connections between research and practice.
Overall, the process of bringing together the 2020 Asynchronous Virtual Convening on Learning Engineering and compiling this report has indicated that learning engineering has considerable promise for enhancing learning experiences, enriching learning, and supporting better long-term achievements by learners. Strides have already been made; we are at the beginning of the field’s journey towards transforming education.
Appendix
Participants
Each of these participants took part in the 2020 Asynchronous Virtual Convening process in some fashion.
Participants do not necessarily agree with all aspects of this report.
Table 2Table of Participants |
Participant |
---|
Elizabeth Albro |
Pathrikrit Banerjee |
Michael Binger |
Gautam Biswas |
Anthony Botelho |
Christopher Brooks |
Emma Brunskill |
Paulo Carvalho |
Catherine Cavanaugh |
Samuel Crane |
Scott Crossley |
Kristen DiCerbo |
Phillip Grimaldi |
Sunil Gunderia |
Neil Heffernan |
Andrew Jones |
Rene Kizilcec |
Janet Kolodner |
Diane Litman |
Bruce McLaren |
Benjamin Motz |
Phil Poekert |
Steve Ritter |
Erica Snow |
Jim Stigler |
Anne Trumbore |
Melina Uncapher |
John Whitmer |
Anonymous Participants |
Copyright © 2022 The Authors
Received May 2, 2021
Revision received September 27, 2021
Accepted October 1, 2021