Positive feedback has known benefits for improving task performance, but it is not clear why. On the one hand, positive feedback may direct attention to the task and one’s motivations for having performed the task. On the other hand, positive feedback may direct attention to the task’s value for achieving a future goal. This ambiguity presents a challenge for the design of automated feedback interventions. Specifically, it is unclear whether positive feedback will more effectively influence behavior when it praises the recipient for having performed an action, or when it highlights the action’s value toward a goal. In the present study, we test these competing approaches in a large-scale field experiment (n = 1,766). Using a mobile app, we assigned college students to receive occasional notifications immediately upon submitting online assignments that either praised them for having submitted their coursework, or that highlighted the value of submitting coursework for academic success, or to a no-treatment control group. We find that only praise messages improved submission rates and course performance, suggesting that drawing attention to the feedback-eliciting task is necessary and sufficient for influencing behavior at scale.
Keywords: feedback, praise, value intervention, mobile notification, student support
This research was made possible by support from the Indiana University Office of the Vice President for Information Technology’s Learning Technology Division. We appreciate the contributions and assistance of Stacy Morrone, Joanna Ray, Jamie Israel, Justin Zemlyak, and John Gosney.
Declaration of Interest: Since this study was conducted, Indiana University has licensed the Boost app to a commercial startup (https://boost.education) so that it can be made available to other institutions. As inventors of the app, authors Benjamin A. Motz and Matthew G. Mallon have a partial financial interest in licensing fees collected from the company, and the company itself.
Disclaimer: Interactive content is included in the online version of this article.
Open Science Disclosures:
Correspondence concerning this article should be addressed to Benjamin A. Motz, Department of Psychological and Brain Sciences, Indiana University, 1101 East 10th Street, Bloomington, IN 47405, United States firstname.lastname@example.org
If you want someone to perform a behavior more often, it is generally a good idea to provide feedback or encouragement when the behavior is performed, by acknowledging the effort or simply saying “Good job.” These kinds of feedback have long been shown to facilitate behavior change across a variety of domains, including classroom behavior (Hall et al., 1968), retail customer service (Crowell et al., 1988), healthy eating (Campbell et al., 1994), and exercise (Guyatt et al., 1984; Weber & Wertheim, 1989). Traditionally, such forms of encouragement would be spoken in person, by a teacher, boss, physician, or coach, but more recently, personal mobile devices make it possible to deploy positive feedback immediately tied to eliciting behaviors, at very large scales (Hermsen et al., 2016; Lathia et al., 2013; Schembre et al., 2018; Villalobos-Zúñiga & Cherubini, 2020). For example, medical monitoring systems trigger encouraging text messages acknowledging when diabetic patients report their blood glucose levels (Yoo et al., 2009), and many smartwatches and fitness trackers vibrate and display congratulatory animations when people achieve personal activity goals (Sullivan & Lachman, 2017).
The prevalence and effectiveness of computer-generated positive feedback interventions raise questions about their mechanism of influence on behavior. When presented by a teacher or other authority figure, positive feedback may be socially rewarding, but there is nothing naturally rewarding about an automatically triggered text message or a vibrating wristwatch (i.e., unless it is conditionally associated with meaningful information). On the contrary, these may even be viewed as disruptive (Heitmayer & Lahlou, 2021; Mehrotra et al., 2016), so it is unlikely that these alerts provide reinforcement in operant conditioning terms, nor directly incentivize people to change their behaviors on their own. Furthermore, long-standing research in education settings has found little evidence that positive feedback acts as a reward, as the recipient ultimately renders their own personal interpretation of any information presented in the feedback (Kulhavy, 1977; Mory, 2004).
Rather than viewing positive feedback as rewarding, theorists suggest two possible explanations: positive feedback either affects human behavior by (a) drawing attention to the preceding action that elicited the positive feedback and, by extension, to one’s motivations for performing the task (Kluger & DeNisi, 1996) or (b) by providing information that reduces the perceived gap between a behavior and an intended goal state (Carver & Scheier, 2012; Hattie & Timperley, 2007). In other words, feedback causes a recipient to direct attention at task performance and effort invested in performing the behavior, or to direct attention on the goal state (the consequence of the behavior) as having become closer and more attainable.
These differing accounts of how feedback affects behavior present an obstacle to the effective design of behavioral interventions. Most notably, it remains ambiguous whether positive feedback is most effective when it highlights the performance of the behavior itself, or instead when it highlights progress toward a goal. If the former, interventions that praise task performance with positive feedback might be more effective at facilitating behavior change. If the latter, interventions that emphasize the value of task performance for advancement toward a goal within positive feedback might be more effective.
The goal of the present study is to experimentally evaluate these different messaging approaches in an automated feedback intervention, comparing positive feedback messages that praise an individual for having completed a task, or that emphasize the task’s value toward a goal. No past studies have made this direct comparison.
We examine these approaches in an education context, with an automated intervention designed to improve assignment submission rates among college students. In general, college students are completing and submitting an increasing volume of their assignments online, a trend broadly driven by higher education’s increasing adoption of online learning management systems (Pomerantz & Brooks, 2017), and recently accelerated by the transition to remote instruction due to the coronavirus disease 2019 (COVID-19) pandemic (Motz, Quick, et al., 2021). Unsurprisingly, submitting these assignments is a dominant factor in student success (Kauffman, 2015) and engagement (Motz et al., 2019), and is moderated by students’ self-regulation and motivation (Bembenutty & White, 2013; Kitsantas & Zimmerman, 2009).
Given that students’ work is largely submitted online in centralized digital platforms, records of this online activity can be used to trigger real-time automated interventions, providing student support at a population scale. In the present study, we examine such an intervention aimed to improve submission rates through positive feedback, motivated by past research findings that feedback and confirmation of students’ work have been shown to improve homework completion and student performance (Núñez et al., 2015; Rosário et al., 2015). The current intervention is a push notification in a mobile app, deployed when students submit online coursework, either providing praise for having submitted an assignment, or emphasizing the value of having submitted an assignment.
There is a general need for reconsideration and extension of the theory that informs behavioral science, as conventional methods for facilitating behavior change may not naturally apply to new technological innovations (Orji & Moffatt, 2018; Patrick et al., 2016; Riley et al., 2011). As institutions leverage online platforms to reach very wide audiences, research is needed to examine the alignment of behavioral science with these new frontiers of technology-mediated interventions. Feedback, in particular, is open to interpretation by the feedback’s recipient, and considering the range of possible individual states and environmental settings where automated feedback might be delivered, it is particularly important to chart theoretical limits to the scale and ecological validity of feedback’s benefits when deployed automatically (Nahum-Shani et al., 2018; Wrzus & Mehl, 2015). This is important because the perceived source of feedback can affect recipients’ interpretation of provided feedback (Bannister, 1986). Effective deployment of feedback in a mobile app is relevant to a wide range of public interests, including medication adherence (Demonceau et al., 2013), healthy eating (Mummah et al., 2017), and, as examined in the present study, education.
Praise has a mixed track record, particularly in education. A principal distinction is that praise can either be directed at the person (e.g., “Good boy” or “Good girl”) or the task (e.g., “Good work” or “Good job”). Praise directed at the person has been shown to have potentially negative consequences for student learning, as it can be perceived as limiting a student’s autonomy, directing attention away from the task, or providing unwelcome sympathy following a substandard performance (Dweck, 1999; Kamins & Dweck, 1999; Mueller & Dweck, 1998; Skipper & Douglas, 2012).
Praise directed at task performance, on the other hand, can be beneficial for motivation, performance, and learning outcomes (Bareket-Bojmel et al., 2017; Deci et al., 1999; Hattie & Timperley, 2007; Hewett & Conway, 2016), particularly when it is not expected (Harackiewicz et al., 1984). And while the information value of feedback is believed to increase its efficacy, evidence suggests that even generic praise can be an effective motivator when the individual has difficulty evaluating their own performance (Ilgen et al., 1979). In this vein, praise may be an effective scalable intervention when directed specifically at behaviors and tasks that are perceived as being weakly or distantly related to a goal state.
These benefits align with recent research on homework adherence. The receipt of performance feedback on homework is associated with increased homework completion (Núñez et al., 2015); however, there is only a weak correlation between the amount of information in the feedback and homework completion (Xu, 2016). Considering that the completion of any single homework assignment may be difficult to evaluate for its benefits toward long-range academic success, one might hypothesize that praise deployed by a mobile device—as automatic, generic, positive, task-contingent feedback—may have positive effects on homework completion. If so, this synthetic praise might be viewed as roughly analogous to the motivational benefits of verbal encouragement from social robots in educational settings (Brown & Howard, 2014), or of points allocated to activities in gamified learning platforms (Subhash & Cudney, 2018). Such forms of feedback may generally signal that a task carries some unspecific benefit, even while the goal-state itself may be vague.
Despite the known efficacy of praise directed at task performance, it has also long been believed that any motivational benefit of positive feedback is fundamentally confounded by the task’s value to a goal, as feedback without any goal has no effect on motivation (Locke et al., 1968). It stands to reason, then, that effective positive feedback perhaps should highlight the utility of the antecedent task toward the accomplishment of a specific goal (e.g., “You’re closer to getting a good grade”), rather than merely providing generic praise. Such an intervention may more directly achieve feedback’s function of reducing the perceived gap between a behavior and a goal state (Carver & Scheier, 2012; Hattie & Timperley, 2007).
Emphasis on progress toward a goal has long been asserted as a critical element of task feedback in educational settings (National Academies of Sciences, Engineering, and Medicine, 2018). Interventions that highlight a task’s value represent a central theme of self-regulated learning approaches (Schraw, 1998), an effective strategy for improving student success (Harackiewicz & Priniski, 2018), and a common practice for improving motivation in educational (Soicher & Becker-Blease, 2020) and organizational (Latham & Locke, 1979) settings. Emphasis on a task’s value has also been shown to moderate the efficacy of performance feedback (Erez, 1977), even under conditions of generic and automated goal information (Earley et al., 1990; Seijts & Latham, 2001).
But emphasis on a task’s value toward a future goal is not always beneficial. For example, when a person has low interest in a task, highlighting the purpose of the task does not improve the efficacy of feedback (Frost & Mahoney, 1976). Additionally, if an intervention highlights a task’s utility to goals that are inauthentic to one’s personal goals or that are perceived as unattainable, the intervention can backfire (Canning et al., 2019; Canning & Harackiewicz, 2015; Durik et al., 2015). So despite the theoretical benefits of feedback focused on a task’s value, these benefits may be difficult to achieve in practice at scale, with automated interventions in mobile devices.
Here, we examine the benefits of automated feedback at improving a specific behavior: adherence to homework assignments in college. Specifically, we investigate the effects of these interventions on students’ assignment submission rates and course performance. If positive feedback has broad unspecific benefits, we should expect that both treatment conditions (Praise and Value interventions) will improve assignment submission rates and course performance relative to no treatment (Control). However, if automated positive feedback works by directing attention at task performance, we would expect that students receiving Praise messages will show evidence of improved outcomes; and contrariwise, if positive feedback works by directing attention to a goal state, we would expect improved outcomes among students receiving Value messages.
All study materials, data, and analysis scripts are publicly available on the Open Science Framework (OSF) at https://osf.io/8x34a/ (Motz, Canning, et al., 2021). The study protocol was approved by the Indiana University Institutional Review Board (IRB).
Instructors of 747 credit-bearing courses with published sites on the Canvas learning management system, distributed across eight Indiana University campuses, volunteered to allow their students to use the Boost app during the Spring 2019 semester. These courses had an average enrollment of approximately 50.9 students, as measured at the start of the semester. Among these, 1,766 unique students downloaded the app, activated it with their university credentials, and agreed to participate in the study. At the moment when they activated the app, participants were sequentially assigned to one of three conditions: Control (n = 588), Praise (n = 591), or Value (n = 587), see Procedure, below. On average, these students were enrolled in 1.49 of the 747 courses where the app was active, and the average number of app users in each course was 3.52 students; thus, roughly 7% of the enrolled students agreed to use the app, and to participate in and release their student data for this study. This sample included 86.3% undergraduate and 13.7% graduate courses. Among the undergraduate courses, 28.3% were 100-level, 24.8% were 200-level, 31.3% were 300-level, and 15.6% were 400-level. 31.0% of courses were offered from the Bloomington campus, 32.1% from the Indianapolis campus, and the remainder were roughly evenly distributed between six regional campuses. We did not collect demographic information about app users, nor did we request to collect this information from study participants.
Student participants self-selected to download a mobile app to help them keep up with online coursework, and while this sample should exhibit differences from the full population of all Indiana University students, it should be representative of the population who would opt-in for these kinds of automated student support services (Andrews et al., 2020; Motz, 2019). Condition assignment was at the individual level; participants received the same assigned treatment in each of their enrolled courses.
Students downloaded the free Boost app from either the Android or iOS app store. The app was built using the Expo framework, an open-source mobile development toolkit for React Native. See Figure 1, for screenshots.
We composed a set of messages that would be deployed to app users via push notification, shortly after they submitted an assignment on time (prior to the deadline). These messages either praised the student with a short exclamation (e.g., “Outstanding!” “Way to go!” or “Nice job!”), or highlighted the value of the assignment submission for degree completion (e.g., “You just got a little closer to a graduation party.” or “Hard work and success in college go hand in hand.”). Each message was followed by the general statement: “Canvas has received your recent assignment submission.” There were 20 unique messages in each condition (Praise and Value), deployed in a fixed sequence; for students who received all 20 messages prior to the end of the semester, the app would begin again at the first message in their assigned condition. The full text of all messages is available at https://osf.io/t9zx8/.
One week before the start of the Spring 2019 semester, we posted a dashboard announcement in Canvas to all users who had “Teacher” roles in at least one Canvas site at Indiana University (all campuses). This announcement appeared at the top of the dashboard page within Canvas, inviting teachers to opt-in (in a simple web form) for the mobile app to be made available to their students. We also advertised this opportunity in a variety of email distribution lists for instructional faculty. When a teacher opted-in, we enabled the app for all Canvas course sites where the user had the “Teacher” role (both undergraduate and graduate courses).
Teachers who opted-in were provided an IRB-approved recruitment script to recite to their students, and an IRB-approved recruitment email that they could send to their class, announcing the opportunity for students to pilot the app and take part in the present study. Additionally, we sent all students enrolled in these courses up to three IRB-approved recruitment emails (one each week, filtering those who had already activated the app), inviting them to participate. All recruitment messages are available on OSF.
Students joined the study by downloading the app on a mobile device, logging-in to the app with their university credentials, providing informed consent, and then agreeing to release their class data for the purposes of the current research study. When students first logged-in, they were sequentially assigned to one of three conditions: Control (n = 588), Praise (n = 591), or Value (n = 587). Sample sizes are uneven because some students declined to consent after being assigned to a treatment condition (and were thus not included). At the time of this study, all students had to provide informed consent and agree to release their student data in order to activate the app.
Once the app was activated, students could set their notification preferences for each course where the app was enabled, and the app would then operate in the background, deploying push notifications according to the students’ coursework and their notification preferences. On their preferences page within the app, students who were assigned to the Praise and Value groups saw a “Submission received” notification type, which could be enabled or disabled on a course-by-course basis. When enabled, students received occasional push notifications after submitting an assignment prior to the deadline. Students assigned to the Control group received no such notifications after submitting assignments, and the “Submission received” option was hidden within their preferences. Throughout the Spring 2019 semester, regardless of condition assignment for the present study, all app users received notifications about imminent deadlines, recent instructor announcements, and upcoming calendar events according to their preferences within the app (for additional details, see Motz, Mallon, et al., 2021).
To prevent students from receiving an excess of submission confirmation notifications (some students in this study had multiple assignments due each day), submission received notifications were delivered on a fixed-interval schedule, so that a student could only receive a maximum of one submission confirmation each week. Specifically, when the mobile app detected a new on-time submission for a qualifying assignment (graded with a deadline), it evaluated whether a submission notification had been deployed for that user within the previous week. If so, the new notification was withheld, but if there had not been a notification within the past week, the submission confirmation notification was deployed.
Students participated in their classes according to the normal conduct of each class, without deviation from normal learning activities or instructional practices. Teachers did not know which students were using the app, nor could teachers see app users’ condition assignments. The present study only manipulates the text of the “Submission received” notifications within the app (Praise or Value, see Materials, above), or withholds this notification type (Control).
Study data were extracted from our institution’s Canvas data warehouse, and from data within the app. For each course that was participating in this study, we identified all graded assignments that had a deadline and required a submission, and then for each of these assignments, we identified each enrolled app user’s submission status, assignment weight, whether they received a submission confirmation notification, the timestamp of the notification, whether they tapped on the notification, and their helpfulness rating (if provided) for the notification. We also identified each participant’s final percent score in Canvas, which was the best estimate of students’ cumulative performance on course assessments, taking into account instructors’ grading schemes, point values, and assignment weights.
Data were analyzed using Bayesian estimation methods. Bayesian estimation has many advantages over frequentist statistical inference (Kruschke & Liddell, 2018a, 2018b), including the ability to define an analytical model that is appropriate to the structure of the observed data, and this flexibility obviates frequentist statistical assumptions about the data (e.g., analytical models do not require normally distributed data). Considering that many of the variables under analysis in the present study are nonnormal (submission rates, percent scores on assignments), Bayesian methods are uniquely well suited to the present study because they accommodate more flexible analytical models. Additionally, unlike frequentist methods (e.g., analysis of variance, ANOVA) that tend to emphasize p values (Wasserstein et al., 2019), Bayesian estimation produces an informative posterior distribution of the most credible estimates of an experimental effect. A posterior distribution describes the relative credibility of different values for a parameter in an analytical model, such as the typical course scores for students in each treatment group. The posterior distribution can be summarized by its highest density interval (HDI), which describes both the tendency and uncertainty of a model parameter (analogous to but distinct from a confidence interval). An introductory overview of our analysis approach is available on OSF at https://osf.io/ea43f/.
Binary response variables (whether a student submits an assignment, whether a student taps on a notification), were modeled with a beta-binomial distribution, where the probability of a response in a binomial distribution (commonly, a “success”) is drawn from a beta distribution (beta distributions are bounded on the interval [0, 1]). To make the analysis more interpretable, the beta distribution’s parameters (α and β) were reparameterized into the mode (ω) and concentration (κ) of the beta’s probability distribution (Kruschke, 2014). Beta distributions were modeled separately for each treatment condition, with vague and diffuse priors: ω priors were uniform beta distributions (α = 1, β = 1), and κ priors were γ distributions (mode = 10, SD = 100), identical for each treatment condition. Because the mode of these probability distributions is known to be 1 (many students submit all of their assignments), the analyses will focus on the concentration parameter (κ), which provides a measure of how concentrated the estimated probability distributions are to the mode.
Students’ cumulative percent scores in each course enrollment were rounded to the nearest full percent score, rare scores over 100% were censored at 100% (to prevent deductive disclosure of participants from outliers in open data, and so that the maximum observed percentage was 100%) and were then converted to an integer in the range [0, 100], out of 100 possible. These values were then aggregated to the student level, by summing percentage points earned, and percentage points possible, over each of a student’s enrollments where the app was enabled. For example, if a student earned 75.41% in one course where the app was enabled, and 80.75% in another, this student’s total percentage points earned is 75 + 81 = 156 out of possible 200. These values were also analyzed with the same beta-binomial model, where the percentage points earned was represented as the number of successes, out of the total percentage points possible represented as the number of trials. The overall distribution of course percentages has roughly the same structure as students’ assignment submission rates (nonnormal, large negative skew, modal response near to 100%), and analyses of these outcome variables will similarly focus on the concentration parameter, κ. We assert that this approach (focusing our analyses on the data’s concentration around the mode, rather than the mean) is particularly reasonable considering that the mean represents a poor parameter estimate with highly skewed data (e.g., Arthurs et al., 2019).
Parameters in the analytical models described above were estimated using Markov Chain Monte Carlo (MCMC) in JAGS (Just Another Gibbs Sampler; Plummer, 2003), using runjags (Denwood, 2016) for R (R Core Team, 2020), and toolkits provided by Kruschke (2014). For each model, posterior distributions were estimated with 150,000 steps across 4 chains, thinned to every 10th step (15,000 saved steps), with 500 adaptation steps and 1,000 burn-in steps. The effective sample size (ESS) for all parameters of interest was no less than 15,000, well above the 10,000 recommended by Kruschke (2014). Additionally, to provide more interpretable accounts of the practical differences between experimental treatments, we also compare the proportions between treatment groups that are above a threshold where group differences are maximal, purely for descriptive purposes. All model specifications and analytical scripts are available on this study’s OSF project site at https://osf.io/8x34a/.
During the study period, participants were assigned an average of 29.9 assignments (SD = 22.4) in participating courses where the app was enabled that required a submission, had a deadline, and were graded for course credit (average of 21.5 assignments per course). Participants earned nonzero credit for an average of 25.8 such assignments (SD = 19.9) in these courses, corresponding to an average submission rate of 86.3%.
Among all graded assignments with deadlines, an average of 15.4 (SD = 18.0) were eligible to receive submission confirmation notifications because submissions for these assignments were made online within the Canvas learning management system (this excludes paper submissions, or submissions made in 3rd-party elearning platforms, where it was not possible for the notification system to identify, in real-time, when a student had made an assignment submission). Students had similar numbers of treatment-eligible assignments across the three conditions, Control (M = 15.6, SD = 17.9), Praise (M = 15.3; SD = 17.5), and Value (M = 15.4, SD = 18.8). On average, students made submissions to 12.3 (SD = 14.8) of these eligible assignments, corresponding to an average submission rate of 79.9% for assignments that were specifically eligible for triggering the feedback intervention.
When a student made an online submission in Canvas prior to the due date, depending on the student’s assigned treatment condition and their preferences within the app (students had the ability to temporarily or permanently mute notifications at their discretion), the app would occasionally deploy a push notification containing the manipulated feedback text (Praise or Value), and a confirmation of the student’s submission (“Canvas has received your recent assignment submission”). The frequency of these notifications was throttled to prevent students from receiving more than one notification in a single week, and so that feedback was not anticipated for every submission (Harackiewicz et al., 1984). Accordingly, excluding students in the Control group, the app delivered an average of 6.19 submission confirmation notifications (SD = 5.68) to each student during the study period. Students rarely tapped on the push notification (which would open the message within the app; the text of the notification could be viewed without opening the message in the app)—an average of 0.46 times (SD = 0.99) per user. However, students in the Value condition were more likely to tap on a submission confirmation notification (8.95%) than students in the Praise condition (5.97%), as reflected by a credibly smaller κ parameter in the Value condition (κ difference mode = −5.94; 95% HDI: −1.88 to −10.05), indicating more dispersion away from the modal 0% tap rate in the Value condition. Students may have been more likely to tap on notifications in the Value condition due to their length and complexity relative to simple exclamations of praise.
The arithmetic mean submission rate, including all graded assignments in participating courses that required a submission and had a deadline, is 87.6% for the Control group, 88.8% for the Praise group, and 87.4% for the Value group. This includes both treatment-eligible assignments (where students made submissions within Canvas, eligible to receive automated feedback in the mobile app; M = 15.4 per student) and other assignments (where students made submissions outside Canvas, e.g., paper submissions, submissions in 3rd-party apps, etc.; M = 4.7 per student). As such, this outcome provides a more comprehensive measure of the effect of the intervention on students’ adherence to class assignments in general. Figure 2 shows these assignment submission rates split by the three different conditions, with semitransparent blue lines highlighting the arithmetic means.
As is evident from the left panel of Figure 2, submission rates tend toward 100%. Accordingly, the beta distributions used in our analysis model estimate the mode (ω) as 1.0 for all three groups (with identical 95% HDIs for the three groups: 0.997–1.0), and thus, the current analysis focuses on the concentration parameter (κ), which describes how closely the individual data tend to cluster around the mode of 1; larger κ values indicate more concentration. The posterior distribution of κ estimates has a mode of 6.77 for the Control group (95% HDI: 6.21–7.30), 7.70 for the Praise group (95% HDI: 7.07–8.35), and 6.75 for the Value group (95% HDI: 6.18–7.27). The Praise group’s κ estimates are reliably larger than the Control group (difference mode = 1.0; 95% HDI: 0.15–1.81) and larger than the Value group (difference mode = 0.97; 95% HDI: 0.14–1.79). There was no difference between Value and Control (difference mode = −0.09; 95% HDI: −0.79 to 0.75). The Praise condition’s higher concentration of submission rates near to the modal value of 1 is evident in the dot plot in Figure 2, where the Praise group has a visibly higher density of observations above the 75% gridline than either of the other two groups.
While the estimated concentration parameter (κ) of a beta distribution is useful for statistical contrasts with the current data, these values can be difficult to conceptualize in practical terms. Moreover, considering that these data are highly skewed toward 100%, contrasting mean submission rates between conditions (e.g., 1.2% higher submission rates in the Praise group compared with the Control group) also fails to provide an intuitive understanding of the magnitude of group differences. To provide a more straightforward account of these differences, purely for descriptive purposes, we identified the submission rate such that the proportion of students above this submission rate was maximally different between treatment groups. This submission rate was 77%; 81.6% of students in the Control group had a submission rate above 77%, 86.7% of students in the Praise group had a submission rate above 77%, and 81.4% of students in the Value group had a submission rate above 77%. The difference of proportions between the Praise group and the other groups is significant by conventional standards, χ2(1, 1762) = 7.2, p = .007, two-tailed. We can infer that the Praise treatment caused an incremental 5% increase in the number of students with submission rates above 77%.
Submitting assigned coursework is generally a good thing for students to do. But an increase in assignment submission rates may not entail measurable improvements in performance outcomes, for example, if students are submitting additional assignments that have scant value to the course objectives, or if students’ goal orientation is to merely submit classwork and not to do well or learn from it. For this reason, we also evaluate students’ cumulative course performance, estimated in the learning management system as the final course percent score, combining all grades as entered into the Canvas gradebook, according to the instructors’ point systems, weights, and grading rules. While this is not necessarily an official grade, nor does it provide an objective measure of student learning, it nevertheless provides a more granular estimate of a student’s performance than the officially recorded letter grades, which are coarse and ordinal, and provides a more general measure of student performance beyond assignment submission rates.
In courses where the app was enabled, students’ estimated cumulative percent score had an arithmetic mean of 82.9% (SD = 19.1). Figure 3 shows these scores split by the three conditions. The average among students in the Control group was 82.5% (SD = 19.9), the Praise group was 83.7% (SD = 17.7), and the Value group was 82.5% (SD = 19.8). Again, the beta distributions used in our analysis model estimate the mode (ω) as 100% for all three groups (Control and Value 95% HDIs: .993–1.00; Praise 95% HDI: .991–1.00), so we focus on the concentration parameter (κ). The posterior distribution of κ estimates has a mode of 4.82 (95% HDI: 4.50–5.16) for the Control group, 5.48 (95% HDI: 5.10–5.88) for the Praise group, and 4.83 (95% HDI: 4.53–5.19) for the Value group. As shown by the posterior distributions of pairwise group differences in Figure 3, the Praise group’s κ estimates were credibly larger than the Control group’s (difference mode = 0.72; 95% HDI: 0.15–1.18) and the Value group (difference mode = 0.64; 95% HDI: 0.12–1.15), and there was no difference between the Value and Control groups (difference mode = 0.045; 95% HDI: −0.46 to 0.48). The Praise condition’s increased concentration relative to the other conditions is visibly apparent in the higher density of dots just above 60% in the left panel of Figure 3.
As with our analysis of submission rates, considering that the practical magnitude of κ values is difficult to interpret, we conducted the same descriptive analysis, this time examining how the proportion of students above a given grade threshold differed between conditions. The estimated cumulative grade where these group differences were maximal was 59% (which, coincidentally, is the typical threshold between passing and failing grades); 90.3% of students in the Control group had a grade above 59%, 93.3% of students in the Praise group had a grade above 59%, and 90.4% of students in the Value group had a grade above 59%. The higher proportion of students above 59% in the Praise group is significant by conventional standards, χ2(1, 1762) = 4.02, p = .045, two-tailed, and reflects a roughly 3% increase in the number of students receiving cumulative final scores above 59%.
Contemporary theory suggests that task feedback works either by drawing attention to one’s motivations for performing a behavior (Kluger & DeNisi, 1996) or reducing the perceived gap between a behavior and a goal state (Hattie & Timperley, 2007). Accordingly, a positive feedback intervention might praise the individual for having performed a task, or might highlight the value of a task for having incrementally advanced toward a goal state. However, no past studies have directly compared these two approaches in a field experiment. In an education setting, the present study finds that simple generic automated exclamations of praise improved assignment submission rates and course grades among college students, compared with no treatment, and compared with messages that emphasized the value of assignment submission toward degree completion. We fail to find measurable advantages to highlighting a task’s value at scale, and instead, we find advantages when automated praise highlights unspecified benefits of task completion. This difference was not merely theoretical—automated praise had measurable benefits for outcomes in practice, and resulted in an estimated 3% increase in the percent of students receiving a passing grade in college courses.
Perhaps due to their simplicity, immediacy, and clear emphasis on a task, short exclamations of praise when students submitted their assignments on time caused measurable increases in assignment submission rates and course grades. This extends previous research demonstrating the efficacy of positive task feedback, showing here that it is effective when automated in mobile devices at an average frequency of once every 2 or 3 weeks. This may be particularly relevant in the current domain of homework adherence. Past research has suggested that praise can be an effective motivator when the individual has difficulty evaluating their own performance (Ilgen et al., 1979). Considering that submitted work may not be immediately graded, and that time invested on homework is not necessarily directly related to achievement (Trautwein, 2007), some students may have difficulty judging the quality of their own effort. By providing an occasional signal praising these efforts, the intervention audience may be better motivated to perform this task in the future, and to incrementally develop improved habits (Stawarz et al., 2015). Such an effect would be expected to be applicable to other domains where people may be unable to perceive practical benefits of individual discrete tasks, such as medication adherence (Brown & Bussell, 2011) and exercise adherence (Schutzer & Graves, 2004).
We suggest that automated praise works by directing the recipient’s attention to the completion of an antecedent task that might otherwise be difficult to judge for its value. Signaling a task’s relevance is among the “first principles” for improving student motivation (Keller, 2008; perhaps standing preeminent among them, Means et al., 1997), and is a common component of motivational interventions in education (Lazowski & Hulleman, 2016; Li & Keller, 2018). While we did not measure student motivation or students’ motivational orientations in the present study, other technology-based interventions have demonstrated the benefits of personalized task encouragement on learner motivation (e.g., Huett et al., 2008). In an automated feedback intervention, praise may operate on a similar pathway whereas emphasizing value alone may not draw attention to task performance.
Nevertheless, past research had shown that broader interventions (not feedback, specifically) highlighting a task’s value toward a goal are beneficial, so why is this not observed in the present study? A simple explanation might be that the value statements used in the present study (e.g., “You just got a little closer to a graduation party”) had low fidelity to the target audience’s personal goals, were otherwise poorly written, or failed to attract students’ attention. However, we observed that students were more likely to tap on push notifications containing these value statements, and that they provided high ratings to these messages (82% 5-out-of-5 stars) that were no different from praise messages (for additional analysis, see https://osf.io/x45bq/), which would be unlikely if these treatments were simply inapplicable or low quality.
Instead, we believe that value interventions rendered within automated feedback suffer from at least three challenges. First, attainable proximal goals are effective messaging targets for improving task performance (Latham & Seijts, 1999; Stock & Cervone, 1990), but when deployed at scale without individually personalized messages, feedback would necessarily refer to generic goals that apply broadly (across the full target population, and across the full range of eliciting tasks), and that are perhaps distal from the feedback-eliciting task. The perceived gap between a discrete task (such as submitting a practice quiz) and a life goal (such as graduation) may be so wide that emphasis on the goal is too distant to be motivationally relevant (Kauffman & Husman, 2004), to be seen as personally applicable (Canning & Harackiewicz, 2015), or to be perceived as accomplishable (Oettingen, 2012). Second, the relationship between task completion and a future goal state is fundamentally more complicated to convey than simple praise for task completion, as the former aims to connect two concepts (task + goal) whereas the latter highlights only one concept (task). Even when it is informative and directly relevant, more complexity in feedback will have diminishing returns on the effectiveness of the feedback (Kulhavy et al., 1985). Indeed, one might dismiss the present pattern of results simply by asserting that shorter messages are more effective than longer messages in this medium (see also the concept of saturation; Song & Keller, 2001), and yet, this interpretation would still point toward benefits of praise statements and difficulties of value statements in automated feedback. And third, positive feedback suggesting that a goal state has moved incrementally closer may inadvertently be perceived as suggesting that the individual need not change their behavior. Without directly considering the relationship between a behavior and a goal state, learners do not spontaneously adopt new strategies (Pressley et al., 1984), and more detailed instruction on effective strategy use is often favored (Hattie et al., 1996).
Just as value interventions’ requisite nuance may be lost in translation to scripted mobile push notifications at scale, the current findings should not be overgeneralized to imply that praise messages are universally superior for all students in all contexts. Interventions that engage an individual in reflection on a behavior’s value to one’s own personal goals may still represent an effective strategy, as has been observed in the literature (e.g., Harackiewicz & Priniski, 2018). But evidently these benefits do not translate to notifications that are automatically deployed en masse by a mobile app. Of course, as mentioned above, it might be possible to identify an individual’s own unique goals for performing tasks through prompting and personalization, and to spotlight these goals these when delivering positive feedback. However, the hypothetical benefits of such an approach, if observed, would be entangled with combined influences of goal-setting, planning, and personalization, and these contingencies may present difficulties for scaling an intervention’s benefits to a full population in a mobile app.
When triggered by a student’s behavior, we assert that scripted praise presented on a mobile device is a form of task feedback, but the word “feedback” clearly has a broader meaning. In education, feedback also refers to the presentation of information that is contingent on the substance of a student’s response to a prompt or assignment, such as validation of whether a response was correct, what the intended response should have been, the quality of the provided response, an evaluation of the perceived effort invested in the response, and so on. The interventions tested in the present study are agnostic to the contents of student work, but these other forms of feedback in education, including grades and instructors’ personalized comments, commonly hinge on how a student responds rather than whether they respond. Past research has not emphasized this particular distinction (for reviews, see Evans, 2013; Shute, 2008; on praise specifically, see Willingham, 2005), however, education research has clearly observed that praise given during performance evaluations, particularly with older students, can be interpreted as a judgment of low ability (Meyer, 1992), a signal that additional effort would not yield improved performance. For this reason, we explicitly caution against generalizing the present study’s inferences to positive feedback administered during an assessment of student performance.
Regarding the present study’s specificity, it is also interesting to consider how broader theory in education should inform our interpretation of the current results, and the mechanisms of automated praise’s influence on student behavior. For example, we have suggested that praise operates by drawing students’ attention to their successful completion of the antecedent task. It is possible that this may affect how a student feels, perhaps eliciting a prideful emotional response, more so than the value statements that were somewhat frank by comparison. Emotions clearly interact with academic effort and performance (Pekrun et al., 2002; Pekrun & Linnenbrink-Garcia, 2012), and pride is a known motivator in education settings (e.g., Webster et al., 2003; Williams & DeSteno, 2008). We did not assess students’ affective response to push notifications, but we consider this to be a promising candidate explanation and an important direction for future research, particularly considering the potentially negative effects of mobile alerts on feelings of stress and anxiety (Kushlev et al., 2016). Such measures might also help determine the scope of impact of automated feedback, such as whether it is specific to a particular task or generalizes across time and context.
The slim bandwidth of push notification text, which very few students tapped on, presents both a practical constraint to the design of mobile interventions, and also a theoretical constraint to the generalizability of inferences drawn from field tests such as the present study. Praise that draws attention to the feedback-eliciting behavior is necessary and sufficient for influencing behavior and performance at population scales, but these may not be effective strategies at the scale of an individual classroom, for example, where a teacher can provide more hands-on support. What works is surely context dependent, and this context dependency should motivate researchers to test behavioral science theory at different scales and with different implementation strategies.
However, the scale does not necessarily come at the cost of sensitivity to context. When behavioral interventions are implemented on personal mobile devices, it becomes technically feasible to personalize these interventions to individual users, their identities, and their situations. Here, we see a tradeoff between the possible benefits of personalized services and the possible risks of personal data collection. In the present study, in an abundance of caution with a newly developed tool, we collected no personal information about student app users, nor did we request permission from app users to annotate their data with personal information from university systems (other than what was necessary to measure target outcomes). Whereas we aimed to test a coarse theoretical contrast in a broad sample, future research might test the generalizability of the practical benefits of behavioral interventions to different subpopulations and settings (de Leeuw et al., in press).
One way to be sensitive to this variability, from a purely analytical perspective, is to consider summary measures other than the mean of a sample when measuring intervention effects at a population scale. In the present study, it is visibly evident (in the left panels of Figure 2 and Figure 3) that students’ assignment submission rates and course scores are generally very high, tending toward the ceiling. It might be unreasonable to expect an intervention to significantly affect the mean tendency of such a sample, given that so many individuals have such little room for improvement (Šimkovic & Träuble, 2019). Instead, we estimated a concentration parameter (κ) and found credible differences in the data’s concentration toward the ceiling, with correspondingly meaningful effects in the study sample, despite minimal change in the arithmetic mean. Setting aside any concerns related to normality assumptions of statistical tests, it is important for researchers to choose estimators that are meaningful and sensitive (Lumley et al., 2002). Particularly when deployed to large and diverse samples, statistical analyses should consider that the benefits of an intervention may not apply equally across the full sample.
When field studies on automated feedback interventions are carried out, the research commonly favors testing an intervention’s global efficacy rather than testing theory (Orji & Moffatt, 2018; Patrick et al., 2016). On the other hand, research that directly tests theory on feedback is frequently carried out with contrived tasks in artificial contexts that may have limited relevance to practice (Gerhart & Fang, 2014). The current field experiment addresses both shortcomings, demonstrating theoretically relevant differences in a practical automated feedback application at scale. While this study has clear limitations (notably, we did not collect personal data on individual participants, and the intervention messages were not personalized to the user or their unique context) the results nevertheless highlight potentially fruitful directions for future research (e.g., measuring why, where, and for whom, automated praise is an effective motivator) and practice.
Supplemental materials: https://doi.org/10.1037/tmb0000042.supp