Volume 5, Issue 1. DOI: 10.1037/tmb0000124
Recent years have seen intense research, media, and policy debate on whether amount of time spent playing video games (“playtime”) affects players’ well-being. Existing research has used cross-sectional designs with easy-to-obtain but unreliable self-report measures of playtime or, in rare instances, obtained industry data on objectively tracked playtime but only for individual games, not a player’s total playtime across games. Further, researchers have raised concerns that publication bias and a lack of differentiation between exploratory and confirmatory research have undermined the credibility of the evidence base. As a result, we still do not know whether well-being affects playtime, playtime affects well-being, both, or neither. To track people’s playtime across multiple games, we developed a method to log playtime on the Xbox platform. In a 12-week, six-wave panel study of adult U.S./U.K. Xbox-predominant players (414 players, 2036 completed surveys), we investigated within-person temporal relations between objectively measured playtime and well-being. Across multiple preregistered model specifications, we found that the within-person prospective relationships between playtime and well-being, or vice versa, were not practically significant—even the largest associations were unlikely to register a perceptible impact on a player’s well-being. These results support the growing body of evidence that playtime is not the primary factor in the relationship between gaming and mental health for the majority of players and that research focus should be on the context and quality of gameplay instead.
Keywords: video games, well-being, longitudinal, objective measures, digital trace data
Funding: This work was part-funded by the Wellcome Trust [Ref: 204829] through the Centre for Future Health (CFH) at the University of York awarded to Sebastian Deterding. Further support came from the Digital Creativity Labs, jointly funded by Engineering and Physical Sciences Research Council (EPSRC)/Arts and Humanities Research Council (AHRC)/ InnovateUK [EP/M023265/1]; the EPSRC Centre for Doctoral Training in Intelligent Games and Games Intelligence (IGGI) [EP/S022325/1] who funded Nick Ballou and offered additional study funding; and the EPSRC/ AHRC Centre for Doctoral Training in Media and Arts Technology (EP/L01632X/1). Craig J. R. Sewall received research support from the National Institute of Mental Health (T32 MH18951). The funders have no role in study design, data collection and analysis, decision to publish, or preparation of the article.
Disclosures: Sebastian Deterding has been employed part-time as an Amazon Scholar with Amazon Services U.K. Ltd. during parts of the study period; he declares that the presented work does not relate in any capacity to his position or work at Amazon. David Zendle has worked as a paid consultant for the Federal Trade Commission, the Australian government, and the Omidyar Research Network with reference to gathering knowledge regarding video gaming and/or gambling. This work was not related to the work undertaken within this article, and these bodies were in no way involved in the work reported in this article.
Author Contributions: Nick Ballou contributed to conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, software, visualization, writing–original draft, and writing–review and editing. Craig J. R. Sewall contributed to conceptualization, formal analysis, methodology, validation, and writing–review and editing. Jack Ratcliffe contributed to software and writing–review and editing. David Zendle contributed to formal analysis, supervision, and writing–review and editing. Laurissa Tokarchuk contributed to supervision and writing–review and editing. Sebastian Deterding contributed to conceptualization, funding acquisition, methodology, supervision, and writing–review and editing.
Data Availability: All data, code, and materials associated with this project are available on the Open Science Framework at https://osf.io/edtwn; Ballou et al., 2023.
Open Science Disclosures: The data are available at https://doi.org/10.17605/OSF.IO/EDTWN. The experimental materials are available at https://doi.org/10.17605/ OSF.IO/EDTWN. The preregistered design and analysis plan (transparent changes notation) are accessible at https://doi.org/10.17605/OSF.IO/EDTWN.
Correspondence concerning this article should be addressed to Sebastian Deterding, Dyson School of Design Engineering, Imperial College London, Imperial College Road, London, SW7 9EG, United Kingdom. Email: [email protected]
With the rise of digital gaming as a dominant leisure activity for billions around the globe, media, parents, policymakers, and scholars have intensely debated the impact of time spent playing video games (“playtime”) on players’ well-being. Multiple governments have considered or enforced limits on people’s playtime out of concern for negative impacts that video games may have on well-being (Király et al., 2017). These concerns only intensified since the World Health Organization’s inclusion of “gaming disorder” in the 11th edition of the International Classification of Diseases manual in 2018 (World Health Organization, 2018) and in the wake of increases in average playtime during the COVID-19 pandemic (Vuorre, Zendle, et al., 2021).
Meanwhile, research on the links between playtime and well-being paints a decidedly mixed picture. On the one hand, several studies report that playtime is negatively associated with well-being (Boxer et al., 2015; Burke & Lucier-Greer, 2021; Wenzel et al., 2009), with some noting that this may be mediated by problematic or “disordered” play (Ballou & Zendle, 2022). Others find associations only for certain ages or well-being constructs such as anxiety (Loton et al., 2016) or small and nuanced negative associations among highly engaged players (Allahverdipour et al., 2010; Colder Carras et al., 2017; Przybylski & Weinstein, 2017). Yet others have found null or negligible associations between playtime and depression, academic achievement, affective well-being, and more (Brunborg et al., 2014; Johannes, Vuorre, et al., 2021; Vuorre, Johannes, et al., 2021).
On the other hand, there is also evidence that playing video games can be beneficial for players: Gaming is successfully used as a strategy to recover from or cope with day-to-day stressors (Iacovides & Mekler, 2019; A. Tyack et al., 2020), to compensate for lacking or thwarted opportunities to satisfy basic psychological needs in everyday life (Ballou et al., 2022; Formosa et al., 2022), or even to contribute to personal psychological growth and increased resilience (Daneels et al., 2021; Reinecke & Rieger, 2021).
While much of the variation in observed associations between playtime and well-being may be due to differences in demographics, contexts, and experiential quality of gaming (e.g., Allen & Anderson, 2018; Koban et al., 2022; Mandryk et al., 2020), we have good reason to believe a substantial portion is methodological—namely, due to data quality issues. One key shortcoming has been that most research to date has used self-report measures of playtime. Evidence suggests that self-report estimates are poor proxies for objective playtime (Johannes et al., 2020; Kahn et al., 2014), consistent with meta-analytic results showing that self-report estimates of other types of technology use, such as social media and phone use, are only weakly correlated with logged data (Parry et al., 2021).
Game telemetry—which describes behavioral data automatically logged by gaming devices, services, or individual games—offers one solution to this problem. This automatically logged data can provide a more accurate objective measure of playtime (and thus, its relation to well-being), yet it is usually only accessible to the games’ developers, publishers, or distribution platforms, who have been hesitant to share it with researchers (Seif El-Nasr et al., 2013). Thus far, only a few studies have investigated links between such objective playtime measures and well-being (Billieux et al., 2013; Brühlmann et al., 2020; Johannes, Vuorre, et al., 2021; Vuorre, Johannes, et al., 2021). One prominent study found a significant but likely negligible positive correlation between well-being and objective playtime in Animal Crossing: New Horizons and Plants Versus Zombies: Battle for Neighborville (Johannes, Vuorre, et al., 2021). An expanded follow-up study measuring playtime and well-being for seven games and three time points over 6 weeks found no meaningful within- or between-person relationship between playtime and well-being (Vuorre, Johannes, et al., 2021).
While improving over previous designs, these studies have their own limitations. They by necessity relied on either data that are made publicly available, or on data-sharing agreements with industry partners. In both cases, this restricts analysis to the small number of games for which companies have made data accessible—and importantly, to data on players’ playtime for an individual game. Although little is known about how varied gaming “diets” are (Orben, 2022), a large-scale database of Steam users indicates that many players log time on multiple games in a given 2-week period (O’Neill et al., 2016). This is corroborated in community posts, where players discuss regularly switching between games over the course of a day or week based on mood, available time, and social context (see, e.g., u/LyzbietCorwi, 2017). Furthermore, any given game for which we have data (such as Animal Crossing) may not be the predominantly played game for any given player. Thus, playtime in particular games may not tell us much about overall playtime of a particular player—and hence, how overall playtime affects well-being.
In addition to issues with self-reports, the majority of the literature (including the studies referenced above, with the notable exception of Vuorre, Johannes, et al., 2021) has been cross-sectional, and thus requires significantly stronger assumptions to support causal inferences (Rohrer & Murayama, 2021). Here, it is vital to distinguish among possible causal explanations for observed correlations between playtime and well-being: players who play more might experience changes in well-being as a result, but so too might players who are feeling poorly seek out games as a coping mechanism, leading to increased playtime (e.g., Iacovides & Mekler, 2019; A. Tyack et al., 2020).
Further, these effects, if they exist, may operate on a range of time scales, with no strong guidance by prior theory or evidence on which time scales to expect (and test) effects of well-being on playtime or vice versa. Some studies assess short-term momentary effects (e.g., positive and negative affect; Petralito et al., 2017; Przybylski et al., 2014), while others ask about feelings over the previous week, 2 weeks, or longer (Allen & Anderson, 2018; Brunborg et al., 2014). This issue is compounded by a structural limitation of cross-sectional self-report playtime data: players can only self-report playtime retrospectively, and thus studies without multiple time points are unable to investigate any effect well-being might have on subsequent playtime, focusing instead on the effects of playtime on well-being.
In investigating any such links, recent research has emphasized the importance of distinguishing different aspects of well-being. Associations between technology use and mental health can differ significantly for different well-being constructs and measures (Ballou & Zendle, 2022; Orben & Przybylski, 2019), highlighting the importance of considering multiple facets of mental health. One area of contrast in the literature concerns measures of positive well-being (e.g., life satisfaction or general psychological health) versus negative well-being (ill-being, e.g., depression; Brunborg et al., 2014; Loton et al., 2016): The absence of flourishing may not have the same relationship with gaming as the presence of psychopathological symptoms (Vella & Johnson, 2012).
In short, the current evidence base for the existence of positive and negative effects is not just mixed but also limited and complicated by methodological issues, namely, reliance on self-reported playtime, cross-sectional designs that do not allow to causal tests of any found correlation, and a wide range of possible specifications in timescales and well-being constructs. Although research in the field is trending toward more rigorous studies using longitudinal designs and logged play data, conflicting results indicate that debates around video game effects are far from over (Kowert & Quandt, 2021). To advance our understanding of the temporal relations between playtime and well-being, we need better data—comprehensive objective playtime data across multiple games, and differentiated well-being data, tracked over time at a within-person level.
To address these issues, we use a panel study design to collect data on objective playtime alongside mental well-being in adult Xbox players in the United States and United Kingdom at six time points over 12 weeks. This study design differs from prior work in four main regards: first, rather than relying on potentially limited data-sharing agreements with individual industrial partners, we collect data on any and all games played on a gaming platform (the Xbox Network), sourced directly from players. Second, we focus sampling on players whose gaming activity occurs exclusively or near-exclusively on this platform, allowing us to obtain better approximations of a player’s total playtime. Third, we track data over a longer time scale (12 weeks), enabling the comparison of short- and long-term effects. Fourth, we investigate both the effect of playtime on well-being and well-being on playtime.
We operationalize subjective well-being with three distinct measures operating on different time scales to cover a broad range of potential gaming-related effects. We include a short-term construct (positive and negative affect in the moment), a medium-term ill-being construct (depressive symptoms during the previous week), and a longer term positive well-being construct (general mental well-being during the previous 2 weeks). We expect any causal relationships between gaming and well-being to be on approximately matching timescales (e.g., well-being over the course of 2 weeks should be affected by gaming over the course of the 2 weeks’ prior). In the case of positive affect, which is a state measure addressing feelings in the moment, we expect this to influence and be influenced by playtime over the course of the preceding or following day.
Given that higher quality evidence in the current literature converges on smaller effects in limited subpopulations, we hypothesize the absence of practically significant effects in either direction. By “practically significant,” we refer to effects large enough for players to describe them as having a meaningful impact on their mental well-being. Our approach is therefore ultimately theory-agnostic, as theories tend to predict specific effects, not a lack thereof; instead, our goal is to provide data that constrains which effects occur, at what timescale, and which direction of causality (if any) holds, which would then serve as explananda for future theory.
To test for the absence of such practically significant effects of playtime on well-being, we use equivalence tests (Schuirmann, 1987) with the following conservative smallest effect sizes of interest (see the Method section): Each additional hour of daily play leading to a .06-scale point change in well-being on a 1–5 scale.
We therefore predict the absence of a practically significant within-person effect of:
Hypothesis 1a: Playtime during the last 24 hr on subsequent positive affect
Hypothesis 1b: Playtime during the last 7 days on subsequent depressive symptoms
Hypothesis 1c: Playtime during the last 14 days on subsequent general mental well-being
We similarly predict the absence of effects in the opposite direction. Using a smallest of effect size of interest of a 1 scale point change in well-being leading to a 16% change in daily playtime (see the Method section), we predict the absence of a practically significant within-person effect of:
Hypothesis 2a: Positive affect on playtime during the following 24 hr
Hypothesis 2b: Depressive symptoms on playtime during the following 7 days
Hypothesis 2c: General mental well-being on playtime during the following 14 days
Before proceeding, we would like to draw the reader’s attention to the fact that our research interest, and by extension our hypotheses, are causal in nature. However, as we are not able to randomize people to spend more or less time playing video games, interpretation of the statistical parameters we estimate below as causal relies on assumptions: specifically, that there are no time-varying confounders, no selection bias, and the correct time lag.
First, an unknown time-varying confounder (e.g., a change in disposable income) might increase well-being now and playtime later, creating the appearance of a positive relationship. We are unable to control for all variables that might create spurious relations between playtime and well-being or obscure true relations, due in large part to a lack of theory identifying such confounders (see Vuorre et al., 2022). We included an exploratory open-ended question asking players to report any events they felt affected both their well-being and their play to inform future research.
Second, there is potential for self-selection bias, wherein well-being and playtime together impact the likelihood of study participation and/or attrition. For example, if people who are feeling guilty about their high playtime are more likely to sign up for a study, this would bias our results toward a negative effect. Relatedly, if participants who later feel poorly and play more games than usual tend not to complete questionnaires, their attrition would mask a true negative effect. Self-selection is closely related to the potentially generalizability of our results, which we return to in the discussion.
Finally, building upon previous work that investigated one potential time lag (Vuorre et al., 2022), we selected three potential lags of 1 day, 1 week, and 2 weeks, each corresponding roughly symmetrically to the scope of one of our well-being variables. However, any actual effects may be too short-lived to be detected with our design or accumulate over longer periods of time (Dormann & Griffin, 2015). Thus, the potential causal effects discussed in this article refer to an effect carried over the specified time scale (e.g., the previous 1 week of play on the subsequent 1 week of depressive symptoms).
Throughout the rest of the article, we use primarily correlational language when describing our results to reflect the likelihood that some or all of our assumptions do not hold. However, we believe these estimates nonetheless offer some information about potential causal effects and return to this interpretation in the discussion.
We conducted a 12-week study, during which Xbox play was tracked continuously and linked with six biweekly survey waves. Participant recruitment began on February 8, 2023, and data collection ended on May 23, 2023. To participate, players were required to be (a) U.S./U.K. residents, (b) at least 18 years old, and (c) active video game players playing (nearly) exclusively on Xbox, defined as playing at least 1 hr of games per week, of which at least 75% take place on any Xbox console (Xbox 360/Xbox One/Xbox Series S|X).
To reach players, we used a combination of (a) paid advertisements on Reddit, targeting Xbox- and gaming-related subreddits (n = 260), (b) convenience and snowball sampling via the research team’s Twitter accounts and university mailing lists (n = 38), and (c) Prolific screening questionnaires (n = 116). We selected Reddit as the platform for advertising because Reddit is home to a large segment of our population (moderately to highly engaged adult Xbox users) and has been found to yield comparable data quality to other commonly used participant pools, such as MTurk and undergraduate students (Jamnik & Lane, 2019; Luong & Lomanowska, 2022).
The study design is summarized in Figure 1. At each time point, players completed a survey in Qualtrics. At Time 1, players completed baseline demographic and well-being measures and provided access to their playtime data by adding researcher accounts as friends on Xbox network. At each subsequent time point, they completed a 9-min survey with the well-being measures specified below, with the order of both blocks and items randomized. Surveys at Times 2–6 were distributed via email in 2-week intervals based on when the player joined the study. Reminders were sent after 24 and 48 hr.
The design was informed by an abbreviated three-wave pilot (n = 37) to test the stability of the Xbox trackers, payment system, survey design, and retention rate. It was not designed to estimate power; hence, we did not use any observed effect sizes to inform our power analyses (Albers & Lakens, 2018). Full pilot details are available in the Supplemental Materials (https://osf.io/edtwn; Ballou et al., 2023).
Ethical approval was granted by Queen Mary University of London Ethics of Research Committee (No. 20.383). All players provided informed consent prior to beginning the study. Players were paid in Amazon gift cards, £3.00 (or equivalent in USD) for the Wave 1 survey, £1.50 for each subsequent wave, and a £5 bonus for completing all six waves, for a maximum total of £15.50.
To determine the absence of practically significant effects using equivalence testing (Schuirmann, 1987), we first need to determine a smallest effect size of interest (SESOI). By practically significant, we mean the smallest degree of change in well-being that a player would consider noticeable or minimally important.
For Hypotheses 1a–c, our SESOI was a .06 scale point change in a well-being measure per hour of play, which we derived from previous estimates of practically significant change in the measures we used. One study found that a practically significant within-person change in Patient-Reported Outcomes Measurement Information System (PROMIS) depression measure was 3–4 points, which when rescaled to 1–5 equates to .38 scale points (Kroenke et al., 2020). This aligned with estimates for the Warwick-Edinburgh Mental Wellbeing Scale (WEMWBS; Maheswaran et al., 2012) and Positive and Negative Affect Schedule (PANAS) positive affect subscale (Anvari & Lakens, 2021), both approximately .3 scale points. Next, we need to determine how large a change in playtime should predict a change in well-being of that magnitude. Here, we based our estimate on the average amount of daily leisure time available to U.S. and U.K. adults, approximately 5 hr (Office for National Statistics, 2017; Sturm & Cohen, 2019). As a highly conservative threshold, we set that a 5-hr change in playtime should predict at least a .3-scale point difference in well-being; effects smaller than this indicate that the average person does not have enough time in the day to modulate their play to an extent that it would meaningfully affect their well-being. The final equivalence bounds for Hypotheses 1a–c are therefore .3-scale point/5 hr = .06-scale point change per additional hour of daily playtime.
For Hypotheses 2a–c, we specified our SESOI as a change in playtime of 16% per scale point change in well-being. A 16% change equates to a 20-min change in playtime for the median player in our sample (mean playtime = 2.1 hr); 20 min corresponds to the shortest amount of time U.K. adults report devoting to one continuous activity (e.g., cooking, online shopping, or socializing with household members; Payne, 2018). Changes in playtime therefore need to be at least this large to potentially displace or make space for another activity. We anchored this to 1-point well-being change for interpretability and to roughly align with the well-being measure in terms of standardized effect sizes (and thereby statistical power). We thus aimed to establish the absence of effects equal to or larger than a 1-scale point change in well-being leading to a 16% change in playtime.
We determined our intended sample size and time points via simulation based on the above SESOIs. Assuming a true null effect (Lakens, 2017), simulation results showed that with 400 subjects and six time points, we have >95% power to declare equivalence within our specified SESOIs. Power simulation details are available in the Supplemental Materials.
A total of 414 adult U.S./U.K. Xbox users completed the Time 1 survey and successfully linked their Xbox account. Demographic information is available in Table 1.
Sample Characteristics | |||||||||
Country | n | Age (SD) | Gender | Employment | |||||
---|---|---|---|---|---|---|---|---|---|
Men | Women | Preferred to specify | Full time | Part Time | Student | Other | |||
United States | 170 | 33.0 (8.4) | 127 | 35 | 8 | 81 | 21 | 12 | 56 |
United Kingdom | 244 | 31.1 (8.2) | 201 | 33 | 10 | 156 | 18 | 16 | 54 |
We received a total of 2,036 survey responses across the six waves (82% response rate). Missing responses were more likely to come from younger players (p < .001), but did not differ across well-being, gender, or playtime (p > .15). As preregistered, we excluded all survey waves from 33 players who did not log any time on Xbox for at least 4 weeks, indicating that they are not active Xbox players, and from two players who self-reported at Time 6 that their data should not be included (see questionnaire items). Of 1,894 remaining responses, 117 were excluded due to potential carelessness, as indicated either by implausibly fast survey completion or preregistered item-by-item variability indices (calculated using the R package careless; Yentes & Wilhelm, 2021; see analysis code). We were therefore left with 1,777 eligible responses from 379 players and 497 missing or careless responses to be imputed.
Missing data were imputed using multiple imputation (Sterne et al., 2009) with the mice package (van Buuren & Groothuis-Oudshoorn, 2011), assuming a mechanism of missing at random (MAR). Sensitivity analyses indicated that results using imputed data differed slightly from complete case analysis of players who completed all waves but led to the same inferences in the equivalence testing. We therefore report only the results with imputation below due to their additional precision and refer readers to the Supplemental Materials for the complete case analysis.
Descriptive statistics and reliability for each measure are shown in Table 2.
Correlations and Descriptive Statistics for Well-Being and Playtime Measures | |||||||||
Variable | Positive affect | Depressive symptoms | General mental | Playtime | Playtime | Playtime | Self-report playtime (prev. day) | Self-report playtime (prev. week) | Self-report playtime (prev. 2 weeks) |
---|---|---|---|---|---|---|---|---|---|
Depressive symptoms | −.41*** | — | |||||||
General mental well-being | .6*** | −.77*** | — | ||||||
Playtime (prev. day) | −.071** | .041 | −.044 | — | |||||
Playtime (prev. week) | −.073** | .073** | −.055* | .74*** | — | ||||
Playtime (prev. 2 weeks) | −.084** | .076** | −.06* | .63*** | .88*** | — | |||
Self-report playtime (prev. day) | −.058* | .072** | −.062** | .64*** | .57*** | .52*** | — | ||
Self-report playtime (prev. week) | −.062** | .079*** | −.071** | .53*** | .60*** | .59*** | .75*** | — | |
Self-report playtime (prev. 2 weeks) | −.058* | .093*** | −.075** | .52*** | .61*** | .60*** | .71*** | .93*** | — |
M | 2.82 | 2.68 | 3.25 | 2.76 | 2.83 | 2.81 | 3.40 | 2.53 | 2.52 |
Mdn | 2.87 | 2.67 | 3.33 | 1.51 | 2.09 | 2.06 | 2.75 | 1.82 | 1.83 |
Within-person SD | 0.84 | 0.87 | 0.72 | 3.45 | 2.81 | 2.69 | 3.48 | 2.46 | 2.36 |
Reliability (ω) | .83 | .96 | .93 | ||||||
Note. Playtime measures refer to hours per day. prev. = previous; Mdn = median. |
Positive affect in the present moment was measured with the positive affect subscale of the International Positive and Negative Affect Schedule-Short Form scale (Thompson, 2007). Players were asked to rate the extent to which a list of 10 adjectives (e.g., “alert” and “determined”) describe how they feel at that moment, on a 5-point Likert scale from 1 (“very slightly or not at all”) to 5 (“extremely”). Scores were calculated by taking the mean of all items, and therefore range from 1 to 5.
Depressive mood in the previous week was measured with the PROMIS eight-item Adult Depression Scale (Cella et al., 2010; Pilkonis et al., 2011). Players rated eight statements about how they felt in the past 7 days such as “I felt hopeless” and “I felt I had nothing to look forward to” on a 5-point scale from 1 (never) to 5 (always). Scores were calculated using item-level calibrations through the HealthMeasures Scoring Service (https://www.assessmentcenter.net/ac_scoringservice) and are normalized to a mean of 50 and an SD of 10. To match the other well-being variables and ease interpretation, we rescaled PROMIS depression scores to range from 1 to 5.
General mental well-being during the previous 2 weeks was measured with the Warwick–Edinburgh Mental Well-Being Scale (WEMWBS; Tennant et al., 2007), which has shown good psychometric properties and sensitivity to within-person change (Maheswaran et al., 2012). Players rated 14 statements about how they felt during the past 2 weeks such as “I’ve been dealing with problems well” and “I’ve been feeling good about myself” on a 5-point scale from 1 (none of the time) to 5 (all of the time). Scores were calculated by taking the mean of all items, and therefore range from 1 to 5.
Playtime data were collected using two redundant Python scripts that tracked players’ online status on the Xbox Network. All players were required to add three researcher accounts as friends on Xbox Network for the duration of the study and ensure that their privacy settings allowed friends to see their play activity. Using the Xbox web interface (using Chrome 89 on Ubuntu 20.04) and the Xbox Android app (using an Android 6.0 Virtual Device on Windows Server 2019), each script independently recorded the status of each player (i.e., if they are online, and if so, what game or application they are running) at 5-min intervals for the duration of the study.
We followed data protection and privacy by design in the setup of our tracking: Identifiable playtime data were stored only on the remote, password-protected, and encrypted machines where data were initially collected (hosted at the University of York and DigitalOcean’s UK GDPR-compliant London servers, respectively). Identifiable information was replaced with random numeric identifiers prior to analysis and sharing. All players provided informed consent about the tracking procedure and could opt out at any point.
Playtime variables were calculated as hours of play per day during the specified time window, a continuous variable. We took the timestamp of survey completion for each player at each wave and summed the time spent online in games during the 24-hr/7-day/14-day time windows preceding and following that survey completion timestamp, then divided by the number of days in the time window (excluding nongame activities on Xbox such as streaming TV series, shopping, or browsing the internet). Due to technical problems, there were 7.6 hr spread throughout the study where both playtime trackers were nonoperational. As preregistered, we weighted each player’s playtime estimates based on the proportion of missing playtime data in that window—in virtually all cases, playtime data were missing for <1% of the window.
Final playtime values therefore correspond to the mean time (in hours) spent playing Xbox games per day during the 24 hr/7 days/14 days before or after completing that survey wave.
Players estimated the number of hours and minutes they spent playing games on Xbox over the past 24 hr/7 days/14 days.
The study also included measures of personality (Big Factor Inventory 2-Short Form; Soto & John, 2017) and dysregulated gaming (Internet Gaming Disorder Scale–Short-Form; Pontes & Griffiths, 2015) at Time 1, and measures of need satisfaction/frustration in daily life (BPNSFS; Chen et al., 2015) and in video game play at Times 1–6. These constructs were included for separate confirmatory and exploratory analyses not reported here.
We tested each hypothesis using a random-effects within-between (REWB) model (Bell et al., 2019), a mixed-effects model that disaggregates within- and between-person sources of variation and that has been successfully applied in previous digital mental health research (Schemer et al., 2021). The REWB model has several benefits: it maintains high power for within-person effects, is able to handle our data structure wherein temporal precedence exists within each wave (not simply from one wave to the next), and is easily interpretable. Models were fit using the glmmTMB package (Brooks et al., 2017) in R Version 4.3.1 (R Core Team, 2023).
In total, we fit six REWB models. Hypotheses 1a–c (playtime predicting subsequent well-being) are analyzed with linear REWB models, while Hypotheses 2a–c (well-being predicting subsequent playtime) are analyzed with generalized linear mixed-effects models. In our preregistration, we planned to use a zero-inflated γ distribution with log link to account for the γ-like distribution of playtime (see Figure 2) alongside the possibility of 0 s (i.e., no playtime in a given window); due to misfit, however, we instead elected to use a Tweedie distribution with log link (see the Deviations From Preregistration section).
In each model, the person’s mean for the predictor of interest (i.e., playtime for Hypotheses 1a–c, one of the three well-being variables for Hypotheses 2a–c) was entered as a between-person predictor, while their person-centered predictor value was entered as a within-person predictor. We include correlated random intercepts and slopes for the within-person predictor, allowing them to vary by player. We include age and gender as covariates, given evidence that they are exogenously related to both playtime (Padilla-Walker et al., 2010; Ream et al., 2013) and well-being (Girgus & Yang, 2015) and are therefore potential confounds (Rohrer, 2018). We include an AR(1) autocorrelation term to avoid artificially small standard errors and wave (time) as a categorical covariate to detrend the outcome variable (Wang & Maxwell, 2015).
If the 90% CI of the predictor of interest is fully within the lower and upper equivalence bounds, the data provide evidence for the absence of a practically meaningful effect. Because each model tests a different hypothesis and we interpret these separately, we do not correct for multiple comparisons (Rubin, 2021). Because our theoretical focus is within-person effects and because statistical power for the between-person effect is lower, we do not interpret the between-person effect—its inclusion is for obtaining unbiased estimates of the within-person effect (Schunck, 2013). For full details of the between-person results, please see the Supplemental Materials.
We originally planned to recruit players using Reddit advertisements only. The advertising costs per recruited player were higher than anticipated, however, and we therefore transitioned to using Prolific screening questionnaires partway through the study, which proved a lower cost option.
When fitting models for Hypotheses 2a–c, we experienced three problems: First, glmmTMB produced false convergence warnings when including the autocorrelation term. This is likely the result of the high number of parameters estimated in a zero-inflated mixed-effects model, combined with the relatively short number of time points over which autocorrelation might occur. To address this, we dropped the autocorrelation term from these models. Second, we experienced singular fit warnings when including the random slopes term, indicating that our data did not support the inclusion of varying relationships between well-being and subsequent playtime across players. We therefore dropped the random slope term, retaining the random intercept. The simplified models converged without issue and differed only minimally in terms of estimates and standard errors from the models with convergence warnings, suggesting that these changes did not meaningfully affect the inferences drawn here.
Finally, diagnostics for these simplified models (particularly Hypothesis 2c) indicated a substantial degree of left-skew in the residuals, and thus, poor model fit. To address this, we fit alternative models in glmmTMB using the Tweedie distribution, a family of exponential dispersion models of which the γ distribution is a special case (Bonat & Kokonendji, 2017). The Tweedie distribution estimates an additional parameter p, or power, which allows it to handle zero-inflated data in a unified way, and by virtue of using a log link function yields regression coefficients that can be interpreted in the same way as those from the γ regression (see, e.g., Andersen et al., 2019). Diagnostics of the Tweedie models showed substantially improved fit with regard to dispersion and residual quantiles, and unlike the preregistered models, converged while retaining the autocorrelation term.
We therefore elect to report the results of the Tweedie models below, as these have the least degree of bias stemming from model misfit. Results from the originally preregistered zero-inflated γ models are available in the Supplemental Materials. The precise estimates between the Tweedie and zero-inflated γ regression models diverged little, with only one impacting on inference: the preregistered model for Hypothesis 2c was inconclusive, with the confidence interval overlapping both 0 and the upper SESOI, whereas the Tweedie estimate for Hypothesis 2c was within the equivalence bounds. We discuss this divergence in the results below. For both Hypotheses 2a and 2b, the preregistered and modified models equally supported the absence of practically significant within-person effects of well-being on subsequent playtime.
In total, we recorded 100,000 hr of playtime throughout the study. Descriptive information is shown in Table 3, which lists the most played games, and Figure 2, which shows how playtime and session length were distributed. The games played in our sample broadly resembled those played by the Xbox population as a whole during the study period: 15 of the 20 top games in our sample were present in the global top 40 (Albigés, 2023).
Top 20 Most Popular Games Across Players | |||
Game | Total time (hours) | Unique sessions | Average session length (hours) |
---|---|---|---|
Call of Duty: Modern Warfare II | 6,196 | 5,773 | 1.07 |
Fortnite | 5,114 | 4,361 | 1.17 |
Destiny 2 | 5,104 | 3,284 | 1.55 |
Hogwarts Legacy | 3,030 | 2,089 | 1.45 |
Forza Horizon 5 | 2,413 | 2,083 | 1.16 |
Overwatch 2 | 2,371 | 1,856 | 1.28 |
Tom Clancy’s The Division 2 | 1872 | 1,572 | 1.19 |
Warframe | 1,640 | 1,330 | 1.23 |
Minecraft | 1,476 | 1,422 | 1.04 |
FIFA 23 | 1,454 | 1,502 | 0.97 |
Apex Legends | 1,353 | 1,418 | 0.95 |
Grand Theft Auto V | 1,348 | 1,268 | 1.06 |
Dead by Daylight: Special Edition | 1,328 | 841 | 1.58 |
Disney Dreamlight Valley | 1,303 | 1.05 | |
ROBLOX | 1,217 | 1,113 | 1.09 |
MLB The Show 23 | 1,177 | 889 | 1.32 |
Atomic Heart | 991 | 835 | 1.19 |
HITMAN 3 | 985 | 907 | 1.09 |
Rocket League | 960 | 1,613 | 0.60 |
Age of Empires Definitive Edition | 947 | 1,592 | 0.59 |
Data quality was high; each of the well-being measures was moderately correlated at each wave, and playtime consistently followed the expected γ-like distribution. There was no evidence for significant floor or ceiling effects in our well-being variables. Average well-being scores of our population closely track those of comparable general populations in reference studies: Mean positive affect is 2.82 (compared to 2.9 in the United Kingdom and cross-cultural sample, Thompson, 2007, p. 238); general mental well-being is 3.25 (compared to 3.4 in U.K. adults aged 25–36, Ng Fat et al., 2017); depressive symptoms were 54 on the original scale or 2.68 when rescaled (compared to 50 in the general as the PROMIS population average, Kroenke et al., 2020), indicating slightly elevated reported depressive symptoms but below the recommended threshold for mild depression.
After adopting the Tweedie regression (as specified in the Deviations From Preregistration section), diagnostics for each multilevel model indicated either no or only minor violations of assumptions of heteroskedasticity, distribution of residuals, and linearity.
Results showed support for Hypotheses 1a–c (Figure 3, top): There is strong evidence to reject a practically significant relationship between playtime and subsequent well-being. This held true at all three time scales we investigated. There were no meaningful relationships between playtime in the last day and current positive affect (Hypothesis 1a), playtime in the last week and current depressive symptoms (Hypothesis 1b), or playtime in the last 2 weeks and current general mental well-being (Hypothesis 1c). Estimated relationships were all within the equivalence bounds, and all three overlapped 0 (p > .24).
Based on our estimates, in which 1 hr of additional playtime was associated with a less than .02 change in all three of the well-being variables, players who would increase their playtime by 5 hr—the average total leisure time available to U.S./U.K. adults—would be predicted to show a less than .10 scale point change in well-being, or just a third of our estimate for a practically significant, noticeable effect of .3.
As preregistered, we ran identical models for Hypotheses 1a–c using self-report playtime instead of logged playtime to explore whether findings differ. Though not the focus of the current article, self-report data were moderately correlated with logged playtime, with shorter time periods showing slightly greater concordance (r = .64 over the previous day, r = .60 over the previous 2 weeks). Results aligned with those of logged playtime: no estimates of self-report play were significantly related to subsequent well-being (p > .18), and all 90% CIs were within the equivalence bounds. Complete findings are available in the Supplemental Materials.
Results similarly supported Hypotheses 2a–c (Figure 4, top): There is evidence to reject any practically significant within-person relationships between positive affect (Hypothesis 2a), depressive symptoms (Hypothesis 2b), or general well-being (Hypothesis 2c) and subsequent playtime. Tweedie estimates indicate that a 1-scale point increase in depression would predict a 4.3% decrease in playtime during the following 24 hr, or 5 min for the average player. This means that a player who reported a change from mild depression to severe depression—a 20-point change on the original PROMIS scale, or a 1.8-point change when rescaled to 1–5 (Kroenke et al., 2020)—would be predicted to play just 10 min less per day, half the size of our specified SESOI. Effect estimates were only marginally larger for general well-being, and even smaller for positive affect.
As noted above, results from the preregistered Hypothesis 2c model differed somewhat: though there was no statistically significant relationship between general mental well-being and subsequent playtime, the 90% CI of general well-being overlapped with the smallest effect size of interest, and the results were therefore inconclusive (see Plots/H2_ZeroInflatedGamma in the Supplemental Materials). While the point estimates between the two models differed only marginally, the Tweedie regression had slightly smaller errors—this improved precision resulted in an estimate that was within the equivalence bounds. Given the degree of model misfit in the preregistered model, we believe the Tweedie estimate to be the more trustworthy one.
In line with previous research (Johannes, Vuorre, et al., 2021; Vuorre, Johannes, et al., 2021), our results strongly indicate that there is no practically significant within-person relationship between playtime and well-being at the population level. This finding improves on prior evidence in several ways: First, we closely approximate players’ total actual playtime by logging Xbox playtime of Xbox-predominant players. Second, we assess short- and long-term effects by tracking data over a longer time scale (12 weeks). Third, we investigate relationships in both directions. Fourth, to our knowledge, we offer the first registered report on the topic, minimizing the potential for publication bias and questionable research practices.
Finally, we expand on previous studies (which show an absence of evidence) by providing evidence of absence for playtime-well-being effects: the relationships we find are too small to be practically significant based on easily interpretable effect sizes. For depressive symptoms, the well-being measure with the “strongest” correlations, a player who increased their playtime by the total average daily leisure time of 5 hr would be predicted to show only a .06 point change in well-being—well short of the .3 point change identified as practically significant. Such extreme variation occasionally occurs, but is exceedingly rare even in our population, which skews toward more involved gamers and can therefore be expected to display large playtime swings in response to, for example, increased leisure time from holiday periods: just 14 players in our sample of 414 recorded a change in daily playtime of 4 hr or more between 2-week waves; just three did so more than once. The relationship in the opposite direction was similarly weak, if not weaker: the average player’s depression score would need to change from the minimum score (1, no depression symptoms) to the maximum (5, clinically severe depression) for us to predict a 20-min decrease in subsequent daily playtime.
The literature is dominated by effect studies; there is a marked absence of theory specifying testable mechanisms on how playtime impacts well-being, positively or negatively (Vuorre et al., 2022). This in part motivated our methodological choice to test for the absence of an effect, as opposed to severely testing hypotheses derived from theory.
This lack of prior theory has prompted researchers to conduct careful descriptive, qualitative work to develop grounded theoretical constructs and relations. Their work evidences that there are individuals who experience significant distress from intense gaming interfering with their life and well-being (Karhulahti et al., 2022), as well as individuals who experience that gaming provides them with psychological recovery, meaning and resonance, and a mechanism for coping with adverse life events (Iacovides & Mekler, 2019; Reinecke & Eden, 2017; A. Tyack et al., 2020). Thus, the main question arising from our robust null finding vis-á-vis prior work is: How do we square it with qualitative case reports showing strong positive and negative impacts of gaming on well-being?
The most commonly articulated mechanisms for case studies showing a negative relation between playtime and well-being (higher playtime:lower well-being) are playtime displacing of other activities essential for healthy psychosocial functioning (Kowert et al., 2014; Williams et al., 2008) or people using games to cope with an underlying condition such as depression (van Rooij et al., 2018). One way of explaining these case studies is that either pathway may be true, but simply too rare in the population to show in our aggregate measures. People may suddenly fall into very intense play interfering with their life or become depressed and start playing very intensely to cope, but this occurs so seldomly that it does not register at a population level. This fits recent arguments and evidence around person-specific media effects (Johannes, Masur, et al., 2021; Valkenburg et al., 2021). Another likely explanation is that problematic gaming effects arise and stabilize more slowly and gradually than over a 2-week window—that is, our sample may have included people with stable low well-being originally produced by stably high playtime (or vice versa), but this relation does not show in day-by-day, week-by-week, or 2-weekly fluctuations. Here, future qualitative work on the etiology of problematic play can help inferences toward a better explanation—and help identify whether high playtime-low well-being correlations may be jointly caused by third variables like environmental stressors.
Moving to predicted positive impacts of playtime (higher playtime:higher well-being), the most well-developed theory to our knowledge is Reinecke and Rieger’s (2021) recovery and resilience in entertaining media use (R2EM) model. It suggests that when people experience a depletion of psychological resources or need for recovery, they selectively expose themselves to entertainment media like games to both regulate their mood state with hedonic experiences (increasing positive affect) and replenish or even grow resources with eudaimonic experiences (satisfying psychological needs, building self-efficacy beliefs, etc.). How do we fit the substantive evidence this model draws upon with our null finding? Given that most evidence underpinning R2EM comes from short laboratory studies or cross-sectional self-report surveys, the most plausible explanation we see is that any recovery effects are so small, exchangeable with other recovery tactics, and/or short-lived that they no longer show in affect over the subsequent day, let alone depressive symptoms or overall well-being over the following 7/14 days. People may successfully recover from stress or frustration by playing games, but this does not markedly differ in effectiveness from other recovery mechanisms such as talking with friends, sleeping, or affect quickly restabilizing around a setpoint even without targeted recovery. Relatedly, reported “peak” experiences of deep meaning and other eudaimonic gaming moments may be highly memorable, easy to report, and attractive to study, but too transient in their impact on well-being, or too rare during everyday gaming to show in our data. Most “ordinary player experience” is routine, familiar, and “emotionally moderate” (A. Tyack & Mekler, 2021). Switching ends of the time scale, case reports of gaming as prolonged successful coping with prolonged adverse life circumstances (e.g., players mentally “escaping” from the loss of a loved one into a video game; Iacovides & Mekler, 2019) would fit our data if they are too rare to show meaningful effects at a population level or so gradual and stable that they do not manifest within 2-week windows.
Both person-specific effects and the R2EM model point to content and context specifics (on their own and in interaction with player specifics) as factors that likely do not show in raw playtime at a population level. That is, positive and negative impacts are moderated or coconstituted by content and quality of play (e.g., experiences of bullying or harassment in online play), whether play fits or interferes with a person’s life context (e.g., playing despite having to study for an imminent exam), or how these interact with the person’s dispositions (e.g., resonance of an in-game character’s story with one’s own life experience). At the idiographic level, these player–content–context interactions (see Elson et al., 2014) could explain when and why a given gaming experience might positively or negatively impact well-being. At the nomothetic level of “raw” playtime–well-being links, they wash out as unpatterned variance.
Overall, the gap between our evidence of absence for playtime–well-being relations and case reports of positive and negative playtime impacts can be bridged by appeals to (a) real-but-very-low-prevalence effects, (b) person–content–context-specific effects that do not manifest at aggregate levels, or (c) real-but-too-transient/slow effects to show at time scales of 24 hr to 14 days. The former two invite future research into causes and conditions of specific effects. The latter highlights, in our view, an area of general underspecification in games and wider media effects research: time scales. Different methodological paradigms can trace effects at very different time scales, yet theory (where it exists) too seldomly specifies claimed scopes of generality for time scales. This lacking specificity also holds for potential temporal dynamics, for example, whether we predict “lossless” or “decaying” accrual of (good or bad) playtime toward well-being, or expect fixed thresholds or dynamic tipping points. Here, again, careful qualitative and descriptive work seems in order to construct (better) empirically grounded theory.
A key strength of our method is the ability to capture play data across an entire platform, the Xbox network, rather than a single game. This was a limitation of recent prominent studies, which were limited to objectively logged playtime for one game per player, not reliably capturing total objective playtime (Johannes, Vuorre, et al., 2021; Vuorre, Johannes, et al., 2021). Our data demonstrate that highly engaged players tend to play many games over the course of a week or month; capturing playtime in a single game therefore severely limits our understanding of how gaming relates to mental health.
We hope that this article can serve as an example of creative digital trace data collection. Games researchers have made regular calls for higher quality behavioral data, often emphasizing collaboration with industry (e.g., Griffiths & Pontes, 2020). However, this article helps demonstrate that this is by no means the only method of accessing such data (Ballou, 2023). While our player data donation method has important limitations (e.g., laborious setup and reliance on user interfaces that are subject to change), we hope to see others improve upon it or develop alternatives. Together, this can help generate data infrastructure that is accessible to the whole research community, rather than simply a subset with the resources and technical expertise to develop these from scratch.
The data presented here are rich and ripe with opportunities for further investigation (using, e.g., our measures of personality, time of day, internet gaming disorder, and more). All data, materials, and tracking software are available in our Supplemental Materials (https://osf.io/edtwn/; Ballou et al., 2023) under a permissive license, and we encourage readers to explore whether their own research questions might be addressable with these.
Our largest limitation is the inability to account for potential time-varying confounders, as discussed in our initial causal assumptions. Various other factors may have impacted both gaming and well-being (e.g., demanding care responsibilities cause a player to play less and feel worse), and these have the potential to suppress effects that may otherwise have surfaced. While this remains a key limitation, our data include 1,221 responses to an open-ended question about what events might have affected both their well-being and their gaming. These offer a valuable opportunity for research identifying potential confounds to be accounted for in future research.
Other limitations include limited and coarse timescales and possible nonlinear relationships. While we assess three different timescales of playtime and dimensions of well-being, this is by no means exhaustive: games may affect various aspects of well-being over shorter or longer time periods than were investigated here. Similarly, as we suggested, such effects may not be linear, an assumption of our analysis approach—for example, players may experience a meaningful increase in well-being only after an initial small amount of playtime, after which no further benefits occur, or there may be a different relationship for extreme high engagement. We hope our results can catalyze the generation of better-specified theory that can predict such nonlinear relationships and timescales during which effects may occur.
Our results have potential to generalize in some regards, but not others. On well-being measures, our sample mirrors adult U.K./U.S. populations. In demographics and play behavior, our sample is fairly representative of adult U.S./U.K. highly engaged console players, who tend to be majority male, between 25 and 34 years old, and play approximately 1.5 hr per day (Newzoo, 2023), characteristics that are broadly reflected in our sample. Similarly, the most popular games in our sample overlapped considerably with the most popular games across the Xbox platform during the same period (Albigés, 2023), suggesting that our findings may generalize to other western adult Xbox players outside the sample. We believe this population to be an important one to study: given existing evidence that effects of games, where present, are small and contextual, highly engaged players may be most likely to experience accumulative effects over long periods of sustained high engagement.
However, our sample is not representative of many other populations of people who play games. We did not look at younger players (especially minors), were limited to two culturally similar western countries, and examined only one gaming platform. While our sample included a range of players from casual to extremely high engagement, the findings are primarily reflective of the moderately to highly engaged group. We look forward to assessing the generalizability of our findings in future research on children and adolescents from a wider range of countries, platforms, and levels of gaming engagement.
Public debate has often framed the relation between gaming and well-being as a simple universal and monotonic effect: the more time individuals tend to spend playing video games, the worse their well-being tends to be (Feiner & Kharpal, 2021; Twenge & Campbell, 2018). Our results contradict this narrative: for a general adult gaming population, and at time scales ranging from 1 day to 2 weeks, even variations of 4–5 additional hours of daily video game play are unlikely to have a practically significant impact on well-being. This leads us to conclude that at a population level, the typical range of observed playtime and playtime variation for adult gamers—ignoring content, context, and player specifics moving us outside “ordinary” player experience—has no practically significant well-being impact, positive or negative. By conducting the first registered report on the topic and the first approximating a player’s actual total objective playtime (tracking all Xbox play from Xbox-predominant players), our study significantly strengthens the evidence base on this topic.
Given our findings, research (or interventions) targeting raw, decontextualized playtime alone as a cause of well-being is bound to miss the mark—the majority of well-being impacts arise from the interaction of specific player, content, and context circumstances. Unpacking these impacts will involve descriptively tracing and theoretically specifying temporal scales and dynamics far more carefully. In other words: To understand how games affect us, we should pay more attention to time—just not playtime.
https://doi.org/10.1037/tmb0000124.supp