Previous studies showing links between smartphone use and poor mental health are based on flimsy evidence.
Volume 1, Issue 2, DOI: 10.1037/tmb0000022
Problematic smartphone scales and duration estimates of use dominate research that considers the impact of smartphones on people and society. However, issues with conceptualization and subsequent measurement can obscure genuine associations between technology use and health. Here, we consider whether different ways of measuring “smartphone use,” notably through problematic smartphone use (PSU) scales, subjective estimates, or objective logs, lead to contrasting associations between mental and physical health. Across two samples including iPhone (n = 199) and Android (n = 46) users, we observed that measuring smartphone interactions with PSU scales produced larger associations between mental health when compared with subjective estimates or objective logs. Notably, the size of the relationship was fourfold in Study 1, and almost three times as large in Study 2, when relying on a PSU scale that measured smartphone “addiction” instead of objective use. Further, in regression models, only smartphone “addiction” scores predicted mental health outcomes, whereas objective logs or estimates were not significant predictors. We conclude that addressing people’s appraisals including worries about their technology usage is likely to have greater mental health benefits than reducing their overall smartphone use. Reducing general smartphone use should therefore not be a priority for public health interventions at this time.
Keywords: Smartphones, Technology, Mental Health, Sedentary Behaviors, Screen Time, Methods, Digital Health
Funding: This work was part funded by the Centre of Research and Evidence on Security Threats (ESRC Award: ES/N009614/1).
Conflict of interest: None of the authors have any financial, personal, or organizational conflicts of interest.
Acknowledgements: The authors would like to thank Adam Birkinshaw and Foivos Vantzos for useful comments and suggestions throughout the project. The authors would also like to thank Neil Shaw, Lee Shaw, and Dr. Flora Ioannidou, for helping pilot the smartphone applications. Finally, the authors are grateful to Dr. Jean-François Stich, Dr. Niklas Johannes, and Dr. Linda Kaye for their useful comments and proposed revisions, which greatly improved the article
Open Science Disclosures:
The data are available at https://osf.io/sw38c/
The experiment materials are available at https://osf.io/a4p78/ The preregistered design and analysis plan is accessible at https://osf.io/92ebz
Correspondence concerning this article should be addressed to Heather Shaw, Department of Psychology, Lancaster University, Bailrigg, Lancaster, United Kingdom, LA1 4YW. Email: email@example.com
Smartphones are primarily used for connecting people in a variety of personal and occupational settings. Although the benefits of interpersonal communication are well established (Berkman et al., 2000), most research concerning the relationship between communication, technology, and health has focused on the “negative consequences” of smartphone use and screen time with a strong focus on mental health (Elhai et al., 2017), and sedentary behaviors (Zagalaz-Sánchez et al., 2019). Often referred to as “problematic smartphone use” (PSU) or “smartphone addiction” (Elhai et al., 2017), these refer to the perceived undesirable side-effects of use, which are mirrored in public discourse (Genc, 2014; Yang et al., 2019). However, there is a growing acknowledgement that the majority of research linking any screen time behaviors to health outcomes are themselves problematic (Science and Technology Committee, U.K. Gov., 2019). For example, a growing number of academics have argued that research needs to address issues with measurement (Ellis, 2019), theory (Orben, 2018; Shaw, Ellis, & Ziegler, 2018), and analysis choices (Orben & Przybylski, 2019), by prioritizing high-quality designs to better understand genuine benefits or harms (Coyne et al., 2020; Heffer et al., 2019). This may, in part, explain the lack of a coherent academic position regarding the impact of smartphone use on well-being. This remains troublesome when it comes to justifying the existence or effectiveness of interventions that aim to reduce usage. In this article, we specifically investigate whether the relationship between smartphone use and health changes noticeably as a result of how smartphone use is conceptualized and measured.
Survey research has repeatedly linked increased smartphone screen time to lower psychological well-being (Twenge et al., 2018). However, many have noted that smartphone use is rarely measured directly, despite objective data being readily available from devices themselves (Ellis et al., 2019; Twenge, 2019). Moreover, in recent years, concerns regarding “overuse” have led to an abundance of usage scales being created to measure new constructs, including the following: “addiction,” “nomophobia,” and ”problematic use” (Ellis, 2019; Thomée, 2018). Specifically, when using problematic smartphone use scales, research consistently links higher scores with greater mental health symptomology; however, these relationships seem to either dissipate or lessen when collecting the duration estimates of use or objective logs (Elhai et al., 2017; Harwood et al., 2014; Katevas et al., 2018; Rozgonjuk et al., 2018; Vahedi & Saiphoo, 2018). Thus, understanding when and why these inconsistencies occur remains essential.
Beyond psychological impacts associated with usage, research has also linked greater smartphone use with increased sedentary behaviors (Lepp et al., 2013; Zagalaz-Sánchez et al., 2019). Accordingly, people report that 87% of all phone use occurs while seated (Barkley & Lepp, 2016), and similarly, 90.9% of users report that they typically are sitting when using their smartphone (Xiang et al., 2020). Thus, it has been proposed that increased smartphone use lowers energy expenditure due to sedentary behaviors, and it is this mechanism, which results in greater body fat and higher rates of obesity (Hamilton et al., 2007; Kim et al., 2015). However, although 9 out of 14 articles in a recent systematic review showed a negative relationship between smartphone use and physical activity, none of the articles measured smartphone use objectively via logs from the device itself (Zagalaz-Sánchez et al., 2019). Instead, people self-reported the duration and frequency of their smartphone behaviors, which is widely documented to only have moderate correlations with actual usage (Andrews et al., 2015; Boase & Ling, 2013; Ellis et al., 2019; Kobayashi & Boase, 2012; Lee et al., 2017; Parslow et al., 2003; Vrijheid et al., 2006). Therefore, research linking physical activity or sedentary behavior to smartphone use is also scarce and yet to be examined precisely using objective logs.
When documenting links between smartphone use and health, nuanced approaches suggest that how users think about and appraise their own smartphone usage is uniquely related to well-being and can be considered separately from objective use of the device itself. For example, a recent study found no evidence linking objective use of social applications to momentary well-being (Johannes et al., 2019). However, they did observe that the more positively people felt about their technology-mediated interactions in the past half hour, the better they felt in the current moment (Johannes et al., 2019). In addition, when assessing email use in occupational settings, stress levels increase when a person perceives their usage to be greater or lower than desired (Stich et al., 2019). This suggests that people aim to regulate technology usage as they would with other everyday behaviors including, for example, social affiliation (O’Connor & Rosenblood, 1996). Negative or positive appraisals may be dependent on whether a person has been able to achieve their preferred amount of usage (O’Connor & Rosenblood, 1996; Stich et al., 2019). Thus, the way people perceive their smartphone usage behaviors (e.g., a belief that their use is excessive) may drive relationships with mental health that are independent from actual usage.
Although there is no consensus regarding how smartphone usage or screen time should be conceptualized or measured, documenting “usage” is of interest to many (Ellis, 2019). Researchers, however, continue to conflate the measurement of smartphone usage with assessing an individual’s appraisal of use. For example, defining or measuring PSU in relation to “overuse” or “excessive use” is prevalent in many articles (Elhai & Contractor, 2018; Elhai et al., 2020; Kim, 2017; Yang et al., 2019). This has foundations in the Behavioral Addictions framework, where tolerance is a key component (e.g., the need to increase use over time to get the same “fix”) (Billieux, Maurage et al., 2015; Elhai et al., 2017; Kim, 2017). Hence, it is not surprising to find questions such as; Using my smartphone longer than I had intended and Having tried time and again to shorten my smartphone use time but failing all the time in problematic use scales (Kwon et al., 2013). However, agreeing with these statements only shows that a person is negatively appraising their smartphone use, and is not a measure of frequency or screen time in itself. Correspondingly, research that has attempted to quantify the relationship between problematic use scales and objective logs reports many small effect sizes (Ellis et al., 2019), and exploratory factor analysis research shows that PSU scores do not cross-load with factors representing actual usage (Davidson et al., 2020). This evidence already suggests that people’s appraisals of their smartphone use and actual usage should be considered separately.
In light of this unclear conceptualization, it is important to distinguish between PSU as a psychological construct that appraises use, and smartphone usage as a behavioral variable, because it has implications for theory and any proposed treatment. For example, if negative associations with physical and mental health are driven entirely by usage appraisals, then providing interventions that focus on usage behaviors alone may not deliver any benefits (Loid et al., 2020).
Measuring the associations between health and smartphone use in different ways could generate radically different results when relying on different operationalizations: subjective estimates, objective logs, and psychometric scales. This article aims to understand this issue by collecting all three measures from the same participants. We therefore asked the question as follows:
Do problematic use scale scores generate larger associations with health when compared with estimates of usage or objective behavior from the same users?
Furthermore, we examined if increased smartphone use, when measured objectively, could account for variability in physical or mental health. Therefore, we also ask the following:
Can objective smartphone use (pickups and screen time) account for differences in mental health symptomatology or physical health?
These questions were first investigated during exploratory analysis of 46 adults who completed all three measurements, alongside an assessment of their body composition and anxiety, depression, and stress symptomology. The results were then used to generate hypotheses regarding the influence of different usage measurements on effect sizes. A second study then acted as a replication and provided increased statistical power. All materials for both studies are located on the Open Science Framework (see Shaw, Ellis, Geyer, et al., 2018).
The sample consisted of 46 (12 male) participants who were staff and students from the University of Lincoln, U.K. This deviates from our preregistered sample size of 84 due to laboratory access and technical issues. However, posterior calculations determined that a total sample size of 44 was adequate to investigate two-tailed medium-to-large effect sizes (r > .4) with a power of .8 when α = .05. Age was skewed, as we tested predominately younger adults (M = 23.54, SD = 8.25). All participants were Android smartphone users and stated they exercised less than 10 hr per week.
The study was advertised around a university campus using posters, leaflets, subject pool systems, and social media channels during term time and during public engagement events. Therefore, the sample consisted of those who emailed the researcher in response to these advertisements. Participants were told they would receive a graph of their phone use and a printout of their health analysis as incentives to take part. Those recruited through subject pool systems received course credit in compensation for their time.
Study 1 collected numerous variables to explore the relationships between individual differences and objective smartphone use. For brevity, the focus of this article is to describe the body composition and mental health relationships with general smartphone use. Therefore, only the variables and data collection procedures related to this aim are described here. For further information on the additional variables collected, see the Supplementary material.
Objective smartphone data were collected using an application developed specifically for the project called Activity Logger (Geyer, 2018). This ran on Android devices and collected data to the resolution of 1 s. Activity logger was set up to listen to three events: the phone being turned on, the screen being activated, and the screen being turned off. Background operations then took this information, retrieved the current time stamp, and stored this in internal memory. This data file was then exported via the application and contained a list of records where a UNIX time stamp was paired with an event stating whether the screen was turning “ON” or “OFF.” Source code for the application is available to download (https://osf.io/a4p78/).
To gather estimates of daily smartphone screen time, participants were asked one question: Think back to days 2–8 of the study. On average, how many hours a day did you spend on your smartphone? Participants responded in hours and minutes. To measure people’s estimates of how many times a day they “picked up” their device, participants were asked the following: Think back to days 2–8 of the study. On average, how many individual times did you use your smartphone a day? Think of these as individual pickups.
PSU was measured using the smartphone addiction scale (SAS), which contained 33 items (Kwon et al., 2013). Participants rated the extent to which they agreed to several statements, for example, Feeling pleasant or excited while using a smartphone. Participants responded on a 6-point Likert-scale ranging from “Strongly Agree” (1) and “Strongly Disagree” (6). Higher scores indicated greater addiction risk. This scale was chosen because it is widely cited and correlates highly with a variety of other PSU measures, which all appear to measure the same construct (Davidson et al., 2020; Ellis et al., 2019; Thomée, 2018).
Symptoms of anxiety were measured using the GAD-7 (Spitzer et al., 2006) and included seven items. Participants were asked how often in the last two weeks have you been bothered by… and responded on a 4-point scale, whereby 0 = “Not at all” and 3 = “Several Days.” Using >10 as a cutoff point, the GAD-7 has been shown to have 89% sensitivity and 82% specificity with a diagnosis of general anxiety disorder (Kroenke et al., 2007).
Severity of depression was measured using the PHQ-9 (Kroenke et al., 2001). Each of the nine questions related to a criterion mentioned in the DSM-IV for depression. Participants were asked how often in the last two weeks have you been bothered by… and responded on a 4-point scale, whereby 0 = “Not at all” and 3 = “Several Days.” Using >10 as a cutoff point, the PHQ-9 has been shown to have 88% sensitivity and 88% specificity with a diagnosis of major depression (Kroenke et al., 2001).
The Perceived Stress Scale (Cohen et al., 1983) had 14 items which measured “the degree to which situations in one’s life are appraised as stressful.” Participants responded how often they felt a certain way on a 5-point Likert scale, whereby 0 = “Never” and 4 = “Very Often.” Participants were asked questions such as In the last month, how often have you felt that you were on top of things? Higher scores indicated greater perceived stress.
Height was measured using a meter stick, with age and gender captured via self-report questions. These data were inputted as controls in subsequent bioimpedance analysis. Body composition was measured using the eight electrode Tanita MC-780MA body composition monitor. This provided an estimate of a person’s body fat percentage, body mass index (BMI), and skeletal muscle mass percentage, using bioelectrical impedance measures. Bioelectrical impedance assessment using the Tanita MC-780MA was a good alternate to magnetic resonance imaging and dual-energy X-ray absorptiometry (DEXA) which are costly and time consuming (Verney et al., 2015). Notably, the Tanita MC-780MA produces body fat assessments which highly correlate with DEXA assessment (r = .85) providing concurrent validity (Verney et al., 2015).
The study lasted 9 days (see Figure S.1 for an infographic of the itinerary). On day 1, a lab session provided participants with study information, including example data, followed by a consent form, and an online questionnaire. Participants answered questions, including date of birth, gender, and other psychometric scales that were beyond the scope of this article (see the Supplementary material). Once completed, participants were guided through the installation of the Activity Logger, and the researchers documented the smartphone brand and operating system. All screen savers were set to turn off after 30 s, and the application was “white listed” in the smartphones’ battery settings, ensuring that the phone would not limit the applications’ functionality if the smartphone battery was low. Participants were then asked to keep their phone switched on for the duration of the study, and to not close the application. Although the application should re-start independently, as a precaution, if a participant’s phone was switched off during the week, or they closed it, participants were instructed to re-open the application. Participants were then provided with information detailing how to prepare for the body composition assessment on day 9. To control for factors influencing body composition results, participants were asked to refrain from intense exercise and alcohol up to 12 hr prior to the assessment. They were also asked to remain hydrated and book a time in the afternoon that was 3 hr after lunch. All participants were asked to go to the toilet before this session.
Participants were requested to use their phone as normal and carry on with their everyday activities across days 2–8 of the study. This ensured that seven full days’ worth of smartphone data was collected for each participant. On day 9, they returned to the lab and upon arrival, emailed data from the application to a researcher. Next, participants completed a questionnaire containing scales that measured stress, anxiety, depression, smartphone addiction, and other variables not reported in this article (see Supplementary material). They were then asked to provide an estimate of how much they picked up their phone, and the amount of time they spent on their phone, on average each day, across days 2–8.
Height was measured as part of the bioimpedance assessment. Participants were instructed to remove any jewelry, items in pockets, metal accessories, and were then asked to stand bare foot on the Tanita MC-780MA body composition monitor while holding the hand electrodes by either side of their body, without touching their legs. A 0.5 kg clothing allowance was inputted into the Tanita software if participants were wearing light clothing (gym gear), and a 1 kg clothing allowance was inputted for heavy clothing (jumpers and jeans). Upon completion, participants were given a printout of their body composition, a graph of their application use, and a graph of their screen time across the week. Finally, participants were debriefed and thanked for their time.
All procedures received ethical clearance by the School of Psychology Research Ethics Committee at the University of Lincoln and complied with British Psychological Society Guidelines (British Psychological Society, 2018). In the debrief, participants were told that the study would not offer any clinical diagnosis of any disorders and were provided with information about charities and services if they needed further support. The study also underwent a data protection plan. Participants had full control of their data as phone logs were stored solely on their devices and could be deleted by the participant at any point during the study by simply uninstalling the application.
Data and analysis scripts for Study 1 can be found on the Open Science Framework (https://osf.io/a4p78/). The median daily hours-of-use was calculated across days 2–8 for each person to remove the influence of any extreme “Screen On” events that occurred if the phone battery depleted and the application did not log a “Screen Off” event. Daily pickups (frequency of use) were averaged across days 2–8, in accordance with the recent work (see Ellis et al., 2019). For the SAS, GAD-7, and PHQ-9, the responses were summed to create a total score for each scale. Specific questions within the perceived stress scale required reverse coding, and then, an overall sum was created per person. See Table 1 for a list of the variables used in the analysis and their descriptives.
When collating all 46 participants’ data together, smartphone use was highly skewed, as 54.44% of uses were under 30 s in duration, and 43.54% of uses were under 15 s in duration. Due to this skew, we followed Bishara and Hittner (2017) recommendations and conducted Spearman rank order correlations with Fieller et al. (1957) variance when calculating 95% CI as these are robust against non-normality. To explore how differences in smartphone measurement may influence associations with health, Spearman correlations were conducted between all the health and smartphone variables (see Table 2). Notably anxiety, depression, and stress had significant positive correlations with smartphone addiction scores (all p < .01), which did not occur with any other smartphone measure (see Figure 1 for objective screen time specifically). In terms of effect sizes, smartphone addiction scores generated rs equal to or larger than .39 with mental health variables, whereby estimates and objective variables were lower (all rs < .2) (see Table 2; Figure 3).
In Study 1, smartphone addiction was positively correlated with anxiety, depression, and stress measures. Pertinently, effect sizes quadrupled when measuring smartphone usage with a problematic use (addiction) scale in comparison to objective screen time and pickup measures. In line with the prior work, people’s appraisals of their smartphone usage had stronger relationships with mental health than self-reported frequencies of use (Vahedi & Saiphoo, 2018) or objective logs (Rozgonjuk et al., 2018). This suggests peoples’ appraisals of their smartphone use (e.g., worries) are more pertinent to mental health symptomatology than actual usage. Therefore, even within the same participants, a researcher could make different conclusions based on the measurement tool adopted. This is especially problematic when confounding the construct of problematic smartphone use with actual usage. Interestingly, we found that BMI reduced as daily screen time and pickups increased. Although gravitating in the same direction, the effect size was smaller for correlations between actual usage and body fat percentage. Nevertheless, neither suggested the presence of any adverse effects between daily smartphone screen time and pickups on these measures of physical health.
We marked these findings as tentative until they could be replicated in a larger sample. This was examined in Study 2, where we collected identical mental health and smartphone measures as Study 1. We also re-assessed BMI and took advantage of retrospective data collected on a user’s device, including daily logs of steps, and daily logs of “walking and running” distance. Based on our previous findings, we predicted that effect sizes of rs > .3 would be found when comparing mental health relationships with problematic use scales, and that lower effect sizes of rs < .2 would be found when examining estimates of use and objective logs.2
A total of 199 (137 women) participants were recruited via Prolific Academic, from a subject pool of 24,117 iPhone owners. This pool contained predominately citizens from the United Kingdom and the United States. Participants had a mean age of 30.18 (SD = 9.46) and were paid £1.25 for their time. A 42.71% of the sample were overweight or obese, and the average BMI across all participants was slightly higher than the recommended range (M = 25.17, SD = 5.38). This was to be expected in a representative sample, as 52% of people have a BMI over 25 worldwide (World Health Organization, 2018). A priori power calculation was performed which showed during two-tailed analysis a sample size of 192 participants was enough to detect small effect sizes of rs ≥ .2 with a power of .8 when α = .05.
Once clicking the link to access the online questionnaire, participants were presented with study information and a digital consent form. If participants agreed to take part, they were then asked as follows: Please estimate how many hours and minutes you spend on your phone each day and answered in hours and minutes. In addition, participants were asked as follows: Please estimate how many times a day you pick up and use your phone. After, smartphone addiction, anxiety, depression, and stress were then measured using the same scales as in Study 1.
Objective smartphone usage data were retrieved by utilizing the Apple Screen Time feature that resides in modern iPhones. We used the same methodology as reported in the study by Ellis et al. (2019) and extracted data retrospectively from the previous 7 days. In short, participants were prompted to find the “Screen Time” graph and the “Pickups” graph in Apple Screen Time settings and record for each day the number of pickups and screen time (in hours and minutes). For more details, see Ellis et al. (2019).
After obtaining objective smartphone use data, the questionnaire asked people to input their health data. The Apple Health App automatically tracks users’ steps, and their combined “walking and running” distances. These historic data are accessible on a user’s iPhone for the entire time they have owned their iPhone. When clicking on the “Today” tab, participants had access to a calendar where they could view their activity for any past day. Daily steps were collected by asking participants to click on the calendar pages for dates in the past week and enter for each day the number of steps displayed. Daily “walking and running” distances were collected by asking people to click on the calendar pages for dates in the past week and report the documented distance in either kilometers or miles. Participants were also asked if they owned a fitness tracker or a smartwatch and specified whether this device was synced to the Apple Health App. Finally, participants were asked to report their age, gender, weight, and height. They were given the option to answer in either metric (meters and centimeters/kilograms) or imperial measures (feet and inches/stones and pounds). At the end of the questionnaire, participants were debriefed, thanked for their time, and were then re-directed back to the Prolific Academic website.
All procedures received ethical clearance by the School of Psychology Research Ethics Committee at the University of Lincoln and complied with British Psychological Society ethical guidelines for internet-mediated research (Hewson et al., 2013). Akin to Study 1, the debrief provided websites where participants could access guidance regarding their mental health and were provided with details of 24-hr support lines. Participants could withdraw at any time before, during, or up to 2 weeks after they completed the study by emailing the researcher.
The following videos were presented to participants so they could easily locate objective data. However, the user interface of Apple Screen Time and Apple Health App may change in line with future iOS software updates.
Data and analysis scripts for Study 2 can also be found on the Open Science Framework (https://osf.io/a4p78/). The survey received 263 respondents. However, this became 207 after removing those who did not have iOS12 installed, did not have an iPhone 5 or later, did not have 7 days of screen time data on their smartphone, or did not complete the survey or health questions. Another person was removed after being identified as an outlier when plotting data; they reported weight and BMI values more than three standard deviations from the mean. Finally, seven people were removed due to input errors (typos) in their health data. This left 199 participants for analysis. This was greater than the sample size derived from our preregistered a priori power analysis (192).
Table 3 shows the descriptive statistics for all variables. Average daily screen time and average daily pickups scores were computed per person by taking the daily amount of screen time/pickups from the first 6 days and then calculating the mean. Six rather than 7 days were used to compute this mean, as data from the seventh day did not represent a full day. Raw estimated numbers of daily pickups and estimated average daily screen time (in hours) were used in the analysis. Smartphone addiction, anxiety, stress, and depression scales were all scored in the same way as Study 1.
The daily physical activity variables; average daily steps and average daily “walking and running” distance (km) were created by selecting the 6 days of data which corresponded to the same 6 days aggregated in the smartphone variables. The daily activity statistics from these 6 days were then averaged for each measure. If a participant reported their daily “walking and running” distance in miles, this was converted to kilometers by multiplying the value by 1.60 before computing this average.
Finally, BMI was calculated per person. Imperial height and weight responses were converted to metric units (centimeters and kilograms respectively). Finally, BMI was calculated from these values using the following formula:
Following Study 1, to explore if differences in smartphone measurement influenced the size of the relationships with health, Spearman correlations were conducted between all the health and smartphone variables using Fieller et al. (1957) variance when calculating 95% CI (see Table 5; Figure 2). Spearman correlations were also conducted between all the smartphone measures to document differences between them (see Table 4). Alpha’s remain uncorrected for multiple comparisons.
Mirroring Study 1, smartphone addiction scores consistently had effect sizes that were at least .36 or larger when correlated with mental health variables. Estimates and objective variables were lower (all rs ≤ .21) (see Figure 3 or Table 5). This prompted an additional analysis that assessed whether this effect size deviation across measures was statistically significant. To compare differences in the magnitude between the coefficients, we adopted Hittner et al.’s (2003) modification of Dunn and Clark’s (1969) z test using the r package “cocor” (Diedenhofen & Musch, 2015). This is suitable for the comparison of coefficients that are calculated from two dependent groups and share a variable in common (Diedenhofen & Musch, 2015). For example, it was possible using this method to compare whether the relationship between smartphone addiction and anxiety (rs = .43) was statistically and significantly larger than the relationship between average daily screen time and anxiety (rs = .16). We also calculated Zou (2007) confidence intervals that reject the null hypothesis if the interval does not include zero (Diedenhofen & Musch, 2015; Zou, 2007). Findings showed that when assessing relationships with anxiety, depression, and stress that associations with smartphone addiction (PSU) were all significantly higher than the associations with estimates and objective logs (all p < .05) (see Table 6). The size of coefficients was not significantly different when using estimates or average daily screen time to determine associations with any mental health metric (all p > .05). However, there was a significant difference in effect sizes for mental health associations depending on whether an estimated or objective measure of pickups was used, with correlations running in the opposite direction (all p < .05) (see Table 6 and Figure 3).
Measuring “percentage variance explained” through the exploration of effect sizes has been the subject of some criticism, with some authors advocating that significant testing between groups is a better indicator of whether screen time impacts mental health (e.g., Twenge, 2019). Although this approach is in contradiction to many other statistical recommendations (Cumming, 2014), it was of interest to explore whether our conclusions would differ if we adopted this type of analysis. Consequently, as the GAD-7 and PHQ-9 have “cutoff points” (≥10) that indicate if people are at a risk of having a disorder, we used these to create two groups: “low risk” and “high risk.” These measures have high sensitivity and specificity (both > .80) when diagnosing depression and anxiety disorders (Kroenke et al., 2001, 2007). However, due to the lack of further psychological assessment, we considered those who exceeded the defined cutoff points for each disorder to be at a higher risk, rather than define an individual as having the disorder. We then examined if people experienced different levels of daily smartphone use and PSU dependent based on group allocation.
To create groups for the analysis, participants who were considered at “high risk” for both anxiety and depression were collated (n = 50). This group used their phone for an average of 4.72 hr a day (SD = 2.27) and picked up their phone on an average of 84.20 times a day (SD = 37.98). Those who did not exceed the cutoff values for either condition (scored less than 10 on both scales) were placed in a “low risk” group (n = 124). This group used their phone for an average of 4.41 hr a day (SD = 2.25) and picked up their phone on an average of 84.07 times a day (SD = 42.55). Wilcoxon rank sum tests showed that the two groups did not significantly differ in their amounts of average daily screen time (W = 3357, p = .39) or average daily pickups (W = 3216, p = .70). This was mirrored when exploring differences in estimated daily screen time (W = 3489.5, p = .19) and estimated daily pickups (W = 2721, p = .20). Therefore, those who were at “high risk” of having both general anxiety disorder and major depression did not use their smartphone’s differently to those who were at “low risk” for both conditions. However, a significant difference was found between the two groups on the levels of smartphone addiction (W = 4505.5, p < .001). Specifically, the “at risk” group had higher smartphone addiction scores (M = 116, SD = 23.67) than the “low risk” group (M = 98.91, SD = 21.91). Consequently, if smartphone use is measured with subjective estimates or objective logs, we find no difference between the “high risk” and “low risk” groups in terms of usage. However, if confounding usage and PSU, one would conclude the opposite if measuring “usage,” via the SAS, incorrectly positing that those with mental health symptomatology have higher usage.
Many researcher’s build predictive models to investigate if there is a linear or logarithmic relationship between health and smartphone usage (Csibi et al., 2018; David et al., 2018; Kim et al., 2016; Regan et al., 2020; Richardson et al., 2018). Following suit, we developed linear models that aimed to predict mental health symptomatology based on various smartphone variables. Notably, when including all five smartphone measures in models, only smartphone addiction scores significantly predicted mental health scores (see Table 7). Furthermore, models that only contained objective smartphone measures were not significant (all R2 ≤ .02, all p > .05). Finally, average daily pickups significantly predicted average daily steps and average daily “walking and running” distance across models (see Table 7).
This article considered whether different conceptualizations and measurements pertaining to “smartphone use,” can generate contrasting associations with health. Across two samples including iPhone (n = 199) and Android (n = 46) users, we observed that PSU scales produced larger associations with mental health when compared with subjective estimates or objective logs. Notably, the size of the relationship was fourfold in Study 1, and almost three times as large in Study 2. Specifically, rs ≤ .17 were repeatedly found between objective smartphone use (daily pickups and screen time) and mental health symptomatology (anxiety, depression, and stress), whereas larger effects were observed when relying on a problematic use scale (all rs ≥ .36). This was further supported with statistical models, which demonstrated that average daily pickups and average daily screen time did not significantly predict anxiety, depression, or stress, and explained less than 2% of the variance. In addition, those who exceeded clinical “cutoff points” for both general anxiety and major depressive disorder did not use their phone significantly more than those who scored below a standard threshold. Finally, in terms of physical health, although previous research has observed associations between higher smartphone addiction scores and lower muscle mass (Kim et al., 2015), our findings derived from objective logs are less clear-cut.
Generally speaking, conflating an individual’s appraisal of their smartphone use with actual usage generates vastly different relationships with well-being. This is problematic given a recent review confirmed that 70% of studies in this area adopt PSU scales (Thomée, 2018). The same review concluded that intense or frequent mobile use was associated with greater mental health symptomatology, yet this conclusion was based primarily on findings derived from PSU scales. Our findings alternatively suggest that helping people manage their appraisals of use (e.g., worries) is more likely to provide a benefit to well-being than reducing use of the device itself. Consequently, one might question whether reducing actual smartphone use should be a priority for any intervention development at this time.
Recent research has arrived at broadly similar conclusions. For example, “intense” general smartphone use did not predict negative well-being from objective logs (Katevas et al., 2018). Another study that measured objective smartphone screen time over a weeklong period observed that average daily depressive mood positively correlated with smartphone addiction scores, yet objective screen time minutes were not related to depression and anxiety (Rozgonjuk et al., 2018). In terms of studies that rely on duration estimates, large-scale designs that follow Open Science practices have also reported small effect sizes. In a large sample of New Zealand adults (n = 19,075), associations between social media use and well-being were weak (Stronge et al., 2019). When using specification curve analysis to examine self-reports from a large sample of adolescents (n = 3,55,358), the association between digital technology use and well-being was again found to be small, explaining only 0.4% of the variance (Orben & Przybylski, 2019). In our sample, objective screen time and pickups explain less than 2% of the variance in mental health.
Placing our findings in a broader context, the relationship between objective use and mental health (all rs ≤ .17) is lower than the average effect size found across many psychology studies (r = .21). In comparison, this is slightly less than the relationship between Nicotine patch (vs. placebo) and smoking abstinence (r = .18), and about the same size as the relationship between post-high school grades and job performance (r = .16) (Funder & Ozer, 2019; Meyer et al., 2001). When adjusting for new recommendations that “small,” “typical,” and “relatively large” effects fall around r coefficients of ∼.10, ∼.20, and ∼.30, respectively (Gignac & Szodorai, 2016), the suggestion that social media has, for example, destroyed our lives would warrant moderate to large effects (r > .20) (Appel et al., 2020, pp. 62). Using this benchmark, our findings show that general smartphone use does not have extreme or profound effects on well-being, contrary to repeated claims suggesting otherwise (e.g., Twenge, 2017). At the same time, the large effects of r ≥ .40 in psychology studies are likely to overestimate a genuine effect and, as a result, warrant additional skepticism (Funder & Ozer, 2019). For example, the relationship between anxiety and smartphone addiction in Study 2 was equivalent to the relationship between height and weight (both rs = .43).
Scores from PSU scales may generate larger associations with mental health for several reasons. First, one could argue that negative appraisals of smartphone use (or technology use more generally) are based around issues that pertain to the regulation of everyday behavior. Specifically, although people would like to perhaps regulate technology usage as they would with any other everyday behavior, this is not always possible and this discrepancy between actual and desired use can lead to negative or positive appraisals (O’Connor & Rosenblood, 1996; Stich et al., 2019). Second, both overall scores derived from the SAS and individual items have latent relationships with stress and depression scales (but not with objective smartphone measures) (Davidson et al., 2020). Hence, cross-loadings between PSU and mental health could artificially inflate relationships due to a lack of independence. Third, “method bias” may be influencing the size of correlation coefficients due to linguistic similarities between items across mental health and PSU scales (Podsakoff et al., 2012). Every question in the SAS (and the majority of related scales) assesses a perceived problem, echoing mental health scales (Kroenke et al., 2001; Kwon et al., 2013; Spitzer et al., 2006). However, negative wording alone could be a further source of bias. For example, it has been shown that correlations between role conflict, role ambiguity, and other constructs reduced by 238% when controlling for wording effects, by balancing the number of positively and negatively slanted questions (Harris & Bladen, 1994; Podsakoff et al., 2012).
It is beyond the scope of this article to discuss every issue pertaining to how technology use is conceptualized, measured, and analyzed. However, future research that aims to specifically consider the impact of smartphone use should, where possible, adopt a more nuanced approach to understand both the costs and benefits of specific smartphone applications that can be monitored remotely (Geyer et al., 2020). Recent work has shown that although total time spent using smartphones had r = .16 effect sizes with anxiety and depression (matching our work), certain categories of applications have beneficial relationships (e.g., time spent reading books) (David et al., 2018). Therefore, claiming general smartphone use as negative or positive oversimplifies a very complex and multifaceted phenomenon. For example, the relationships observed between BMI and objective smartphone use were incoherent across our two studies. However, there appears to be a positive relationship between physical activity and objectively measured pickups. These results further question whether all smartphone behaviors should be considered sedentary when deliberating the relationship between usage and physical activity. Arora et al. (2013) found that computer use, tv viewing, and video gaming were associated with increased BMI, but conversely, did not find the same for mobile phone use. They stated, the portable nature of a mobile telephone does not require the user to remain in one place during use, thus allowing movement (Arora et al., 2013, pp. 1258). In line with recent discussions, screen time is often conceptualized without acknowledging “exergaming” and other activities which involve physical activity while engaging with the device (Kaye et al., 2020). Therefore, given that objective measures of technology use and exercise can be recorded by the same device, or in conjunction with a wearable tracker, future research should consider associations between specific patterns of usage and physical activity in greater detail. A variety of ecological momentary assessments including measures of cognitive functioning, mood, or anxiety could extend these investigations further (Ellis, 2020).
We acknowledge that it remains difficult to objectively measure the use of a specific application across many devices (e.g., documenting time spent on Netflix across smartphones, televisions, and tablets) (Kaye et al., 2020), and researchers may still have to rely on estimates of use. However, our findings remain important as they confirm consistent discrepancies between objective logs and subjective estimates (see Table 4) (Andrews et al., 2015; Boase & Ling, 2013; Ellis et al., 2019; Kobayashi & Boase, 2012; Lee et al., 2017; Parslow et al., 2003; Vrijheid et al., 2006). In Study 2, and as observed previously, estimated frequency of “pickups” had greater deviation from its objective counterpart than screen time estimates (Andrews et al., 2015; Ellis et al., 2019). Thus, if subjective estimates are to be collected, it is advised that researchers start including this measurement error into statistical models, which we have now quantified (Ellis, 2020; van Smeden et al., 2019).
Both studies were cross-sectional; therefore, we cannot make any causal claims regarding the impact of smartphone use and mental health. However, by using a quasi-experimental approach in the exploratory analysis of Study 2 and through analyzing the naturally occurring levels of mental health symptomatology in our sample, our findings cast doubt on the presence of any causal relationships that have been proposed previously, as those in a high symptomatology group did not have increased general smartphone usage. It is further possible that participants may have received feedback from Apple Screen Time prior to the study, which would have influenced their estimation of use. The size of the relationship between estimated screen time and actual screen time is larger in Study 2 than previous work, and may explain why association between mental health and these two measures of usage did not significantly differ (Andrews et al., 2015; Ellis et al., 2019). However, this does not mitigate the need to control for errors between actual and self-reported screen time as part of any future analysis.
In addition, by moving our second study to an online platform, we achieved a larger and more representative sample. However, this meant losing some of the precision obtained with laboratory-based bioimpedance measures when examining physical health. Nonetheless, as BMI scores in Study 1 had large correlations with body fat percentage (rs = .70) and skeletal muscle mass percentage (rs = −.73), we accepted this as a relatively good proxy in Study 2. Furthermore, as self-reports of height and weight may also have measurement error, we analyzed the ranges of BMI values. Our sample in Study 2 specifically had BMI values that were in line with what might be expected in representative sample (WHO, 2018). However, future research would benefit by exploring how body composition (including body fat percentage) could be collected objectively when relying on remote data collection.
To conclude, choosing between measurement tools and accepting the benefits and limitations of that choice is an unavoidable facet of all research. However, when understanding or making claims regarding the effects of a particular behavior on health, the cost of any error can be considerable. Here, we demonstrate that PSU scales have significantly larger relationships with mental health when contrasted with objective logs of use. These are nearly thrice in a large sample and fourfold in a small sample. Thus, if a research question concerns technology usage, then objectivity should remain the preferred measure. The notion of “problematic use” requires stringent examination because it is frequently conflated with behavior despite a general acceptance that “excessive” smartphone usage does not necessarily equate to “problematic use” (Billieux, Philippot et al., 2015; Panova & Carbonell, 2018). Consequently, PSU scales may only capture people’s appraisals of their smartphone use, rather than an underlying pathology or behavior. Finally, our findings would favour addressing peoples’ appraisals about their usage rather than reducing their overall screen time, as the former relates more strongly to mental health symptomatology. Even if specific worries in relation to mobile technology are widespread, limiting general smartphone use or engaging with any form of “digital detox” is unlikely to have any demonstrable benefits and should not be a priority for public health interventions at this time (Wilcockson et al., 2019).
App usage logger worked in a similar way to activity logger, but, had an additional response when the screen was turned on and off. When the screen was turned on a function would repeat every 200 milliseconds. The function would query a database (UsageStats, 2019) generated by Android and independent of the application. This database stored a record of what applications were being used for the past two years on an Android device. The query would only question what application was running in the foreground for the past second. If this was the first time the function had ran since the screen was turned on or identified a different application from the previous time the function ran, then the name of the application would be documented in the internal memory along with a UNIX timestamp. However, repetitively running this function would be require extensive amounts of processing power and therefore to save battery the app would stop calling this function while the phone screen was off. App usage logger would also document meta-data including installed apps, the deletion and installation of applications across the week, and smartphone unlocking. Source code for both apps are available to download (Shaw, Ellis, & Ziegler, 2018).
The 60-item HEXACO was used to analyse a user’s personality across six domain level traits; honesty-humility, emotionality, extraversion, agreeableness, conscientiousness and openness-to-experience (see chapter 3 for a detailed description of each trait) (Ashton & Lee, 2009). Only single factors were examined rather than facet-level personality scores as these have previously shown to have higher predictive performance when predicting personality from phone logs (Stachl et al., 2017). For each trait, ten questions were answered concerning how much a participant agreed or disagreed with a statement about themselves. Answers were recorded on a five-point likert-style scale, whereby five = ‘strongly agree’ and one = ‘strongly disagree’. Reliability analysis showed that all trait scales had good internal reliability, apart from honestly-humility (a = 0.65) and openness-to-experience (a = .78) which fell below the .8 threshold.
Self-report questions were used to measure gender, date of birth, marital status, highest qualification and job sector. Gender was measured through the selection of six options on a multiple-choice question: ‘Female’, ‘Male’, ‘Transgender’, ‘Gender Variant/Non-Conforming’, ‘Not Listed (please specify)’, and ‘Prefer Not to Say’. Participants selected one of five options for marital status: ‘Single’, ‘Married/Civil Partnership’, ‘Widowed’, ‘Divorced’, and ‘Separated’. Job sectors were taken from the UK Prospects websites and included 24 categories ranging from accounting to transport and logistics (Prospects, 2019). Highest qualification was measured through a multiple-choice question containing each of the eight UK levels with examples of what qualifications constitute to each level (UK-Government, 2019).
The MacArthur Ladder of Subjective Social Status was used to measure a person’s perceived social economic status in society (Adler, Epel, Castellazzo, & Ickovics, 2000). In this scale, participants viewed a picture of a ten-step ladder. The instructions were: “There are 10 steps on this ladder. At the top of the ladder are the people who are the best off, those who have the most money, most education, and best jobs. At the bottom are the people who are the worst off, those who have the least money, least education, and worst jobs or no job. On the multiple-choice options below, please select a step which best represents where you stand on this ladder”. Like chapter three, this was used instead of collecting details about pay or household income.
Happiness was measured using the Oxford Happiness Questionnaire (Hills & Argyle, 2002). Participants were asked to indicate how much they agreed or disagreed with 29 statements about happiness on a six-point Likert scale, whereby 1 = “Strongly Disagree” and 6 = “Strongly Agree”. Example questions included “I laugh a lot” and “I often experience joy and elation”. Higher scores indicated greater amounts of happiness.
The UCLA loneliness scale (Russell, Peplau, & Cutrona, 1980) was used to measure lonesomeness and consisted of 20 questions such as “I feel left out” and “There are people I can turn to”. Participants responded how often they felt a particular way on a 4-point Likert scale, whereby 1 = “Never” and 4 = “Often”. Higher scores indicated greater levels of loneliness.
The Personal Wellbeing Index included eight questions which indicated how satisfied a person was with several life factors, such as: standard of living, health, life achievement, personal relationships, safety, community, future security, and religion (International Wellbeing Group, 2013). Questions were asked on a 11-point Likert scale whereby 0 = “no satisfaction at all” and 10 = “completely satisfied”. Higher total scores indicated higher overall wellbeing.
The PHQ-15 (Kroenke, Spitzer, & Williams, 2002) consisted of 15 questions relating to somatic symptoms such as stomach pain, back pain, menstrual cramps, and headaches. Participants were asked “Over the last two weeks, how much have you been bothered by any of the following problems” and responded on a three-point scale whereby 0 = “not bothered at all’ and 2 = “bothered a lot’. Scores of 5, 10 and 15 are cut of points for low, medium and high somatic symptom severity.
Ocular symptoms were measured by adapting the PHQ-15 to seven new ocular symptoms including vision blurring, eye redness, visual disturbance, secretion, inflammation, lacrimation, and eye dryness. Similarly, musculoskeletal symptoms were measured by adapting the PHQ-15 to six new symptoms including pain in the neck, spine, shoulders, hands, wrists, and fingers.
The Pittsburgh Sleep Quality Index measured sleep quality and disturbances by reflecting over the past month (Buysee et al., 1988). 19 items were used to generate several component scores such as: subjective sleep quality, sleep latency, sleep duration, habitual sleep efficiency, sleep disturbances, use of sleeping medication, and daytime dis-functioning. These were used to calculate a global score, which has shown to have a diagnostic sensitivity of 89.6% and a specificity of 86.5% when distinguishing good to poor sleepers (Buysee et al., 1988). Higher scores indicate lower sleep quality.
Copyright © the Author(s) 2020
Received June 23, 2020
Revision received September 17, 2020
Accepted September 18, 2020