Skip to main content
SearchLoginLogin or Signup

Using Browser Data to Understand Desires to Spend Time Online

Volume 4, Issue 1. DOI: 10.1037/tmb0000095

Published onMar 15, 2023
Using Browser Data to Understand Desires to Spend Time Online
·

Abstract

There is growing recognition that many people feel the need to regulate their use of the internet and other digital technologies to support their well-being. In this study, we used Mozilla Firefox browser telemetry to investigate the role played by various usage factors in desires to regulate time spent online. In particular, we investigated how six metrics pertaining to time spent on the internet, and the diversity and intensity of use, predict participants’ (n = 8,094) desires to spend more or less time online. Across all six metrics, we did not find evidence for a relationship between browser usage metrics and participants wanting to spend more or less time online. This finding was robust across various analytical pathways. The study highlights a number of considerations and concerns that need to be addressed in future industry–academia collaborations that draw on trace data or usage telemetry.

Keywords: browser telemetry, log data, digital well-being, Mozilla Firefox, trace data

Acknowledgments: The authors acknowledge the Mozilla Corporation for its support of the initial phases of this research and the provision of the data for the study.

Funding: Amy Orben is supported by the U.K. Medical Research Council (SWAG/076.G101400), the U.K. Economic and Social Research Council (ES/T008709/1), and a Research Fellowship from Emmanuel College, University of Cambridge. Craig J. R. Sewall is supported by the National Institute of Mental Health (T32MH018951).

Disclosures: The authors declare that there is no conflict of interest.

Author Contributions: Jesse D. McCrosky contributed to data curation and investigation. Jesse D. McCrosky and Douglas A. Parry contributed to formal analysis. Jesse D. McCrosky, Douglas A. Parry, Craig J. R. Sewall, and Amy Orben contributed to methodology, writing—original draft, and writing—review & editing. Jesse D. McCrosky, Douglas A. Parry, and Amy Orben contributed to project administration.

Data Availability: The data collection materials and analysis code are available on the Open Science Framework (https://osf.io/p3j7v/). Requests to access the synthetic data can be made via https://www.mrc-cbu.cam.ac.uk/ bibliography/opendata/request/8782/. The study was preregistered prior to data analysis (https://osf.io/6wduk/).

Open Science Disclosures:

The data for protected access are available at https://www.mrccbu. cam.ac.uk/bibliography/opendata/request/8782/

The experimental materials are available at https://osf.io/p3j7v/

The pregistration (data exist) is available at https://osf.io/6wduk/

Correspondence concerning this article should be addressed toDouglas A. Parry, Department of Information Science, Stellenbosch University, Arts and Social Sciences Building Room 438, Stellenbosch University Private Bag X1, Matieland, Stellenbosch 7602, South Africa. Email: [email protected]


Spurred by increasing media attention, the past decade has seen considerable debate among academics, policymakers, and the general public about how digital technologies impact our society and our well-being (Orben & Przybylski, 2020; Parks, 2021; Twenge et al., 2020). Some have expressed concerns that high levels of digital technology use are causing addiction (Allcott et al., 2021; Rogers, 2021), decreases in cognitive function (Gazzaley, 2018), an oppressing attention economy (Harris, 2016), or decreases in well-being (Meier & Reinecke, 2021; Twenge, 2018). From one perspective, such concern seems the natural response to new technologies changing our everyday behaviors (Orben, 2020b). Similar apprehension has followed when other technologies such as the radio or video games gained widespread popularity. Empirical evidence, however, does not support most concerns about the negative effects associated with digital technologies (Dienlin & Johannes, 2020; Meier & Reinecke, 2021; Orben, 2020a; Valkenburg et al., 2022).

Recent qualitative research has drawn attention to individuals who have expressed desires to regulate their digital technology use (Grady et al., 2022; Marder et al., 2016; Orhan et al., 2021; Reinecke et al., 2022) and, for some, disconnect from modern digital technologies like the internet or social media (Chib et al., 2021; Jorge, 2019; Nguyen, 2021; Portwood-Stacer, 2013). Considering this, a growing body of research now focuses on the practices, motivations for, and effects of deliberate reduced or nonuse of digital technologies (Hardey & Atkinson, 2018; Kuntsman & Miyake, 2019; Lomborg & Ytre-Arne, 2021; Natale & Treré, 2020). Such studies highlight various perceived negative effects associated with digital technology use (e.g., addiction, stress, anxiety). They also describe how these perceptions motivate attempts to regulate digital technology usage to improve users’ well-being and avoid unwanted outcomes (e.g., privacy violations, distraction, anxiety).

Disconnection from digital technologies is rarely total. As ever more aspects of life become digitized—education, entertainment, health care, commerce, and communication—complete disconnection becomes increasingly difficult to achieve (Hesselberth, 2018; Kuntsman & Miyake, 2019). Rather, like many other behaviors requiring self-regulation (Duckworth et al., 2019; Inzlicht et al., 2021), people are tasked with regulating their behavior to align their digital technology use with their general goals and desires, to ultimately minimize any negative effects that they may perceive to be associated with use. Recent work has drawn attention to the ambivalent and, at times, contradictory feelings people experience about their digital technology use—that is, viewing time spent on digital technology as simultaneously rewarding and wasteful (Ytre-Arne et al., 2020). On the one hand, users are aware that digital technologies are sources of social interaction, entertainment, self-expression, and information, among many other uses and gratifications. On the other hand, users are also cognizant of the ways in which digital technologies can interfere with other goals or responsibilities (Nguyen, 2021; Parry et al., 2020). In support of achieving a balance between the perceived harms and the perceived benefits of digital technology use, self-regulation may be achieved through various preventive or interventive strategies (Duckworth et al., 2016). Such steps, however, first require an individual to be aware of their own technology usage and the effects, if any, that this may have on other desired goals or experiences (Parry et al., 2020).

Against this backdrop, scholars have developed frameworks to understand the balance between the benefits and drawbacks of digital technologies and have focused on the dualities inherent in the complex array of uses and effects they enable (e.g., see Büchi, 2020; Vanden Abeele, 2021). Acknowledging that the use of digital technologies can be both beneficial and detrimental, Vanden Abeele (2021) proposes the concept of “digital well-being” to focus on users’ subjective evaluations of the optimal balance between the perceived negative and perceived positive effects associated with digital technology use. According to Vanden Abeele (2021, p. 13) “people achieve digital wellbeing when experiencing maximal controlled pleasure and functional support, together with minimal loss of control and functional impairment.” In line with this interpretation, individuals may seek to regulate or reduce their usage of a digital technology when they experience an imbalance between the perceived benefits and the perceived negative effects associated with its use.

To inform such theoretical approaches, in addition to focusing on the motivations and strategies that people use to regulate their digital technology usage (Chib et al., 2021; Nguyen, 2021; Parry et al., 2020), researchers have examined empirical evidence for positive or negative links between digital technology use and well-being (Dickson et al., 2018; Meier & Reinecke, 2021). Progress in this regard has, however, been stymied by the absence of clear results and by debate over the meaning of study findings. While some argue that divergent results could be due to influential individual differences (Beyens et al., 2020; Valkenburg, Beyens, et al., 2021), others argue that extant research has suffered from substantial measurement issues (Ellis, 2019; Kaye et al., 2020; Parry et al., 2021; Sewall et al., 2020). Specifically, most research on the uses and effects of digital technologies has relied on self-reported estimates of the time spent on, for instance, social media or the internet. However, due to several well-established cognitive and perceptual limitations (Schwarz & Oyserman, 2001; Tourangeau, 1984), users have trouble providing accurate estimates of both their media use in general (Parry et al., 2021) and their internet use specifically (Araujo et al., 2017; Festic et al., 2021; Scharkow, 2016).

In addition to concerns about the accuracy and ecological validity of self-report measures, researchers have also expressed concerns about the validity of the “screen time” construct—the aggregate total usage time on a platform, device, or application—that large proportions of the literature focus on (Kaye et al., 2020; Orben, 2020a; Parry et al., 2022; Valkenburg et al., 2022). Extant research tends to treat digital technology use as a uniform construct and simply focuses on the aggregate amount of time spent with digital technologies (Granic et al., 2020; Griffioen et al., 2020; Kaye et al., 2020). This approach neglects the many uses and gratifications that digital technologies enable and ignores the important role played by various subjective, contextual, and content-specific aspects of usage (Parry et al., 2022). It is likely that such aggregate usage measures do not hold sufficient nuance to account for potential relations with well-being or desires to regulate time spent online.

The focus on screen time, coupled with a reliance on self-reported usage measures collected via surveys, has likely contributed to the slow development of suitable theories accounting for effects potentially associated with digital technologies and desires to regulate time spent online (Orben, 2020b). These foci may have also precluded the consideration of alternative measures of digital technology usage. One potential way to kickstart a new phase of digital technology research is the collaboration and sharing of data and expertise between software companies and researchers. While challenging (e.g., Johannes, Vuorre, & Przybylski, 2021; Przybylski et al., 2021), such collaborations are key for improving measurement accuracy and diversifying the measures of digital technology use available to researchers (Orben et al., 2020).

Digital trace data (i.e., data that are produced and logged as a byproduct of digital technology use; Freelon, 2014) or telemetry (i.e., the collection of in situ data for transmission to a receiver) collected by software companies can provide behavioral information about the time spent with digital technologies with a greater level of ecological validity compared to retrospective self-reports or simulation studies in laboratory environments. Such data may also enhance researchers’ abilities to capture the diversity and intensity of use and move away from crude time-based measures. Diversity of use provides a proxy for the amount and variety of material accessed and may be approximated by, for instance, the number of unique domains an individual accesses. Similarly, the intensity of use—which indicates the amount of material accessed in a given time period—may be measured by the number of Uniform Resource Identifiers (URIs) accessed per active hour in a specific period. Metrics like these that are computed on the basis of in situ use of digital technologies may prove useful for understanding digital well-being and intentions to regulate digital technology usage as they provide insight into the nature of use and may relate to an individual’s sense that their time has been “well-spent” and that their behavior aligns with some self-defined ideal or, in the case of usage diversity, a sense of cognitive overload and fatigue (Fu et al., 2020; Lin et al., 2020; Pelet et al., 2017).

The Present Study

Given the aforementioned measurement and conceptual issues, to advance our understanding of digital well-being in general and, in particular, to study the factors that may be associated with desires to self-regulate time spent online with an ecologically valid approach, the present study draws on data collected directly from real-world, in situ browser usage sessions and focuses on a range of usage metrics beyond simply the duration of use. Specifically, we use such telemetry and survey data collected from a large sample in collaboration with the Mozilla Corporation to address the following research question:How does wanting to spend time online correlate with different quantifications of internet use devised from telemetry data?

Method

To address our research question, we used a mixed-method data set collected from users of the Firefox browser. This data set includes survey data about perceptions of time spent online, internet usage, and basic demographics, as well as linked in situ telemetry that allows for the computation of various browser usage metrics.

To enable others to critically examine our work, while constrained by the proprietary nature of our data set, we aimed to be as transparent as possible in our workflow. To guard against specification searching and distinguish between a priori and post hoc analytical decisions, we specified a preanalysis plan and registered it on the Open Science Framework (OSF) prior to data analysis (https://osf.io/6wduk/). At the time of registration, JDMC was familiar with the data so, for this reason, was not involved in the analytical choices made, and only provided guidance about the nature of the telemetry. AO had some exposure to the data (descriptive statistics) but not had access to the data itself. DP and CS did not have access to any data, descriptive or inferential results, and were only provided with details of the study design. In addition to registering our analysis plan, while we are unable to share the original raw data for legal and ethical reasons, we have produced a synthetic data set using the SynthPop package (Nowok et al., 2016; Reiter, 2005) that mimics the original data set while protecting the privacy of participants. Synthetic data preserve the statistical properties of variables and the relationships between variables, but no record represents a real participant (Quintana, 2020).1 The data collection materials and analysis code are available on the OSF (https://osf.io/p3j7v/); requests to access the synthetic data can be made via https://www.mrc-cbu.cam.ac.uk/bibliography/opendata/request/8782/.

Participants and Procedure

In September 2019, the Mozilla Corporation sent in-browser invitations to a randomly selected sample of users2 asking them to complete a survey hosted on SurveyGizmo. To be eligible to receive an invite, users had to use the Firefox web browser and have telemetry turned on. The survey included an identifier that, with the approval of the trust and legal departments at the Mozilla Corporation, enabled the survey responses to be linked to participants’ browser telemetry. The invitation received 20,042 responses from 19,961 unique users and, after deduplication, removal of responses for which telemetry-linkage was impossible, or usage metrics could not be computed, resulted in an initial sample of n = 15,311 participants with matched browser telemetry.

From this sample, following our preanalysis plan, we excluded those who indicated that Firefox was not their primary browser (n = 2,680), or that their mobile internet usage is greater than their desktop usage (n = 3,724). This resulted in an eligible sample of n = 8,907. Upon receiving the data, two further unplanned exclusion steps were required to produce the final sample. To calculate 7-day usage metrics, we needed to remove participants who created their profiles less than 7 days before data collection (n = 334), as well as those with implausible activity metrics3 (n = 479). These procedures resulted in a final sample of n = 8,094.

Measures

Telemetry Data

The Firefox Telemetry system (Mozilla, 2017) provides nonpersonal browser performance and usage data to Mozilla through a “ping” sent approximately daily. Using these data, the following metrics are recorded: (a) the number of active days in the last week (i.e., days that the Firefox installation was running and connected to the internet; Mozilla, 2021b); (b) the number of active hours in the last week (i.e., an aggregate of the number of 5-s “ticks” in which the browser received keyboard or mouse activity; Mozilla, 2021a); (c) the number of URIs4 loaded in the last week (this metric can be interpreted as a proxy for the number of webpages loaded); and (d) the number of unique domains loaded in the last week (this metric can be interpreted as a proxy for the number of specific hosts visited, e.g., https://www.Facebook.com, https://www.Google.com, https://www.Wikipedia.org/).5 In addition to these four raw metrics, two additional derived variables were computed for each participant: (a) the number of URIs per active hour loaded in the last week and (b) the number of unique domains per active hour loaded in the last week.

Survey Data

Three versions of the survey were designed (see the materials on the OSF for the full survey), with each sent to approximately one third of the participants. In the first version (completed by 2,702 participants in the final sample), to assess participants’ feelings about their internet use, a single item was used to elicit their desire to spend more time online (“If I could, I would spend more time online”). In the second version of the survey (completed by 2,670 participants in the final sample), a single item was used to elicit participants’ desires to spend less time online (“If I could, I would spend less time online”). In the third version of the survey (completed by 2,722 participants in the final sample), both versions of the item were presented. In all three versions, responses were provided through a Likert scale ranging from 1 = strongly disagree to 5 = strongly agree. Other than these differences, the remainder of the items were identical across the three survey versions.

Participants were asked to indicate from a list of options which browser was their primary browser on nonmobile devices. Two optional demographic items were used to elicit participants’ age group and geographic region (i.e., continent). Finally, participants used a slider to indicate the proportion of their time online spent on a desktop device versus on a mobile device. In addition to these items, the survey assessed several measures not included in any of the preregistered analyses reported in this study (see the Supplemental Methods, for further information on these additional measures).

Analytic Approach

In addition to the application of the eligibility criteria and the removal of implausible usage data previously described, our analysis began by producing three calculated variables. The first two—URIs per active hour and unique domains per active hour—were calculated by dividing the respective variables by the total number of active browser usage hours in the observation period for each participant. Next, we used the responses for the items concerning desires to either spend more or less time online to calculate an overall variable that represented their “internet usage desire.” Responses to the initial items were negatively correlated (rs = −0.55), and thus, following our preregistered protocol, we considered them to be measuring the same construct but on opposite scales.6 Therefore, to produce an overall variable for internet usage desire, we created an integer-scaled variable for each ordinal measure and reverse-coded the “desire to spend less time online” item. For the “desire to spend more time online” question, we assigned 0 = strongly disagree and 4 = strongly agree, and so forth. For the “desire to spend less time online” question, we assigned 4 = strongly disagree and 0 = strongly agree, and so forth.

For those who received both items, we used the mean of these values, while responses for those who only received a single item were added as-is after reverse coding. We considered the resulting value as an ordinal variable with nine possible levels: 0, 0.5, 1, …, 3.5, and 4, with higher values indicating greater agreement with the desire to spend more time online. This variable formed the basis of our primary preregistered analysis. However, as an additional sensitivity analysis, we also analyzed each original response direction (i.e., “desire to spend more time online” and “desire to spend less time online”) separately.

We used ordinal logistic regression to examine the relationships between internet usage metrics and internet usage desire. Specifically, we regressed internet usage desire on each of the six telemetry variables (i.e., the number of active days, the number of active hours, the URI count, the domain count, URIs per hour, and domains per hour) in separate models while controlling for age category and region of residence. Due to very low rates of missing data for the demographic variables (>97% of participants provided data on their region of residence and age), we did not perform any imputation for missing data as originally planned. To account for multiple testing, we used Bonferroni-adjusted confidence intervals.7

Following the primary analysis, we conducted several sensitivity analyses to determine the robustness of our results. Three of these were preregistered and a fourth was conducted as a post hoc exploratory analysis. The first preregistered sensitivity analysis accounted for the decision to remove participants who indicated that their mobile internet usage is greater than their desktop usage (n = 3,724). For this analysis, the six ordinal logistic regression models were recalculated with the full sample (n = 11,476). The second preregistered sensitivity analysis involved analyzing the three versions of the survey separately (i.e., separate models were produced for those who received either the “desire to spend less time online,” item the “spend more time online,” item or both items). Our third preregistered sensitivity analysis involved analyzing the internet usage desire variables as two separate, directional variables. Notably, in our preanalysis plan, we specified that if the correlation between these two variables was greater than or equal to abs(rs) = 0.5, we would combine them (using reverse scoring if necessary). However, while the threshold was passed (rs = −0.55), given the proximity of the correlation coefficient to the threshold, and the relatively low bound we specified, we ran a sensitivity analysis considering these as two distinct variables: “desire to spend less time online” and “desire to spend more time online.” Our final sensitivity analysis was not preregistered. Rather, given the skewness of the usage metrics produced from the telemetry data, and the presence of outliers that may indicate nonhuman usage (see Table 1), we conducted a further exploratory analysis in which these outliers were removed using the median absolute deviation method (MAD).8

Table 1

Descriptive Statistics for Key Study Variables

Variable

M (SD)

Min

Max

Skewness

Kurtosis

Active days

5.21 (1.94)

1.00

7.00

−0.79

−0.66

Active hours

8.57 (8.95)

0.01

156.30

3.08

23.25

URI count

1564.83 (2144.78)

1.00

62529.00

6.96

114.83

Domain count

68.40 (59.76)

1.00

700.00

1.96

7.31

URI per active hour

196.76 (208.51)

0.05

9577.99

25.21

947.73

Domains per active hour

11.80 (9.20)

0.04

144.00

3.66

25.49

Internet usage desire

1.87 (0.95)

0.00

4.00

0.18

−0.20

Note. URI = Uniform Resource Identifier. Active hours = the average active hours per day.

Results

The final sample included 8,094 participants and represented individuals who reported residence in seven continents. Of the participants who provided data on their region of residence, a majority indicated that they reside in either Europe (49.10%) or North America (33.51%), with the remaining 17.39% residing in other regions. In terms of age group, for those participants who provided data on their age, the largest group in the sample included those aged between 25 and 29 (13.82%), followed by those aged 65 or greater (12.15%). The majority of participants were aged below 40. The distributions for these two demographic covariates are provided in the Supplemental Materials. Descriptive statistics for the remaining study variables are presented in Table 1. Notably, several variables, particularly URI count and URIs per hour, are extremely right-skewed. The maximum values of these variables are illustrative, suggesting that at least one respondent visited more than 60,000 URIs in a week and another visited an average of almost 10,000 URIs per hour of active browser use. It is possible that these extreme values represent automated use (however, all participants completed the survey in-browser, suggesting at least partial organic use), or other browsing patterns that represent a distinct population from most of our participants. After presenting our preregistered analyses, we address these possibilities in a sensitivity analysis.

Figure 1 summarizes the responses to the two questions regarding desires to spend more or less time online. Across both versions of the survey, the most likely response was “neither agree nor disagree,” and slightly more participants indicated that they wanted to spend less time online rather than more time online. Histograms for the six independent variables are available in the Supplemental Materials. Table 2 provides a zero-order bivariate correlation matrix for the main study variables. All six of the use metrics are mutually correlated, with the raw metrics indicating larger effect sizes than for the two calculated metrics (domains per active hour and URIs per active hour). While the two “internet usage desire” items were negatively related to each other with a moderate effect size (rs = −0.55, p < .001), the magnitude of associations for these variables with all six-usage metrics was small and, in some cases, not statistically significant.

Figure 1

Distribution of Responses for the Questions About Spending More or Less Time Online

Table 2

Correlation Matrix Depicting Spearman Correlation Coefficients for Key Study Variables

Variable

1

2

3

4

5

6

7

8

1. Active days

2. Active hours

0.68***

3. URI count

0.61***

0.87***

4. Domain count

0.63***

0.80***

0.85***

5. URIs per active hour

−0.06***

−0.12***

0.32***

0.16***

6. Domains per active hour

−0.30***

−0.57***

−0.34***

−0.04***

0.41***

7. Age group

−0.03*

−0.04***

−0.10***

−0.07***

−0.15***

−0.03**

8. More time online

0.06***

0.02

−0.02

−0.05***

−0.07***

−0.09***

−0.08

9. Less time online

−0.06***

−0.02

0.00

0.00

0.03*

0.03*

−0.02

−0.55***

Note. URI = Uniform Resource Identifier.
* p < .05. ** p < .01. *** p < .001.

As described in our preanalysis plan, for each of our independent variables, we fitted an ordinal logistic regression model with age group and region included as covariates. Figure 2 depicts the outcomes of these models for each standardized independent variable with a Bonferroni-adjusted confidence interval (indicated in bold red).9 In Table 3, we report the specific effect sizes and Bonferroni-adjusted confidence intervals for each variable for the five modeling scenarios. Based on Figure 2 and the values reported in Table 3, it is evident that, for all six variables, effect sizes are very small and, for most variables, the confidence intervals include zero.

Figure 2

Results of the Separate Ordinal Logistic Regression Models for Each of the Six Independent Variables for the Preregistered and Sensitivity Analyses
Note. CI = confidence interval; URI = Uniform Resource Identifier; IV = independent variable.

Table 3

Effect Sizes and Bonferroni-Adjusted Confidence Intervals for Each Variable in Each Modeling Scenario

Variable

Preregistered model

Including mobile users

Excluding outliers

Survey with more time item

Survey with less time item

Days active

0.1 [0.06, 0.14]

0.09 [0.06, 0.13]

0.10 [0.06, 0.14]

0.06 [−0.01, 0.13]

0.12 [0.05, 0.19]

Hours active

0.05 [0.01, 0.09]

0.02 [−0.02, 0.06]

0.02 [−0.05, 0.09]

0.08 [0.01, 0.15]

URI count

0.01 [−0.03, 0.05]

0.01 [−0.03, 0.04]

−0.02 [−0.06, 0.02]

−0.02 [−0.09, 0.05]

0.07 [0.00, 0.14]

Domain count

−0.01 [−0.05, 0.04]

0.01 [−0.02, 0.05]

−0.02 [−0.06, 0.02]

−0.05 [−0.12, 0.02]

0.08 [0.01, 0.15]

URIs per active hour

−0.02 [−0.06, 0.02]

−0.03 [−0.06, 0.00]

−0.06 [−0.10, −0.01]

−0.01 [−0.08, 0.06]

0.01 [−0.06, 0.07]

Domains per active hour

−0.05 [−0.09, −0.01]

−0.04 [−0.08, −0.01]

−0.06 [−0.10, −0.02]

−0.04 [−0.11, 0.03]

0.02 [−0.05, 0.10]

Note. CI = confidence interval; URI = Uniform Resource Identifier.

Figure 2 also depicts the outcomes for the sensitivity analyses including mobile users, excluding outliers, and analyzing the three different versions of the survey separately. Results for these analyses are comparable to the preregistered models, with all effect sizes falling in a similar range. Notably, while the confidence intervals were comparable to the preregistered models, and still tended to include zero, effects were slightly larger and always positive for those who only received the “less time” question (reverse coded and depicted in light blue in Figure 2).

In our final sensitivity analysis, we considered the original response directions separately. As depicted in Figure 3, the pattern of results was comparable with the original prespecified models, with generally small effect sizes and confidence intervals including zero (except for days active). Notably, effects were in the opposite direction for the “less time” outcome due to the reverse coding in the prespecified model. While the direction of effects differed between outcomes, the same pattern of statistical significance was found. Except for days active, the confidence intervals for all of the remaining independent variables included zero.

Figure 3

Results of the Separate Ordinal Logistic Regression Models for the Sensitivity Analysis Comparing the “More Time Online” Outcome With the “Less Time Online” Outcome
Note. CI = confidence interval; URI = Uniform Resource Identifier; IV = independent variable.

Discussion

There is growing recognition that many people feel the need to modulate their use of the internet and other digital technologies in support of their well-being (Grady et al., 2022; Hardey & Atkinson, 2018; Kuntsman & Miyake, 2019; Lomborg & Ytre-Arne, 2021; Natale & Treré, 2020; Nguyen, 2021; Parry et al., 2020). In this preregistered study, we aimed to advance our understanding of digital well-being in general and, in particular, to study whether browser usage metrics collected by industry partners are associated with desires to regulate time spent online.

To address this aim, we used Mozilla Firefox browser telemetry to investigate how six metrics pertaining to time spent on the internet and the diversity and intensity of use predict participants’ desires to spend more or less time online. We found that the associations between browser usage and participants’ desire to spend more or less time online were very small, and, except for active days, we could not reject the null hypothesis of no association. As evidenced by the precision of the confidence intervals around the effect sizes (i.e., tightly bound around the point estimate), our study was well-powered to detect very small effect sizes. Our findings were robust across several different analytical pathways. Browser usage behavior, as indicated by our six metrics, did not appear to be useful for understanding people’s desires to spend more or less time online.

Moving beyond high-level browser usage metrics, future research into desires to regulate internet usage should consider the specific content users engage with, the purposes for which they use the internet, and the contexts in which their use occurs, in addition to various individual differences like the user’s occupation, experience with the internet, or mental health in general. Additionally, as peoples’ mindsets toward digital technology usage (e.g., whether they think such usage is “good” or “bad”) may relate to how they behave and desires to regulate this behavior (Ernala et al., 2022), there is a need to understand these mindsets and the endogenous (e.g., personal reflections on past experiences online) and exogenous (e.g., media narratives) factors that drive them. Importantly, our measures for desires to spend more or less time online did not attempt to assess any antecedents of these desires. It is likely that various negative perceptions of digital technology use (e.g., distraction, addiction, or other forms of problematic outcomes) or perceptions about one’s time allocation (e.g., a lack of discretionary time) account for some of the responses to the survey. In future work investigating the drivers of people’s mindsets toward digital technology use, in addition to further validation of our single-item measures, these elements should be considered.

This study addressed concerns about the use of self-report questionnaires to quantify the amount of time spent online in research on digital technology uses and effects and digital well-being (Parry et al., 2021). We know that these measures are not accurate, as people cannot reliably estimate the amount of time that they spend using various digital functions (Parry et al., 2021), and while there remain gaps in our understanding of measurement error (Johannes, Nguyen, et al., 2021), it is likely that the degree of (in)accuracy depends on individual differences that are often fundamental to the effects under investigation (Sewall et al., 2020; Sewall & Parry, 2021; Shaw et al., 2020). Additionally, many have expressed their dissatisfaction with the reliance on time-based metrics like the overall amount of time an individual uses digital media (i.e., “screen time”). Although we were limited in the extent to which we could delve into the content of participants’ browser usage, our collaboration involving scientific and industry partners enabled us to, first, use data recorded directly from browser usage sessions and, second, extend the range of ways to measure how people use their browsers. This enabled us to investigate effects associated with actual browser usage intensity and diversity alongside other time-based metrics. Notwithstanding these contributions, while we used objective measures for browser usage, our dependent variables—desires to spend more or less time online—were assessed with single-item measures. This is a limitation and future work is needed to investigate the validity of these measures.

In addition to addressing the primary research question, the findings provide an indication of the proportion of individuals who feel the need to self-regulate their internet use. Using a relatively large, albeit nonrepresentative, sample of Firefox users, the results show that most participants did not indicate strong desires to spend either more or less time online. Only a small proportion of participants indicated strong agreement with either the desire to spend less time online or the desire to spend more time online. This finding suggests that most people are generally satisfied with, or indifferent toward, the amount of time that they spend online and, therefore, that they do not feel a need to regulate this behavior in pursuit of some other goal or desired state. Alternatively, the findings may also be reflective of successful self-regulation—people are satisfied with the current state of their behavior and may already be using various preventive or interventive strategies to achieve this state. Either way, given that our sample was relatively young and included users of only a single web browser, further research is needed to determine the extent to which these findings generalize to older populations and users of other web browsers.

Irrespective of the mechanism, the findings suggest that, just as digital technology effects are likely to be highly heterogeneous, with only a small proportion of users experiencing either positive or negative outcomes (Beyens et al., 2020; Valkenburg, Beyens, et al., 2021), desires to regulate digital technology usage follow a similar pattern, with most users satisfied or indifferent to the amount of time they spend online and only a small proportion wanting change. However, to extend our correlational, between-person findings and further understand how aspects of browsing behavior relate to desires to regulate digital technology use, there is a need to investigate intraindividual consistency in browsing behaviors (e.g., via intensive longitudinal studies or repertoire approaches; Horvát & Hargittai, 2021; Parry & Sewall, 2021; Valkenburg, Pouwels, et al., 2021) and determine how the variability in desires to regulate behavior relates, first, to the consistency of browsing behavior and, second, to particular “types” of browsing. Such work will not only further our understanding of digital well-being but, more generally, enable us to learn about the stability of everyday behaviors (e.g., O’Connor & Rosenblood, 1996) and the frames with which individuals view these behaviors (Hofmann et al., 2012).

Challenges Associated With the Use of Telemetry

The study also highlights several challenges inherent to using telemetry or trace data to draw inferences with user desires, characteristics, outcomes, or perspectives. New ways of measurement provide new opportunities for understanding human behavior, but they also present new challenges and require careful calibration for these opportunities to be realized.

The first challenge concerns the distinction between “readymade” data and “custommade” data. Salganik (2019) draws on the “readymades,” a series of artworks by the French artist Marcel Duchamp, to distinguish between data that were produced specifically for research purposes in a preplanned manner (i.e., “custommade” data), and data that were generated for one purpose which are then repurposed to address a research objective for which they were not originally generated (i.e., “readymade” data). This distinction highlights the fact that repurposed, “readymade” data may not always fit the characteristics needed for a study, and that, while such data can address research questions in ways “custommade” data cannot, one needs to be aware of, and account for, the data generating processes inherent to the source.

Behavioral data on digital technology use, whether acquired through industry–academia collaborations (as in this study), custom trackers, or third-party application programming interfaces or trackers, will invariably draw on telemetry systems not originally produced for research purposes. Not only does this mean that such data sources may be biased by various individual and technological factors (Jürgens et al., 2020; Parry et al., 2021; Scharkow, 2016), but it also implies that there may be a mismatch between the existing data and the behavior that researchers want to measure based on a theoretical model (i.e., usage content, diversity, or intensity for specific applications or services). In a related manner, analyses may be driven by the data that are available rather than the data that are truly needed to address a research question. For instance, in this study, we were limited in that we could only access and study desktop browser usage. It is likely that a substantial proportion of browser usage takes place on a smartphone or via dedicated applications. Such behavior was out of reach to us. Assessment of digital technology usage across hardware devices or even across various platforms or modalities remains a key challenge in this domain.

A second challenge associated with the use of telemetry or trace data concerns the calibration and validation of the measures. In this context, calibration refers to the extent to which a measure captures the intended actions (i.e., whether a measure records human actions as well as other automated or background processes). While it is well established that usage logs do not correlate with estimates of usage (Parry et al., 2021), little research has focused on the validation of telemetric mechanisms for recording digital technology usage (cf. Elhai et al., 2018; Geyer et al., 2022). In the context of smartphone usage logs, concerns have been raised that such systems can misrecord background functions as active usage, or that some forms of active usage (e.g., interactions via “lock-screens”) may not be logged at all (Jürgens et al., 2020). It is reasonable to assume that the accuracy of telemetry-based metrics for internet usage may also suffer from various biases and technical challenges. As more researchers leverage telemetry to measure behavior with digital technologies, the validation of the tools used to produce these measures becomes a key concern.

Telemetry and trace data can be of variable quality, as the data are typically captured with minimal researcher involvement. It can be difficult to differentiate data that are “natural” (i.e., data that represent typical use patterns by a user on the platform) from data that are “unnatural” (i.e., data that represent a use pattern that is impossible or highly improbable). In this study, several browser usage metrics had substantially skewed distributions with several participants’ usage far exceeding the averages. This presented a challenge for understanding whether these values represented measurement error, natural human usage, or unnatural usage by automated agents. Future studies leveraging telemetry systems to measure digital technology use will likely face similar challenges. In addition to the need for research into the accuracy and calibration of these tracking systems, variability in data quality and poorly understood usage patterns also signal the need for preregistration to guard against p-hacking in search of statistically significant results (Dienlin et al., 2021). In-depth preregistration and sensitivity-checking procedures based on a more complete understanding of what actual human digital technology usage looks like can provide guide rails to help researchers process behavioral data without succumbing to the garden of forking paths. It will also be increasingly important for researchers to calibrate their measures to truly understand whether they are measuring the conceptualizations they are aimed at and account for the inevitable imperfections.

Conclusion

To advance our understanding of digital well-being and study the usage factors that may be associated with desires to regulate time spent online, this study drew on browser telemetry collected in collaboration with the Mozilla Corporation, the developers of the Firefox browser. In addition to addressing our primary research question and showing that actual browser usage metrics, at a high level, do not relate to desires to self-regulate the amount of time spent online, the study also highlights a number of considerations and concerns that need to be addressed in future industry–academia collaborations that draw on trace data or usage telemetry. Despite these challenges, we are optimistic about the potential inherent in telemetry and trace data for enhancing our understanding of the role of digital technologies in human behavior and well-being (Lazer et al., 2021). Central to realizing this potential will be increased collaboration between academia and the technology companies that develop and maintain the platforms and services through which large parts of our lives are mediated.

Supplemental Materials


https://doi.org/10.1037/tmb0000095.supp


Received February 15, 2022
Revision received July 26, 2022
Accepted July 29, 2022
Comments
0
comment
No comments here
Why not start the discussion?