Skip to main content
SearchLoginLogin or Signup

The Effects of Unproctored Internet Testing on Applicant Pool Size and Diversity: Using Interrupted Time Series to Improve Causal Inference

Volume 3, Issue 3: Autumn 2022. DOI: 10.1037/tmb0000079

Published onAug 18, 2022
The Effects of Unproctored Internet Testing on Applicant Pool Size and Diversity: Using Interrupted Time Series to Improve Causal Inference
·

Abstract

Although it is commonly assumed that implementing unproctored internet testing (UIT) in employee selection systems can result in increased applicant pool diversity, this assumption has not been explicitly tested. Thus, we analyzed the applicant pool composition of a major U.S.-based manufacturing organization across the span of over 8 years (N = 24,963) using an interrupted time series analytic approach. This allowed us to evaluate changes before and after the implementation of unproctored testing as well as changes following subsequent mobile device blocking. The results of this analysis suggested that although adding a UIT option appeared to increase the size of the applicant pool, the magnitude of this effect did not appear to differ between Black and White applicants. Furthermore, removing the option to apply on a mobile device dampened this general effect, but similarly, there were no differences in the magnitude applicant pool reduction between Black and White applicants. This evidence contradicts the common notion that UITs result in more diverse applicant pools, suggesting UIT’s primary value in this regard is increasing access to the application process across groups. Additionally, this study demonstrates the use of interrupted time series analysis as a powerful framework to understand longitudinal effects in real-world employee selection data.

Keywords: unproctored internet testing, interrupted time series, employee selection, longitudinal design, diversity

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

Disclosures: There are no perceived conflicts of interest.

Data Availability: The data and analytic code that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Correspondence concerning this article should be addressed to Elena M. Auer, Department of Psychology, University of Minnesota, Twin Cities, Minneapolis, MN 55455, United States auer0027@umn.edu


Attracting and hiring diverse job applicants is often one of the core levers for increasing the diversity of and alleviating inequality within organizations (e.g., Newman & Lyon, 2009). As online job applications and employee selection systems have become increasingly common, the opportunity for technology-based interventions designed to support diversity and equality initiatives has widened. Unproctored internet-based tests (UITs; Tippins et al., 2006), which are preemployment tests taken from a unproctored computing device at the applicant’s convenience (Coyne & Bartram, 2006), have often been referenced as a potential tool for increasing the diversity of an applicant pool (Landers & Sackett, 2012; Lawrence & Kinney, 2017; Naglieri et al., 2004; Scott & Lezotte, 2012; Tippins, 2009). Two-thirds of employers who use preemployment selection tests now offer unproctored testing options (Fallaw et al., 2009). Commonly listed and tested advantages of UITs over proctored tests include reduced cost, increased convenience, increased data collection speed, and measurement equivalence (Beaty et al., 2011; Delgado et al., 2009; Do et al., 2005; Templer & Lange, 2008). However, no published research to our knowledge has empirically tested a direct effect of UIT deployment on applicant pool diversity (Tippins et al., 2006).

Given this gap, we set the present study’s core causal research question: if an organization adds UIT to an employee selection system without UIT, will the diversity of its applicant pool increase? Because random assignment of authentic job applicants to experience the availability of UIT or not is likely not feasible due to issues of fairness and logistics, a quasiexperimental design is required instead, which involves carefully identifying threats to causal inference and addressing each through statistical or methodological means. Unfortunately, the employee selection research literature is not particularly informative as to the best approach in this situation, given the complex temporal confounds common to such data. Specifically, application rates in most organizations are highly temporally variant, combining linear trends, cyclical variation, and seasonal variation. Despite numerous calls in the selection literature for increased exploration of high-complexity longitudinal trends, we could identify no studies in the selection literature that actually incorporated or demonstrated an analytic approach appropriate to such data (Ployhart et al., 2017; Ryan & Ployhart, 2014).

Thus, a secondary purpose of the present article became to introduce the selection literature to a more powerful longitudinal analytic framework appropriate for examining long-term changes in application rates and to encourage its adoption in more selection research. Among available options, we identified interrupted time series analysis as the most powerful framework (cf. Marcantonio & Cook, 1994; Pickup, 2014; West et al., 2000) for addressing such research questions and focus upon it here. Despite its value, time series analysis has only recently begun to gain popularity in organizational science more broadly (see Jebb & Tay, 2017); in the present article, we introduce a more uncommon variation that focuses upon interruption that is particularly useful in selection research. In an interrupted time series design, applicant pool diversity is observed repeatedly, for an extended time before and after the introduction of UIT. If a change in outcome is observed at the time of the intervention while simultaneously modeling and controlling for a variety of theoretically suggested temporal effects, it is reasonable to infer some degree of causal influence (Pearl, 2000).

In the present study, we analyzed over 8 years of change in applicant pool diversity using an interrupted time series design and analytic approach, evaluating changes before and after an organization’s switch from proctored-only tests to proctored-plus-unproctored testing. Additionally, although the original UITs could be completed from any mobile or nonmobile device, mobile blocking was implemented approximately 1 year later. Thus, in addition to assessing change due to the implementation of UIT, we also examined the effect of mobile device blocking on applicant pool diversity as a secondary research question. Utilizing an interrupted time series framework to analyze these two events, we were thus able to provide some support for the causal impacts of selection system policy changes on applicant diversity, and in describing our study, we also introduce interrupted time series analysis as a powerful analytic framework for understanding the longitudinal diversity-related effects of employee selection system changes, in real-world samples where researcher control is limited by practical and ethical constraints. Additionally, although race, ethnicity, gender, and other subgroups are often considered as focal comparison groups in diversity research, we focused this study specifically on Black–White applicant comparisons given both the depth and breadth of previous research on Black–White applicant comparisons in United States-based employee selection research (e.g., Dahlke & Sackett, 2017; Sackett et al., 2021). In the context of employee selection, inequity between Black and White applicants is the dominant focus of diversity research given both commonly observed mean score differences on traditional selection assessments and the large population of Black people in the United States relative to other subgroups (e.g., Bobko & Roth, 2013).

Unproctored Internet Testing and Diversity

Theoretically speaking, UITs have causally explicit and implicit effects on diversity (Tippins et al., 2006). Explicit effects of UITs refer to causal effects of the technology on test performance, particularly those associated with subgroup differences. For example, if a test had an explicit effect on organizational diversity, it would be because subgroup differences in test scores used for selection resulted in subgroup differences in selection rates. The second type, implicit effects, refer to changes in the makeup of the applicant pool caused by differences in access and restrictions associated with a switch to UIT (Tippins et al., 2006) and are the focus of the present study. Although existing research is limited, we focus here on two indirect effects likely to be of larger magnitude relative to others: differences in access to on-site tests and differences in test environment perceptions.

First, in the United States, the deployment of UITs can theoretically reduce access barriers to individuals who would not be able to attend an on-site proctored test. In the earlier days of UIT, when socioeconomic status was more strongly positively correlated with access to the internet (e.g., in 2000, 53% of White Americans had home internet access compared to 38% of Black Americans; Pew Research Center, 2021a, 2021b), a switch to UIT would systematically bar racial and ethnic minorities from applying to jobs (Redl, 2018; Tippins, 2015). As internet access has improved nationally, these differences have become smaller. As of 2021, 90% of Americans used the internet with relatively small racial differences in overall access (93% of White and 91% of Black Americans) and in broadband access (80% of White and 71% of Black Americans, Pew Research Center, 2021a, 2021b). Assuming this trend continues, it will be increasingly unlikely that offering a UIT prevents minority applicants from completing tests. Instead, concerns have shifted. Now, the ability or availability to travel to proctored testing sites, including access to transportation, long distances from testing sites, the ability to take time off of work, and the ability to obtain childcare (e.g., Malik et al., n.d.; National Equity Atlas, 2015a, 2015b, 2015c) are considered more important and problematic impediments to completing a proctored test. By making a UIT available, an organization potentially removes such access barriers, to implicitly enable a more diverse pool of applicants.

Another potential contributor to the implicit effects of UIT implementation on applicant pool diversity relates to differences in perceptions of unproctored testing experiences, although both theory and empirical evidence is limited. In one study, Weiner and Morrison (2009) investigated racial differences in applicant and incumbent reactions to unproctored testing environments. In their study, applicants completed an unproctored noncognitive selection assessment either remotely or on-site and then reported on the favorability of their testing environment. Black applicants rated several aspects of their remote unproctored testing environment directionally (but not significantly) more favorably than did White applicants and significantly more favorably on the noise dimension, suggesting some potential favorability in perceptions of UIT environments by Black applicants. Although the study’s authors do not speculate on the cause of these differences, they do question their potential impacts on the validity and fairness of the assessment across environments, suggesting that differential perceptions of UITs across demographic subgroups could be related to applicant pool diversity.

Given the commonly cited assumption that UITs are related to increased applicant pool diversity (Lawrence & Kinney, 2017; Naglieri et al., 2004; Scott & Lezotte, 2012; Tippins, 2009), despite limited evidence regarding differences in access and perceptions, we hypothesized:Hypothesis 1: The provision of unproctored test availability will increase the number of minority (Black) applicants overall. Hypothesis 2: The provision of unproctored test availability will increase the average number of minority (Black) applicants over time.

Mobile Devices and Diversity

A side effect of offering UITs is the ability of test takers to take the test from a range of possible devices, resulting in differences in structural characteristics related to operating systems, screen size, and internet connection (Arthur et al., 2018). One major difference observed between UIT test takers is the use of mobile (e.g., phone or tablet) versus nonmobile (i.e., desktop or laptop) devices (King et al., 2015). Mobile assessments have been growing in popularity (Illingworth et al., 2015) and are thought to be more convenient and provide test accessibility to those without access to nonmobile devices (King et al., 2015; Walker & Moretti, 2018). However, concerns about measurement inequivalence due to differences in screen resolution, display size, assessment adaptation, and other technology and individual difference influences have also been raised. Similar to broader research on UITs, researchers have primarily focused on measurement equivalence (i.e., explicit effects) across tests taken on mobile and nonmobile devices. Generally, evidence has supported measurement equivalence for many noncognitive measures but is mixed for cognitive tests (Arthur et al., 2014, 2018; Morelli et al., 2014).

Researchers have started to explore the potential subgroup-related consequences of UITs taken using mobile and nonmobile devices. For example, Lawrence et al. (2013) found no evidence of adverse impact (i.e., a disproportionately negative impact of a hiring practice on subgroups within legally protected classes; Hough et al., 2001) related to the completion of assessments on mobile devices, and several studies have found evidence that Black, Hispanic, and female applicants were more likely to use mobile devices for testing (Arthur et al., 2014; McClure Johnson & Boyce, 2015). Similarly, Golubovich and Boyce (2013) found that when given the choice between on site, home computer, and mobile device testing, ethnic and racial minorities were more likely to use mobile devices to complete those assessments. Given this evidence, along with data suggesting that minorities and younger people are generally more dependent on internet-enabled smartphones (Smith, 2015), we concluded that implementing mobile device blocking would likely reduce the number of minority applicants.Hypothesis 3: The implementation of mobile blocking will reduce the number of minority (Black) applicants overall. Hypothesis 4: The implementation of mobile blocking will reduce the average number of weekly minority (Black) applicants over time.

Supporting Causal Inference With Observational Data

For organizational decision-making, the most central research questions of interest are causal in nature; practitioners need to know what will be different in their organization if an organizational policy is changed. Although randomized control trials have long been considered a gold standard for causal inference, there are many research questions for which such a research design is highly impractical. Offering unproctored testing options to some applicants and not others within an organization with a goal of increasing applicant pool diversity is impractical from an organizational perspective and potentially harmful to applicant perceptions of fairness (Hausknecht et al., 2004). Identifying numerous organizations that use the same employment test and randomly assigning the option of an unproctored test again would also be practically difficult. When randomization is not possible or impractical, quasiexperimental designs can be used to strengthen the validity of causal inferences (Marcantonio & Cook, 1994; West et al., 2000). Although strong causal claims cannot be made based on evidence from a quasiexperimental design, stronger claims about causal inference can be made in comparison to claims made using cross-sectional or correlational evidence, which today makes up all published UIT research sampling authentic applicants.

Interrupted time series analysis is particularly valuable in the context of the present study because it enables the examination of the explanatory effects of an intervention (i.e., introducing UITs or mobile blocking). This is done by controlling for historical effects as well as isolating the immediate postinterruption effects based on the assumption of the low likelihood of another causal influence occurring at the exact same time of the interruption of interest. In contrast with traditional longitudinal analytic approaches (e.g., latent growth modeling, random coefficients modeling), in which data are collected cross-sectionally or at a few points of time, interrupted time series analysis involves collecting data at many points in time, generally at least 20 discrete measurements but up to and including continuous measurement (McCleary et al., 1980), before and after an intervention. Most critically, time series analysis does not require a priori specification of time series components, including trend, seasonal, cyclical, and random variation effects. Traditional longitudinal modeling approaches do not allow for the simultaneous modeling of theory-driven effects simultaneously with atheoretical time effects; instead, all effects must be specified a priori or ignored, as is presently common. As a result, researchers relying on these approaches typically propose simplistic longitudinal models with few time points to simplify model decision-making. Unfortunately, simplifying models for researcher convenience harms model accuracy, particularly when the complexity of time-related effects is large, as is common with observations of authentic real-world phenomena. In such situations, a modeling approach combining theory-based components with the empirical modeling of time enables better isolation of theoretically meaningful effects from the effects of time (Jebb et al., 2015; Jebb & Tay, 2017; McCleary et al., 1980) and is preferred. Thus, in the present study, we demonstrate the value of interrupted time series for testing hypotheses in high-complexity longitudinal selection data.

The analytic goal in the present study is to decompose the longitudinal data patterns to estimate or control for the effects of those forces so that the force of interest (i.e., the applicant count trend before and after the interruptions) can be examined more clearly. Time series data can be decomposed into a variety of components including trends, seasonal effects, and irregular variation. The ideal decomposition process is driven by one of two possible research goals in time series design and analysis: forecasting and explanation. Differentiating between these goals is critical for deciding the optimal time series analysis approach. For example, for predictive modeling approaches, a certain family of time series analysis (i.e., autoregressive moving average modeling) is almost always most appropriate (Jebb et al., 2015). For descriptive or explanatory modeling, as in the case of the present study, identifying trends is the primary interest and modeled such that sample parameter estimates are unbiased population estimates (Jebb et al., 2015).

Method1

Participants

Participants were 24,963 job applicants who identified as either Black or White for entry-level production operator and maintenance positions in a major U.S.-based manufacturing organization between December 2010 and March 2019. Although data were available prior to December 2010, these data were not consistently available with difficult-to-diagnose missingness. Additionally, the overall base rate of application was much lower than in the later sample due to evolving personnel needs in the organization over time, creating heterogeneity concerns with the full sample. Thirty-nine percent of applicants self-identified as Black and 13% of applicants self-identified as women. Table 1 summarizes these and other applicant demographic characteristics before and after each intervention.

Table 1
Applicant Counts by Race, Proctored Versus Unproctored, and Mobile Versus Nonmobile

Group

Before UITs

After UITs & Before mobile blocking

After mobile blocking

Total

Proctored

 Black

4,200

317

154

4,671

 White

7,638

248

149

8,035

 Total

11,838

565

303

12,706

Unproctored

 Black

0

847

4,282

5,129

 White

0

1,633

5,495

7,128

 Total

0

2,480

9,777

12,257

Mobile

 Black

0

90

0

90

 White

0

130

0

130

 Total

0

220

0

220

Nonmobile

 Black

4,895

1,074

4,435

10,404

 White

9,429

1,751

5,644

16,824

 Total

14,324

2,825

10,079

27,228

Note. UITs = unproctored internet tests.

Design

As part of the employee selection process for a manufacturing facility, applicants completed a manufacturing assessment battery consisting of measures of 10 competencies across several methods of measurement (e.g., personality items, situational judgment, simulations, problem-solving) administered by an international management consulting firm. The battery took about an hour to complete. Applicants who passed an initial online application and upfront screening were invited to schedule an assessment administration on-site, which were proctored by HR administrators.

Data were examined across 430 weekly periods. This unit of time (weekly) was chosen as a practical balance between a highly frequent periods (i.e., days) that can be difficult to conceptualize especially over an 8-year span and low frequency periods (i.e., months, years) that do not allow for as much granularity or statistical power (Jebb & Tay, 2017). Until June 2015, the assessment battery was administered only on-site and proctored at manufacturing facilities. However, in June 2015 (Week 234), UIT was made available to applicants. Over the course of entire study, applicants who passed the initial online application and upfront screening stage were invited via email to participate further; after Week 234, they could choose to do so via UIT or by appearing for an on-site proctored administration. Before Week 234, their only option to continue was to apply in a proctored setting. In May 2016 (Week 284), mobile blocking was put in place preventing applicants from completing the UIT on a mobile device. After May 2016, applicants were blocked from completing the UIT on a mobile device and would encounter an error if they tried to do so. Applicants applied for positions at two separate manufacturing sites. However, test and intervention timeline remained the same across sites. Site-related effects were controlled for in all models.

Tests identical in content were given to applicants regardless of administration method; the assessment competency coverage in the time period was the same, though there were slight technical enhancements to the assessment between the proctored and UIT time periods; for example, the original platform utilized Flash to deliver dynamic web content and was ported to HTML5 for use in UIT. The organization indicated that no recruiting efforts specifically targeting an increase in Black applicants were employed over the course of the entire study period.

Results

To test each hypothesis, a series of steps typical to time series analysis were conducted prior to fitting the final model. First, raw time series data was visualized (Figure 1) to consider complex patterns that might be useful when modeling. No clear patterns emerged. Second, the time series was decomposed into each component (trend, seasonality, error; Figure 2, Figure 3, Figure 4, Figure 5) using stl() from the stats R package (R Core Team, 2019). Additive decomposition was conducted separately for Black and White applicants, by site. This is because removing seasonal effects can only be done at the subgroup level since it is not possible to know the race or site of the applicants to be removed through seasonal adjustment. The proportion of variance for each component was calculated. For White applicants at Site 1, a trend explained 25% of the total variance, seasonal effects explained 16%, and error explained 47%. For White applicants at Site 2, a trend explained 57% of the total variance, seasonal effects explained 14%, and error explained 27%. For Black applicants at Site 1, a trend explained 43% of the total variance, seasonal effects explained 11%, and error explained 33%. For Black applicants at Site 2, a trend explained 47% of the total variance, seasonal effects explained 12%, and error explained 34%. Seasonal effects were removed from both sets of applicants because seasonal effects were not of interest and the seasonally adjusted applicant count was used in subsequent analyses. Figure 6 graphs the seasonally adjusted time series data.

Figure 1

Raw Applicants Over Time by Group
Note. UIT offered in Week 234; mobile blocking began Week 284. UIT = unproctored internet testing.

Figure 2

Time Series Decomposition for White Applicants at Site 1
Note. Time series decomposition for White applicants at Site 1 including raw applicant count data, seasonal trend in applicant count data, estimated trend, and residuals.

Figure 3

Time Series Decomposition for Black Applicants at Site 1
Note. Time series decomposition for Black applicants at Site 1 including raw applicant count data, seasonal trend in applicant count data, estimated trend, and residuals.

Figure 4

Time Series Decomposition for White Applicants at Site 2
Note. Time series decomposition for White applicants at Site 2 including raw applicant count data, seasonal trend in applicant count data, estimated trend, and residuals.

Figure 5

Time Series Decomposition for Black Applicants at Site 2
Note. Time series decomposition for Black applicants at Site 2 including raw applicant count data, seasonal trend in applicant count data, estimated trend, and residuals.

Figure 6

Seasonally Adjusted Applicant Counts Over Time by Group
Note. UIT offered in Week 234; mobile blocking began Week 284. UIT = unproctored internet testing.

Next, the model was estimated using interrupted time series regression. In addition to modeling each component, interrupted time series regression models allow for the examination of the effects of an event or intervention on the level and slope of the trend component (Glass et al., 1975). Modeling the interruption statistically involves using segmented regression such that two pre and postevent segments are estimated (Wagner et al., 2002). The basic interrupted time series model is:

yt=b0+b1×t+b2×eventt+b3×tafter event+εt,y_{t} = b_{0} + b_{1} \times t + b_{2} \times \text{event}_{t} + b_{3} \times t\,\text{after\ event}\, + \varepsilon_{t},

where yt is the mean of the outcome per period, b0 is the baseline level of the outcome at time zero, b1 is the change in outcome mean per time period before the intervention, t is the time that occurred prior to the intervention, b2 is the level change in the outcome mean after the intervention, eventt is the indicator of the intervention (i.e., time before intervention = 0, time after intervention = 1), b3 is the estimated change in trend in the outcome after the intervention compared to the initial trend, t after event is the number of periods that occurred after the intervention (i.e., 1, 2, 3, 4, etc.), and εt is the remaining unexplained variance (Jebb et al., 2015). For each estimated level and trend change, baseline level and trend are controlled for. The present study uses this approach to examine the effects of two interventions on immediate and long-term changes in applicant pool diversity. Additionally, site terms were added as control variables to control for site-related differences over time and after interventions.

We implemented negative binomial regression because the outcome was applicant counts and there was evidence of overdispersion and not a substantial number of zero values (dispersion = 4.9; number of zero values = 7%; Blevins et al., 2015). There were two interruptions (UIT implementation and mobile blocking), a moderator (race), and a control (site). Race was dummy coded (0 = White and 1 = Black) and site was coded as −1 and 1. Wagner et al. (2002) and Hartung et al. (2015) demonstrated extensions of traditional segmented regression in interrupted time series analysis, including adding a moderator, a control, and multiple interruptions. We modified their approach to fit our study parameters to create the study’s initial model, the results of which can be found in Table 2.

Table 2
Results of Negative Binomial Segmented Regression Interrupted Time Series Analysis Predicting Applicant Count Before and After Each Intervention

Predictor

B

SE

z

p

95% CI LB

95% CI UB

Intercept

2.57

0.05

53.71

.000

2.47

2.67

Time (weeks)

0.00

0.00

5.61

.000

0.00

0.00

Site (control)

−0.19

0.03

−6.54

.000

−0.25

−0.13

Race

−0.58

0.04

−13.47

.000

−0.66

−0.49

Intervention 1

−0.44

0.14

−3.15

.002

−0.72

−0.16

Time after Intervention 1

0.01

0.00

2.80

.005

0.00

0.02

Intervention 2

−0.15

0.15

−0.99

.323

−0.45

0.15

Time after Intervention 2

−0.02

0.00

−4.27

.000

−0.03

−0.01

Site × Race

0.08

0.04

1.84

.066

−0.01

0.16

Race × Intervention 1

0.27

0.19

1.41

.159

−0.11

0.66

Race × Time after Intervention 1

−0.01

0.01

−0.98

.326

−0.02

0.01

Race × Intervention 2

0.11

0.21

0.49

.623

−0.32

0.54

Race × Time after Intervention 2

0.01

0.01

0.94

.348

−0.01

0.02

Site × Intervention 1

0.77

0.14

5.67

.000

0.50

1.04

Site × Time after Intervention 1

0.01

0.00

−2.79

.005

−0.02

0.00

Site × Intervention 2

−0.02

0.15

−0.12

.904

−0.31

0.28

Site × Time after Intervention 2

0.01

0.00

1.09

.275

0.00

0.01

Site × Race × Intervention 1

−0.23

0.19

−1.20

.231

−0.62

0.15

Site × Race × Time after Intervention 1

0.00

0.01

0.73

.463

−0.01

0.02

Site × Race × Intervention 2

0.01

0.21

0.05

.961

−0.42

0.44

Site × Race × Time after Intervention 2

−0.01

0.01

−1.69

.09

−0.02

0.00

Note. Race is coded as 1 = Black, 0 = White. Site is included as a control variable, coded as −1 and 1 so that terms of interest reflect an average across sites. SE = standard error; CI = confidence interval; UB = upper band; LB = lower band.

Lastly, model residuals were assessed for autocorrelation. Autocorrelation occurs when a variable correlates with itself or is serially dependent (Pickup, 2014), a common issue in time series data. Because autocorrelation violates the standard assumption of independent and uncorrelated errors (McCleary et al., 1980), standard errors become biased, and predictive accuracy is reduced (Jebb et al., 2015). In an autocorrelation function (ACF) plot, used to diagnosis this, the y axis is the strength of the autocorrelation whereas the x axis is the length of the lag. Partial autocorrelation functions (PACF) can also be calculated by estimating the correlation of the time series with previous points but controlling for the linear dependence of lags in-between (McCleary et al., 1980). The ACF and PACF plots can be viewed in Figure 7 and Figure 8. Based on the plots, there was some evidence of significant autocorrelation. In other words, the correlation between the applicant count at one time point and applicant counts at prior time points were statistically significant. This indicates that autocorrelation was potentially a significant problem in this model. Additionally, autocorrelation can be statistically tested using the Durbin–Watson statistic, which tests for serial autocorrelation of the error terms (Durbin & Watson, 1951). We conducted a Durbin–Watson test, which suggested the presence of autocorrelation in the model (DW = 1.23, p < .01). In combination, the ACF/PACF plots and the Durbin–Watson test suggested the presence of problematic autocorrelation in this time series. Because autocorrelation can bias standard errors by violating the assumption of independent, uncorrelated residuals, we decided to estimate new standard errors using heteroskedasticity and autocorrelation consistent (HAC) estimation of the covariance matrix estimation method using the vcovHAC() function in the Sandwich R package (Hardin, 1998; Newey & West, 1987, 1994; Zeileis, 2004). This approach estimates a new covariance matrix from the model to deal with autocorrelation and heteroskedasticity in time series data. For the interpretation of hypothesis tests, we thus relied upon inferences drawn from HAC estimates rather than from theoretical negative binomial standard error distributions. Results of this model can be found in Table 3. A figure of these results can be found in Figure 9.

Figure 7

Autocorrelation Function Plot
Note. ACF = autocorrelation function.

Figure 8

Partial Autocorrelation Function Plot
Note. ACF = autocorrelation function.

Table 3
Results of Negative Binomial Segmented Regression Interrupted Time Series Analysis With Sandwich Estimators Predicting Applicant Count Before and After Each Intervention

Predictor

B

SE

z

p

95% CI LB

95% CI UB

Intercept

2.57

0.141

8.18

.000

2.29

2.85

Time (weeks)

0.00

0.00

1.90

.057

0.00

0.00

Site (control)

−0.19

0.07

−2.64

.008

−0.34

−0.05

Race

−0.58

0.11

−5.47

.000

−0.79

−0.37

Intervention 1

−0.44

0.17

−2.64

.008

−0.77

−0.11

Time after Intervention 1

0.01

0.00

3.65

.000

0.01

0.02

Intervention 2

−0.15

0.10

−1.40

.162

−0.35

0.06

Time after Intervention 2

−0.02

0.00

−5.54

.000

−0.03

−0.01

Site × Race

0.08

0.11

0.75

.455

−0.13

0.29

Race × Intervention 1

0.27

0.18

1.54

.123

−0.07

0.62

Race × Time after Intervention 1

−0.01

0.01

−1.23

.218

0.00

Race × Intervention 2

0.11

0.19

0.55

.583

−0.27

0.48

Race × Time after Intervention 2

0.01

0.01

1.10

.271

0.00

0.02

Site × Intervention 1

0.77

0.13

5.75

.000

0.51

1.03

Site × Time after Intervention 1

−0.01

0.00

−3.74

.000

−0.02

−0.01

Site × Intervention 2

−0.02

0.10

−0.17

.864

−0.22

0.19

Site × Time after Intervention 2

0.01

0.00

1.41

.157

0.00

0.01

Site × Race × Intervention 1

−0.23

0.18

−1.31

.190

−0.58

0.12

Site × Race × Time after Intervention 1

0.00

0.01

0.92

.357

−0.01

0.01

Site × Race × Intervention 2

0.01

0.19

0.06

.956

−0.37

0.39

Site × Race × Time after Intervention 2

−0.01

0.01

−1.98

.048

−0.02

0.00

Note. Race is coded as 1 = Black, 0 = White. Site is included as a control variable, coded as −1 and 1 so that terms of interest reflect an average across sites. Coefficients estimated using heteroskedasticity and autocorrelation consistent (HAC) estimation of the covariance matrix. SE = standard error; CI = confidence interval; UB = upper band; LB = lower band.

Figure 9

Predicted Applicant Counts Over Time Using Interrupted Time Series Negative Binomial Regression
Note. UIT offered in Week 234; mobile blocking began Week 284. UIT = unproctored internet testing.

Hypothesis Tests

To test Hypothesis 1, that the provision of unproctored test availability would increase the number of Black applicants overall, we assessed the interaction between race and intercept changes after the first intervention. There were no significant racial differences in applicant count changes immediately after the first intervention; thus, the first hypothesis was not supported. To test the second hypothesis, that the provision of unproctored test availability would increase the average number of minority (Black) applicants over time, we examined the interaction between time elapsed after the first intervention and race. There were no significant racial differences in applicant count changes over time after the first intervention; thus, the second hypothesis was also not supported. To test the third hypothesis, that the implementation of mobile blocking would reduce the number of Black applicants overall, we assessed the interaction between race and intercept changes after mobile blocking was implemented. There were no significant racial differences in applicant count changes immediately after the second intervention, so Hypothesis 3 was not supported. To test the fourth hypothesis, which was that the implementation of mobile blocking will reduce the average number of weekly Black applicants over time, we examined the interaction between time elapsed after the second intervention and race. Although there visually appears to be an interaction between race and applicant rates over time after implementing mobile blocking in Figure 9, this effect was not statistically significant. The fourth hypothesis was not supported.

Exploratory Analyses

Although race was not a significant moderator of changes in applicant counts after each intervention, we observed an increase of similar magnitude across racial groups when UIT was offered as shown in Figure 9. Thus, we next examined changes in applicant counts without differentiating by race (i.e., the main effects). In this analysis, we observed both a significant intercept and slope change after the first intervention (UIT option, B = −0.44, p = .008; B = 0.01, p < .001), and a significant slope change after the second intervention (mobile blocking, B = −0.02, p < .001). Specifically, after introducing UITs, overall applicant counts initially dropped but then increased more rapidly than before UIT. Implementing mobile blocking did not immediately affect applicant count but did slow the application rate over time. Neither of these effects were evident in the raw data (Figure 1).

Discussion

Given the importance of recruiting strategies and selection tools to the diversity of organizations, this study explored the potential of introducing UITs as an intervention to increase the diversity of applicant pools, with a specific focus on Black and White applicants. Within this context, the present study makes four major contributions. First, in contrast to the assumptions of many discussing UITs in the literature, there were no meaningful changes in applicant count by race immediately following the implementation of UITs. Specifically, there was no evidence of increased applicant pool diversity after the implementation of UITs. Second, there also appeared to be no racial differences in the impact of blocking mobile devices. Third, although differences in counts and trends by race were generally not supported, both interventions were related to subsequent changes in trends across both Black and White applicants. Specifically, after an initial drop in applications, offering UITs seemed related to a steeper upward trend in applicants over time. This provides some causal evidence for an increase in overall applicant pool size attributable to UIT. Because increasing applicant pool size allows organizations to increase cutoff scores (i.e., the score thresholds used to make hiring decisions), the present empirical evidence combined with past simulation work (i.e., Landers & Sackett, 2012) suggests that adding UIT could improve job performance under certain system designs. Fourth, after implementing mobile blocking, this trend was dampened such that the overall applicant pool size increase over time was slowed. This suggests that implementing mobile blocking to an existing UIT may work against applicant pool expansion goals sought by practitioners when implementing UIT. Combined, this provides some causal evidence that UIT enlarges applicant pools and that mobile blocking dampens the rate of that enlargement.

There are three primary limitations that should be considered. Because this is a quasiexperimental study without random assignment or a control group, causality claims are still tempered by several assumptions. Although interrupted time series designs help mitigate internal validity threats by controlling for history effects and providing precise analysis of pre and postintervention effects (Cook & Campbell, 1979; Wagner et al., 2002), these trend changes may be related to other potentially unidentified factors. For example, the effect of UIT could have been confounded by local demographic shifts, as were common in manufacturing-dominant towns over the analyzed decade, if major shifts occurred at dates close to the date on which UIT was added. Similarly, the effect of mobile blocking could have been confounded by similar local changes if a switch to mobile that differed by subgroup occurred close to the mobile blocking date. Additionally, the generalizability of these findings are limited to one large U.S.-based manufacturing organization and to one perspective of applicant pool diversity. It is possible that in other organizational settings with different preemployment testing norms and applicant pool characteristics these trends would be different. It is also possible that there are changes to other characteristics of applicant pool diversity, not limited to the increase in Black applicants in relation to other minority groups that were too small for us to analyze in this data set. Lastly, the underlying causes of the change in applicant count were not specifically investigated, which was impossible given available observational data. It is unclear why applicants may or may not have decided to complete an application process before or after each intervention.

Future research should more directly investigate the causes of these differences. For example, asking applicants directly why they prefer to take a proctored or unproctored selection test or asking applicants to report their motivation and the level of burden they perceive associated with in-person proctored tests versus a UIT. Practitioners looking to increase the size of their applicant pools should implement UITs only after careful considerations of the risks involved (e.g., cheating on cognitively loaded tests, score differences among proctored and unproctored tests; Tippins et al., 2006; Wright et al., 2014). Additional consideration should be made regarding whether to implement mobile blocking. Although mobile blocking appears to be related to a reduction in the desired applicant pool expansion rate in a switch to UIT, there may be valid reasons to implement mobile blocking to avoid disadvantaging mobile test takers (e.g., timed cognitively loaded assessments).

Given the importance of understanding the longitudinal effects of technology-related policy changes, whether in the job applicant context or more broadly, we call on researchers to apply these methods to data sets concerning other meaningful demographic classes, including gender, race more broadly than analyzed here, ethnicity, age, social class, parental status, and disability status. Interrupted time series analysis is a powerful framework to understand such longitudinal effects in real-world data, a technique that we suspect can be applied in numerous existing practitioner data sets to uncover valuable scientific insights. Each demographic characteristic, and all their interactions, potentially bring unique concerns related to the impact of new technologies on behavior that can be better understood with this method. Yet such modeling alone will be insufficient to understand the observed effects. A better understanding of the causal underpinnings of both main and interactive effects, including but not limited to differential access to on-site tests and perceptions of the testing environment across and among groups, should therefore be a major research goal. For example, unique barriers related to the interaction of race and gender may prevent people at certain intersections between them from taking advantage of new technologies, and the nature of these barriers likely cannot be fully understood by examining the main effects of race or gender alone nor through time series analysis alone. Better theory is needed, which will require deeper investigation.

In conclusion, this study provides stronger causal evidence than available in existing literature to support that adding a UIT option is likely to increase the size of an applicant pool and that the magnitude of this effect does not appear to differ between Black and White applicants. Furthermore, removing mobile as an option in UIT is likely to dampen the applicant pool benefits of UIT, although the effects of adding mobile support to an existing UIT system without it remain unclear. Most critically, this study contradicts the popular notion that adding UITs to a selection system increases the diversity of an applicant pool (albeit limited to this study’s focus on Black job applicants in American organizations). Further, it contradicts the notion that restricting mobile use is necessarily more harmful to Black applicants than White applicants. In short, if organizational leaders want to meaningfully increase diversity in their organizations, simple technology policy changes alone, in the absence of more authentic, transformative change, appear broadly insufficient.


Comments
0
comment

No comments here

Why not start the discussion?