Appendix A Parameter Inference and Assessing Validity of Instruments

Parameter Inference

Under the assumption of constant error variance, asymptotic theory (e.g., see Hayashi, 2000) implies that,

$$\begin{aligned} \sqrt{n}(\hat{{{\varvec{b}}}}_j -{{\varvec{b}}}_j) \sim \mathcal {N}\left( 0,\hat{\sigma }_j^2 (\mathbf{S}_{vzj}^{\prime } \mathbf{S}_{vvj}^{-1} \mathbf{S}_{vzj})^{-1}\right) \end{aligned}$$ (A1)

where the estimator for the conditional error variance is,

$$\begin{aligned} \hat{\sigma }_j^2 =\frac{n-1}{n}\left( s_j^2 -2\mathbf{S}_{xzj}^{\prime } \hat{{{\varvec{b}}}}_j +\hat{{{\varvec{b}}}}_j^{\prime } \mathbf{S}_{zz} \hat{{{\varvec{b}}}}_j\right) \end{aligned}$$ (A2)

and \(s_j^2\) is the sample variance of \(X_j\), n is the sample size, \(\mathbf{S}_{xzj}\) is a vector of covariances between \(X_j\) and \({{\varvec{Z}}}\), and \(\mathbf{S}_{zz}\) is the variance–covariance matrix of \({{\varvec{Z}}}\).

Assessing Validity of Instruments

The question of whether a latent structure is adequate is generally translated into a statistical question as to whether the model fits the data. There is a vast body of work on the development and evaluation of model fit indices for structural equation models (e.g., Browne & Cudeck, 2002; Fan & Sivo, 2005; Hu & Bentler, 1999; Lance, Beck, Fan, & Carter, 2016; MacCallum, Browne, Sugawara, 1996; McDonald & Ho, 2002; Nye & Drasgow, 2011; Vandenberg & Lance, 2000; Widaman & Thompson, 2003; Wu, West, & Taylor, 2009). Much of prior research developed fit indices for ML estimators although there are also formal tests for model fit for the IVs estimator. Assessing model fit for the IVs estimator is based upon assessing the quality of the instruments used to estimate model parameters. We employ Sargan’s J test for overidentification (Hayashi, 2000) to evaluate the adequacy of the 2SLS model fit. As Bollen et al. (2014) noted, the J test is used for a hypothesis test where, “The null hypothesis is that all IVs for each equation are uncorrelated with the disturbance of the same equation and this is true for each equation in the system. Rejection of the null hypothesis means that at least one IV in at least one equation is invalid” (p. 31). In the particular case of MI&PI studies, the J test statistics can be used to infer whether the measurement or structural models are misspecified. Note that the J tests do not detect misspecifications in the latent variable variance and covariance structure (e.g., missing covariance parameters between residual terms), which is different than typical SEM fit indices.

For the 2SLS estimator, Sargan’s omnibus J test of overidentification is,

$$\begin{aligned} J=n\mathop {\sum }\limits _j \frac{(\mathbf{S}_{vxj} -\mathbf{S}_{vzj} \hat{{{\varvec{b}}}}_j)^{\prime }{} \mathbf{S}_{vvj}^{-1} (\mathbf{S}_{vxj} -\mathbf{S}_{vzj} \hat{{{\varvec{b}}}}_j)}{\hat{\sigma }_j^2}, \end{aligned}$$ (A3)

which is evaluated using an asymptotic Chi-square distribution with degrees of freedom equal to the number of instruments less the number of unrestricted coefficients.

Appendix B Monte Carlo Simulation Study Assessing the Accuracy of the 2SLS Estimator

Overview

We conducted a Monte Carlo simulation study to assess the accuracy of the 2SLS estimator for MI&PI studies, because prior research (e.g., Marsh, Wen, & Hau, 2004; Moulder & Algina, 2002) recommends against using 2SLS to estimate latent interaction effects involving continuous variables (Bollen & Paxton, 1998). Thus, our Monte Carlo study is necessary to evaluate the performance of the 2SLS estimator for latent interaction effects between categorical and continuous variables. We also compared the performance of 2SLS estimator to the traditional multigroup ML procedure (e.g., see Jöreskog, 1971; Sörbom, 1974, 1978).

We based the Monte Carlo study upon the model in Fig. 1 where there are three observed variables (\(X_1\), \(X_2\), and \(X_3\)) as measures of a common factor \(\xi \). Additionally, we assess parameter recovery for the structural relationship between \(\xi \) and a single criterion variable, Y. Note that we fixed the correlation between \(X_4\) and \(\xi \) and the slope relating \(X_4\) to Y to zero to focus on the accuracy of estimating group differences in measurement intercepts, prediction intercepts, and prediction slopes.

We chose parameter values for the Monte Carlo simulation based on values used in prior PI research (e.g., Aguinis et al., 2010; Culpepper & Aguinis, 2011; Culpepper & Davenport, 2009; Moulder & Algina, 2002) and estimates from the application reported in the main body of our article. We manipulated the following seven parameters: sample size (i.e., \(n = 250\), 500, and 1000), proportion of the sample in the focal group (i.e., \(p= 0.1\), 0.3, and 0.5), observed variable reliabilities (i.e., \(r_{xx} = 0.5\), 0.7, and 0.9), group latent mean differences (i.e., \(\kappa _1 -\kappa _0 = 0, -0.25\), and \(-0.5\)), measurement intercept differences for \(X_2\) (i.e., \(\tau _{21} -\tau _{20} = 0, -0.25\), and \(-0.5\)), latent prediction intercept differences (i.e., \(\beta _{01} -\beta _{00} = 0, -0.25\), and \(-0.5\)), and latent slope differences (i.e., \(\beta _{11} -\beta _{10} = 0, -0.125\), and \(-0.25\)). The remaining parameters were fixed across the simulation conditions; i.e., the loadings were defined as \(\lambda _1 =\lambda _2 =\lambda _3 =1\), the latent intercept and slope for group \(g = 0\) were \(\beta _{00} =0\) and \(\beta _{10} =\sqrt{0.5}\), measurement intercepts for both groups were set to zero (i.e., \(\tau _{10} =\tau _{11} =\tau _{20} =\tau _{30} =\tau _{31} =0)\), and the criterion residual variance was \(\psi =0.5\). Note that the unique factor variances for \(X_1\), \(X_2\), and \(X_3\) (i.e., \(\theta _1\), \(\theta _2\), and \(\theta _3\)) were determined by values for \(r_{xx}\).

Table 8 Type I error and power rates ML and 2SLS estimators for measurement intercept differences, \(\tau _{21} -\tau _{20}\) , by n, p, \(r_{xx}\) . Full size table

Results

We performed the simulation study with a total of 2187 combinations of parameters values. The outcomes of interest for the ML and 2SLS estimators were bias, Type I error rates, and power rates for \(\tau _{21} -\tau _{20}\) (i.e., measurement intercept differences), \(\beta _{01} -\beta _{00}\) (i.e., latent intercept differences), and \(\beta _{11} -\beta _{10}\) (i.e., latent slope differences). We estimated the outcomes from 5000 replications and employed an a priori Type I error rate of 0.05 for all tests.

Overall, the 2SLS estimator provided accurate estimates for all combinations of parameter values. More specifically, the mean bias for the 2SLS estimator across conditions and parameter values was \(-0.001\), 0.000, and \(-0.001\) for \(\tau _{21} -\tau _{20} \), \(\beta _{01} -\beta _{00} \), and \(\beta _{11} -\beta _{10} \), respectively, and bias for the parameter values was less than 0.01 in absolute value for 99% of conditions. In contrast, the ML estimator failed to converge for some of the conditions with small n and p. The ML estimator demonstrated similar bias as the 2SLS estimator after removing 119 of the 2187 conditions for which the ML estimator did not converge. Table 8 reports Type I error rates and power for the ML and 2SLS tests of group measurement intercept differences, \(\tau _{21} -\tau _{20}\), by values of n, p, and \(r_{xx}\). Note that “a” in Table 8 denotes conditions where ML failed to converged for all replications. Table 8 provides evidence that the ML and 2SLS estimators effectively controlled Type I error rates. Furthermore, the power to detect group measurement intercept differences was affected by n, p, and \(r_{xx}\). In general, power was larger for ML than 2SLS, but the difference between the methods declined as \(\tau _{21} -\tau _{20}\), n, p, and \(r_{xx}\) increased.

Tables 9 and 10 report Type I error rates and power for the ML and 2SLS tests of group differences in latent prediction intercepts (i.e., \(\beta _{01} -\beta _{00}\)) and latent slopes (i.e., \(\beta _{11} -\beta _{10}\)). Similar to the results in Table 8, the ML and 2SLS estimators controlled the Type I error rate at the a priori level and ML tended to be more powerful than 2SLS across parameter values. Additionally, the power to detect latent prediction intercept differences tended to be larger than the power to detect latent slope differences.

Table 9 Type I error and power rates of ML and 2SLS estimators for latent prediction intercept difference, \(\beta _{01} -\beta _{00}\) , by n, p, \(r_{xx}\) . Full size table

Table 10 Type I error and power rates of ML and 2SLS estimators for latent score slope differences, \(\beta _{11} -\beta _{10}\) , by n, p, \(r_{xx}\) . Full size table

In short, results summarized in Tables 8, 9, and 10 support the use of the 2SLS estimator to perform MI&PI studies. Reassuringly, statistical power for the 2SLS estimator was satisfactory for parameter conditions typically found in high-stakes testing contexts (e.g., \(n > 500\) and \(r_{xx} > 0.7\)).