In this study, ethnic group bias of the KABC-II global scores (FCI, MPI, and NVI) was examined for a representative sample of Caucasian, Hispanic, and African-American children and adolescents in grades 1 through 12. More specifically, it was explored whether the less culturally and linguistically loaded global scores, MPI and NVI of the KABC-II, were fairer and more accurate at predicting minority group’s achievement outcomes than the traditionally more culturally and linguistically loaded FCI. In order to answer this research question, structural equation modeling was used to measure predictive invariance of the FCI, MPI, and NVI separately. The methodology applied increasingly restrictive sets of equality constraints in order to incrementally test whether the different levels of equality were met across the groups—residual, slope, and intercept invariance (Meredith 1993). Despite the firm belief by many neuropsychologists and educators that less culturally loaded scales, such as MPI and NVI, are the fairest (least biased) predictors of achievement for ethnic minority group children, results of this study suggest that FCI is the “fairest” predictor of achievement for Caucasian, Hispanic, and African-American school-aged children.

Predictive Invariance

This is undoubtedly the first study to compare prediction invariance of three global ability measures of an individually administered test of cognition across three ethnic and grade groups, using structural equation modeling. Comparison of the FCI, MPI, and NVI results demonstrated that the FCI, the most comprehensive global score, emerged as the least biased predictor variable for achievement, not only for Caucasian school-aged children but also, most importantly, for Hispanics and African-Americans. This finding is contrary to the KABC-II test authors’ predictions and contrary to what many clinicians and neuropsychologists believe, based on the inclusion of the language-oriented and fact-oriented Knowledge/Gc scale in the FCI. Additionally, the MPI and NVI, the two global indexes that are linguistically and culturally more neutral, did not accurately predict the level of the reading, math, and writing abilities of children from the two ethnic minority groups. These indexes were not biased against African-Americans and Hispanics—they correlated as highly with achievement for the two ethnic minorities as they did for Caucasians, and they did not underpredict the achievement of African-American and Hispanic children—but they did not do a good job of identifying their level of achievement. The MPI and NVI overpredicted their actual levels of academic achievement, especially at grades 5–8.

In sum, the results of this present study show that the MPI and NVI produced consistent overprediction in terms of their intercept when assessing African-American and Hispanic minority group children’s achievement, especially at grades 5–8. The more comprehensive FCI, on the other hand, was not biased in terms of its slope or intercept for Caucasian, Hispanic, and African-American school-aged children. The FCI findings are consistent with previous studies. Keith (1999), Weiss et al. (1993), and Weiss and Prifitera (1995) found no bias in terms of the slope and intercept when assessing prediction invariance of the WISC-III FSIQ and the GAI, both of which are comparable to the FCI in terms of content. Not many studies have assessed psychometric test bias in terms of prediction invariance, and those that did are more than 15 years old (Keith 1999: Weiss et al. 1993; Weiss and Prifitera 1995). No study has previously investigated prediction bias in terms of slope and intercept bias of culturally and linguistically free global scaled scores using structural equation modeling. Naglieri and colleagues did evaluate prediction bias of the CAS and NNAT, both with culturally reduced content, but they used simple coefficients of correlation for their analytic approach rather than structural equation modeling; therefore, it is not possible to determine whether the Naglieri studies also found overprediction of achievement for the ethnic minority children. The results of the present study have important implications for neuropsychologists.

Clinical Implications for Neuropsychologists

The results demonstrate several important findings for neuropsychologists. First of all, some neuropsychologists believe that global scores should not be interpreted, or even used at all, as summary scores are often thought not to be reflective of an individual’s neuropsychological status and profile (e.g., Kaplan 1988; Luria 1979; Lezak 1988). Data of this preset study, however, suggest that global scores have value. The most comprehensive global KABC-II score, FCI, was, in fact, very accurate at predicting achievement not only for Caucasians but also, most importantly, for Hispanics and African-Americans. FCI demonstrated no slope bias and virtually no intercept bias. The results of this present study suggest that the FCI, apart from being reliable and valid, is an unbiased and accurate predictor of academic achievement for Caucasian, Hispanic, and African-American school-aged children. Such results suggest that global scores can be useful indexes for clinical neuropsychologists to interpret, even though such scores mask the interpretation of the multiple individual characteristics (processes) that are more truly reflective of an individual’s cognitive functioning. Further, findings of this study support Canivez’s (2013) argument that global scores are valid and reliable when it comes to the interpretation of an individual’s cognitive capacity. He supports his argument statistically, stating that global IQs have been found to have the strongest internal consistency, short- and long-term temporal stability, and predictive validity coefficients; they produce less error variance and account for the largest portion of the variance with a variety of criteria. The results of this study support his argument—namely that FCI is a fair predictor variable to use in the evaluation of Caucasian, Hispanic, and African-American school-aged children.

Some researchers might argue that the FCI is the fairest predictor variable of achievement due to criterion contamination. However, it is important to note that such an argument would have been true for earlier versions of the Wechsler scales, which only consisted of the Performance and Verbal IQ scales; naturally, the Verbal IQ scale overlapped greatly with achievement variables. The FCI is composed of five indexes, only one of which—Knowledge/Gc—is akin to academic achievement. The other four indexes measure visual-spatial ability, short-term memory, long-term retrieval, and fluid reasoning; none of these abilities (or cognitive processes) are taught in school.

Secondly, outcomes of this study showed persistent evidence for intercept bias on the MPI and NVI, such that the global cognitive scales consistently overpredicted African-American and Hispanic academic achievement at grades 5–8. Even though Kaufman and Kaufman (2004a) suggest using the MPI in preference to the more comprehensive FCI when assessing children from non-mainstream backgrounds, such as Hispanics and African-Americans, the present findings suggest otherwise. Kaufman and Kaufman generally suggest using the FCI as the index of choice in most neuropsychological evaluations, for example, for the diagnosis of learning disabilities or brain damage; however, they make the exception of suggesting the MPI for ethnic minorities. The present findings suggest that the comprehensive FCI should be the index of choice for neuropsychological evaluations, even if the referred child or adolescent is African-American or Hispanic. By comparing the prediction invariance results of the three global scores, the present study showed that the FCI emerged as the least biased global index on the KABC-II—and that includes the NVI, which reduces language skills to an even greater extent than the MPI. Such findings are important because both the NVI and MPI measure a limited range of abilities, as they exclude language- and fact-oriented subtests. This limitation becomes especially problematic when the referral for evaluation is based on problems in language (Flanagan et al. 2013).

Thus, overall, results of this present study suggest that neuropsychologists should opt to use the FCI for all children, ethnic minority, or otherwise, when the goal is simply to predict their current level of achievement. As the MPI and NVI consistently overpredicted minority students’ achievement, these two global indexes are likely to prove less accurate as predictors of their reading, math, and writing. However, we encourage examiners to use the MPI or NVI to identify minority students’ capability to achieve higher than their current level; these indexes are less language and achievement oriented and are better suited at identifying their cognitive potential. For example, the MPI and NVI would be better choices than the FCI for Black and Hispanic students when the KABC-II is used for gifted placement as well as for the assessment of intellectual impairment or disability. It is also important to note that even though the KABC-II global scores are valuable predictors of current achievement or future potential, results do not imply that global scores are especially useful for planning interventions. Intelligent testing demands that clinicians rely on children’s patterns of strengths and weaknesses for selecting the best educational interventions for each student (Kaufman et al. 2015; Lichtenberger and Kaufman 2013). Here, schools have the responsibility to make use of existing cognitive strengths in order to allow students to achieve to their fullest potential.

Finally, it is important to recognize that findings of this present study not only pertain to the Kaufman tests but also generalize to other popular tests of cognition and achievement. For example, the study conducted by Kaufman et al. (2012) demonstrated that the g measured by the KABC-II is essentially the same g that is measured by the WJ III. Similarly, Reynolds et al. (2013) and Floyd et al. (2013) demonstrated that the same g underlies the KABC-II, the WISC-IV, the WJ III, and the DAS-II. Such findings provide strong evidence for the fact that the same global construct that is being measured by the KABC-II is also measured by the WISC-IV, the DAS-II, and the WJ III. Any findings pertaining to the KABC-II are, therefore, likely to be generalizable to those other tests and, by extension, to the current versions (WISC-V and WJ IV). Thus, neuropsychologists can be reasonably confident that results of the present study generalize to other popular tests of cognition and achievement.

Possible Explanations for the Overprediction at Grades 5–8

It was interesting to see the persistent intercept overprediction of NVI and MPI for African-American and Hispanic achievement outcomes at grades 5–8. Such findings indicate the overprediction might depend on the developmental age of the students. For some reason, ethnic minority group children have the cognitive capacity to achieve higher in middle school (as evidence by their higher KABC-II scores) than they actually achieve. One explanation for the overprediction at grades 5–8 could be that verbal skills become extremely important for academic success in middle school, more so than in the primary grades when the goal is to learn the basics of reading, math, and writing. For example, solving math problems in middle school not only require moving letters and numbers but also require the student to read the problem, sketch the situation, and solve the problem both verbally and quantitatively (Wendling and Mather 2009). Other explanations for the overpredictions in middle school include the fact that the early adolescence is a critical time for brain development. For example, students are moving from more concrete problem solving to abstract, analytical thinking as the prefrontal cortex is undergoing rapid development (Luria 1979). However, the early adolescent years are also marked by difficulties paying attention to several stimuli at the same time (related to short-term memory limitations). Behaviors reflect this stage of brain development when adolescents engage intensively, but briefly, in a specific activity. Also, interaction with peers and active, experimental learning is preferred. In order to assist struggling students to achieve to their fullest potential in middle school, teachers should try to focus on experimental, group learning techniques and possibly limit the amount of stimuli students at these ages are presented with (Wendling and Mather 2009; Ryan and Patrick 2001). Furthermore, young adolescents are more emotionally driven (this is related to the fact that the prefrontal cortex is still developing and the amygdala is more easily activated) (Somerville et al. 2010). It is also possible that some individuals from ethnic minorities might perceive increased awareness over their minority status; for example, some pre-adolescents and adolescents might feel socially rejected or perceive the differences between them and their Caucasian school-teachers, all of which can impact their psychological health and, therefore, their ability to succeed in school (e.g., Parkhurst and Asher 1992; Weiss et al. 2006). It is important for neuropsychologists and school psychologists to take these suggestions into consideration, especially when evaluating middle school-aged minority group students.

It is also important to note that whereas reliability of the intelligence test would have affected the slope of the regression line, intercept bias arises due to omitted variables that are separate from the predictor variable (Meade and Fetzer 2009). In this study, the NVI and MPI produced consistent intercept overprediction, but no slope bias. Such results strongly suggest that the minority group children have the cognitive capacity to achieve substantially higher than their current level of achievement, especially at grades 5–8. Intercept overprediction means that there are other variables, independent of the cognitive variables, that influence the minority groups’ ability to achieve to their fullest potential. Many of these independent variables that impact achievement are likely related to socioeconomic disparities, such as differences in income and percent of single-parent households, as well as differences in nutrition and physical health (Weiss et al. 2015). Undoubtedly, there are many socioeconomic variables that contribute to differences in test results and it is impossible to account for all of those disparities. It is for those reasons that differences in mean scores between Caucasian and minority group students should not be taken at face value and should not be interpreted as meaningful. Finally, another variable that likely contributes to the “overprediction” is the failure of the American educational system to capitalize on minority children’s strengths.

Limitations

The results need to be understood in the context of the study’s limitations. First of all, there is disagreement among researchers who have published on measures of ability and promoted theories of intelligence. Whereas some accept the notion that a test, which requires knowledge and skills taught in school, can be used to measure ability, others disagree due to criterion contamination (Dumont, Willis, and Elliott 2009). Furthermore, it is important to take into consideration limitations pertaining to the sample’s demographics. Only three broad ethnic groups were included in the sample. Due to a lack of sample size, other ethnic groups, such as Asians, Pacific Islanders, and Native Americans, could not be included in the analysis. Additionally, keep in mind that the term “Hispanic” was used to classify a very broad and heterogeneous group of individuals who differ in terms of their cultural and historical background. Unfortunately, no representative subsamples of Hispanics were available. In order to generalize present findings, future researchers need to replicate the analyses with different ethnic subsamples. Furthermore, it is important to note that the sample was not large enough to permit ethnic bias analysis for students from different socioeconomic backgrounds (as measured by mother’s educational attainment). Future studies should address this limitation. For example, future studies could split their groups by parental education or other socioeconomic variables to evaluate whether results maintain for different SES groups. Other limitations include the fact the standardization sample used in this study is representative of 2001 US Census data. The demographic profile in the USA has undoubtedly changed since 2001; thus, the stratification of the sample does not exactly reflect the current US population. Furthermore, even though we examined a developmental trend by dividing the sample into three grade groups, it is important to consider that this was a cross-sectional sample. Thus, just as there are drawbacks with regard to using longitudinal data sets, such as practice effects, there are also limitations to using cross-sectional data sets, such as cohort effects (Kaufman 2009; Kaufman and Weiss 2010). Future studies may want to replicate this present study using longitudinal data sets.

Finally, it is crucial to take into consideration that the sample was composed of normally developing children. However, the children that are most commonly referred for psychological testing are those who struggle with learning disabilities or other developmental disorders. Future researchers ought to address these limitations.