Earlier, I have reviewed Braden’s (1994) book, Deafness, Deprivation, and IQ. Considerable amount of studies have been conducted since then. The focus is on the validity of measures of intelligence among the deaf population, such as reliability, predictive validity, measurement properties of the tests.



Hearing loss and nonverbal IQ

Given Braden’s (1994) review, we would not expect nonverbal IQ to be correlated with the degree of hearing loss (perhaps excepted the fact that the genetically deaf have greater IQ and a more severe hearing loss than the nongenetically deaf people) because nonverbal IQ (unlike verbal IQ) is equivalent between the deaf and the hearing groups. The implication is that the cultural deprivation related to hearing loss does not affect general intelligence but is domain-specific. Braden (1994, p. 121) reported correlations of 0.05 and -0.49 for nonverbal (N_studies=12) and verbal (N_studies=4) IQ tests. With regard to nonverbal tests, we will see that the sign and magnitude of the correlations differ across studies, so that hearing loss displayed no consistent behavior.

Granick et al. (1976), apparently missed in Braden’s review, found among 47 aged males and 38 aged females that hearing loss consistently has at least a modest negative impact on WAIS subtests, Raven CPM and Ammons Picture Vocabulary (when age is partialled out). The effect is way stronger for verbal (sub)tests and for the female sample.

Sullivan & Schulte (1992) studied a fairly large sample of of 77 hard-of-hearing (HOH) children and 291 deaf children, aged 6-16 with mean age of 12. Consistent with Braden (1994) meta-analytic review, they report deaf children of deaf parents (N=43, VIQ=96.51, PIQ=127.51) having higher VIQ and PIQ than deaf children of hearing parents (N=325, VIQ=85.71, PIQ=115.85) by 11 and 12 points, respectively. There is a slight difference in IQ or subtest scores in interpreted (i.e., through an interpreter) versus signed administrations (VIQ= 83.61 vs 88.37, PIQ= 115.31 vs 118.01) for the advantage of signed administrations. Children having severe degree of hearing loss outperform children hard-of-hearing on PIQ (119.73 vs 107.70) and FSIQ (101.62 vs 97.99) but not on VIQ (85.88 vs 91.08). Sullivan & Montoya (1997) report data on another sample of deaf and HOH children (N=106). Compared to moderate/severe hearing loss, those having profound hearing loss have better full scale IQ (87.26 vs 83.45) and PIQ (103.10 vs 94.07) but not VIQ (75.06 vs 76.10). Both communication and administration mode appear important, and sign mode produces higher PIQ (but not FSIQ and VIQ) than oral mode (102.50 vs 95.90). Children having known etiology of deafness have poorer FSIQ (82.43 vs 90.46), VIQ (73.07 vs 77.90) and PIQ (95.43 vs 106.46) than those having unknown etiology.

Moeller (2000) examines a sample of 112 deaf children enrolled in the Diagnostic Early Intervention Program (DEIP), in which 84 took either the performance scale of the WPPSI or WISC-III or Hiskey-Nebraska. The mean nonverbal IQ of these 84 children is 102.27, which is comparable with Braden’s estimates of 100. Two vocabulary tests are provided; the Expressive One-Word Picture Vocabulary Test (EOWPVT) and Peabody Picture Vocabulary Test (PPVT). They correlate at 0.81. The PPVT correlates also highly with receptive language (0.80) and expressive language (0.74) but modestly with nonverbal IQ (0.289). The Preschool Language Assessment Instrument (PLAI), which is a verbal reasoning test, also correlates (-0.310) with later age of enrollment but weakly with nonverbal IQ (0.161). The degree of hearing loss, as measured by pure tone average (PTA), did not correlate with either PPVT (-0.033) or nonverbal IQ (0.018 and -0.088), but is correlated with PLAI (-0.251). Interestingly, PPVT was negatively correlated (-0.464) with later age of enrollment in the program but not nonverbal IQ (-0.092 and -0.067). However, the present conclusion that intervention has no effect on nonverbal IQ is not clear at all. Kushalnagar et al. (2007) studied two samples of deaf children who were enrolled in the cochlear implant program. One group (N=23, age identified deaf is 7 months) took the Mullen Scales of Early Learning and the other group (N=23, age identified deaf is 17 months) took the Leiter-R. The Mullen and Leiter-R correlate with later age (ranged 3 to 35 months) of intervention at -0.40 and -0.01, respectively. One tentative explanation for this difference is that the Mullen scales have a language component, that is absent in the Leiter-R. However, the authors removed the language components of that test. Both tests effectively measure nonverbal ability.

Weichbold & Herka (2003) studied the effect of hearing impairment on the Raven CPM in a sample of 52 deaf children. At dB BEHL 20-40, 41-80, >80, the mean Raven’s IQs were 104 (n=6), 100 (n=26), 108 (n=20). The respective numbers for the median IQs are 100, 102, 105. It is surprising that hearing loss has a positive impact on nonverbal IQ. The correlations between hearing loss and overall CPM, subtest A, subtest AB, subtest B, are 0.23, 0.01, 0.15, 0.37. In subtest A, the theme is identity and similarity, in subtest AB, the theme is symmetry and location, and subtest B (which contained some of the most difficult items) involves logical and spatial principles.

Zekveld et al. (2007) correlate the degree of hearing loss with a series of nonverbal IQ tests, such as Groninger Intelligentie Test and the three subtests of CANTAB (Pattern Recognition Memory, Rapid Visual Processing, Spatial Working Memory), among 30 children. Hearing loss correlates at 0.16 with GIT, at 0.13 with PRM, at 0.01 with RVP, at -0.09 and -0.41 with SWM-errors and SWM-strategy. There is no evidence that the degree of hearing loss lowers nonverbal IQ.

Smiley et al. (2009) compare a group of hearing-impaired (N=13) and hearing-normal (N=13) children, matched on gender, grade level, intelligence and language ability. PIQ is measured with WISC-III and the score of deaf is greater than the hearing group (102.77 vs 98.54). Language ability (LQ) is measured with CELF-3 and the score of hearing group is greater than the deaf (88.46 vs 84.69). Receptive vocabulary is measured with PPVT-III and the score of hearing group was stronger than the deaf (94 vs 86). The ability to solve computation problems is assessed by Math Advantage, which test also contains word problems. In both tests the hearing-impaired group has higher score. Hearing loss is correlated with the ability to solve computation problems (0.302), modestly with the ability to solve word problems (0.102), PIQ (-0.137), LQ (-0.170) but not with receptive vocabulary (-0.072).

Phillips et al. (2014) examine a sample of 54 deaf children (mean age=4.8) on the Leiter International Performance Scale-Revised (Leiter-R Brief IQ) and Differential Ability Scales – Second Edition (DAS-II Nonverbal Reasoning Index). The mean standard scores of the two tests are roughly similar. The two tests are correlated ar 0.74, and the correlations between their subtests range from 0.41 to 0.84. Leiter-R and DAS-II correlate negatively with the severity of hearing loss, -0.20 and -0.15, respectively. They also correlate with income (0.44 and 0.31) and father (0.19 and 0.30) as well as mother (0.18 and 0.10) education. Both tests correlate substantially with receptive and expressive language. It is regrettable that the authors concluded there was no correlation between hearing loss and nonverbal IQ. They came to this false conclusion by relying on the significance test.

Lin (2011) examines the survey data of NHANES and correlates the degree of hearing loss (dB) with digit symbol substitution test (DSST). The correlation was -0.18. Emmett & Francis (2014) also analyzed the NHANES III, but used the WISC-R block design subtest. The children were aged 12-16, the entire sample having audiometric and IQ tests is large (N=4,823). They conduct logistic regression (IQ variable dichotomized) with age, gender, race, head of household education, family income, unilateral and bilateral hearing loss in the set of independent variables. When the other variables are controlled, the odds ratios for unilateral and bilateral loss are 0.73 and 5.77. Without adjustment for control variables, the odds ratios are respectively 0.77 and 6.94. Therefore, bilateral (but not unilateral) hearing loss has a large negative impact on block design. One problem with the analysis is that the dependent variable (block design) has been dichotomized so that a raw score <4 is the response value, which leaves 7 persons having bilateral loss in the group of low IQ, for 39 persons in the group of high IQ. Such small sample size in either one of the category makes the variables’ estimates very unstable.

Gent (2012, p. 142) has reported a weak correlation between the WISC-R PIQ and degree of deafness (0.05) and cause of deafness (-0.08) for 68 deaf adolescents, aged 16.

Barbosa et al. (2013) examine a sample of 205 deaf children (mean age=14) on the TONI-3 (nonverbal test). The split-half (Spearman-Brown) reliability of this test was 0.83. Only 156 had data on hearing loss. They report no score difference in function of hearing loss, because they look at the p-value, not the effect size. The mean score was 16.90 for moderate, 19.51 for severe, and 17.44 for profound deafness. The magnitude of scores has no linear relationship (moderate to severe to profound) with deafness. If anything, the scores may tend to increase. Also, the subjects with hearing aids (n=75) have higher scores (18.75 vs 17.19) than those without hearing aids (n=93).

Khan et al. (2005) compare a sample of hearing (N=18), hearing-impaired (N=13) and deaf having cochlear implant (N=24) on full scale IQ (Leiter-R or LIPS-R) and fluid reasoning (a composite score of two subtests of the Leiter-R, i.e., Sequential Order and Repeated Patterns). The cochlear implant (CI) is an electronic device that provides hearing to the patients. If nonverbal IQ is unaffected by hearing loss, we should expect no positive impact of CI on the scores. But the authors found that the hearing and CI groups have similar scores (114 and 112) while deaf have much lower score (103). The hearing and CI groups also have similar scores on nonverbal IQ (103 and 105) while deaf have lower score (96). However, in Braden’s review, nonverbal IQ was initially equal between the hearing and the deaf. For example, in Tharpe et al. (2002), the group with prelingual deafness has an IQ of 110.9 on the TONI, while the normal-hearing and CI groups have 107.4 and 97.2, respectively. Schlumberger et al. (2004) factor analyzed several nonverbal IQ and visual (sub)tests (Raven SPM, mazes, visual span, visual perception) and this yield a factor they have labelled procedural. On this factor score, the hearing (n=25) and CI (n=17) groups perform almost equally and far above the deaf (n=12), among children aged 7 and less. But all three groups perform (n=15 for hearing, n=7 for CI, n=12 for deaf) equally well among children aged 7 and more. They factor analyzed other tests of visual skills only, which produces a semantic factor, and all groups have almost equal score among children aged 7 and less, but the deaf group has much lower score among children aged 7 and more. It would seem that CI enhances the development of nonverbal IQ. But even this conclusion is moderated by another finding. Luckner & McNeill (1994) examine 86 children, of which 43 are deaf. They were given a problem-solving task, which requires to move a pyramid of disks and placing them into the empty pegs. The researchers divide the group into four age groups : 5-8, 9-10, 11-12, 13+. The deaf children perform far worse than the hearing group at all ages except 13+. This suggests that the deaf children have a delay in their cognitive development, but do not lag behind. Hyde et al. (2003) replicate this result, through grade 1 to 7, using a test of arithmetic word problems. Remember that none of these studies were longitudinal.

The impact of hearing loss on nonverbal IQ is truly ambiguous. Several studies show that cochlear implants help to develop better nonverbal IQ. But other studies (Wu et al., 2008; Huber & Kipman, 2012) show that verbal IQ is clearly inferior for verbal IQ but not for nonverbal IQ in deaf people with cochlear implants.

How can we explain these disparities ? It seems that most studies considered the deaf group as an homogeneous group. Perhaps an answer will emerge if they separate the groups into genetically deaf and nongenetically deaf. Braden (1994) said that hearing loss was not correlated with nonverbal IQ but that genetically deaf have higher nonverbal IQ and hearing loss than nongenetically deaf persons.

Reliability and validity

The question of reliability and validity is important. If the test is not reliable, and that the persons having higher scores will not have higher scores at the retest, then it may be expected that the IQ test does not measure intelligence very well. If the IQ scores are not correlated with achievement and job success, they will be devoid of meaning. In the normal-hearing population such correlations are high, and we would expect similar correlations for the deaf people, if the IQ tests are equally relevant for the two groups.

Blennerhassett et al. (1994) reviewed previous studies on the correlations between nonverbal IQ (including Raven) and WISC-R PIQ. They range from 0.635 to 0.870. They also reviewed the studies on the correlations between various IQ and achievement tests. They are quite varied but it is safe to say they are about 0.45-0.50. Nonverbal IQ is usually used to assess intelligence in deaf children because verbal IQ is impracticable. Because verbal IQ is known to be a better correlates of achievement, it is expected that the average IQ-achievement correlation in deaf children must be under-estimated. Their own analysis includes 107 deaf students (14 years old) enrolled in residential schools for deaf. 102 students took the Raven SPM, 37 took the WISC-R PIQ and about 85 took the SAT reading comprehension and SAT spelling, only 36 took the SAT language. The correlation of Raven SPM with PIQ (and SAT subscales) was about 0.60 (and 0.33, 0.38, 0.44).

Slate & Fawcett (1995) report a high correlation between WISC-III performance scale and WISC-R performance scale (0.93) in a sample of 47 deaf students (but 43 students over a 3-year time period). The correlation between WISC-III PIQ with WRAT-R Reading, Spelling and Arithmetic subtests are 0.41, 0.48, 0.64, respectively. The respective numbers for WISC-R PIQ are 0.43,0 .45, 0.68. WISC-III PIQ (mean=87.1) was 3.8 points lower than WISC-R PIQ (mean=90.9). Students who communicated via Total Communication (mean=92.5) exhibited higher PIQ means than did students who communicated orally (mean=82.6). This was expected because TC comprises sign language. The sampled students were all from public schools located in southern US state.

Mackinson et al. (1997) report modest correlations between TONI-2 (nonverbal test) and WISC-III PIQ (r=0.67) subtests such as picture completion (r=0.58), coding (r=0.09), picture arrangement (r=0.70), block design (r=0.63), object assembly (r=0.52), symbol search (r=0.58), with N=24-27, between TONI-2 and SAT subtests, such as reading comprehension (r=0.49), math applications (r=0.62), concepts of number (r=0.66), math computations (r=0.59), spelling (r=0.71), language (r=0.50), with N=17-19, as well as the correlation of WISC-III PIQ with SAT subtests, respectively, 0.65, 0.73, 0.71, 0.61, 0.72, 0.55. The mean age of the sample is 11 years. The sample is non-representative (56% blacks and 37% whites).

Hammill & Pearson (2009) report the validity of the Comprehensive Test of Nonverbal Intelligence – Second Edition (CTONI-2). The reliability (Table 9.4) for the deaf children (N=91) was very similar with the normative sample as well as all other ethnicities on the subtests and composite scales. The full scale IQ on this test has a strong correlation with WISC-III for deaf children (0.90). Similarly, Naglieri & Brunnert (2009) present the Wechsler Nonverbal Scale of Ability (WNV). This scale does not require language or arithmetic skills but it is multidimensional in that it may require visual-spatial skills, recall of spatial information or recall of the sequence of information, and paper-and-pencil skills. The WNV contains 6 subtests : matrices, coding, object assembly, recognition, spatial span, picture arrangement. The reliability for deaf children is 0.77-0.98, and is identical for hard-of-hearing. Table 12.5 shows that the hard-of-hearing (HOH) and deaf samples have mean scores (96.7 and 102.5) on the WNV similar to the means of the matched normative sample (100.5 and 100.8).

Measurement bias

According to Braden (1994), nonverbal IQ would not be affected by the kind of cultural environments that affect verbal IQ. If the deaf are still deprived in knowledge relevant to some specific items or subtests, we will find a bias (i.e., group difference) in the difficulty parameter or the intercept(s). Furthermore, deaf persons are expected to organize their ability in a similar way than the normal-hearing population when attempting to resolve any particular items of a given subtest. Otherwise, there will be a group difference in item discrimination or factor loadings.

Krouse (2011) analyzed 134 deaf people (mean age=12.16) on the WISC-IV, using the CHC model reported for the norm group (N=2,200) as a point of reference. The 3 group factors of the CHC were VCI (verbal), PRI (nonverbal), and PSI (processing speed). The analysis involves multi-group CFA. Factor loading was not invariant only with respect to VCI. Intercept invariance was violated for PRI and PSI. But, as nearly all psychometricians using MGCFA, Krouse did not understand the implication of measurement (intercept) invariance. At this level, the technique attempts to equate the intercept (i.e., mean score of subtests when the latent variable score is zero) across all subtests. If one (or more) is not equal across groups, this subtest would favor either the deaf or normal-hearing group. It is possible that subtest bias cancellation occurs when summing all of them, if the subtests are biased in both directions. There is no logical justification for claims about “incomparability” if such measurement bias does not impair either the measurement or validity of the test (Roznowski & Reith, 1999). However, the unconstrained scalar values can reveal the magnitude and sign of intercept difference (Table 13). All five subtests of PRI and PSI display a lower intercept for the deaf sample. The means of VCI, PRI, and PSI were 80.05, 96.18, and 94.16. This compares unfavorably with the norm group, which is said to have a mean of 100. Although PRI and PSI are thought to be tests of nonverbality, the difference of 5 points is by no means trivial. One possible reason for this curiosity is that only 47.8% of the deaf sample is white, with 40.3% of hispanics. It is greatly unfortunate that Krouse did not even attempt to remove this confounding but one can guess the deaf-hearing difference should be modest or small when hispanic people are removed. By doing this, perhaps intercept invariance would have not been violated. Finally, the subtest intercorrelations seem lower than what is usually seen, and a few (three) of them are null or negative.

Krouse (2008) have previously shown, using the same sample, that the subtest reliability (using the internal consistency method, also called test’s split-half into odd and even items) of the norm group and deaf group are very similar, but slightly higher in the deaf group.

Tayrose (2011) performs MGCFA on various early studies on deaf/HOH and hearing children. The CHC model has been fitted on the hearing and deaf children. It provided an excellent fit for the hearing group aged 10 and less in the ITPA, WISC, WPPSI but not HNTLA, while for the hearing group aged 11 and more, the CHC has a good fit for HNTLA but borderline for WAIS-R. Concerning the deaf/HOH children, the CHC model is acceptable for WISC and WPPSI, but not when applied to the test battery ITPA for children aged 10 and less. And for children aged 11 and more, the CHC yields perfect fit for Hiskey-Nebraska (HNTLA) battery but has an acceptable fit for only one of the two samples of the WAIS-R. The next step was to compare the factor structures. Tables 9 & 10 show that the fit of configural and metric invariance don’t depart from each other in the HNTLA and WAIS-R for children aged 11 and more, and similar conclusion was observed in the WISC and WPPSI for children aged 10 and less. The author continued and fitted the CHC to other samples of mixed ages. For normal hearing group, the CHC has a good fit on ITPA, K-ABC, UNIT, WISC-III, WISC-R. For the deaf/HOH group, the CHC has no acceptable fit for the K-ABC, the UNIT, the WISC-III, and 4 samples of the WISC-R, and borderline for another WISC-R sample, but has a good fit on the two remaining WISC-R. CFI and NNFI always had very good fit, and it is always RMSEA that was the source of misfit (except UNIT and WISC-III where all fit indices are unacceptable). As explained by Sharma et al. (2005, Tables 2-3), RMSEA diminishes (although modestly) when sample sizes reach 200 or more. Table 13 show that the metric invariance, compared to configural invariance, has no worse fit. Except for the battery of Illinois Test of Psycholinguistic Abilities (ITPA), all values of CFI, NNFI and RMSEA, are virtually identical between the two levels of invariance. Therefore, the subtests’ factor loadings are identical. Virtually all batteries involve the Wechsler (one study uses the UNIT, and another the K-ABC). In light of what Braden (1994) had said, it appears very surprising that the metric invariance is not violated when using the entire Wechsler’s scale, included the verbal subtests.

Maller (2000) analyzed a recent IQ test recommended for deaf children, the Universal Nonverbal Intelligence Test (UNIT), by way of IRT and Mantel-Haenszel techniques. She found no evidence of meaningful item bias (Differential Item Functioning) across the 4 analyzed subtests (each containing about 30 items), as assessed by effect size measures (RMSD for IRT). Then, the probability of correct answer is identical between deaf and normal-hearing children when they are equated for latent ability. The problem is that RMSD is a measure of (average) unsigned difference in probabilities of correct responses. In other words, the possibility of cumulative DIF cannot be tested with this approach.

Maller (1997) analyzed the items in several subtests of the WISC-III (translated into sign language) such as Picture Completion, Information, Similarities, Vocabulary, Comprehension. The sample included 110 severely and profound deaf children whose mean PIQ was 104 (aged 8-16, mean age=12.78). The standardization sample (N=2,200) was chosen to be similar in mean age (11.52). The analysis involved Rasch model, which is based on a logistic function that models the probability of a correct response, given the difficulty of the item and the ability of the person. The key point in Rasch modeling is that whenever an item shows insufficient fit to the Rasch model, it means some factors other than ability (e.g., group difference in item discrimination, difficulty, or guessing) influence the probability of correct response. Simply put, the unidimensionality assumption is violated. DIF was detected by comparing the logit item difficulties centered at zero for each sample using Lord’s (1980) formula for testing the size of the difference in item difficulty (b-)parameters, which is expressed as: DIF i = (b Di -b Hi )/SQRT(σ² Di +σ² Hi ) where, b Di is the logit difficulty of item i for the deaf sample, or b Hi for the hearing sample, (σ² Di is the error variance around b Di , and σ² Hi is the error variance around b Hi . A significant difference (as evaluated against a z distribution) indicates the presence of DIF. The negative (positive) sign of “DIF” column means that the item favors (disfavors) the deaf children. A close look at tables 1-5 reveals that the items of all the subtests tend to cancel out, and this outcome has been acknowledged by the author herself (p. 311). Many of the items had DIF statistics sufficiently large to be considered as DIF. Vocabulary subtest (14 out of the 25 items without poor fit) had the largest DIF stats while Picture Completion (6 DIFs out of 30 items) had the smallest DIF stats. Curiously enough, the sign and magnitude of DIF stats indicate that Vocabulary is likely to be biased against normal-hearing children. The absence of cumulative DIF against deaf people is clearly unexpected. It is obvious that such outcome cannot be trusted.

The likely reason why techniques such as IRT and MGCFA are largely imperfect is that they can only detect relative group differences, not absolute group differences. That is, when the latent score is equated, both techniques examine whether one (or more) item(s)/subtest(s) behave differently relative to the total set of items/subtests. But if one specific group (the deaf or blacks) is depressed equally in all items/subtests, neither IRT nor MGCFA will be able to detect this kind of cultural bias. Clauser & Mazor (1998, pp. 286, 292), Nandakumar (1994, p. 17) and Richwine (2009, p. 54) recognized this problem. This “pervasive bias” usually called “ipsitivity” or “circularity” problem is known to be difficult to circumvent (Penfield & Camilli, 2007, pp. 161-162). Internal methods of DIF are ipsative; that is, holding ability constant, if one group tends to miss some items unexpectedly, it must unexpectedly answer other items correctly. The dramatic consequences will be that the items disfavoring one group will be cancelled by other items favoring this group. This results in spurious non-detection of bias at the total score level.

References