Murphy and DeShon (2000) show that interrater correlations do not provide reasonable estimates of the reliability of job performance ratings, and suggest that better estimates can be obtained by applying the methods of generalizability theory. Schmidt, Viswesvaran, and Ones (2000) criticize our suggestions as radical, and argue that: (a) the reliability of ratings should be evaluated using the parallel test model rather than the more general and more realistic generalizability model, (b) reliability and validity are distinct concepts that should not be confused, and (c) measurement models have little to do with substantive models of the processes that generate scores on a test or measure. All three of these ideas were once part of the psychometric mainstream, but progress in psychometrics over the last 3 decades has moved the field well beyond these assumptions and approaches. Modern psychometric theory calls for close linkages between measurement models and substantive models of the phenomena being measured.