Key Points

Question Are personality profiles, emotional intelligence, and situational judgment tests useful applicant screening tools for identifying successful residents?

Findings This analysis of 3 screening tool results among 51 postgraduate year 1 through 5 general surgery residents found that, although emotional intelligence and personality factors were significantly correlated with various performance dimensions, only US Medical Licensing Examination Step 1 (accounting for 12% of performance variance) and situational judgment test scores were associated with overall performance 1 year later. Both tools together accounted for 25% of overall resident performance variance.

Meaning Inclusion of situational judgment test assessments in the resident selection process may be warranted.

Abstract

Importance The ability to identify candidates who will thrive and successfully complete their residency is especially critical for general surgery programs.

Objective To assess the extent to which 3 screening tools used extensively in industrial selection settings—emotional intelligence (EQ), personality profiles, and situational judgment tests (SJTs)—could identify successful surgery residents.

Design, Setting, and Participants In this analysis, personality profiles, EQ assessments, and SJTs were administered from July through August 2015 to 51 postgraduate year 1 through 5 general surgery residents in a large general surgery residency program. Associations between these variables and residency performance were investigated through correlation and hierarchical regression analyses.

Interventions Completion of EQ, personality profiles, and SJT assessments.

Main Outcomes and Measures Performance in residency as measured by a comprehensive performance metric. A score of zero represented a resident whose performance was consistent with that of their respective cohort’s performance; below zero, worse performance; and greater than zero, better performance.

Results Of the 61 eligible residents, 51 (84%) chose to participate and 22 (43%) were women. US Medical Licensing Examination Step 1 (USMLE1), but not USMLE2, emerged as a significant factor (t 2,49 = 1.98; β = 0.30; P = .03) associated with overall performance. Neither EQ facets nor overall EQ offered significant incremental validity over USMLE1 scores. Inclusion of the personality factors did not significantly alter the test statistic and did not explain any additional portion of the variance. By contrast, inclusion of SJT scores accounted for 15% more of the variance than USMLE1 scores alone, resulting in a total of 25% of the variance explained by both USMLE1 and SJT scores (F 2,57 = 7.47; P = .002). Both USMLE1 (t = 2.21; P = .03) and SJT scores (t = 2.97; P = .005) were significantly associated with overall resident performance.

Conclusions and Relevance This study found little support for the use of EQ assessment and only weak support for some distinct personality factors (ie, agreeableness, extraversion, and independence) in surgery resident selection. Performance on the SJT was associated with overall resident performance more than traditional cognitive measures (ie, USMLE scores). These data support further exploration of these 2 screening assessments on a larger scale across specialties and institutions.

Introduction

Medical educators are increasingly investigating improved methods for screening and selecting applicants for medical training programs.1-3 Screening assessments to determine applicant fit with a residency often include US Medical Licensing Exam (USMLE) scores, medical student performance evaluations, letters of recommendation, personal statements, and in-person interviews.4,5 However, scholars have observed wide variability not only in the way each of these data points are used but also in their ability to estimate later performance in residency.6,7

The ability to select candidates who will thrive and successfully complete a residency is especially critical for general surgery programs. General surgery residency typically spans 5 to 7 years of intense training, most often followed by an additional 1 to 2 years of specialty training.8 These factors require program directors to identify candidates who not only demonstrate the competencies and aptitude required to be a surgeon but also can manage the extended length of training in a high-stress environment. Quiz Ref IDHowever, literature reviews have shown that up to 30% of residents in surgery programs require at least 1 remediation intervention for performance issues,9 most of which involve nontechnical competencies, such as interpersonal skills and professionalism.10-12 In addition, approximately a quarter of those who enter surgery training programs do not stay, resulting in one of the highest attrition rates across medical specialties.13

There are undoubtedly a number of factors leading to these high attrition rates and thus multiple potential solutions (eg, providing more realistic previews of surgical careers to students, enhancing the quality of training programs, and incorporating methods to identify residents at risk for remediation or attrition). However, given the resources involved with current selection practices, remediation programs, and costs of attrition,3 it is critical that program directors are able to effectively and efficiently identify candidates who will be successful in their particular training programs.

We investigated whether candidate assessment practices commonly used in industry could be applied to the resident screening process to maximize applicant-organization fit. Specifically, we used correlation and hierarchical linear regression analyses to assess the extent to which emotional intelligence (EQ), personality profiles, and situational judgment tests (SJTs)—3 screening tools that have received extensive attention for their use in candidate selection in industrial settings—were associated with resident performance 1 year after administration in a large general surgery residency program.

Methods

The 3 screening tools—EQ, personality profile, and situational judgment tests—were administered from July through August 2015 to general surgery residents who were in a large residency training program, and the test results were correlated 1 year later with a multidimensional performance metric. The screening tools and resident performance metric were created and administered as described below. The institutional review board at the University of Texas Southwestern Medical Center, Dallas, waived the need for review and documentation of participant consent.

Emotional Intelligence

Emotional intelligence was assessed with the Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT), version 2.0 (Multi-Health Systems, Inc). This widely investigated tool consists of 141 items and measures each of the 4 EQ branches (eAppendix 1 in the Supplement). Moderate but significant correlations have been found between MSCEIT scores and measures of cognitive ability and the 5 basic dimensions of personality, termed the “Big 5” personality traits (ie, openness, conscientiousness, extraversion, agreeableness, and neuroticism), suggesting that EQ is associated with but distinguishable from intelligence and personality.14-16 In addition, positive correlations have been reported with academic achievement17 and psychological well-being.15 Scores are calculated similar to intelligence quotient assessments in that the mean (SD) score is 100 (15).

Personality

The Six Factor Personality Questionnaire (SFPQ)18 was used to assess personality. The SFPQ contains 108 items that assess the traditional Big 5 personality traits but bifurcates conscientiousness into methodicalness and industriousness facets. Each factor has 3 narrow facet scales that are assessed by 6 items each (eAppendix 2 in the Supplement). The factor scale scores range from 18 to 90. Participants responded to SFPQ items using a 5-point Likert scale, with 1 indicating strongly disagree and 5 indicating strongly agree.

Situational Judgment Test

During an SJT, participants are presented with hypothetical but realistic job-relevant scenarios, and they must determine the most and least effective options from a list of potential responses. These items do not measure medical knowledge but measure judgment as well as decision-making and problem-solving skills across a wide array of situations. An example item is provided in eAppendix 3 in the Supplement.

The 50-item SJT and scoring key were created in accordance with other studies.19 The Kendall coefficient of concordance computed for each ranking item showed 0.68 concordance, indicating adequate interrater agreement. Thus, 34 of 50 items (68%) were of sufficient psychometric quality to be included in the final assessment. The maximum score on the SJT assessment was 77 points. A knowledge-based response instruction format (ie, “What should you do?” vs “What would you do?”) was used because this format is less prone to insincere responses.20,21

Performance

Resident performance measures consisted of data from monthly faculty evaluations, faculty- and staff-rated professionalism metrics (ie, interpersonal and communication skills, completion of administrative tasks, conference attendance, duty hour compliance, and so forth that align with Accreditation Council for Graduate Medical Education [ACGME] milestones) that are completed monthly, procedural activity from ACGME case log data, and scholarly activity (ie, raw number of presentations and peer-reviewed publications). An overview of these performance measures is provided in Table 1. Because all programs have unique cultures and because the expectations of residents and the existing evaluation metrics (ie, milestones) at the time of this study were based primarily on monthly faculty evaluations, an institution-specific overall performance metric using the aforementioned measures was created. Nine faculty members who had been tasked with monitoring and evaluating resident performance through the department Clinical Competency Committee were asked to provide weights for each of these factors. The following is the resulting overall performance equation for the program:

Resident Performance = (0.37 × Faculty Evaluation Score) + (0.18 × Professionalism Score) + (0.17 × Case Log Score) + (0.14 × American Board of Surgery In-Training Examination score) + (0.08 × Scholarly Score) + (0.06 × Medical Student Evaluation Score).

Because senior residents are likely to have higher values on some variables (eg, scholarly publications, case logs, and professionalism) owing to their having been in the program longer, z scores were created based on the postgraduate year (PGY) mean, enabling aggregation of performance data from the 5 PGY cohorts. Overall performance was calculated according to the equation above and multiplied by 100. Quiz Ref IDThus, a score of zero represented a resident whose performance was consistent with that of their respective cohort; below zero, worse performance; and greater than zero, better performance. Given the variable weighting of each performance measure, as described above, a resident who far exceeded the cohort mean on faculty evaluations would have a substantially larger overall performance score than a resident who far exceeded the cohort mean on medical student evaluations.

Assessment Administration

Surgery residents at a single institution were invited to participate in this study at the beginning of the academic year. Trainees had 2 hours to complete the assessments. The USMLE scores and performance measures obtained 1 year later were collected for those residents who chose to participate.

Statistical Analysis

Descriptive statistics were computed for the MSCEIT, SFPQ, SJT, and performance measures. Pearson correlations were used to examine associations between variables. Independent paired-samples 2-tailed t tests were used to examine performance differences between residents who chose to complete the assessments and those who did not. Linear regression analysis was used to identify factors independently associated with the performance measures. All statistical tests were 2-sided, and P < .05 was considered statistically significant. All data were analyzed using SPSS software, version 24.0 (IBM).

Results

Descriptive Statistics

Of the 61 eligible residents, 51 (84%) chose to participate in this study (PGY1, 13 of 13; PGY2, 10 of 13; PGY3, 12 of 13; PGY4, 9 of 13; and PGY5, 7 of 9) and 22 (43%) were women. The results of paired-samples t tests revealed that the performance measures of those who chose to participate did not significantly differ from those who chose not to participate. Table 2 presents the means and SDs of the evaluation variables (EQ, personality profile, and SJT results) by PGY. Table 3 provides the standardized means and SDs of performance criterion variables by PGY.

Correlations

Table 4 provides the Pearson correlation coefficients between all evaluation and criterion variables. Overall EQ was associated with the personality facet of industriousness (r = 0.31; P = .03). Within the personality factors, extraversion was significantly associated with both independence (r = −0.34; P = .01) and openness (r = 0.42; P = .002). Methodicalness and industriousness were significantly associated with each other (r = 0.31; P = .02). Quiz Ref IDBoth USMLE1 (r = 0.48; P < .001) and USMLE2 (r = 0.37; P = .004) were significantly associated with American Board of Surgery In-Training Examination (ABSITE) scores. USMLE1 was also significantly associated with overall performance (r = 0.30; P = .02), USMLE2 (r = 0.56; P < .001), and agreeableness (r = 0.29; P = .04). Performance on the SJT was significantly associated with faculty evaluations (r = 0.31; P = .03), medical student evaluations (r = 0.38; P = .03), overall performance (r = 0.41; P = .006), and overall EQ (r = 0.35; P = .02).

Associative Validity

Hierarchical regression analyses were conducted to further examine the associations among these variables. Specifically, we wanted to know the extent to which scores on the evaluation variables (USMLE, EQ, personality factors, and SJT performance) were associated with the overall performance of residents during their residency. We began by including both USMLE scores (Steps 1 and 2) in a regression equation. We found that these scores accounted for 12% of the criterion variance (F 2,57 = 3.68; P = .03). However, only results from USMLE1 emerged as a significantly associated factor (t 2,49 = 1.98; β = 0.30; P = .03). We then entered USMLE1 in a first block of the regression equation and EQ facets in a second block. Neither EQ facets nor overall EQ offered significant incremental variance over the use of USMLE1 scores alone. We performed another set of regression analyses with USMLE1 entered in the first block and personality factors in the second block. Inclusion of personality factors did not significantly alter the test statistic and did not account for any additional portion of the variance. Finally, we conducted analyses with USMLE1 in the first block and SJT scores in the second block. This model accounted for 15% more of the variance than the USMLE1 scores alone, resulting in a total of 25% of the variance explained by USMLE1 and SJT scores together (F 2,57 = 7.47; P = .002) and indicating that SJT scores had significant incremental validity over using USMLE1 scores alone. Quiz Ref IDBoth USMLE1 (t = 2.21; P = .03) and SJT scores (t = 2.97; P = .005) were significantly associated with overall resident performance.

Discussion

This study used correlation and hierarchical regression analyses to assess the extent to which EQ, personality, and SJT scores obtained from 51 residents were associated with their performance in a large general surgery residency program 1 year later. The results showed that the USMLE1 score accounted for a reasonable level of criterion variance and was significantly associated with resident performance, likely because of its strong association with the ABSITE score. The USMLE2 score, however, demonstrated no significant association with our criterion. These findings suggested that, although the USMLE was originally created to inform licensure decisions, the use of USMLE1 scores as 1 component in the resident selection decision can be supported at this institution.

Quiz Ref IDDespite increasing interest in the construct of EQ in the surgery literature,22,23 our data do not support the use of EQ assessments as a screening tool for general surgery residency applicants. We were unable to find significant associations between any of the facets of EQ or overall EQ with any of our performance criteria. A recent review of studies assessing EQ in surgery by McKinley and Phitayakorn24 concluded that no study found a significant link between surgical resident EQ and clinical performance. Even more recently, Hollis et al25 were unable to correlate EQ with either ABSITE scores or faculty evaluations of clinical competency. Thus, despite the growing interest in EQ measures in the surgical community, no data currently exist to support their use as a selection tool.

Personality assessments are often used for applicant selection in industries outside of medicine because such assessments have been shown to have reasonable validity evidence and result in less potential discrimination of protected groups.26,27 In fact, approximately two-thirds of medium to large organizations use some type of personality or aptitude test in applicant screening.28 The present study did not find a direct association in a regression model between any personality factor examined and overall performance. Correlation analyses did, however, indicate a positive association between evaluations received from medical students and both extraversion and agreeableness, such that the more outgoing and kind residents were, the higher their evaluation scores were from the students. The data also revealed a positive association between independence and case log numbers, suggesting that residents who were less reliant on others were more likely to take advantage of opportunities to participate in surgical procedures. Thus, personality factors may contribute to important indicators of success in residency but may not play a sufficiently strong role to have a direct association with overall performance criteria that do not heavily weigh medical student evaluations and procedural activity. However, programs that place more importance on medical student evaluations and procedural activity may find that these personality factors are important factors associated with performance.

The SJT assessment in the present study consisted of written common clinical scenarios presenting residents with challenging situations likely to be encountered in residency. Residents had to make judgments regarding the potential responses under a degree of uncertainty, a concept that is receiving increasing attention in the medical education literature.3,29-33 Residents were scored against a predetermined key defined by 12 clinical faculty members entrenched in the surgical education milieu. The results indicated a positive association between performance on the SJT and overall resident performance. The findings showed that, in this institution, the SJT estimated performance significantly better than a traditional cognitive measure (ie, USMLE score) alone. This finding can considerably contribute to how residency programs screen applicants. By developing customized tools that ask applicants to respond to unique situations that are likely to be encountered in a particular residency program, decision makers may have an opportunity to not only identify who will be successful in that program but also display the organization’s unique culture and values.

Organizational consultants have noted that the use of the SJT for screening applicants provides a realistic job preview, giving the applicant common scenarios in which they would be placed on a frequent basis. In specialties that experience high rates of attrition, such as surgery,9,13 implementing SJTs for candidate selection may be additionally valuable. In fact, the Association of American Medical Colleges (AAMC) is already undertaking preliminary work to incorporate SJTs into medical student selection.33 The results of the present study and efforts such as those of the AAMC support the role of SJTs in medical trainee selection.

In addition, the powerful association between SJTs and performance observed in the present study aligns with efforts to enhance diversity in surgery.34 For programs actively pursuing these efforts, inclusion of nondiscriminatory screening tools that can estimate later performance is needed. Traditional tests of general mental ability and specific cognitive abilities (eg, numerical, verbal, or spatial ability) have elicited concerns regarding fairness because these tests can result in substantial racial differences in test performance that are not matched in job performance.35 As such, the use and weight given to written examinations, such as the USMLE, during the screening process may not align with efforts to enhance diversity. Other screening tools, such as SJTs, have been shown to be equally as associated with performance as cognitive-based assessments but without the discriminatory potential.36 Thus, as indicated by our results, SJTs not only may offer predictive value in estimating performance in a residency but also may play a key role in enhancing diversity in surgical training programs.

Once sufficient data are accumulated to support the use of SJTs and other innovative screening tools, programs have a number of options regarding how and when to use these tools during the screening process. One of the most efficient methods may be to screen all applicants for eligibility and then invite eligible applicants to participate in an online assessment tool that must be completed in a timed setting. This process can provide program directors with standardized and program-specific information that can then be used to identify which individuals should be invited for the next round of screening, whether that consists of another round of assessments, a telephone interview, or simply fewer applicants invited to an on-site interview. Ultimately, the goal is to enhance the quality and relevance of data available to program directors, enabling them to make more informed decisions during the application review and interview invitation process.

Limitations

There are some limitations to our findings. First, these data are from a single specialty in a single institution, making the generalizability of these findings to other surgery programs and specialties unknown. However, because this institution is one of the largest general surgery residency programs in the country, there is little opportunity to create a more robust evaluation within a single institution. Multi-institutional studies can be conducted to further investigate these associations, but the unique values, culture, and performance measures within each program would need to be thoughtfully considered. In addition, despite the rigor with which the resident assessments and processes were created and collected, these evaluations are subject to biases prevalent across medical educational settings.37,38 To our knowledge, no other study has examined resident performance in such a robust manner by creating an overall “performance equation” that consists of weighted values of faculty evaluations, medical student evaluations, departmental staff evaluations of professionalism and administrative responsibilities, in-training examinations, procedural activities, and scholarship. Finally, SJT development, assessment administration, and data analyses were resource intensive. Programs without access to individuals with knowledge and experience in these domains may be unable to adopt these processes, thus limiting distribution of this methodology. However, nonmedical industries have overcome this limitation by using expert consultants in the science of selection to help create the necessary infrastructure, reasoning that they gain a return on their investment through reduced employee attrition and remediation rates. Thus, residency programs without such resources that are interested in replicating or exploring this methodology may similarly benefit by seeking professional consultation. As noted by Sklar,39 better information is not the complete solution; the right people with the right training who know what to do with the information that is collected are also required.

Conclusions

The goal of this study was to explore the extent to which 3 distinct assessments—EQ, personality profiles, and SJTs—offered enough evidence to support their use in resident selection. We found little support for the use of EQ and weak support for some distinct personality factors (ie, agreeableness, extraversion, and independence). However, performance on an SJT assessment better estimated overall performance of residents 1 year later than traditional cognitive measures (ie, USMLE scores) used alone. These data support further exploration of these screening assessments on a larger scale across specialties and institutions.

Back to top Article Information

Corresponding Author: Aimee K. Gardner, PhD, Department of Surgery, School of Allied Health Sciences, Baylor College of Medicine, MS BCM115, DeBakey Bldg, M108K, One Baylor Plaza, Houston, TX 77030 (aimee.gardner@bcm.edu).

Accepted for Publication: September 4, 2017.

Published Online: December 27, 2017. doi:10.1001/jamasurg.2017.5013

Author Contributions: Dr Gardner had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Both authors.

Acquisition, analysis, or interpretation of data: Both authors.

Drafting of the manuscript: Both authors.

Critical revision of the manuscript for important intellectual content: Both authors.

Statistical analysis: Both authors.

Obtained funding: Both authors.

Administrative, technical, or material support: Gardner.

Study supervision: Gardner.

Conflict of Interest Disclosures: Drs Gardner and Dunkin reported providing advice on selection methodology and assessment through SurgWise Consulting, LLC, in which they have ownership interest.