Conclusions After accounting for patient, surgeon, and hospital characteristics, patients treated by female surgeons had a small but statistically significant decrease in 30 day mortality and similar surgical outcomes (length of stay, complications, and readmission), compared with those treated by male surgeons. These findings support the need for further examination of the surgical outcomes and mechanisms related to physicians and the underlying processes and patterns of care to improve mortality, complications, and readmissions for all patients.

Results 104 630 patients were treated by 3314 surgeons, 774 female and 2540 male. Before matching, patients treated by female doctors were more likely to be female and younger but had similar comorbidity, income, rurality, and year of surgery. After matching, the groups were comparable. Fewer patients treated by female surgeons died, were readmitted to hospital, or had complications within 30 days (5810 of 52 315, 11.1%, 95% confidence interval 10.9% to 11.4%) than those treated by male surgeons (6046 of 52 315, 11.6%, 11.3% to 11.8%; adjusted odds ratio 0.96, 0.92 to 0.99, P=0.02). Patients treated by female surgeons were less likely to die within 30 days (adjusted odds ratio 0.88; 0.79 to 0.99, P=0.04), but there was no significant difference in readmissions or complications. Stratified analyses by patient, physician, and hospital characteristics did not significant modify the effect of surgeon sex on outcome. A retrospective analysis showed no difference in outcomes by surgeon sex in patients who had emergency surgery, where patients do not usually choose their surgeon.

Participants Patients undergoing one of 25 surgical procedures performed by a female surgeon were matched by patient age, patient sex, comorbidity, surgeon volume, surgeon age, and hospital to patients undergoing the same operation by a male surgeon.

This large, population based, matched cohort analysis found small differences in surgical outcomes between patients treated by female and male surgeons, with the former having a small but statistically significant decreased risk of short term postoperative death.

Female and male physicians differ in their practice of medicine in ways that might substantially affect patient outcomes. Outcomes after surgery depend on the technical and cognitive skills of treating physicians, so findings from medical specialties might not apply to surgical specialties.

Surgical disciplines are disproportionately male despite increasing numbers of female medical students. 3 4 5 6 7 Gender equity in the surgical profession, including disparities in compensation and promotion, is a growing concern. 10 11 Assessing outcomes for female and male surgeons is important for combatting implicit bias and gender schemas that might perpetuate current inequalities. 10 We sought to determine whether postoperative outcomes for patients having one of 25 surgical procedures (emergent and elective) differ between male and female operating surgeons using a large, population based cohort.

Successful surgical practice has four core components: knowledge, communication skills, judgment, and technical proficiency. 1 The acquisition and maintenance of technical skills distinguishes surgeons from many other doctors, and these skills are directly associated with short term postoperative outcomes. 2 Women and men practise medicine differently, 3 4 5 6 7 although little research exists on the differences in learning styles, acquisition of skills, or outcomes for female and male surgeons. 8 Tsugawa et al found that beneficiaries of US Medicare that were treated by female general internists in hospital had lower rates of 30 day mortality and readmission than those treated by male internists. 9 Suggested mechanisms for this difference include female doctors being more likely to use a patient centred approach and to follow evidence based guidelines. Surgery, however, has a major technical component, so there is less reason to expect a difference in outcomes between female and male surgeons.

No patients were involved in setting the research question or the outcome measures, nor were they involved in developing plans for implementation of the study. No patients were asked to advise on interpretation or writing up of results. There are no plans to disseminate the results of the research to study participants or the relevant patient community. Individual patient consent was not sought given the use of anonymised administrative data.

Matching considerably reduced our sample size, which might adversely affect the applicability of our findings. Thus, we examined the entire cohort (n=1 159 687) using regression with multivariable GEE models with a logit link, rather than matching, to account for differences in patient, surgeon, and hospital factors between patients treated by female and male surgeons. The GEE model accounted for correlation in outcomes based on surgeon, hospital, and procedural fee code.

We conducted subgroup analyses to assess for an interaction between patient (age, sex, and comorbidity), surgeon (specialty, age, years in practice, and annual volume), and hospital characteristics and to assess the association between surgeon sex and outcomes. To minimise the effect of patients selecting their physician, we performed a retrospective stratified analysis of the matched cohort according to whether the surgery was elective or emergent.

Matched data are inherently correlated, so we used multivariable generalised estimating equations (GEE) with a logit link to estimate the association between surgeon sex and the outcomes. We used Poisson regression to examine the association between surgeon sex and hospital length of stay.

In all analyses we examined surgeon sex as an exposure potentially associated with postoperative outcomes. Descriptive statistics were used to compare patients treated by male and female physicians, as well as to compare the characteristics of the male and female surgeons. Given the large sample size, traditional statistical measures were likely to show statistically significant differences where no clinically important differences exist. So we compared groups using standardised differences—the difference in the mean of a variable between two groups divided by an estimate of the standard deviation of that variable among both groups. 23 We defined a clinically important standardised difference as greater than 0.10. 23

Surgeon age, years in practice, operative volumes, surgical specialty, and patient demographics differed between participants treated by female and male surgeons, so we conducted a matched analysis. Variables such as surgeon age could potentially confound the relationship between surgeon sex and outcomes, as male surgeons were older on average. Far fewer patients were treated by female surgeons, so we identified these first and matching them to patients treated by male surgeons. We matched patients 1:1 using a hard match comprising procedural fee code (a unique identifier for each procedure), surgeon volume (separated into quarters based on the number of index procedures performed by each surgeon in the year before index), surgeon age (±3 years), hospital identifier, patient age (±5 years), patient sex, and patient comorbidity using ADG score (categorical). Surgeon years in practice was collinear with surgeon age, so we selected only age for matching. We selected all covariates for matching based on a priori identification without variable selection.

We collected patient age at surgery, geographic location (local health integration networks 20 ), sex, socioeconomic status (based on geographic location), rurality, and general comorbidity according to the Johns Hopkins aggregate disease group score (ADG) 21 from linked administrative databases. The ADG score discriminates better than the Charlson score. 22 We identified data on the treating surgeons including age, sex, years in practice, specialty, and surgical volume. Surgeon sex refers to biological sex, as reported by physicians at the time of registering for licenses to practise medicine in the province of Ontario. Patients’ biological sex is captured in the Registered Persons Database, which we used for demographic information. We were, therefore, unable to identify transgendered patients or surgeons. To determine surgical volume, for each patient and for the specific procedure they had, we identified the number of identical procedures their operating surgeon performed in the previous year. We defined this variable categorically into quarters for each procedure performed, among the whole cohort. We collected hospital institution identifiers to account for variability between institutions. We defined a surgical procedure as emergent or elective using the CIHI-DAD database admission variables, which denote urgent, emergent, or admission from the emergency department before surgery. We considered all same day surgery procedures to be elective.

Our primary outcome was a composite of death, complications, or readmission (to any hospital in the province of Ontario) in the 30 days after surgery, as previously described. 16 We included deaths that occurred in the initial postoperative period in hospital. We think this outcome best captures the overall burden of short term postoperative complications. Our secondary outcomes were the individual components of the primary outcome (death, complications, and readmission within 30 days of surgery), and length of stay in hospital. We used a definition of surgical complications that has previously been used and represents major morbidity. 16 These outcomes were ascertained from health administrative data using a combination of procedural and diagnostic codes that are uniformly collected for all hospitals and patients in Ontario. 14 19

We identified all patients who had one of the 25 index procedures in the study period (n=1 534 592). We excluded patients treated by physicians whose primary declared specialty was non-surgical (n=8753) and patients under 18 (n=29 158). We included only the first one of the 25 procedures for each patient, thus excluding 282 399 procedures in patients already in the cohort and ensuring that each patient was included only once. We excluded patients for whom the date of death preceded the date of surgery (n=332). We think these cases represent coding errors as the majority died several years before the date of surgery. A sensitivity analysis including the 10 patients who died one day before surgery did not alter the outcome of the primary analysis (see appendix). To capture data on hospital factors, we excluded patients for whom the treating institution could not be identified (n=54 263). After these exclusions, the study sample comprised 1 159 687 patients who had surgery between 2007 and 2015.

We linked the following datasets using encrypted patient identifiers: the Ontario Health Insurance Plan (OHIP) database, which tracks claims paid for physician billings, laboratories, and out-of-province providers 16 ; the Canadian Institute for Health Information Discharge Abstract Database (CIHI-DAD), which contains records for hospital admissions 17 ; the CIHI National Ambulatory Care Reporting System, which contains records for emergency department visits; the Registered Persons Database for demographic information 18 ; and the Corporate Provider Database for surgeon data.

Despite using administrative data sources, we performed a sample size calculation to ensure that the proposed analysis was feasible. Based on previous estimates of the study primary outcome (composite event rate of death, readmission, and postoperative complications; 17%) 14 and effect size (4%, or odds ratio 0.96), 9 we calculated a necessary sample size of 94 270 (see appendix). We designed and conducted this study according to STROBE (strengthening the reporting of observational studies in epidemiology) guidelines 15 and the RECORD (reporting of studies conducted using observational routinely collected health data) statement. 11 Sunnybrook Health Sciences Centre Research Ethics Board approved this study (project identification number 375-2016).

We conducted a population based, retrospective cohort study of patients having surgical procedures in Ontario, Canada, between 1 January 2007 and 31 December 2015 to assess the hypothesis that sex of the operating surgeon would significantly affect 30 day postoperative outcomes. The study sample included patients who had one of 25 surgical interventions, selected by multidisciplinary discussion with consultants from all surgical specialties. We selected surgical procedures using the following criteria: inclusion of all surgical subspecialties with female surgeons 12 and either frequently performed in Ontario 13 or having an increased likelihood of complications. We selected coronary artery bypass grafting, femoral-popliteal bypass, abdominal aortic aneurysm repair, appendectomy, cholecystectomy, gastric bypass, colon resection, liver resection, hysterectomy, anterior or posterior spinal decompression, anterior or posterior spinal arthrodesis, craniotomy for brain tumour, total knee replacement, total hip replacement, open repair of femoral neck or shaft fracture, total thyroidectomy, neck dissection, lung resection, radical cystectomy, radical prostatectomy, transurethral resection of prostate, carpal tunnel release, and breast reduction.

Results

We identified 1 159 687 eligible patients who had one of the 25 index procedures in the study period. Surgery was performed by 3314 surgeons, 774 (23.4%) of whom were female, and 2540 (76.6%) were male (table 1⇓). Of the 1 159 687 patients, 144 119 (12.4%) were treated by female surgeons, and 1 015 568 (87.6%) were treated by male surgeons. Patients treated by female physicians were more likely to be female and younger (table 1⇓). We found no difference in year of surgery, comorbidity, income, rurality, or region of residence between groups. Female surgeons were younger and had fewer years in practice. Surgeon volume (based on the number of identical surgical procedures that surgeon had performed in the preceding year) was lower for female surgeons. Female surgeons performed proportionally more operations than men in general surgery, obstetrics and gynaecology, and plastic surgery. Before matching, patients treated by female surgeons had statistically significantly lower rates of the composite endpoint (15 731 of 144 119, 10.9%, 10.8% to 11.1%) than those treated by male surgeons (122 468 of 1 015 568, 12.1%, 12.0% to 12.1%; unadjusted odds ratio 0.94, 0.91 to 0.97, P<0.001).

Table 1 Baseline characteristics of patients included in the analysis View this table:

After matching, the two groups were well balanced with respect to patient age, sex, comorbidity, rurality, and income; surgeon age, years in practice, annual volume of the index procedure, and specialty; and hospital type. More patients treated by female surgeons had their operation in later calendar years, although the difference was not clinically important (standardized difference <0.10).

Differences in surgical specialty, procedural volume, and age of female surgeons meant that patients who were successfully matched to male surgeons (and, thus, included in the primary analytical cohort) were treated by younger surgeons with less experience and lower annual volumes than those who were excluded (supplementary table 2). Included patients were more likely to have had general surgery or an obstetrics and gynaecology related procedure and less likely to have had neurosurgery, orthopaedic surgery, or a urology related procedure. Included patients were younger and more likely to be female; comorbidity did not differ (supplementary table 2).

In the matched cohort, the primary outcome of death, readmission, or complications occurred in 5810 of 52 315 (11.1%, 10.9% to 11.4%) patients treated by female surgeons and in 6046 of 52 315 (11.6%, 11.3% to 11.8%) patients treated by male surgeons (table 2⇓), translating to an absolute difference of 0.43% (number needed to treat to prevent one event=230).

Table 2 Outcomes in the matched study cohort, n (%, 95% CI) unless otherwise stated View this table:

In GEE models accounting for the correlation between matched individuals, patients treated by female surgeons were significantly less likely to experience the composite outcome (adjusted odds ratio 0.96, 0.92 to 0.99, P=0.02). Among the secondary outcomes, patients treated by female surgeons had a significantly lower likelihood of death within 30 days of surgery (adjusted odds ratio 0.88, 0.79 to 0.99, P=0.04) and comparable likelihood of readmission to hospital (0.96, 0.91 to 1.02, P=0.20) or complications (0.96, 0.92 to 1.01, P=0.10). Median length of stay was two days (interquartile range 0-4 days), regardless of the sex of the treating physician. As the mean duration of hospital admission differed among patients treated by female surgeons and those treated by male surgeons (mean 3.9 days (SD 10.0) and mean 4.0 days (SD 11.1), respectively), this was statistically significant in adjusted models (table 2⇑).

We performed a retrospective sensitivity analysis of all outcomes in the matched cohort, stratifying by elective or emergent surgery. Patients who had emergent procedures were less likely to be female and more likely to have these procedures performed by younger surgeons with lower surgical volumes who had practiced for a shorter period of time (supplementary table 3). Specialties wherein emergent procedures are common (such as general surgery) were proportionately more represented than those where it is uncommon (such as obstetrics and gynaecology and plastic surgery), as expected. Emergent procedures were not more or less likely to be performed by a female surgeon. Using GEE models accounting for the correlation between matched individuals, patients treated by female surgeons were significantly less likely to experience the composite outcome when the procedure was elective (adjusted odds ratio 0.94, 0.89 to 0.98, P=0.007) but not emergent (adjusted odds ratio 1.01, 0.96 to 1.08, P=0.636); Pinteraction=0.048). We did not find any evidence of statistically significant effect modification by the urgency of the procedure for secondary outcomes, except length of stay (supplementary table 4).

We found no significant evidence of effect modification when analyses were stratified according to surgical specialty, surgeon age, surgeon years in practice, annual surgical volume of the index procedure, hospital status (academic or community), patient sex, patient age, or patient comorbidity (fig 1⇓).

Fig 1 Likelihood of adverse postoperative outcomes (death, readmission, or complications) among patients treated by female and male surgeons, stratified by physician, patient, and hospital factors

Although the effect estimates remained relatively consistent, many comparisons were no longer statistically significant in these subgroup analyses; for example, differences based on surgical specialty were not statistically significant (P=0.17). Given the dramatic effect of surgeon sex on outcomes of plastic surgical procedures (fig 1⇑; GEE regression model adjusted odds ratio 0.24, 0.17 to 0.36, supplementary table 5), we repeated our primary analysis excluding plastic surgery. The direction and magnitude of the adjusted relative effects were unchanged (adjusted odds ratio of composite outcome 0.963, 0.928 to 0.999, P=0.045; adjusted odds ratio of mortality 0.886, 0.787 to 0.998, P=0.0465).

Matching considerably reduced our sample size, which might adversely affect the extrapolation of these findings to our entire patient cohort. In regression based models to account for confounding, patients treated by female surgeons had a comparably lower likelihood of the composite primary outcome (adjusted odds ratio 0.96, 0.93 to 0.99, P=0.006); the other findings, including mortality, were also comparable to the matched analysis (supplementary table 5). Notably, the year of surgery was not significantly associated with primary outcome in the regression analysis.

To explore the degree to which case-mix variation and surgical volume affected the study conclusions, we examined outcomes in regression models with sequentially greater numbers of covariates. We clustered observations based on the procedural fee code for each model to compare patients undergoing the same operation. Comparing patients treated by female surgeons to those treated by male surgeons, in unadjusted models, the crude odds ratio of the composite endpoint was 0.94 (0.91 to 0.97, P=0.0004). It remained 0.94 (0.90 to 0.97, P=0.0002) after accounting for surgeon age. Finally, we accounted for case-mix and all variables except for surgical volume, and the difference between female and male surgeons became non-significant (adjusted odds ratio 0.97, 0.94 to 1.00, P=0.08).