A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of idea. — John Ciardi

This is a statistical review of the GIRES report “Trans Mental Health Study 2012”, written by McNeil et al. (2012). What makes this report interesting is not the report itself, rather the way people talk about it and claim its findings as a definitive observation. I don’t believe that it is statistically safe to draw firm conclusions from this report. I will detail a variety of technical reservations (of varying severity), but to me the most striking feature of this report is that around half the available responses indicate that respondents had experienced childhood abuse. Either by design or careful analysis I think it is important to know whether this had any bearing on the responses to other questions. I agree with the report’s self-description as a pilot study. It provides insights into the lived experience of a group of vulnerable people and demands careful and accurate follow-up. It does not provide evidence as to how best to respond to their needs.

I have taught critical appraisal skills to a wide range of working healthcare practitioners and indeed developed a spin-off teaching critical appraisal skills to road safety professionals. Critical appraisal is an essential component of evidence based medicine, and my experience is based as much on what I have learnt from the professionals I taught as well as from my own statistical skill-set. That skill set has included conducting systematic reviews and (statistical) meta-analysis. Greenhalgh (2014) is a core guide to critically appraising papers about health. She emphasizes the primacy of study design in determining the strength of evidence we are offered. It is also worth noting that a detailed reporting framework has been generated specifically for observational studies, namely the STROBE statement (von Elm et al. 2007).

One key idea of Greenhalgh (2014) is that of a hierarchy of evidence (in quantitative research) and so we regard the strength of evidence from a cohort study such as Dhejne et al. (2002) to be innately more valuable than from a cross-sectional study such as McNeil et al. (2012). And the finding from Dhejne et al. (2002) is that that the suicide rate remains higher in trans-individuals post transition than the general population. This in itself is a finding that is limited to the context of the study; the cultural context of a generally trans-supportive country and their definition of “trans”. That one study in itself is not definitive and deserves follow up. However, cohort studies are expensive to conduct, but that is the price we have to pay for robust scientific knowledge

Best (2012) describes as the social phenomenon of a mutant statistic, where a number gets taken out of context and acquires a life of its own. The results in this report have acquired this status in relation to understanding the role of transition upon mental health. These results therefore need very careful attention. This is especially as we are discussing suicide, and the reporting of suicide requires great care.

This review only concerns the data collected and used by McNeil et al. (2012). It is only a review of the statistical evidence provided, but a vitally important point is that experts in the field note that whilst suicidal ideation and attempted suicide are serious markers of mental ill-health requiring appropriate support, they don’t necessarily correlate with a wish to die (Wolford-Clevenger et al., 2018). It is also widely acknowledged that such a sensitive issue really needs to examined by very subtle question wording or even interviews to better elicit what we mean by ideation, intent and self harm. Web-based self-report surveys do not seem the most suitable data collection method for this purpose. Ultimately therefore, this is a limited data collection method, and the survey wording and interpretation really require interpretation by someone with suitable expertise in this area.

Survey non-response

One striking feature of the data in McNeil et al. (2012), common with most surveys, is that there are few responses where every question has been answered. There is no description of methodology used to correct for biases that arise from this non-response. Little and Rubin (2014) present a state of the art approach to handling missing data; but in the absence of these methods some information should be presented on levels of partial completion and the method used to address this. It is obviously pertinent in evaluating the survey performance if specific questions were answered less often than others. Even if such as a naive approach as complete case analysis were used it should be reported as such. It appears that a rather ad hoc method has been used whereby all data available for a specific question of pair of questions has been used. But it should be noted that best practice, such as specified by the STROBE statement, states that any and all pre-specified hypotheses are listed and that a flow diagram is constructed which indicates the number of relevant responses available for analysis of these questions.

Demographics of survey respondents

An uncontrolled web survey runs the risk of respondents completing the survey more than once, and while this is acknowledged by the authors and they report addressing this, no information is given on the number of rows eliminated from the survey.

One important feature of the survey in terms of its representativeness is that the number of responses recording that they had transitioned, or wish to transition is considerably higher than reported by earlier GIRES studies suggesting that as few as 1 in 5 wish to transition surgically. In McNeil et al. (2012), the responses are made up as follows:

Surgery to REMOVE MALE physical characteristics or to CREATE FEMALE physical characteristics: 363 Surgery to REMOVE FEMALE physical characteristics or to CREATE MALE physical characteristics: 276 BOTH of the above types of surgery: 12 I have never wanted to have, or undergone, any of these: 79

One element of survey critique is to ensure that the sample are representative of the study population. Section 2 of the report acknowledges that this is a complex topic.

“We are mindful though that the sample may not be demographically representative of the trans population as a whole”

Poststratification was attempted by YouGov in a Stonewall survey to make it more representative in this way, but was not attempted in this McNeil et al. (2012).

Identifying a “trans” community is both difficult in terms of definition and the breadth of what is taken to be trans. There is a design conflict here. The more precisely the target group for a survey are designed, the less global variation you have in the answers and hence the more insightful the findings. The researcher’s choices are either to precisely define some inclusion criteria (a particular and specific sub-group of the trans-umbrella) or to use complex statistical modelling afterwards to allow for the presence of groups within the data. In practical terms, it is also difficult in terms of communication to contact a group when there is no clear definition of the group of interest.

Certainly, there is little doubt that ingrained prejudice within society makes this work of contacting respondents more difficult. However, there are many other sections of our society that are described as “difficult to reach groups” when we wish to study them. There is plenty of relevant technical guidance for suitable survey methodology. Well established reviews of these methods include Marpsat and Razafindratsima (2010) as well as Wagner and Lee (2015). Snowballing is about the only one of these methods which has been used. This though raises an important technical concept. It is vital to consider the idea of the effective sample size. Typically this is far smaller in surveys such as these than the actual number of responses (Heckathorn, 1997 and 2002). Where you have a sample drawn entirely at random from a sample frame the effective sample size and the actual sample size should be similar. It is not possible to use a sample frame in this case (which is fine in and of itself), however, roughly speaking, respondents may be more alike in key ways than would occur when drawing names at random from a sample frame. The result is, in information terms, that the additional value of a new survey response is less than if the respondent had been selected at random. It is therefore ambitious to claim in section 2 that

“While our sample is essentially one of convenience, we believe that we have fairly robust findings given the sheer size of the sample”.

The effective sample size (the equivalent sample size had it been taken completely at random) is likely much smaller than 1,000, and the number available to analyse (given few questions were answered by every respondent) is far smaller. This is well recognized in classical survey research and corrections are made throughout the process. Indeed, there are specific methodologies proposed for corrections e.g. Valliant and Dever (2011) in this web-survey context. These have not been used and we do not have a valid estimate of the effective sample size. Sample size is really only important in terms of understanding “random” error, it does nothing to help with all the other biases that can occur in statistical collection (non-reporting bias, recall bias and so on).

These comments are not intended to dismiss the human cost of distress reported in this survey. In many ways it may have been more appropriate to report this as a qualitative survey than attempt any kind of statistical inference. The human narrative of distress this survey reveals, which we could address, should not be ignored. But, however true these self reports may be for the individuals at the time of survey completion, it is ambitious to claim that these findings are representative of a broader “trans” community because we are neither sure how to understand the community who filled in the survey or the wider “trans” community.

What the survey records

One overriding problem is that the survey assumes a shared definition of the concepts being measured and then assumes that these can be captured in an objective, standard manner.

What do YOU think gender means?

Even measuring an uncontroversial and objective concept such as eye colour is challenging. McNeil et al. (2012) assume a level of shared understanding of the self reported gender categories which may not exist. For example, since the 1950s, “gender” has been an overloaded word in English usage. The Oxford Online dictionary states

“Either of the two sexes (male and female), especially when considered with reference to social and cultural differences rather than biological ones. The term is also used more broadly to denote a range of identities that do not correspond to established ideas of male and female.”

The footnotes on usage state that:

“The word gender has been used since the 14th century as a grammatical term, referring to classes of noun designated as masculine, feminine, or neuter in some languages. The sense denoting biological sex has also been used since the 14th century, but this did not become common until the mid 20th century. Although the words gender and sex are often used interchangeably, they have slightly different connotations; sex tends to refer to biological differences, while gender more often refers to cultural and social differences and sometimes encompasses a broader range of identities than the binary of male and female”

Old English stopped having gendered nouns since the 13th Century. In English therefore, many people will understand the world “gender” to mean biological sex. Others see gender to be a purely some cultural and social differences. Hence, if you present the following options to someone:

I have a constant and clear gender identity as a woman I have a constant and clear gender identity as a man I have a constant and clear non-binary gender identity I have a variable or fluid non-binary gender identity I have no gender identity I am unsure of my gender identity

you can find a range of people, with a range of ideas as to what “gender” means ticking the same box. Especially in an online survey, whilst those commissioning and analysing the survey might be very clear what they mean by “gender identity” (and the various specific forms), we have no idea at all as to the “concept” that was being interpreted as respondents filled in the survey. Without information on natal sex, we don’t know how to interpret someone stating their sexuality as “lesbian”. Indeed, this is apparent from the quoted responses given in the report.

This survey draws our attention to important topics as the abuse and lack of freedom suffered by a section of our community, and consequent effects in terms of employment or healthcare. These are so important that they deserve careful and appropriate analysis. It is difficult to do this by claiming we have an objective representation of a subjective interpretation of an individual’s lived experience that may differ for each response and may make it harder to find appropriate solutions.

One important reflection is that both natal sex, sexuality and transition pathway are known to be differentially associated with suicide risk and could usefully have been carefully recorded. Wasserman et al. (2005) conducted a review of 15–19 year old youth globally, using the World Health Organization (WHO) Mortality Database, finding the mean suicide rate for this age group for 90 countries was 7.4 deaths per 100,000 and that these were higher in males (10.5) than in females (4.1) in all countries except China, Cuba, Ecuador, El Salvador and Sri Lanka. Overall, they found that suicide was reported as being 9.1% of all deaths in the 90 countries for that age group. In terms of self harm and suicidal ideation, Hawton (2000) found greater deliberate self harm and suicidal ideation in females, and suggested that deliberate self harm was used to communicate distress, or to modify the behaviour of others. Conversely, deliberate self harm was more associated with suicidal intent in males. They also reported that in community samples, suicidal ideation is reported more often by females. Same-gender sexual orientation has been repeatedly shown to exert an independent influence on suicidal ideation and suicide attempts, suggesting that risk factors and markers may differ in relative importance between lesbian, gay, and bisexual individuals and others. Silenzio et al. (2007) reported that lesbian, gay, and bisexual respondents reported higher rates of suicidal ideation and suicide attempts than did heterosexual respondents. In other words, whilst it raises sensitive issues around the collection of data, understanding whether someone is on a Male-to-Female or Female-to-Male journey, or another journey in life, as well as their sexuality could be very pertinent to better understanding the relationship between suicide ideation and suicide risk. Indeed, a recent pilot study (Toomey et al. 2018) suggested that there is a

“heightened risk for female to male and nonbinary transgender adolescents”.

This reinforces the idea that in order to understand the risk of suicidal ideation and attempts, it is important to understand all the potentially relevant risk factors and not to assume that there is only one cause and only one possible solution.

Transition and suicidal ideation

Whatever the original intention of this report, it is is now used by others to argue about the suicidal risk for people from the trans community in relation to transition. Therefore, this section concentrates on the results around these topics.

Transition, and the effect of non-response

Section 4.2 states that 784 respondents answered the question on transition. Most of the participants stated that they were undergoing a process of transition. 29% had already undergone some form of transition (in other words, approximately 227 responses), 17% were proposing to start that process (133 responses). 13% did not wish to transition (102 responses) with the responses of 322 (41%) not being described in this part of the report.

Conversely, the cross-tabulation in section 4.3 (table 5) reports on 745 responses (not 784), with 86 (11%) not wishing to transition, 131 (18%) proposing to transition 259 (35%) undergoing transition, 214 (29%) had transitioned, 40 (5%) unsure and 15 (2%) other.

Table 6 has 680 responses. 85 (13%) not wanting to transition, 35 (5%) unsure 127 (19%) proposing, 233 (34%) currently undergoing and 200 (29%) having undergone transition.

Whilst these seem like small differences in terms of the overall picture it illustrates the problem with inappropriate handling of missing responses. These small differences, e.g. 227, 214, or 200 reporting transition, are compounded as additional factors are considered in one analysis. If modelling the relationship between transition and suicidal tendencies, conditional on a number of known risk factors, the number of available responses could be very small.

Deliberate self-harm

53% (311) of the the 583 responses to this question stated that they had self-harmed at some point, with 11% (62) currently self-harming

“Just under 60% of the participants felt that there were reasons they self-harmed which related to them being trans, while 70% felt there were non-trans related reasons for their self-injury. This suggests that although being trans may be one factor, others are also relevant.”

The relevant cross tabulations are not given. As noted elsewhere, there are many other factors which need to be considered when determining the most effective method to prevent self harm, and the mental ill health which precedes this.

Suicidal ideation and suicide attempts

Page 59 reports that 84% of 581 responses, i.e. 488 had thought about ending their lives at some point. 27% of these (either 131 or 127 responses depending whether we take the n=488 calculated or if we take the n=471 given in the report) had thought about in the last year with 4% thinking about it every day (20 responses). Self reported suicide attempts in the last year were noted by 11% of 427 or 47 responses with 48% of 436 (215 responses) reporting at some point in their life.

These numbers have to be contrasted with the claim made on page 8 of the report:

“While our sample is essentially one of convenience, we believe that we have fairly robust findings given the sheer size of the sample. With a total sample just short of 1000 participants this exceeds the sample size of recent online surveys with comparable ‘hard-to-reach’ populations (e.g. LGB people).”

As noted, the 1,000 number is not justified, the effective sample size is much lower. In the context of “just short of 1,000 participants”, we appear to have 215 reports of lifetime suicide attempts, 47 in the last year. We don’t know what is meant by “suicide attempt”. Suicide experts appear to distinguish various forms of attempt (from cries for help through to failed attempts). I reiterate, this is not to discount the human suffering that is being described. I am drawing attention to the difficulties of making a clear interpretation of what these survey responses tell us beyond the very clear and extreme mental distress. Suicidal intent, ideation and precursors are clearly sensitive areas to research and require carefully constructed survey methods. Wolford-Clevenger et al. (2018) report a careful meta-analysis of suicidal ideation and do indeed suggest a much higher level of suicidal ideation amongst the trans community. However, they make a number of clear suggestions, such as psychological pathology, to determine more clearly potential causal factors in actual suicides and to be sure that effective interventions are developed. Wolford-Clevenger et al. state:

“Across these 42 studies an average of 55% of respondents ideated about and 29% attempted suicide in their lifetimes. Within the past year, these averages were, respectively, 51% and 11%, or 14 and 22 times that of the general public. Overall, suicidal ideation was higher among individuals of a male-to-female (MTF) than female-to-male (FTM) alignment, and lowest among those who were gender non-conforming (GNC). Conversely, attempts occurred most often among FTM individuals, then decreased for MTF individuals, followed by GNC individuals.”

As noted above, relevant information to assess differences between the transition routes is missing from the McNeil et al. (2012).To a statistician assessing this report, the most striking observation is that the knowledge of suicide experts is vital in constructing, conducting and interpreting such a survey. The data available in McNeil et al. (2012) does not lend itself to a comparison with the conceptualisation presented by Wolford-Clevenger et al. (2018). It does not seem that it serves anyone, least of all those trans people suffering mental distress to suggest that this study suggests there are simple answers to the mental distress that is clearly experienced. Any claims that there are simple answers need to be very carefully examined by good quality evidence.

Suicide ideation and attempts in relation to transition

Greenhalgh (2014) seems to be very clear that attempts to claim causality cannot be made from a cross-sectional survey. McNeil el al. (2012) state that

“Suicidal ideation and actual attempts reduced after transition, with 63% thinking about or attempting suicide more before they transitioned and only 3% thinking about or attempting suicide more post-transition. 7% found that this increased during transition, which has implications for the support provided to those undergoing these processes (N=316)”

Whilst the claim that this has implications for support for transitioning people is valid, the broader claim, that transition reduces suicide risk is not supported by the data. First, section 4.2 implies we have at most 227 people who have transitioned. Only 227 can compare the effect of transition on suicidal thinking and attempts before and after transition. If this is the case, it raises the issue of recall bias (Coghlin, 1990). Moreover, if this is the relevant figure, then at most 143 (63% of 227) responses stating that they had less suicidal ideation after transition than before. But given the fact that few questions have been answered fully by all respondents the number is likely smaller, and needs further reduction due to the design effect (i.e., the effective sample size is much lower).

The alternative way in which these figures could have been generated is to suggest that of the 133 responses who were planning on transitioning, 63% or 84 responses indicated high levels of suicide ideation or attempts and that the post-transition group reported lower levels.

But far more importantly, the fundamental the study design question is whether the two groups are comparable. The pre-transition group may be different in many other ways to the post-transition group. There are many potential confounding variables which could be associated with suicidal ideation, suicide attempts and deliberate self harm.

Most strikingly, almost half (49%) of 536 reported some form of abuse in childhood and breaks down the type of abuse by category in table 18. Clearly, childhood abuse is a major risk factor for subsequent mental ill health, self harm, suicidal ideation and actual suicide. This is one of the most alarming findings of the study. Suggesting that there is a single solution to the range of complex problems experienced by this survey group seems an ambitious claim based on the data, given the way they were collected and analysed.

Other important risk factors include the 58% of 492, i.e., 285 responses identified as having a disability or chronic health condition, with nearly a fifth of the sample (up to 92 responses) experiencing some form of learning impairment, intellectual disability or other neuro-diversity. 18% of 582 responses (93) indicated that they had caring responsibilities. 62% of respondents have alcohol dependence or abuse issues. All these are known risk factors for adverse mental health outcomes.

In order to understand the relationship between transition and any self reported mental health, we need to understand how any of the plausible risk factors are related to the markers of distress. In a companion article I attempt to explain how these factors could be accounted for by careful statistical modelling. However, I doubt the sample size is large enough to do this. It may be more appropriate to examine these issues by means of design.

Formal Statistical Testing

To conduct many statistical tests leaves one open to committing the Texas Sharpshooter Fallacy. In the absence of a clearly stated hypothesis prior to data collection, the claim of statistical significance cannot be defended (see the Strobe statement for more details). McNeil et al. (2012) claim statistical significance for many tests which is not valid statistical practice. Page 83 of the report make the claim:

Transition was related to improved life satisfaction (Satisfaction with Life Scale scores being statistically significant when separated by stage of/desire to transition; F=18.506, df=5, p<0.005).

Even if this were the single pre-specified test, the wrong test has been used, it has been interpreted wrongly and although full details are not available it looks as if may have been constructed wrongly. In a companion piece, I will look harder at the use of statistical testing in this report. In brief, my view is that the report would have been better without any statistical hypothesis tests. I feel it would be more appropriate, as a self-acknowledged pilot study, to report findings without claiming any wider statistical significance, and for other studies to follow up carefully many of the issues that are raised in this report using more powerful study designs and data collection mechanisms.

Conclusion

The report does indeed acknowledge that it is a pilot (section 8, page 91) and highlights the need for further research. It suggests that such future work should be conducted in partnership with representatives of the trans community. These seem like eminently reasonable summaries of the status of the work described by this report. Given the sensitivity of suicide research it seems important to also have expertise in that area. The problem is not so much with such a pilot study, but those who award it a firmer status than it deserves. By way of a quick illustration of the importance of study design, consider Witcomb et al. (2015) who compare and contrast attitudes to body self image between trans people, people with eating disorders and a control group. They claim that to have found stronger, more negative, body image issues in trans group than the eating disorder group. They gain strength in their conclusions by explicitly carrying out a comparison.

In relation to suicide, clearly this is a complex and sensitive topic which needs careful research in conjunction with appropriately experienced professionals, both to make sure that the right things are being measured, but that they are being measured in a way which is unlikely to harm survey responses. In relation to suicide risk amongst trans persons, McCann and Sharrick (2016) have performed a thorough literature review identifying only ten studies that met basic quality criteria. They concluded that mental health nurses need appropriate training with respect to “culturally competent care”. There seems little doubt that this is the case, that there is much more that needs to be done to provide appropriate health care for those in the trans community, whether or not they wish to share their trans identity with a mental health professional. However, using McNeil et al. (2012) to argue that transition is the answer to suicide risk seems like a gross misrepresentation of the evidence presented by this study.

It also seems sensible to have expert statistical peer review, as happens in some academic journals.

References

Adams, N., Hitomi, M., & Moody, C. (2017). “Varied reports of adult transgender suicidality: synthesizing and describing the peer-reviewed and gray literature”. Transgender Health, 2(1), 60–75.

Best, J., 2012. Damned lies and statistics: Untangling numbers from the media, politicians, and activists. Univ of California Press.

Coughlin, S. S. (1990). “Recall bias in epidemiologic studies”. Journal of clinical epidemiology, 43(1), 87–91.

Dhejne C, Lichtenstein P, Boman M, Johansson ALV, Långström N, Landén M (2011) “Long-Term Follow-Up of Transsexual Persons Undergoing Sex Reassignment Surgery: Cohort Study in Sweden”. PLoS ONE 6(2): e16885

Gfroerer, Joseph, Arthur Hughes, and Jonaki Bose (2017). “Sampling Strategies for Substance Abuse Research.” Research Methods in the Study of Substance Abuse. Springer, Cham, pp 65–80.

Greenhalgh, Trisha. How to read a paper: The basics of evidence-based medicine. John Wiley & Sons, 2014.

Heckathorn, D. D. (1997). “Respondent-driven sampling: A new approach to the study of hidden populations”. Social Problems, 44(2), 174–199.

Hawton, K. (2000). “Sex and suicide: Gender differences in suicidal behaviour”. The British Journal of Psychiatry, 177(6), 484–485.

Heckathorn, D. D. (2002). “Respondent-driven sampling II: Deriving valid population estimates from chain-referral samples of hidden populations”. Social Problems, 49(1), 11–34.

Kitsuse, John I., and Aaron V. Cicourel (1963) “A note on the uses of official statistics.” Soc. Probs. 11: 131.

Little RJ, Rubin DB (2014). Statistical analysis with missing data. John Wiley & Sons.

Marpsat, Maryse, and Nicolas Razafindratsima (2010). “Survey methods for hard-to-reach populations: introduction to the special issue” Methodological Innovations Online 5.2: 3–16.

McCann, E., & Sharek, D. (2016). “Mental health needs of people who identify as transgender: A review of the literature”. Archives of Psychiatric Nursing, 30(2), 280–285.

McNeil, J., Bailey, L., Ellis, S., Morton, J., & Regan, M. (2012). Trans Mental Health Study 2012. Scottish Transgender Alliance.

Silenzio, V. M., Pena, J. B., Duberstein, P. R., Cerel, J., & Knox, K. L. (2007). “Sexual orientation and risk factors for suicidal ideation and suicide attempts among adolescents and young adults”. American Journal of Public Health, 97(11), 2017–2019.

Toomey, R.B., A.K. Syvertsen, M.Shramko (2018) “Transgender Adolescent Suicide” Pediatrics 142 (4) e20174218

Valliant, R., & Dever, J. A. (2011). “Estimating propensity adjustments for volunteer web surveys”. Sociological Methods & Research, 40(1), 105–137.

Von Elm, E., Altman, D. G., Egger, M., Pocock, S. J., Gøtzsche, P. C., Vandenbroucke, J. P., & Strobe Initiative. (2007). “The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies”. PLoS Medicine, 4(10), e296.

Wagner, J., & Lee, S. (2015). “Sampling rare populations”. In T. P. Johnson (Ed.), Handbook of health survey methods (pp. 77–106). Hoboken: Wiley.

Wasserman, D., Cheng, Q. I., & Jiang, G. X. (2005). “Global suicide rates among young people aged 15–19”. World Psychiatry, 4(2), 114.

Witcomb, G. L., Bouman, W. P., Brewin, N., Richards, C., Fernandez‐Aranda, F., & Arcelus, J. (2015). ”Body image dissatisfaction and eating‐related psychopathology in trans individuals: A matched control study”. European Eating Disorders Review, 23(4), 287–293.

Wolford-Clevenger, C., Frantell, K., Smith, P. N., Flores, L. Y., & Stuart, G. L. (2018). “Correlates of suicide ideation and behaviors among transgender people: A systematic review guided by ideation-to-action theory.” Clinical Psychology Review.