Amongst food additives, aspartame is one of the most controversial, especially in the USA, but also in the UK and the EU. The most recent official attempt to settle the controversy was provided by the European Food Safety Authority’s (or EFSA) Panel on Food Additives and Nutrient Sources added to Food (or ANS) in December 2013 [1]. The ANS Panel: “… concluded that aspartame was not of safety concern at the current aspartame exposure estimates or at the ADI [acceptable daily intake] of 40 mg/kg bw/day” [2]. An ADI is a level of consumption officially deemed to be acceptably safe.

In the context of a set of proposals to transform the European Food Safety Authority into an ‘Open EFSA’, a 2014 discussion paper from the EFSA Board highlighted some of the intended benefits from the openness and transparency to which it claimed to aspire. The text stated:

“The proactive and committed adherence to openness and transparency values by the Authority facilitates an informed debate, both among experts and the public, on scientific issues within EFSA’s remit. Thereby this represents a prerequisite for constructive and informed dialogue between the agency and any interested organisation or individual.” [3].

This paper is in part intended to contribute to a constructive and informed dialogue with EFSA and with other stakeholders. This paper will highlight the limited extent to which the ANS panel’s risk assessment of the artificial sweetener aspartame was in practice transparent. EFSA also stipulated that its risk assessments should be ‘reproducible’ [4], correspondingly this paper will also specify the conditions under which it might be reproducible, in terms of the assumptions that would need to be made.

This paper has two main sections. The first provides a chronological account drawn from a documentary archive of the key highlights of the antecedent scientific and policy debates concerning the safety and/or toxicity of aspartame from the early 1970s onwards, while the second provides a critical review of the December 2013 ANS report and explains why it did not settle the controversy but rather contributed to it. The central question addressed in the second section asks whether the ANS panel’s review of toxicological evidence was symmetrically sceptical? In other words, did it even-handedly try to identify possible unreliable positives (ie studies indicating adverse effects, but unreliably) and unreliable negatives (ie studies not indicating adverse effects, though unreliably), or was it asymmetrically focused more on one than the other?

The answer that emerges from a detailed quantitative and qualitative scrutiny of the studies, and the ANS panel’s representations and interpretations of those studies, is that the panel frequently treated studies that provided no prima facie evidence of harm as if they were unproblematically reliable, even when they were very weak studies, yet discounted the results of every single one of 73 studies that indicated that aspartame could be harmful, deeming all those studies to be unreliable and/or insufficient, even though some were more powerful and sensitive than some of the seemingly negative studies that the panel deemed reliable.

Furthermore, the evidence shows that, if the benchmarks that the ANS panel used to evaluate the results of ‘negative’ studies had been consistently used to evaluate the results of ‘positive’ studies then the panel would have been obliged to conclude that there was sufficient evidence to indicate that aspartame is not acceptably safe. Correspondingly, if the benchmarks that the ANS panel used to evaluate the results of positive studies had been consistently used to evaluate the results of negative studies then the panel would have been obliged to conclude that there was insufficient evidence to conclude that aspartame is acceptably safe. Instead it said that aspartame is safe when consumed at currently estimated rates of consumption. This paper therefore questions the procedure followed by the ANS panel and its conclusion.

Given that animal experiments conducted to test, for example, the safety and/or efficacy of chemicals often generate findings that are equivocal and/or conflicting, considerable efforts have, in recent years, been devoted to developing suitable methods by which to provide systematic reviews of the diverse results of such studies to indicate the overall implications of multiple datasets. One promising example of such novel methods is known as SYRCLE, which is an adapted version of the Cochrane Risk of Bias tool [5]. An alternative approach is known as CAMARADES, which is an acronym for ‘Collaborative Approach to Meta-Analysis and Review of Animal Data from Experimental Studies’ [6]. Both SYRCLE and CAMARADES focus on trying to identify ‘risk of bias’ in animal studies, though their applicability does not extend to epidemiological or clinical studies, though Cochrane Collaboration .guidelines can be applied to those studies. Given that this paper is a review of a central part of one single EFSA document, those approaches are not applicable in this context. However, as and when future official public policy reviews of the safety and/or toxicity of aspartame and other chemical additives, contaminants and nutrients will be conducted, such systematic approaches should be adopted, and their selection and application explained and justified.

Section 1

Regulatory, scientific and corporate context

A good case can be made for the claim that no industrial food additive has been more controversial than aspartame [7]. G D Searle, a US pharmaceutical company, first petitioned the US Food & Drug Administration (FDA) for permission to market aspartame in 1973; in 1974 the FDA announced its intention to grant permission for its use in dry goods, such as table-top sweeteners [8]. Before that decision could be implemented, objections were raised by independent scientists alleging that aspartame could cause mental retardation, brain lesions and neuroendocrine disorders [9]. Before those issues were resolved, further objections were raised focussing on documentary and interview evidence indicating that Searle, and its sub-contractor Hazleton, had failed to conduct properly at least 15 toxicology tests, and that their subsequent reports had been misleading [10].

The key allegation was that the conduct of those studies was seriously incompetent, and that when Searle managers recognised the failures that had taken place they mis-reported the studies to conceal those failures, portraying the studies as if they had been conducted competently and reported accurately. The FDA could not simply discard or discount the evidence it had collected, which revealed that it had initially been misled by unreliable studies and reports, so instead it was discarded and discounted through several separate processes.

The FDA arranged for the 15 problematic studies to be reviewed, but in two separate sets and by different institutions. The FDA’s Bureau of Foods convened 5 of its staff into a Task Force and assigned it to review just 3 of those 15 studies [11]. The job of reviewing the other 12 studies was assigned to an organisation called the US Universities Association for Research and Evaluation in Pathology (UAREP). The FDA negotiated an agreement with G D Searle under which Searle would pay the costs of the UAREP work, in exchange for which the FDA agreed that Searle would contribute to setting the UAREP review’s terms of reference [12]. Dr. Adrian Gross, who was then an FDA pathologist, and who first uncovered the problems with Searle’s laboratory work and reporting, argued in 1976 that the members of the UAREP team were not appropriately qualified to conduct the kind of investigation that was required, and consequently that their eventual conclusions could not be considered to be reliable or definitive [13]. Gross had been instrumental in uncovering the shortcomings in Searle’ tests on aspartame as well as on two pharmaceutical products (flagyl - an antibiotic, and aldactone - a diuretic) [14]. Gross persistently criticised all 15 of the studies, ie both those reviewed by the Bureau of Foods Taskforce and those reviewed by the UAREP.

Both the Bureau of Foods Task Force and the UAREP subsequently issued reports containing reassuring conclusions, but in both cases they only contrived to reach those conclusions because their terms of reference had been set particularly narrowly and in ways that failed to address the critical shortcomings of the studies. Both teams focussed their attention on characterising and comparing the remaining parts of the documentary records of those studies alongside the detectable features of laboratory samples, including samples of animals’ tissues on glass slides, but without examining the prior processes that had resulted in those documents being written and the samples remaining available [15]. The report of the UAREP never explicitly said that the reviewers took Searle’s documentary and laboratory evidence at face value, but that in practice was what happened. On a few occasions, the UAREP highlighted omissions from documents, and inconsistencies between them, but otherwise treated them as if they were unproblematically reliable. The UAREP was not provided with evidence of the incompetence of some of the laboratory staff or the fictional aspects of some of the documentation. They took the documents and the pathology slides at face value, and checked the arithmetic. Although UAREP noted “… a substantial number of minor and inconsequential discrepancies …” during its review, it found “… few, if any, discrepancies which would produce a change of greater than five percent in the final numerical data being compared” [16].

The FDA bureau of food task force report

The conclusion of the Bureau of Foods Task Force stated that while the three tests had not been properly conducted, and although there were marked differences between raw data and the summaries submitted in the petition to the FDA, those differences: “…were not of such a magnitude that they would significantly alter the conclusions of the studies” [17]. The three studies reviewed by the Bureau of Food Task Force were listed using Searle’s numbering system as: E5, E-89 and E-77/78. The last of those three studies has been omitted from our quantitative analysis below because it was a study of the toxicity of diketopiperazine (or DKP) which is one of aspartame’s breakdown products, rather than a study of aspartame itself. It was however included in Section 7.2.4.1 of the ANS panel’s report, where it was interpreted by the panel as a reliable negative.

The Task Force had considerable difficulty in evaluating the studies, in part because in some cases there were no raw data with which to compare the reported results. In other cases, it was impossible to determine which the real raw results were and which were subsequent revisions or summaries. In some contexts, the Task Force had to rely on information and assumptions provided by Searle employees who had not been involved in the original work. At worst, it was impossible to identify the occasion on which a particular animal had died. For example, the report said in relation to E78/79, a study of the toxicity of DKP: “Observation records indicated that animal A23LM was alive at week 88, dead from week 92 through week 104, alive at week 108, and dead at week 112” [18]. Most scientists do not believe in reincarnation, and we should not expect that the FDA or the ANS panel to do so either.

When reviewing the test (E78/79) on DKP, the report listed no fewer than 52 major discrepancies in the Searle submission [19]. One of the central problems concerned the quantities of DKP supposedly consumed by the rats. The FDA investigators found no fewer than 3 separate documents with different specifications for the content and the purity of the test substance, and they were unable to establish precisely which specification, if any, was correct. It was impossible to reconcile the quantity of the chemical requisitioned from stores with the quantities supposedly fed to the animals. There were questions raised as to the extent to which the DKP was uniformly incorporated into the animals’ food. There is clear evidence to show that the test substance was not properly ground, and inadequately mixed, so that it might have been possible for the animals to avoid the DKP while eating their food [20].

Ten years later, at a November 1987 hearing of a US Senate Committee hearing Dr. Jacqueline Verrett, an FDA toxicologist who had been a member of the Bureau of Foods Task Force, explained that the three studies it had examined were “… woefully inadequate …” (p 387) and that: “Almost any single one of these aberrations would suffice to negate a study designed to assess the safety of a food additive, and most certainly, a combination of many such improper practices would, since the results are bound to be compromised. It is unthinkable that any reputable toxicologist giving a completely objective evaluation of this data resulting from such a study could conclude anything other than that the study was uninterpretable and worthless and should be repeated.” [21]. Nonetheless, the ANS panel included E5 in Section 3.2 of its December 2013 review, but deemed it an unreliable positive; though it deemed E89 to be a reliable negative.

When asked to explain the contrast between the 1977 report of the Bureau of Foods Task Force and her subsequent statements to the 1987 Senate committee, Jacqueline Verrett explained that the Task Force members were: “… limited in what we could actually conclude about the studies. We were not allowed to comment on the validity of any study. It was an explicit instruction based on administrative rather than scientific considerations. We were supposed to figure out what the conclusions would have been if the studies had been fully and correctly reported. We were obliged to ignore the protocols and the non-homogeneity of the DKP … Some animals did reject the DKP. Searle initially said that it may not have been fully mixed but that that did not matter, they later said that it had been fully mixed. We were not allowed to consider those issues by the Bureau of Foods administrator … We were ham-strung in being able to comment. The fact is that the studies should not have been considered at all, and that was the position from the beginning.” [22].

The report of the UAREP

In 1978, the UAREP delivered its 1062-page report, which concluded that the 12 studies they had audited were “authentic” [23]. Gross subsequently told the November 1987 US Senate committee hearing that:

“… no amount of additional examinations of pathology material such as undertaken by the UAREP … [or] … new additional statistical analyses … and no judgmental evaluations or interpretations of any data arising from those studies can in any way rectify the basic problem …: in the absence of reasonable expectation that the experimental animals were administered the correct dosages of the test agent, any observational data carried out on those animals must be regarded as questionable or flawed. This is to say nothing of all the myriad of other problems involving the competence of those conducting such studies, and the [lack of] care they exercised in their execution. Once a study is carried out and the test animals are disposed of, all that remains are the number of tiny bits of tissue preserved from their organs for microscopic examination and the written records of observations made by those who actually carried out that study. While the tissues themselves can be examined by others long after the remains of those animals no longer exist, the reliability of the written records has already been found to be unacceptable in a great variety of ways. … Once a study is compromised in its executions, it is beyond salvation by anyone. Even with respect to those small portions of tissue preserved for microscopic examination for an indefinite period of time after any study is completed there are serious problems … there is little if any assurance that such samples of tissues as were preserved actually originate from the specific animals said … to have been their source … Furthermore, due to the unacceptably high rate of post-mortem autolysis, a great many such tissues were not collected at all from the experimental animals.” [11].

The 12 studies for which the UAREP was responsible for reviewing were listed as: E-28, E-33 & 34, E-70, E-75, E-76, E-86, E-87, E-9, E-11, E-19, E-88 and E-90. Of those, E-76 was not included in the ANS panel’s discussion of the toxicity of aspartame, but it was included in relation to DKP. It is consequently omitted from our quantitative analysis below. When referring to the studies reviewed by the UAREP, Verrett said: “… the safety of aspartame and its breakdown products has still not been satisfactorily determined, since many of the flaws cited in these three studies [those reviewed by the FDA Task Force] were also present in all the other studies submitted by Searle [including those reviewed by UAREP]” [24].

Despite the fact that those two reviews had provided reassurances, several objectors remained dissatisfied, and furthermore a new complex set of objections to the safety of aspartame were introduced [25]. In an attempt to resolve the controversy once and for all, the FDA proposed the establishment of a so-called Public Board of Inquiry (or PBoI). This was a unique institution; the procedure had never previously been used, and in all probability will not be used again [26]. The PBoI first met in early 1980, and published its conclusions in October 1980 [27]. Two sets of issues were on its agenda. On one of the crucial questions its view was that aspartame consumption would not pose an increased risk of brain damage resulting in mental retardation, but on the other issue it concluded that the evidence available did not preclude the possibility that aspartame could induce brain tumours. Consequently, the Board recommended that aspartame should not be permitted for use, pending the results of further studies.

The FDA’s attempted follow-up

In 1976, the evidence of misconduct by Searle, in relation to two pharmaceutical products and aspartame, was sufficient to convince the FDA’s Chief Counsel (Richard Merrill) to instruct the Federal Attorney in Chicago to convene a Grand Jury to investigate: “… apparent violations of the Federal Food, Drug, and Cosmetic Act … and the False Reports to the Government Act … by G.D. Searle and Company and three of its responsible officers for their willful and knowing failure to make reports to the Food and Drug Administration required by the Act … and for concealing material facts and making false statements in reports of animal studies conducted to establish the safety of … Aspartame.” [28].

A team of investigators working for US Senator Metzenbaum subsequently gathered and then released a set of documents showing that soon after the Chicago Federal Attorney (Samuel Skinner) received Richard Merrill’s April 1976 letter, he was invited to join the board of the firms of lawyers (Sidley & Austin) then representing G D Searle; he accepted the invitation. The Searle dossier remained suspended until the next Federal Attorney (Thomas Sullivan) was appointed; he too was then invited to join the board of Sidley & Austin, and accepted the invitation [29]. Those processes served to delay legal proceedings until the interval, specified by the Statute of Limitations, had expired. Searle was therefore not prosecuted, but that did not establish the corporation’s innocence.

Informing EFSA

Documents providing detailed empirical support for the above account of the toxicological and regulatory history of aspartame were included in a dossier provided by Erik Millstone to the secretariat of the ANS panel in October 2011. The Secretariat had issued a public request for “… all necessary data (published, unpublished or newly generated) …” on 1st June 2011 [30]. Millstone responded by sending EFSA an annotated list of 30 relevant documents. The Head of the ANS Unit replied requesting by 4 November 2011 digitised copies of 27 of the documents [31]. 26 of the 27 documents were then fully digitised and dispatched to EFSA on a CD-ROM. The exception was the UAREP report, because at 1062 pages, only the Table of Contents was provided. In the event, the ANS panel’s reports of both January and December 2013 failed to mention that dossier, or most of the documents of which it was comprised, which was unjustified when judged by reference to both scientific and policy criteria. Copies of those documents are available at http://www.sussex.ac.uk/spru/research/projects/fcs.

Aspartame’s eventual approval

It was not until 1981, and the arrival of the Reagan administration, that the FDA permitted aspartame’s commercial use, restricting it initially to dry goods [32]. In 1983 the FDA approved its use in beverages, which became its major market for which it was marketed under the name ‘Nutrasweet’ [33]. The eventual approval process was seriously problematic. The FDA Commissioner’s decision was taken against the advice of FDA toxicologists and the Public Board of Inquiry [34]. Commissioner Hayes approved aspartame as one of his first decisions in that post, and was subsequently employed by the US National Soft Drinks Association [35].

When the Reagan administration assumed office in early 1981, Searle’s former CEO accepted the job as the new US ambassador to Beirut, in exchange for a promise that the administration would get aspartame onto the market [36]. The Reagan administration took office a short while after a terrorists group truck-bombed the US embassy in Beirut. Reagan appointed as his new ambassador to Beirut the former chief executive of GD Searle, Donald Rumsfeld.

Aspartame was deemed acceptably safe by the World Health Organisation and UN Food and Agriculture Organisation's Joint Expert Committee on Food Additives (JECFA) in 1980 and by the European Commission’s Scientific Committee for Food in 1985 [37]. Aspartame was also approved in the United Kingdom (UK), on advice from the Committee on Toxicology (CoT), the chair of which had his research indirectly funded by Searle [38]. All of those committees based their judgements on sets of studies that included those that Verrett had accurately characterised as the ‘woefully inadequate’. They were included and evaluated as if they were no more problematic than any other studies; their severe shortcoming were ignored or discounted, or maybe they were not drawn to the attention of the members of those committees. Several of the members of those committees were, moreover, acting as paid consultants to relevant food, beverage and chemical companies, in circumstances when declarations of conflicts of interest were then not required [39]. It would be naive to presume that undeclared conflicts of interest could not have influenced the judgements of those committee members.

Subsequent twentieth century developments

In October 1985 it emerged that Searle had been acquired by the large chemical company Monsanto, which had long been one of the major manufacturers of Saccharin [40]. Monsanto subsequently detached the aspartame business from the remainder of Searle’s operations and established the NutraSweet Company [41].

After aspartame had been approved the controversy shifted to discussions of reports from consumers of acute adverse effects [42]. The most common symptoms were (and are) neurological problems including severe headaches and blurred vision; thankfully reports of epileptic-type seizures, though serious, are rare. Such evidence has repeatedly been officially dismissed as ‘anecdotal’, though the sufferers often report that when consumption ceases so too do the symptoms. Moreover, when symptoms recur, the sufferers of those symptoms often discover that they had inadvertently consumed aspartame [43].

In the 1990s one key development was a paper by Olney et al. in the Journal of Neuropathology and Experimental Neurology suggesting that the introduction of aspartame into the USA may have resulted in a rapid increase in the incidence of a particularly aggressive type of brain tumour [44]. Their evidence was however officially discounted, despite providing convergent indications from animal studies, in vitro mutagenicity tests and human epidemiological data.

Twenty-first century debates

At the end of the twentieth century there were, consequently, good grounds for concluding that no one could be confident that aspartame was acceptably safe. In this century, the main toxicological contributions were provided by the Ramazzini Foundation’s rodent carcinogenicity studies [45].

The conventional protocols for long-term rodent feeding studies typically involve feeding a test compound to four dose-groups of animal, with 50 males and 50 females in each of low-dose, mid-dose and high-dose groups as well as a corresponding control group, making a total of 400 animals. The typical duration for carcinogenicity studies in rodents is 104 consecutive weeks [46], although the OCED has indicated that, while the duration will normally be 24 months for rodents, for specific strains of mice, 18 months may be more appropriate [47].

The first Ramazzini study on aspartame, published in 2005, used 1800 rats. Instead of testing the compound at three dose levels (plus a control group) they tested it at six dose levels plus controls. Instead of killing the rats at 2 years of age, the rats were allowed to live longer so that long-term effects could be studied. Subsequently, the Ramazzini Institute published data from a mouse study of aspartame, and a further rat study that included in utero exposure [48]. Gift et al. have argued that: “The protocols characteristic of RI [Ramazzini Institute] studies can cause interpretive challenges, but aspects of the RI design, including gestational exposure, life span observation, and larger numbers of animals and dose groups, may impart advantages that provide chemical risk assessors with valuable insights for the identification of chemical-related neoplasia not obtained from other bioassays” [49].

In these and many other ways, the Ramazzini study was more thorough, sensitive, reliable and relevant to human exposure than those conducted in accordance with conventional protocols. The authors reported in 2005 that their study: “… demonstrated for the first time that APM [aspartame] is a multipotent [ …] carcinogenic agent …” with dose-related tumour increases in both males and females [50]. In 2010 the Ramazzini team published the results of a study showing that aspartame induced tumours in the livers and lungs of male mice [51].

Several official bodies, including the US FDA, JECFA, the Scientific Committee on Food (SCF) of the European Commission and the UK’s CoT discounted those findings, complaining that the Ramazzini studies had not followed standard protocols. While the Ramazzini protocols were non-standard, their deviations from the standard, by using more animals in more dose groups and not ‘sacrificing’ them prematurely entailed that the Ramazzini studies provided greater sensitivity than could be obtained from a standard study. Keeping the animals until they die may not be common practice, but since European Union (EU) food safety policy legislation stipulates that “Assuring that the EU has the highest standards of food safety is a key policy priority …” [52] we might have expected that EFSA’s benchmark would be the protection of all consumers throughout their entire lives, rather than, for example, only until they reach retirement age. Those considerations imply that the Ramazzini protocol can be expected to provide a better model of the risks to the population of Europe than studies that ‘sacrifice’ the animals prematurely, however orthodox they might be. Premature sacrifice might well result in ‘unreliable negatives’, and unacknowledged ones at that.

One reason why the findings of the Ramazzini rat study had been officially discounted was because of relatively high rates of respiratory infections in the elderly rats [53]. However, as the rates of infection in the test groups were not significantly different from that in the control group, those infections could not explain the dose-related increase in tumours. Caldwell et al. convincingly rebutted the hypothesis that lymphomas and leukaemias were induced by infection. They noted, for example, that while respiratory infections frequently occur in old rats, and in most Ramazzini Institute rat bioassays, leukaemia and lymphoma were only reported in a few animals, namely 8 out of 112, implying that a link between the respiratory infections and those pathologies was improbable [54].

Another reason why the findings of the Ramazzini Institutes studies have been officially deemed not to be reliable has been because the tumour rates in treated animals were within the ranges reported for historical controls, even though they showed significantly higher rates that those in the concurrent controls. One reason why that criterion of interpretation is problematic was articulated by the WHO’s International Agency for Research on Cancer, which had emphasised in January 2006 that: “It is generally not appropriate to discount a tumour response that is significantly increased compared with concurrent controls by arguing that it falls within the range of historical controls ….” [55].

In 2013 the EFSA ANS panel, in line with other statutory risk assessors, discounted the Ramazzini findings as a set of unreliable positives, while accepting as reliable negatives, the evidence of studies that had many more, and far more serious, imperfections including those Verrett previously characterised as ‘woefully inadequate’. Inevitably, there were imperfections in the Ramazzini studies, but all studies are characterised by some imperfections. The ANS panel chose to treat many of the 15 earlier studies [56] (discussed above) as reliable, despite the fact that their imperfections were very substantially greater than those that characterised Ramazzini’s work.

Despite the efforts of EFSA, and a coalition of industrial and commercial stakeholders, to provide reassurances about the safety of aspartame, the accumulation of fresh evidence and public concerns provoked the European Parliament’s Public Health and Consumer Protection Committee to call on the Commission formally to instruct EFSA urgently to initiate a review of aspartame’s toxicity and safety, rather than keeping to a previously-set target-date of 31 Dec 2020 before doing so. In May 2011 the European Commission asked EFSA to re-evaluate the safety on aspartame (E951) as a food additive, and to do so by 31 July 2012 [57].

The ANS panel issued a 245-page ‘draft report’ in January 2013, and requested comments by 15 February 2013 [58]. The abstract of that document stated that the Panel had: “… concluded that there were no safety concerns at the current ADI of 40 mg/kg bw/day. Therefore, there was no reason to revise the ADI for aspartame.” [59] The panel’s draft was problematic in numerous respects; it failed to address the key issues concerning the unreliability of the 15 studies, namely those that had previously been reviewed by the FDA Bureau of Foods Task Force (ie E5, E-89 and E-77/78) and those previously reviewed by the UAREP (ie E-28, E-33 & 34, E-70, E-75, E-76, E-86, E-87, E-9, E-11, E-19, E-88 and E-90), on which Erik Millstone had provided the ANS panel’s secretariat with detailed documentary evidence. All of those 15 studies were cited in the ANS panel’s report, and the only relevant comment on them cited the UAREP’s document, but failed to refer to Gross’s devastating critique of the relevance and reliability of the UAREP review [60].

The way in which the studies, which had been included, had been interpreted appeared consistently inconsistent [61]. The ANS panel had portrayed most of the studies that did not indicate any possible harm, except at dose levels above the nominal ‘no-observed-adverse effect-level’ (or NOAEL) of 4000 mg of aspartame per kilogramme of the body weight of the test-species per day (ie mg/kg bw/day) as unproblematically reliable, while portraying each and every one of the studies indicating possible harm as unreliable and/or inconclusive, even though many of the studies providing positive evidence of toxicity were far more sensitive and rigorously conducted and reported than some of the apparently negative studies [62].

EFSA issued a ‘final report’ on 10 Dec 2013. It culminated with: “Overall, the Panel concluded from the present assessment of aspartame that there were no safety concerns at the current ADI of 40 mg/kg bw/day. Therefore, there was no reason to revise the ADI for aspartame.” [63].

Millstone responded on 14 December with a 3-page critique arguing that the panel had consistently treated a very large majority of negative studies as reliable, while discounting each and every one of the studies providing evidence of harm as unreliable [64]. In response, EFSA’s Head of Regulated Products and senior colleagues held a video-conference on 14 April 2014 with Millstone. On 14 November 2014 EFSA’s Head of Regulated Products wrote to Millstone, referring to a full list of studies. The letter reported that an internal review had concluded that:

“ … we found that the number of studies not indicating harm considered in the [ANS] opinion on aspartame as unreliable was substantially higher (35%) than the number claimed in your letter (20%). Likewise our analysis showed that the number of studies indicating harm by aspartame and treated by the ANS Panel as reliable was not zero as stated in your letter. Instead, we found a similar proportion of studies in which an adverse effect by aspartame was described (typically at a dose above the NOAEL) and that were considered by the ANS Panel as unreliable as compared to reliable (43% vs 57%). Moreover, the relative proportion of studies produced or funded by industry and found reliable to be used in the safety assessment of aspartame to be very similar to the proportion of studies carried out using no-commercial funds, irrespective of the outcome of the study (54% vs 46%). Those findings do not support your claim that EFSA took a pro-industry views. I remain confident after reviewing our internal analysis that the analysis of the studies and the literature reported in the aspartame opinion was conducted in a scientifically rigorous manner and that there was no evidence of bias.” [65].

In other words, EFSA argued that it had been symmetrically sceptical, with respect to both putative false positives and putative false negatives. This paper is in part a response to that claim, but it is also intended to deepen understandings how some regulatory scientific panels discharge their duties.

Section 2

Methods and approach

It is against the background of this contested saga that we report the results of our characterisation of the ways in which the EFSA ANS panel interpreted the individual studies that were included in Section 3.2 of the ANS Panel’s December 2013 review; the section is headed ‘Toxicological data of aspartame’; it extended from pages 56 to 102. All the studies and documents are listed in full in EFSA Panel report (available at https://efsa.onlinelibrary.wiley.com/doi/epdf/10.2903/j.efsa.2013.3496, see pages 151–170).

The goal of this investigation was to establish if the ANS panel even-handedly tried to identify possible unreliable positives and unreliable negatives, or whether it asymmetrically focused more on one than the other? That issue is important because an asymmetry would constitute evidence of bias. If greater effort had been devoted to identifying and discounting false negatives that would indicate bias against aspartame, and in favour of consumer protection, whereas if greater effort had been devoted to identifying and discounting false positives then the bias would have favoured commercial interests to the detriment of consumer protection. Symmetry would imply that the ANS panel was neutral as between those two competing interests. It is important however to acknowledge that a symmetrical perspective need not entail equal numbers of true and/or false positives and true and/or false negatives. That would depend, amongst other things, on the truth regarding the safety or otherwise of aspartame.

A toxicological risk assessment requires more than the collection of data. The data generated by empirical studies need to be evaluated and interpreted. All studies have shortcomings, but some are significantly more robust and reliable than others. That is especially the case when the data derive from studies of model systems, such as laboratory rodents or microbes in glass dishes rather than from clinical or epidemiological studies of people. The relevance of, for example, the results of a rodent feeding study to the probable effects on people can never be taken for granted. Not all studies are equally relevant, and judgements of extrapolative relevance are firstly unavoidable and secondly never determined solely by reference to the results of the study under consideration.

Judging extrapolative relevance is closely related to judgements about the existence, extent and implications of uncertainties, and about how the benefit of the consequent doubts should be allocated. This is the basis for the contrast between so-called ‘positive’ and ‘negative’ regulatory list systems. With a negative list systems chemicals are assumed to be safe until shown to be harmful, while a positive list system assumes that they may be risky unless and until sufficient evidence of safety is provided. The EU is supposed to have a positive list system for many categories of food additives, including intense sweeteners. With respect to Section 3.2 of the ANS panels’ December 2013 review of the safety of aspartame, three main types of answers might be forthcoming to the question of whether the ANS panel's treatment was biased.

1. If the ANS panel’s treatment was symmetrically concerned with, and sensitive to, identifying apparently ‘positive’ findings of toxicity and ‘negative’ findings then its perspective could be characterised as neutral as between commercial and consumers’ interests. For the purposes of this appraisal, symmetry was the null hypothesis. 2. If the panel was more concerned to detect and discount putative unreliable negatives than unreliable positives then it could be characterised as favouring the interests of consumers over commercial ones. 3. If, however, the panel was more concerned to detect and discount putative unreliable positives than unreliable negatives then it should be characterised as favouring commercial interests over those of consumers.

We have used, as the indicators of symmetry or asymmetry, firstly the quantitative frequency with which studies were variously deemed by the ANS panel to be reliable or unreliable, and secondly the qualitative stringency of the hurdles that had to be satisfied for the panel to deem a putative positive as a reliable positive or an apparently negative study as a reliable negative. The features and severity of those hurdles were sometimes explicit in the panel’s text, but often they had to be inferred from the comments in the text, some of which were rather enigmatic, and from the subsequent discussion leading to the Panel’s conclusions.

The data for this analysis were obtained solely by focussing on the studies cited and discussed in Section 3.2 of the December 2013 report (pages 56 to 101), which was entitled ‘Toxicological data of aspartame’. This analysis therefore does not cover parts of the document, such as Section 2.8, that reviewed evidence for an ‘exposure assessment’ or Section 3.1 on ‘Absorption, distribution, metabolism and excretion of aspartame’. The corresponding discussions of the toxicity of aspartame-derived methanol or diketopiperazine (DKP), in Section 7 pp. 127–133, are also outside the scope of this analysis. While our analysis is not exhaustive, it focusses on a pivotal section and is sufficient to sustain robust and relevant conclusions. There is also no reason to think that Section 3.2 might be unrepresentative of the document as a whole; though if it were, that would be no less problematic. A detailed tabulation of all of the studies cited by the ANS panel in Section 3.2 of its December 2013 report is provided in Additional file 2.

Those studies are allocated to the 7 categories, which are listed below in Table 1, in Additional file 1. The resulting statistics are provided in Tables 2 and 3 below.

Table 1 analytical categories used in this report Full size table

Table 2 ANS panel’s interpretation of the reliability of studies for those that had, and had not, indicated possible harm, by number of studies Full size table

Table 3 Numbers in Categories 6 and 7 Full size table

Criteria of categorisation

The following analysis has been constructed by developing two distinct binary characterisations, firstly the tenor of the authors’ conclusion of their reports of their studies and secondly whether the ANS panel had subsequently deemed those studies to be reliable or as unreliable. The first categorisation differentiates the studies into two main groups: those for which the authors provided some evidence of adverse effects on the one hand and those providing no evidence of adverse effects on the other.

As no animal toxicology studies, of which we are aware, has shown adverse effects in all types of cells, tissues or systems, our default assumption has been to categorise studies as ‘positive’, ie indicating possible harm, if the evidence concerned one or more parameters. On the other hand, studies have been categorised as negative if no indications of harm were reported by the authors. The panel’s approach was to allow one important exception to that interpretative criterion. The panel would sometimes report evidence of adverse effects in particular studies that only occurred in animals that had received high doses of the test material [66]. The panel chose to interpret those studies as providing quantitative indicators of lower levels of exposure at which such adverse effects were not apparent; it referred to them in toxicologically-conventional terms as indicating ‘no observed adverse effect levels’ (or NOAELs), which are reported in terms of the dose measured in milligrams per kilogram of the body weight of the recipients per day (or mg/kg bw/day), The panel followed orthodox official practices by interpreting such levels as if they were thresholds of exposure below which adverse effects did not occur, in that variety of that species, or (at any rate) a level at which adverse effects were not observed.

Food additive policies typically invoke what are called ADIs (or acceptable daily intakes), where ADIs are defined as the lowest observed NOAEL, in the most sensitive laboratory species, divided by a ‘safety factor’ (or SF). The most commonly quoted safety factor is ‘100’, ostensibly on the assumption that a factor of 10 accommodates the difference between average humans and average laboratory animals, and another factor of 10 accommodates the variations amongst humans [67]. The prevailing ADI for aspartame in the EU is 40 mg/kg bw/day, as the ANS panel applied a safety factor of 100. We therefore took particular note of which studies provided evidence of adverse effects below or above the designated NOAEL of 4000 mg/kg bw/day. It is important to appreciate, however, that most professional toxicologists think that it is inappropriate to set an ADI for compounds deemed to be carcinogenic by reference to an estimated NOAEL from an animals study. This is because one mechanism by which compounds can act as carcinogens is by damaging the DNA in the nuclei of cells, that is by being genotoxic, and for genotoxic carcinogens one molecule could be sufficient to initiate or promote tumours, and therefore no level of exposure should be deemed acceptably safe. Given that three rodent studies conducted by the Ramazzini institute had provided evidence indicating that aspartame is a rodent carcinogen, the decision of the ANS panel to allocate an NOAEL to aspartame and then to ascribe an ADI to it was problematic.

We gave serious consideration to the possibility of categorising studies only showing evidence of harm at doses above 4000 mg/kg bw/day as providing positive evidence of harm, if only at the higher doses, but for the purposes of this analysis we chose to categorise such studies in line with the ANS panel’s portrayal of them, as not showing adverse effects at levels below the NOAEL. For completeness, our analysis nonetheless includes an estimate of the number of studies in which adverse effects were only evident at levels of exposure above 4000 mg/kg bw/day.

In the following discussion, individual studies and/or papers are referred to either by reference to the family name of the first author, along with the year of publication, or in terms of the number assigned to the study by G D Searle in its submission to the US FDA in the 1970s. All of the early Searle studies were assigned numbers prefaced by the letter E, and that is how the ANS panel referred to them. This paper follows that practice.

To differentiate studies in terms of whether or not they indicated possible harm, we relied in the first place on the ANS panel’s account. For example, when the panel said of studies E97 and E101 that: “Aspartame was tested for mutagenicity in [5] Salmonella typhimurium strains … Aspartame was not mutagenic in this test system, either in the absence or in the presence of the metabolic activation system” (p 59) it was straightforward to categorise those studies as not indicating harm. On the other hand, when the panel said of Halldorsen et al. 2010 (“A large prospective cohort study … based on data from the Danish National Birth Cohort [that] investigated associations between consumption of artificially sweetened and sugar- sweetened soft drinks during pregnancy and subsequent pre-term delivery” (p 86)) “Statistically significant trends were found in the risk of pre-term delivery with increasing consumption of artificially sweetened drinks (both carbonated and non-carbonated), but not for sugar-sweetened drinks. In the highest exposure groups (≥ 4 servings/day) the odds ratios relative to non-consumption were 1.78 … and 1.29 … respectively for carbonated and non-carbonated artificially sweetened drinks.” (p 87) it was straightforward to categorise that study as providing some indication of possible harm from artificially sweetened beverages, a market that aspartame currently dominates.

Categorisation was not, however, always straightforward because in some cases the panel’s text was unclear, evasive or even self-contradictory. In all of those cases we checked the text of the original publication, to identify any toxicological effects that were reported by the researchers. In a few cases we made our own interpretative judgements about whether or not the findings reported by the author(s) should be characterised as adverse. We did not check all the original documents because, for the purposes of this analysis, whenever the panel reported prima facie evidence of adverse effects that was sufficient reason to categorise the studies accordingly.

To differentiate studies, in terms of whether or not the panel deemed them reliable or unreliable, we drew in the first place on the panel’s text. For example, when the panel said of study E81 that it: “… noted some discrepancies in description of doses … and that the test system employed has not received further validation and is presently considered obsolete and therefore, the results of the study were not included in the assessment…” it was straightforward to categorise the panel’s judgement; the panel deemed it unreliable.

Similarly, when the panel commented of E43 that: “The methods implemented were thought to be sufficiently robust to support the results reported” (p 209) it was straightforward to categorise the study as one that the panel had deemed reliable. On other occasions, however, the panel failed to provide any explicit indication as to whether or not particular studies were deemed reliable. In such cases (eg E55, 1973, 3.2.5.1.2 page 75) categorisations were based on an exercise of informed judgement. For example, where the panel suggested in passing that some symptoms emerged that might possibly have been related to exposure to the test compound, but subsequently never again mentioned that evidence, particularly not in the summary and conclusions of the relevant section, we judged that the putatively positive evidence had been discounted, and deemed to be unreliable.

The seven categories into which we differentiated the studies, and the panel’s interpretations of their results, are set out in Table 1.

The reasons for selecting the first four of those categories should be self-evident; the others deserve brief explanations. Categories 5, 6 and 7 were not ones that we had expected would be required, but the need for them emerged as the panel’s account of each individual study was examined. Category 7, ‘Cont’ was necessary because in several cases the main texts in Section 3.2 were subsequently contradicted by comments included in one of the appendices.

Our quantitative analysis, set out in Table 2, did not include every single study or paper referred to in Section 3.2 of the panel’s December 2013 report. For example, to minimise avoidable double counting we omitted Iwata 2006 because it was not a separate study, but a re-evaluation of the previous Ishii et al. 1981 study. It is also important to note that our categorisations were not always exclusive. For example, we assigned E81 and E44 to both uN and Cont, because the text in the appendix contradicted the wording of the main text. In adopting that categorisation, the wording in Section 3.2 is treated as representing the panels’ definitive judgement while the comment in the appendix is treated as a secondary qualifier. E11, E9, E39, E54, E55, E56, E90 and McAnulty et al. 1989 belong in both uP and ELlow. E47, E63, E51, E52 and E79 belong in uP, Cont and ELlow. E48 belongs in uP and Cont. Reynolds et al. 1976 belongs in uP and uN. Bunin et al. 2005 belongs in uP and ELhigh. Consequently the aggregate count of the number of exemplifications of the 7 categories in Table 2 is greater than the total number of individual studies.