Why do food writers think they are competent to evaluate the scientific literature? I know of at least two who, based on their tweets, clearly are not. One is Mark Bittman, who we have previously chastised, and now also Michael Pollan who has been a bit more coy about promoting anti-science related to GMO. Now they've both been broadcasting the flimsy results of this paper - A long-term toxicology study on pigs fed a combined genetically modified (GM) soy and GM maize diet - published in the "Journal of Organic Systems". Why do I feel like I'm reading headlines from Climate Depot or Milloy's Junkscience? Because it's the exact same behavior.

For all you budding science journalists out there, here is your first red flag, novel groundbreaking research is rarely reported in a such journals. Not to demean the smaller journals, good science is done there, but the quality of the publications must be one of the first factors taken into account when evaluating the significance of results published in the lay press. Note Reuters and Huffpo both published fluffy repetitions of "press release" evaluations of the study. Neither appears to show any skepticism or depth into the significance of the results, other results within the paper, or whether the fundamental conclusions of the authors are even supported by the data. Let's do this now.

First, let's describe the study. It's a long-term (22.7 week) feeding study in pigs, with two groups of 84 pigs randomly selected to either receive GMO feed or non-GMO feed. During the trial all conditions are controlled, the feeds are found to be nutritionally identical (interesting given how GMOs has no nutrients!11!!!), and were obtained according to standard practices of pig farmers from similar local sources. The pigs were raised to the standard age they are when they go to slaughter, and were then killed and their bodies autopsied. While living the animals were evaluated by weights weekly, level of activity of pigs, level of contentment, skin problems, respiratory problems, eye problems, stool quality, blood biochemical analyses right before slaughter, and mortality. At autopsy organs were weighed and evaluated by veterinarians for evidence of tissue pathology.

Second, the findings. A good science journalist determines these by looking at the data, not by repeating whatever the authors tell them. Looking at the data there were no differences in any of the major variables evaluated by the study, such as weights, veterinary costs, illnesses, or mortality. No significant differences in blood biochemistry were found. At autopsy most organ weights were similar between groups. There was a statistically significant (but likely clinically-meaningless) increase (0.1kg vs 0.12kg) in uterus weights in the GM group. At pathology there were nonsignificant decreases in cardiac and liver abnormalities in the GM group (half as many), in stomach pathology there was one significant finding of more "severe inflammation" (on a 4-point scale from no inflammation to severe) in the GM group. This is the finding that has been amplified as variably "damning" or "concerning" depending on which source is reporting these dramatic new findings.

But since we're skeptics here (real skeptics not like global warming "skeptics" in scare quotes) we ask, is it really?

Lets take a closer look at the data in table 3. Here are the relevant numbers:



While it is clear that along the severe inflammation row there is a difference, look at the moderate inflammation row immediately above it, and see if it changes your mind. What if we were to combine this table into a binary, no to mild inflammation vs moderate to severe? The numbers become GMO 41, non GM 38. Why would I look at it this way? Because pathologic scales of things like inflammation are subjective. (***Update It has been pointed out that the authors also didn't actually do tissue pathology, instead they just graded how red the stomachs were on gross pathology, which also makes this assay totally meaningless. See full update below***) One should be very cautious about results presented on such a scale representing true differences especially given the next nearest population on the scale is reversed and eliminates your effect when the two groups are combined. Trying to make this objective data to suggest an association is very much trying to cram a square peg through a round hole, and would not fly on most reviewers' reads of this data, and if I had been a reviewer I would have squashed this on this point alone. The fixation on one single data point in this table to the exclusion of the others and building the conclusions around it is unscientific. One needs to be a lot more cautious given the design of this study. Let me explain.

This is not hypothesis-driven work. They authors did not at the outset say, "we propose stomach inflammation will be greater in GM fed pigs because of x". No. What they did was feed pigs two different diets and then go fishing for abnormal values. This is not necessarily wrong behavior, scientists go on fishing trips all the time looking to find significant effects. What is wrong is then publishing the results of your fishing trip! This is unscientific.

If you were to study some 20 variables in your study (these authors studied far more variables and I would actually expect more abnormal results then we have), and have a cutoff for significance at the standard arbitrary value of p = 0.05, one would expect, just by chance, that 1 of those variables will be significant. A good scientist then says, "well that's interesting, let's see if it's real", and then follows this study with a hypothesis-driven study specifically designed to study the apparent effect. When the single effect is then studied in isolation, with appropriate power, one should see if the result you found, perhaps by chance, is a real effect or not.

So what we have in this study is the first half of a valid study (the fishing trip) but no real hypothesis driven research to confirm if this 1 in 20 result is real. There is no molecular data to suggest a mechanism. They don't further determine if it was the soy component or corn component on the diet. There are no follow up evaluations examining this effect alone, or trying to link ingestion of cry proteins on stomach inflammation. So far, one can only conclude that it's just as likely that this result occurred by chance as it is to be an actual effect of feeding the pigs GM corn and soy. Now, is that "damning" or "concerning"? Concerning is even a stretch.

Third, it's important for the good science journalist to interpret these new findings in the context of the literature, and perhaps consult an expert in the field to determine the significance of these results in context of the total knowledge in the field.

One should mention the extensive literature on the safety of GM foods. Other writers including Mark Lynas have evaluated this paper as well with similar conclusions as mine. Additionally, Mark points out the paper's favorable interpretation of Seralini's work - a bad sign. The authors appear to have ties to anti-GMO advocacy groups, and even thank Jeffrey Smith (the hysterical anti-GMO fake expert with no scientific or medical training). Andrew Kniss points out that he can't replicate their result with the appropriate statistical test. I admit, I am confused about exactly how they calculated the p value, as in their methods they describe using t tests, Mann-Whitney and Chi Squared variably based on the distribution or categorical nature of the variables, so half the time reading I was trying to figure out which test they were using at any given moment. I'm still unsure exactly why they chose to do which test in each instance - in table 5 they appeared to switch between a Wilcox and a t-test at random. Although in table 3 they appear to have used a Uncorrected Chi squared based on the footnote, I'm not exactly sure, based on how one could be constructed with different expected values, if this was appropriate. No statistical expert am I, but again this smells a bit like statistical fishing to me. Even so, it doesn't change the relevance of the results. Even if it does technically pass statistical muster, it's still just the first step in a real scientific investigation. Another GMO expert suggests given the levels of mold they measured on their GM corn, it could have been a result of their source selling them moldy feed (at levels much higher than are usually found on GM crops).

So, to summarize, in this paper the authors performed a large non-specific screen for potential evidence of harm from GM crops. Of the many analyses performed, one showed statistical significance for severe stomach inflammation on a pathology scale in the GM group, but this effect rapidly-disappears if one groups inflammation based on broader categories. The clinical significance of this finding can only be determined by subsequent hypothesis driven research into this potential effect, but it is equally likely this is a result of random chance.

Or you can skip all the words above and read the XKCD one of Mark Lynas' commenters suggests

XKCD knows stats XKCD knows stats

A final note, I'm not interested in comments saying I work for Monsanto, that I'm a corporate shill, blah blah blah. I haven't worked for, or accepted money from, a corporation in my adult life (excluding Nat Geo sending me beer money for this blog, and working as a valet for Toyota dealership when I was 16). Address the data, the paper, relevant biological arguments etc, or get lost.

**Update**

In reading an additional response to the Carman et. al study, I now change my opinion on this paper from "competently performed but meaningless" to "totally meaningless".

At issue is a criticism by Robert Friendship in the link above, that the author's assay for inflammation is basically meaningless. In my initial read of the paper I didn't notice this sentence "Typical examples of each of the four categories of inflammation are shown in Figure 1. For a severe level of inflammation, almost the whole fundus had to

be swollen and cherry-red in colour."

I incorrectly assumed the authors had taken sections, performed histology, then assessed inflammation based on a legitimate pathological scale. This was apparently too generous. No, they just looked at the color of the stomach by gross pathology. As Dr. Friendship points out, this is meaningless.