The number of individual samples used in the study [6] (not shown in the “Methods” section, but only in the legends of Figure 3 and Additional file 10) indicates “n = 3” (possibly meaning 3 controls versus 3 treatments) for fetal male germ cells (MGC) comparisons and “n = 2” (possibly meaning 2 controls versus 2 treatments) for sperm comparisons. The MIRA-chip signals of the 2 versus 2 (sperm) or 3 versus 3 (MGC) comparisons are shown in their Figure 8 and Additional figure 9. One important consequence of using such low numbers of individuals for comparisons is that it does not allow for a powerful enough statistical testing in order to detect differences among groups, leading to a substantial increase in type II error.

A post hoc power analysis was performed with the ssize R package [16], employing an average of the standard deviations provided by Dr. Szabó and the same FDR rate (0.05) used for power calculations by her group. The results are shown in Table 1.

Table 1 Power analysis for 2 vs. 2 comparisons using the ssize R script Full size table

The main conclusion from this table is that the 2 versus 2 comparisons are under-powered to detect even twofold changes. In such cases, the type II error gives a 76.6 % chance of not detecting changes. This level is far from the accepted standard, which is a ~20 % chance of not finding effects. Therefore, all the analyses that include sperm samples, i.e. those performing 2 versus 2 comparisons, are under-powered to detect either small (20 %) or large (100 %) changes in DNA methylation.

As for the 3 versus 3 comparisons, Table 1 in Dr. Szabó’s response shows that in order to detect a 20 % change in methylation (a fairly common rate of change), 14 individuals would have been needed in order to obtain a power of 0.8. My conclusion is that although the 3 versus 3 comparisons seem sufficient to detect a 50 % change in methylation, they are also under-powered to detect common changes such as 20 %.

Another consequence of studying a small number of individuals is specifically related to epigenetic analyses. Epigenetic variation exists between animals so that variability in methylation patterns will occur among individuals in the same gene. When a population of animals is affected by an environmental stimulus, not all individuals will be affected to the same extent, similarly to what occurs for physiological parameters in response to environmental disturbances.

Variability in DNA methylation changes observed among individuals can only be detected when a sufficient number of individuals is studied. In a 2 versus 2 animal testing design (as used in [6]), there is an enormous chance that non-responsive or less-responsive individuals are compared, leading to the erroneous conclusion that no change occurs due to the treatment.

Sufficient statistical power is especially important in studies that aim at refuting previous findings. Based on the current data evaluation, my conclusion is that the DNA methylation analyses presented in the Iqbal et al. study [6] does not have sufficient power to refute the aspects of TEI in question. The fact that few genes were found altered is strongly dependent on the low power of the experiments. Moreover, it is also important to consider that even if a few changes are found, they can still be of biological relevance.