I’ll be evaluating the arguments presented in a paper published in PNAS, the Proceedings of the National Academy of Sciences entitled: “Evening use of light-emitting eReaders negatively affects sleep, circadian timing, and next-morning alertness” (Chang et. al. 2014)

This paper really makes me mad for several reasons, which I will go into further:

There are flaws in the experimental design.

The sample size is small (12 people)

They make claims that are technically backed by evidence, but the evidence isn’t statistically significant.

Confirmation bias. The authors interpret their results in a way that confirms their hypotheses, even though the results aren’t statistically significant.

So, I thought it would be a good paper to show to highlight some flaws to look for when evaluating evidence and deciding whether or not to accept the claims provided as true.

I don’t go through all of the main points of the paper, so you can either read the paper (linked above) before this blog post… or just look at the main points as we go. You should be fine either way

The Hypothesis

Average sleep duration and quality has decreased in the last 50 years. In that time, electronic use has risen. A survey said that 90% of Americans used within 1 hour before bedtime “at least a few days a week”. The authors believe the trend may be due to the short-wavelength–enriched light (blue light) emitted by electronic devices.

Sleep duration and quality may be correlated to an increase in electronic use. Electronic use and sale have been correlated to some other negative trends. For example, the increase in sale of iPhones is pretty strongly correlated to an increase in people dying from falling down stairs….

Science tip #1: Correlation does not imply causation.

I’m sure you’ve heard this a million times, but its still true. Correlations can be a useful tool in determining potential causal relationships. But, before implying causation it is important to look at factors driving change in each individual trend.

The website Spurious Correlations points out some correlations found between trends of various statistics. Per capita consumption of mozzarella cheese correlates with civil engineering doctorates awarded. US spending on science, space, and technology correlates with deaths caused by people falling out of their bed.

No one is intentionally eating more cheese to compensate for the growth of civil engineering or urging congress to cut funding from NASA as a preventative measure against deadly falls from a bed. It is easy to tell in these ridiculous cases that these correlations are not based in any root cause. But some correlations mistaken for causation serve as fodder for pseudoscience fire.

The Experiment

Twelve adults completed a 14 day inpatient protocol. All participants completed both reading conditions in a randomized order;

i) reading a light emitting eBook (iPad)in otherwise very dim room light for ∼4 h before bedtime for five consecutive evenings

ii) reading a printed book in the same very dim room light for ∼4 h before bedtime for five consecutive evenings.

They quantified sleep quality and duration using various methods, which we’ll get into later….

Some problems I have with the experimental design..

“During reading sessions, CP, and upon waking, the room light was ∼0.0048 Watts/meter^2”

“During the rest of the waking episodes, participants were in typical indoor room lighting of ∼0.23 Watts/m^2”

So…. the group meant to read a printed book for four hours sat in a room about 2% as bright as a typical room. Dimmer rooms are harder to read in and more relaxing… Just to further help you imagine just how dim the reading conditions were, look at the comparison between the spectral profile (how bright light is in different wavelengths/overall) of the iPad vs. the light reflecting off the book.

I would be interested to know the difference in how many pages each person read in each setting

Other problems…

The light-emitting device was set to maximum brightness and placed in a stand that held it at a fixed angle. This stand was placed on a table directly in front of the individual at a 30- to 45-cm distance from their eyes. Participants were allowed to turn pages on the LE-eBook, but were asked not to hold it while reading or make any adjustments to the settings.

During the printed book reading sessions, participants were allowed to hold the book at any desired distance from their eyes

For the last hour, participants read while seated in bed

The participants reading the printed book had the opportunity to move their book in order to get more comfortable when they inevitably get tired in their dark room. The iPad readers seating position would be limited by the position of the iPad. Would these effects be greater once getting into bed?

Sample size so small, especially since a lot of the publication was quantitative analysis

The Results

Up to now, the problems I have had with the experiment have been pretty minor. What really bothered me about this paper is how the results are presented… I really feel like the results are *NOT* statistically significant

The measurements taken to determine eBook’s effect on sleep:

Hourly blood samples for assessment of plasma melatonin concentrations.

Time it took participants to fall asleep after getting in bed (sleep latency)

Total sleep time (TST), sleep efficiency (the percentage of time in bed spent asleep), and the time spent in each sleep stage.

Participants rated their sleepiness using a computerized Karolinska Sleepiness Scale (KSS) (9) every evening and morning, and waking electroencephalogram (EEG) measures were recorded on two evenings and two mornings of each reading condition

First, I’m going to mention the average amount of time spent asleep by participants:

Look at that! On average, the eBook readers only slept 5 minutes less than the print book readers. There is no error analysis provided for these values, other than in the time spent in REM sleep.

“Participants also had significantly less rapid eye movement (REM) sleep following the LE-eBook condition (109.04 ± 26.25 min vs. 120.86 ± 25.32 min in the print-book condition” But, typically if two results have reported errors that cause the results to overlap they are not considered statistically different. And, the difference between the length of time spent in REM sleep (11.8 minutes) is less than half of the measured standard deviation on length of REM sleep in each condition. What does that mean? Well….

Science tip #2: To determine whether or not you trust a result, the first thing to look at the reported error analysis. Standard deviation (SD or σ) quantifies how much dispersion there is between a set of data points and should be low compared to reported value.

Measurements with high standard deviations indicate that data points are spread over a wide range of values and don’t tend towards the mean. This is problematic, especially if you are reporting the mean to be a true representative value for your sample

Imagine throwing darts at a dartboard to try to determine where you hit on the dartboard, on average. If the center of the dartboard is defined to be zero, you can define coordinates as such:

By convention, you can state that your darts always hit the center of the dart board (on average) in both of these cases. In reality, in the second case your darts always landed at an equal distance away from the center. However, the standard deviation would tell you that you have a wider range of values and didn’t always hit the center.

Science tip #2.1: Beware of confirmation bias: “the tendency to search for, interpret, favor, and recall information in a way that confirms one’s preexisting beliefs or hypotheses” (Wikipedia)

Science tip #2, example:

“In the LE-eBook condition, participants averaged nearly 10 min longer to fall asleep than in the print-book condition (mean ± SD, 25.65 ± 18.78 min vs. 15.75 ± 13.09 min)”

What’s wrong with this picture? Well, for one the reported standard deviation is not much larger than the actual value reported. The reported mean time it took participants to fall asleep for the eBook condition is only 1.4 times greater than the standard deviation, σ. For the print book condition, the reported mean is only 1.2 times greater than the standard deviation. Also, again, typically if two results have reported errors that cause the results to overlap they are not considered statistically different

This implies values are actually very spread out and don’t tend towards a particular value. When scientists share results that aren’t statistically significant yet they present their result as if it is, it introduces possibly untrue information into a field. Other researchers can cite the results as if they are true, which readers will likely believe without evaluating the original source. This paper has been cited in at least 80 other papers

Science tip #3: Compare reported results to each other

So heres a nice little figure from their paper showing some measurements they took. I’m going to focus on boxes A and C (see description in figure for explanation)

A key problem with this figure:

Error bars are only shown in *one direction* rather than +/-, as they should be. This misleadingly makes it look like the people reading the print book ONLY have higher melatonin levels and the the people reading the e-Book ONLY have lower melatonin levels.

So, here’s the figure edited to show error bars as they should be reported:

Now you can actually see that statistically, it is possible for these values to be closer to each other than is actually reported.

Science tip #4: Be wary of subjective results, especially without quantitative values. In studies where people have to rate how they feel, results are subject to confirmation bias.

“Effects on Acute Evening Alertness and Morning Sleepiness: Reading the LE-eBook was associated with decreased sleepiness in the evening. An hour before bedtime, study participants rated themselves as less sleepy. The following morning, however, the results for self-reported sleepiness were reversed, with participants feeling sleepier the morning after reading an LE-eBook the prior evening.”

No values are provided, so we don’t know how sleepy the participants rated themselves. Even if we did, when participants are surveyed and have a predetermined idea on the effects of an experiment systematic error is introduced.

Science tip #5: Make your own conclusions based on evidence provided!!!!!!

Their conclusion: “Our findings provide evidence that the electric light to which we are exposed between dusk and bedtime has profound biological effects.”

My conclusion:

Another thought: The authors constantly highlight the fact that the eBook emits shorter wavelengths, but don’t actually test a case with participants exposed to bright light with a different spectral profile to see if it affects any values measured in the publication. At least they pointed out this problem. Setting up the experiment is the hardest part so if it was something they thought would be interesting or reveal new information they should have just done it.