There were several variables to code up:

sceaduwe moved apartments in August 2014, and I thought this covariate was worth inclusion as different rooms/locations may aggravate allergies differently.

Due to the lapse and including the baseline, that yields 95 days of data.

Descriptive:

At a first stab, I load up the data and regress on #2-6:

l <- lm (Allergy.rating ~ Pollen.season + Creatine + Location + Allergy.medicine + Spirulina, data= spirulina[spirulina $ Spirulina.random,]) summary (l) ## ## ...Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.7497249 0.4032818 9.29803 5.9712e-13 ## Pollen.seasonTRUE 0.0995488 0.2732337 0.36434 0.7169796 ## Creatine -0.5534339 0.2489801 -2.22280 0.0302834 ## Location B -1.1608519 0.3648258 -3.18193 0.0023872 ## Allergy.medicineTRUE -0.5023112 0.3210568 -1.56456 0.1233211 ## Spirulina -0.01277196 0.1390217 -0.12746 0.8990337 ## ## Residual standard error: 0.631029 on 56 degrees of freedom ## Multiple R-squared: 0.250248, Adjusted R-squared: 0.183305 ## F-statistic: 3.73827 on 5 and 56 DF, p-value: 0.00544645 library (MASS) pl <- polr ( as.ordered (Allergy.rating) ~ Pollen.season + Creatine + Location + Allergy.medicine + Spirulina, data= spirulina[spirulina $ Spirulina.random,]) summary (pl) ## ...Coefficients: ## Value Std. Error t value ## Pollen.seasonTRUE 0.5572250 0.865166 0.6440669 ## Creatine -1.8439587 0.846798 -2.1775671 ## Location B -16.4522527 634.533164 -0.0259281 ## Allergy.medicineTRUE -1.8680170 1.078616 -1.7318653 ## Spirulina 0.0466616 0.445739 0.1046836 ## ## Intercepts: ## Value Std. Error t value ## 2|3 -3.910809 1.418710 -2.756594 ## 3|4 -0.672510 1.281926 -0.524609 ## 4|5 1.977223 1.601942 1.234267 ## ## Residual Deviance: 106.639735 ## AIC: 122.639735

The immediate answer to the primary causal question is clear: no, the spirulina has no apparent effect either good or bad.

The other variables are interesting, though: of course the allergy medicine has an effect (presumably that was proven by the clinical trials that got it approved), but what’s creatine doing with p<0.03 and Location at p<0.002?

Looking at Location closer, it only changes right before the data series ends in August, and August should be well past the pollen danger season, so it may be a confound with allergies over the seasons, which we’d expect to look something like an inverted U-curve peaking in spring/summer. This suggests adding a quadratic term to the regression model, something like I(as.integer(Date)^2) , but then a step() (which penalizes complexity) prefers using the Location variable to the quadratic:

l2 <- lm (Allergy.rating ~ Date + I ( as.integer (Date) ^ 2 ) + Pollen.season + Creatine + Allergy.medicine + Spirulina + Location, data= spirulina[spirulina $ Spirulina.random,]) summary (l2) ## ...Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.65011e+04 4.87941e+04 0.54312 0.5892816 ## Date -3.26909e+00 6.01720e+00 -0.54329 0.5891648 ## I(as.integer(Date)^2) 1.00832e-04 1.85509e-04 0.54354 0.5889936 ## Pollen.seasonTRUE 2.12637e-01 3.34582e-01 0.63553 0.5277659 ## Creatine -8.82748e-01 5.25005e-01 -1.68141 0.0984595 ## Allergy.medicineTRUE -4.49432e-01 3.33177e-01 -1.34893 0.1829898 ## Spirulina -2.31188e-01 3.59458e-01 -0.64316 0.5228438 ## Location B -1.34235e+00 4.47436e-01 -3.00010 0.0040776 ## ## Residual standard error: 0.639404 on 54 degrees of freedom ## Multiple R-squared: 0.257707, Adjusted R-squared: 0.161484 ## F-statistic: 2.67822 on 7 and 54 DF, p-value: 0.0186685 step (l2) ## ...Call: ## lm(formula = Allergy.rating ~ Creatine + Allergy.medicine + Location, ## data = spirulina[spirulina$Spirulina.random, ]) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.200000 -0.200000 -0.200000 0.323413 1.800000 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.757143 0.267633 14.03840 < 2.22e-16 ## Creatine -0.557143 0.196389 -2.83693 0.0062644 ## Allergy.medicineTRUE -0.422222 0.232107 -1.81908 0.0740620 ## Location B -1.200000 0.327782 -3.66096 0.0005447 ## ## Residual standard error: 0.621037 on 58 degrees of freedom ## Multiple R-squared: 0.247869, Adjusted R-squared: 0.208965 ## F-statistic: 6.3714 on 3 and 58 DF, p-value: 0.00083053

My guess here is that there’s not enough data, particularly in the June/July gap, to justify the curve when location A vs location B happens to overfit the latter points so well, but I still believe this Location B estimate is being driven by seasons.

What about creatine, is that seasonal too? sceaduwe didn’t do creatine initially, but he started creatine before the pollen season and never stopped in the data, so it shouldn’t be driven by the same end-of-allergy-season effect. A plot:

qplot (Date, Allergy.rating, color= as.ordered (Creatine), data= spirulina) + theme_bw () + xlab ( "Date (2014)" ) + ylab ( "Allergy self-rating" ) + theme ( legend.title= element_blank ()) + geom_point ( size= I ( 5 )) + stat_smooth ()

Allergy self-rating over time, colored by creatine dose (0, 1, 2 pills), and smoothed

The creatine estimate seems to be driven by the period with 2 pills. I don’t know of any prior reason to expect creatine to have any effect on allergies or post-nasal drip, so I wonder if this turned out to be a subtler form of the seasonal effect due to the extra weighting of the double-dose in the regression?

If I expand the dataset to include the baseline as well, and I look at creatine as a factor, the effect also seems to disappear:

l3 <- lm (Allergy.rating ~ Date + I ( as.integer (Date) ^ 2 ) + Pollen.season + as.factor (Creatine) + Allergy.medicine + Spirulina + Location, data= spirulina) summary (l3) ## ...Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.07174e+03 1.64581e+04 0.55120 0.5829234 ## Date -1.12123e+00 2.03062e+00 -0.55216 0.5822707 ## I(as.integer(Date)^2) 3.46557e-05 6.26343e-05 0.55330 0.5814923 ## Pollen.seasonTRUE 1.77142e-01 2.93014e-01 0.60455 0.5470693 ## as.factor(Creatine)1 3.44745e-01 3.10455e-01 1.11045 0.2699001 ## as.factor(Creatine)2 -4.80779e-01 4.20889e-01 -1.14229 0.2565024 ## Allergy.medicineTRUE -4.44689e-01 3.22589e-01 -1.37850 0.1716244 ## Spirulina -1.65976e-01 1.55833e-01 -1.06509 0.2898163 ## Location B -1.15120e+00 3.89378e-01 -2.95650 0.0040155 summary ( step (l3)) ## ...Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.888889 0.146343 19.74048 < 2.22e-16 ## as.factor(Creatine)1 0.294785 0.171125 1.72263 0.0883899 ## as.factor(Creatine)2 -0.246032 0.221250 -1.11201 0.2690973 ## Allergy.medicineTRUE -0.405896 0.225167 -1.80265 0.0747911 ## Location B -0.983673 0.291490 -3.37464 0.0010922 ## ## Residual standard error: 0.620883 on 90 degrees of freedom ## Multiple R-squared: 0.170613, Adjusted R-squared: 0.133752 ## F-statistic: 4.62848 on 4 and 90 DF, p-value: 0.00191632

If creatine were helping with allergies, one would expect the effect to be at least monotonic, whatever the details of the dose-response curve; but instead we see that 1 pill of creatine correlates with increases in allergies and 2 pills with decrease! So my best guess is that this is spurious too like the Location estimate.