In this post I am going to show an example how to calculate a repeated measures ANOVA using Python. As far as I know there are no packages for Python doing a repeated measures ANOVA (you can do it using the mixed-effects methods in Statsmodels, of course).

A repeated-measures ANOVA (rmANOVA) is extending the analysis of

variance to research situations using repeated-measures research designs. The logic of rmANOVA and an ordinary ANOVA is very similar. In fact, many of the formulas for rmANOVA are identical to ANOVA. However, a rmANOVA includes a second stage of analysis in which variability due to individual differences is subtracted out of the error term. A repeated-measures design eliminates individual differences from the between-treatments variability because the same subjects go through each treatment condition. The F-ratio needs to be balanced with the calculation such that the individual differences are eliminated from the F-ratio. In the end we get a similar test statistic as in an ordinary ANOVA but all individual differences are removed.

Since the same subjects are measured in every treatment , there are no individual differences between treatments. Hence, the variability due to individual differences is not a component of the numerator of the F ratio. Individual differences must also be removed from the denominator of the F ratio to maintain a balanced ratio with an expected value of 1.00 when there is no treatment effect:

This is accomplished by two stages. Note, SS stands for Sum of Squares. First, the total variability (SS total) is partitioned into variability between-treatments (SS between) and within-treatments (SS within). Individual differences do not appear in SS between due to that the same sample of subjects were measured in every treatment.

Individual differences do play a role in SS total because the sample contains different subjects.

Second, we measure the individual differences by calculating the variability between subjects, or SS subjects. SS value is subtracted from SS within and we obtain variability due to sampling error, SS error.

In the following example Python code will be interleaved with text and formulas. The first Python code snippet is mainly for importing the needed Python modules and creating some example data to use. Further, we calculate degree of freedom for the data;

import pandas as pd

import numpy as np

from scipy import stats def calc_grandmean(data, columns):

"“”

Takes a pandas dataframe and calculates the grand mean

data = dataframe

columns = list of column names with the response variables

"“”

gm = np.mean(data[columns].mean())

return gm ##For createing example data

X1 = [6,4,5,1,0,2]

X2 = [8,5,5,2,1,3]

X3 = [10,6,5,3,2,4] df = pd.DataFrame({‘Subid’:xrange(1, len(X1)+1), “X1”:X1, “X2”:X2, “X3”:X3}) #Grand mean

grand_mean = calc_grandmean(df, ['X1’, 'X2’, 'X3’])

df['Submean’] = df[['X1’, 'X2’, 'X3’]].mean(axis=1)

column_means = df[['X1’, 'X2’, 'X3’]].mean(axis=0) n = len(df['Subid’])

k = len(['X1’, 'X2’, 'X3’])

#Degree of Freedom

ncells = df[['X1’,'X2’,'X3’]].size dftotal = ncells - 1

dfbw = 3 - 1

dfsbj = len(df['Subid’]) - 1

dfw = dftotal - dfbw

dferror = dfw - dfsbj

We start with SS between. SS between is the sum of squared deviations of the sample means from the grand mean multiplied by the number of observations:

SSbetween = sum(n*[(m - grand_mean)**2 for m in column_means])





We continue with SS within (the sum of squared deviations within each sample):

SSwithin = sum(sum([(df[col] - column_means[i])**2 for i,

col in enumerate(df[['X1’, 'X2’, 'X3’]])]))

SS subjects : The sum of squared deviations of the subject

means from the grand mean multiplied by the number of

conditions (k)





SSsubject = sum(k*[(m -grand_mean)**2 for m in df['Submean’]])





SS error : The sum of squared deviations due to sampling

error

SSerror = SSwithin - SSsubject

We can also calculate the SS total (i.e., The sum of squared deviations of all observations from the grand mean):



SStotal = SSbetween + SSwithin

After we have calculated the Mean square error and Mean square between we can obtain the F-statitistica:

#MSbetween

msbetween = SSbetween/dfbetween #MSerror

mserror = SSerror/dferror #F-statistic

F = msbetween/mserror



By using SciPy we can obtain a p-value. We start by setting our alpha to .05 and then we get our p-value.

alpha = 0.05 p_value = stats.f.sf(F, 2, dferror)

That was all. ANOVA is one of the most commonly used statistical methods in Psychology. Often used when the design is within-subjects as in our example. The computation for the ANOVA is pretty simple with a one way design as in the example.

Any questions?