There is considerable need to develop tailored approaches to psychiatric treatment. Numerous researchers have proposed using functional magnetic resonance imaging (fMRI) biomarkers to predict therapeutic response, in particular by measuring task-evoked subgenual anterior cingulate (sgACC) and amygdala activation in mood and anxiety disorders. Translating this to the clinic relies on the assumption that blood-oxygen-level dependent (BOLD) responses in these regions are stable within individuals. To test this assumption, we scanned a group of 29 volunteers twice (mean test-retest interval=14.3 days) and calculated the within-subject reliability of the amplitude of the amygdalae and sgACC BOLD responses to emotional faces using three paradigms: emotion identification; emotion matching; and gender classification. We also calculated the reliability of activation in a control region, the right fusiform face area (FFA). All three tasks elicited robust group activations in the amygdalae and sgACC (which changed little on average over scanning sessions), but within-subject reliability was surprisingly low, despite excellent reliability in the control right FFA region. Our findings demonstrate low statistical reliability of two important putative treatment biomarkers in mood and anxiety disorders.