$\begingroup$

@CHCH has provided a good broad overview, but I thought I would also append some specific experiments that are considered to be a weakness of Bayesian models. The whole theme of this answer is an extension of Tversky and Kahneman's program of rationality-violation. All of these experiments can be fitted by some Bayesian-ish just-so model of the sort Bowers & Davis (2012) discuss. These just-so models usually requires an unreasonably large state space (lots of free parameters), or grafting on unnatural heuristic-and-biases mechanisms for each individual experiment without any guidance from Bayesianism. I will present my narrative as an annotated bibliography.

Shafir, E. & Tversky, A. (1992) Thinking through uncertainty: nonconsequential reasoning and choice. Cognitive Psychology 24: 449-474. Townsend, J.T., Silva, K.M., Spencer-Smith, J., & Wenger, M. (2000) Exploring the relations between categorization and decision making with regard to realistic face stimuli. Pragmatics and Cognition 8: 83-105. Tversky, A., & Shafir, E. (1992) The disjunction effect in choice under uncertainty. Psychological Science 3: 305-309.

As mentioned in the question, and elsewhere on the site, violation of the sure-thing principle is one of the most popular examples of departures from Bayesianism. Given a random variable that can have only two possible outcomes A and B, probability requires $p(X)$ to be between $p(X|A)$ and $p(X|B)$. A violation is when $p(X) > p(X|A)$ and $p(X) > p(X|B)$ (or both $<$ instead). ST showed the violation in a game-theoretic setting, TS in a two-stage gambling task, and TSS-SW in a face-categorization task.

Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: the conjuctive fallacy in probability judgement. Psychology Review 101: 547-567.

In this study, TK described Linda and then asked the participant to make a probability judgement $p(\text{BT})$ of Linda being a bank-teller, or a probability judgement $p(\text{BT} \;\&\; \text{F})$ of Linda being a bank-teller and a feminist. In any Bayesian model (without some grafted mechanisms or weird latent variables) you need to have $p(\text{BT}) \geq p(\text{BT} \;\&\; \text{F})$, but the participants judged $p(\text{BT}) < p(\text{BT} \;\&\; \text{F})$ and committed a conjunction fallacy.

Gavanski, I., & Roskos-Ewoldsen, D.R. (1991) Representativeness and conjoint probability. Journal of Personality and Social Psychology 61(2): 181-194.

TK's ad-hoc explanation for this is the representative heuristic. GR-E recreate the fallacy in a setting where they show that the ad-hoc heuristic is not sufficient. This points out the weakness of ad-hoc fixes to Bayesian-ish approaches.

Sides, A., Osherson, D., Bonini, N., and Viale, R. (2002) On the reality of the conjunction fallacy. Memory & Cognition 30(2): 191-198.

It is natural to suspect that TK's result could be an artifact of participants not understanding the modern concept of probability. SOBV account for this by using a betting paradigm that uses probabilities implicitly instead of asking participants to report numeric values. SOBV show that the conjunction fallacy is independent of numeric probability reporting and thus an intrinsic 'error'.

Feldman, J.M. & Lynch, J.G. (1988) Self-generated validity and other effects of measurement on belief, attitude, intention, and behavior. Journal of Applied Psychology 73: 421-435. Moore, D.W. (2002) Measuring new types of question-order effects. Public Opinion Quarterly 66: 80-91 Schuman, H., & Presser, S. (1981) Questions and answers in attitude surveys: experiments on question form, wording, and content

When conducting a questionnaire, the order questions are asked in changes the resulting probability judgements. For a purely Bayesian approach, the mutual probability of asking A then B and getting a specific outcome is $p(A)p(B|A) = p(A \& B) = p(B \& A) = p(B)p(A|B)$. This would suggest that the order questions are asked shouldn't matter, so we have a failure of commutativity.

Hogarth, R.M. & Einhorn, H.J. (1992) Order effects in belief updating: the belief-adjustment model. Cognitive Psycholog 24: 1-55. Shanteau, J.C. (1970) An additivity model for sequential decision making. Journal of Experimental Psychology. 85: 181-191.

The order effect is not confined to questions, but also to integrating evidence. The strongest point of Bayesianism is a clear theory of how to update hypotheses, given evidence. Unfortunately, for Bayes' rule $p(H|A \& B) = p(H|B \& A)$, but HE & S show that for humans this is not always the case and present heuristic-and-biases alternatives.

Berges, G.R., Chapman, G.B., Levy, B.T., Ely, J.W., & Oppliger, R.A. (1998) Clinical diagnosis and order information. Medical Decision Making 18: 412-417. McKenzie, C.R.M., Lee, S.M., & Chen, K.K. (2002) When negative evidence increases confidence: change in belief after hearing two sides of a dispute. Journal of Behavioral Decision Making 15: 1-18.

It might be tempting to suspect that these order effects are confined to artificial laboratory settings. Unfortunately, they appear in natural settings associated with rational decision making such as BCLEO's study of medical diagnosis and MLC's of dispute mediation in a jury.

Aerts, D. & Sozzo, S. (2011) Quantum Structure in Cognition: Why and How Concepts Are Entangled. Quantum Interaction 7052: 116-127.

AS studied membership judgements for pairs of concept combinations, and found among their participants found forms of dependence between concept pairs that violated Bell's theorem and thus could not be fit by any reasonable classical joint distribution over the concept combinations.