$\begingroup$

Just for fun I'm trying to do some simple medical diagnosis using Bayes' theorem. Right now I'm calculating

P(condition | symptoms) = P(symptoms | condition) * P(condition)

for each possibly condition, then choosing the most likely condition given the present symptoms as the "diagnosis" (note that for simplicity's sake I assume that the symptoms are independent). This works well when I have a complete list of the probabilities P(symptom | condition) for all symptoms and conditions.

However, I want to do better in the case where I do not know how likely each symptom is to occur as part of every disease. Let's say, for example, that I have a "patient" with a long list of symptoms, and two possible conditions A and B. For condition A, I have a full list of the symptoms and their probabilities, while for condition B I only know the five most common symptoms. To calculate P(condition B | symptoms) my current solution is to set P(symptom | condition B) to some base rate, e.g. 0.01 both when I know for sure that the symptom is never caused by condition B and when I don't know the real rate of the symptom under condition B.

This leads to problems since condition A will often end up as the "diagnosis" even if every P(symptom | condition A) is low, if the number of known symptom probabilities given condition A is higher than the number of known probabilities given condition B.

What is the best way to properly handle this uncertainty and solve the problem presented above?