Followup to: What Evidence Filtered Evidence?

In "What Evidence Filtered Evidence?", we are asked to consider a scenario involving a coin that is either biased to land Heads 2/3rds of the time, or Tails 2/3rds of the time. Observing Heads is 1 bit of evidence for the coin being Heads-biased (because the Heads-biased coin lands Heads with probability 2/3, the Tails-biased coin does so with probability 1/3, the likelihood ratio of these is 2/31/3=2, and log22=1), and analogously and respectively for Tails.

If such a coin is flipped ten times by someone who doesn't make literally false statements, who then reports that the 4th, 6th, and 9th flips came up Heads, then the update to our beliefs about the coin depends on what algorithm the not-lying reporter used to decide to report those flips in particular. If they always report the 4th, 6th, and 9th flips independently of the flip outcomes—if there's no evidential entanglement between the flip outcomes and the choice of which flips get reported—then reported flip-outcomes can be treated the same as flips you observed yourself: three Headses is 3 * 1 = 3 bits of evidence in favor of the hypothesis that the coin is Heads-biased. (So if we were initially 50:50 on the question of which way the coin is biased, our posterior odds after collecting 3 bits of evidence for a Heads-biased coin would be 23:1 = 8:1, or a probability of 8/(1 + 8) ≈ 0.89 that the coin is Heads-biased.)

On the other hand, if the reporter mentions only and exactly the flips that came out Heads, then we can infer that the other 7 flips came out Tails (if they didn't, the reporter would have mentioned them), giving us posterior odds of 23:27 = 1:16, or a probability of around 0.06 that the coin is Heads-biased.

So far, so standard. (You did read the Sequences, right??) What I'd like to emphasize about this scenario today, however, is that while a Bayesian reasoner who knows the non-lying reporter's algorithm of what flips to report will never be misled by the selective reporting of flips, a Bayesian with mistaken beliefs about the reporter's decision algorithm can be misled quite badly: compare the 0.89 and 0.06 probabilities we just derived given the same reported outcomes, but different assumptions about the reporting algorithm.

If the coin gets flipped a sufficiently large number of times, a reporter whom you trust to be impartial (but isn't), can make you believe anything she wants without ever telling a single lie, just with appropriate selective reporting. Imagine a very biased coin that comes up Heads 99% of the time. If it gets flipped ten thousand times, 100 of those flips will be Tails (in expectation), giving a selective reporter plenty of examples to point to if she wants to convince you that the coin is extremely Tails-biased.

Toy models about biased coins are instructive for constructing examples with explicitly calculable probabilities, but the same structure applies to any real-world situation where you're receiving evidence from other agents, and you have uncertainty about what algorithm is being used to determine what reports get to you. Reality is like the coin's bias; evidence and arguments are like the outcome of a particular flip. Wrong theories will still have some valid arguments and evidence supporting them (as even a very Heads-biased coin will come up Tails sometimes), but theories that are less wrong will have more.

If selective reporting is mostly due to the idiosyncratic bad intent of rare malicious actors, then you might hope for safety in (the law of large) numbers: if Helga in particular is systematically more likely to report Headses than Tailses that she sees, then her flip reports will diverge from everyone else's, and you can take that into account when reading Helga's reports. On the other hand, if selective reporting is mostly due to systemic structural factors that result in correlated selective reporting even among well-intentioned people who are being honest as best they know how, then you might have a more serious problem.

"A Fable of Science and Politics" depicts a fictional underground Society polarized between two partisan factions, the Blues and the Greens. "[T]here is a 'Blue' and a 'Green' position on almost every contemporary issue of political or cultural importance." If human brains consistently understood the is/ought distinction, then political or cultural alignment with the Blue or Green agenda wouldn't distort people's beliefs about reality. Unfortunately ... humans. (I'm not even going to finish the sentence.)

Reality itself isn't on anyone's side, but any particular fact, argument, sign, or portent might just so happen to be more easily construed as "supporting" the Blues or the Greens. The Blues want stronger marriage laws; the Greens want no-fault divorce. An evolutionary psychologist investigating effects of kin-recognition mechanisms on child abuse by stepparents might aspire to scientific objectivity, but being objective and staying objective is difficult when you're embedded in an intelligent social web in which in your work is going to be predictably championed by Blues and reviled by Greens.

Let's make another toy model to try to understand the resulting distortions on the Undergrounders' collective epistemology. Suppose Reality is a coin—no, not a coin, a three-sided die, with faces colored blue, green, and gray. One-third of the time it comes up blue (representing a fact that is more easily construed as supporting the Blue narrative), one-third of the time it comes up green (representing a fact that is more easily construed as supporting the Green narrative), and one-third of the time it comes up gray (representing a fact that not even the worst ideologues know how to spin as "supporting" their side).

Suppose each faction has social-punishment mechanisms enforcing consensus internally. Without loss of generality, take the Greens (with the understanding that everything that follows goes just the same if you swap "Green" for "Blue" and vice versa). People observe rolls of the die of Reality, and can freely choose what rolls to report—except a resident of a Green city who reports more than 1 blue roll for every 3 green rolls is assumed to be a secret Blue Bad Guy, and faces increasing social punishment as their ratio of reported green to blue rolls falls below 3:1. (Reporting gray rolls is always safe.)

The punishment is typically informal: there's no official censorship from Green-controlled local governments, just a visible incentive gradient made out of social-media pile-ons, denied promotions, lost friends and mating opportunities, increased risk of being involuntarily committed to psychiatric prison, &c. Even people who privately agree with dissident speech might participate in punishing it, the better to evade punishment themselves.

This scenario presents a problem for people who live in Green cities who want to make and share accurate models of reality. It's impossible to report every die roll (the only 1:1 scale map of the territory, is the territory itself), but it seems clear that the most generally useful models—the ones you would expect arbitrary AIs to come up with—aren't going to be sensitive to which facts are "blue" or "green". The reports of aspiring epistemic rationalists who are just trying to make sense of the world will end up being about one-third blue, one-third green, and one-third gray, matching the distribution of the Reality die.

From the perspective of ordinary nice smart Green citizens who have not been trained in the Way, these reports look unthinkably Blue. Aspiring epistemic rationalists who are actually paying attention can easily distinguish Blue partisans from actual truthseekers, but the social-punishment machinery can't process more than five words at a time. The social consequences of being an actual Blue Bad Guy, or just an honest nerd who doesn't know when to keep her stupid trap shut, are the same.

In this scenario, public opinion within a subculture or community in a Green area is constrained by the 3:1 (green:blue) "Overton ratio." In particular, under these conditions, it's impossible to have a rationalist community—at least the most naïve conception of such. If your marketing literature says, "Speak the truth, even if your voice trembles," but all the savvy high-status people's actual reporting algorithm is, "Speak the truth, except when that would cause the local social-punishment machinery to mark me as a Blue Bad Guy and hurt me and any people or institutions I'm associated with—in which case, tell the most convenient lie-of-omission", then smart sincere idealists who have internalized your marketing literature as a moral ideal and trust the community to implement that ideal, are going to be misled by the community's stated beliefs—and confused at some of the pushback they get when submitting reports with a 1:1:1 blue:green:gray ratio.

Well, misled to some extent—maybe not much! In the absence of an Oracle AI (or a competing rationalist community in Blue territory) to compare notes with, then it's not clear how one could get a better map than trusting what the "green rationalists" say. With a few more made-up modeling assumptions, we can quantify the distortion introduced by the Overton-ratio constraint, which will hopefully help develop an intuition for how large of a problem this sort of thing might be in real life.

Imagine that Society needs to make a decision about an Issue (like a question about divorce law or merchant taxes). Suppose that the facts relevant to making optimal decisions about an Issue are represented by nine rolls of the Reality die, and that the quality (utility) of Society's decision is proportional to the (base-two logarithm) entropy of the distribution of what facts get heard and discussed.

The maximum achievable decision quality is log29 ≈ 3.17.

On average, Green partisans will find 3 "green" facts and 3 "gray" facts to report, and mercilessly stonewall anyone who tries to report any "blue" facts, for a decision quality of log26 ≈ 2.58.

On average, the Overton-constrained rationalists will report the same 3 "green" and 3 "gray" facts, but something interesting happens with "blue" facts: each individual can only afford to report one "blue" fact without blowing their Overton budget—but it doesn't have to be the same fact for each person. Reports of all 3 (on average) blue rolls get to enter the public discussion, but get mentioned (cited, retweeted, &c.) 1/3 as often as green or gray rolls, in accordance with the Overton ratio. So it turns out that the constrained rationalists end up with a decision quality of 67log27+17log221 ≈ 3.03, significantly better than the Green partisans—but still falling short of the theoretical ideal where all the relevant facts get their due attention.

If it's just not pragmatic to expect people to defy their incentives, is this the best we can do? Accept a somewhat distorted state of discourse, forever?

At least one partial remedy seems apparent. Recall from our original coin-flipping example that a Bayesian who knows what the filtering process looks like, can take it into account and make the correct update. If you're filtering your evidence to avoid social punishment, but it's possible to clue in your fellow rationalists to your filtering algorithm without triggering the social-punishment machinery—you mustn't assume that everyone already knows!—that's potentially a big win. In other words, blatant cherry-picking is the best kind!