A shockingly high proportion of Americans believe that study findings are facts, according to a recent study.

Despite the many flaws inherent to study results, a whopping 5 in 6 Americans believed that the published results of studies were simple facts. Indeed, 100% of Americans have cited study results to “prove” that something they have said is factually correct.

This alarming result belies a deep misunderstanding of the nature of a study’s credibility and confidence, with serious implications for manipulation of public policy through the production and presentation of a biased set of studies.

What follows is the anatomy of a study, and all of the various opportunities for bias and inaccuracy to sneak into the findings and conclusions.

Formulation Stage:

During the initial formulation of a study, a researcher must necessarily limit the scope of what they are planning to study. Crucially, this requires that they assume a set of previous foundational studies to be accurate. Depending on the chain of confidence intervals of the studies cited, there may be a very high chance that the assumptions are incorrect.

The researcher must decide what kinds of data they are going to collect. If they are not able to identify all of the relevant variables, their analysis may produce incorrect or incomplete conclusions.

The researcher must decide how they will collect those data. If their method of collection is flawed, they may collect incorrect data even for the data they have decided to collect.

The researcher must decide how to collect samples for their data. Once again, if they do not identify all of the relevant factors, they may end up with a very biased sample that does not represent wider reality. In fact, it is almost impossible to acquire a sample of anything that is representative of reality. It works best with atoms and molecules, which are simple enough to be very standardized, as far as we understand. The more diverse the set of things under study, the more challenging it is to acquire a representative sample. Studying “people” comes to mind as an extremely diverse and thus challenging set to study.

Experiment Stage:

Once the study has actually begun, there are significant challenges in implementation. Language must be converted into specific actions, and the potential for divergent interpretations of study procedures is not insignificant. That is, two people reading the same procedure may end up carrying out different experiments.

Even supposing the instructions are very clearly standardized, there is still the inevitability of human error, known and unknown, likely causing disparities between the plan and the actual study. Depending on the quantity and scale of errors and the weight of the effect of those errors on expected results, the data collected as a result may be deeply misleading.

Data Collection Stage:

Even supposing the instructions are perfect and carried out perfectly, there is the possibility of human error in observation. It is possible that the actual results of the study may not all be witnessed, noticed, recognized, or acknowledged by the researcher; the data they collect are but an approximation of what actually occurred. Depending on the researcher, the disparity between observation and reality may be substantial.

Even supposing the researcher manages perfect observation, they are still human, and could easily write down the wrong thing in the wrong place, depending on how focused, diligent, disciplined, and organized they are.

Some kinds of data collection require subjective judgment on the part of the researcher. Whenever this is the case, it is guaranteed that different researchers carrying out the same experiment will produce different results, even if they do everything else perfectly and identically. This adds a very thick layer of uncertainty that can only be peeled away through meta-analysis or repeated studies by multiple, diverse researchers.

Data Analysis:

“Analyzing” data is another way of saying “simplifying” data. Every form of analysis necessarily erases some of the nuance and static to produce a comprehensible “meaning.” There are basically infinite ways that a data set could be analyzed, and each of them produces a different kind of meaning. Most of those would not be very useful to know in most circumstances.

A researcher must choose what method(s) of analysis they are going to employ. They are limited by their time, tools, and talents, as well as the type(s) of data they have collected.

Some methods of analysis provide more nuanced results than others. Some methods provide so much nuance that it is almost the same as just presenting the whole data set and telling somebody to analyze it themselves. Some methods simplify data so much that most of the information is lost, and the study becomes almost pointless. What method(s) the researcher chooses to use will have a big impact on what it is that they find when they examine the data they collected.

Yet again, there is the possibility for human error. This could be as simple as plugging the wrong numbers into a formula, omitting or duplicating some of the data by mistake, plugging data into the wrong formula by mistake, or even inadvertently selecting an inappropriate formula to begin with.

Interpretation Stage:

Just looking at the name of this stage already tells you the problem with it: interpretation is necessarily subjective, and subjectivity is very dangerous to scientific research.

Even assuming no errors were made up to this point, when somebody is looking at the results of their calculations in the analysis section, it is easily possible that they will draw illogical conclusions based on a misunderstanding of the mathematics that they employed.

It is further possible that the person will have very solid mathematical interpretations of their results, but that they phrase their interpretation poorly such that the mathematical meaning of their finding does not match the wording of their interpretation of that finding.

It is further possible that the researcher will extend the implications of their findings beyond what is supported by those findings, on account of other further foundational assumptions which themselves may or may not be adequately supported.

Conclusion:

One of the strangest things about humans is our capacity to see undeniable evidence against our beliefs and to disregard or attempt to discredit that evidence.

The conclusion of a study is often the least scientific part. In this section, a researcher is supposed to describe the limitations of their own study as well as its potential broader applications, and recommendations for future research to verify or extend the findings.

It can be virtually impossible for a person to accept that they have found the opposite of what they thought they were going to find, and this often shows up in the conclusion section. When a finding does not go the way the researcher wanted, they will spend a great deal of effort coming up with all the various explanations for why they think this study is wrong and their initial expectations may still be correct, despite the findings.

In contrast, when a study goes the way it was “supposed” to, the author is still supposed to describe all the ways that they know in which the study was limited. Although it is good for study authors to apply self-criticism, it is obvious that if they recognized all of the limitations in the first place, they would have done what they could to avoid those. The limitations they do not know about will not be mentioned, obviously. Their conclusions will be based off of their own biased, imperfect understandings.

Peer Review:

Naturally, there is a moral hazard when a researcher is supposed to be their own critic. Like, nobody writes a book and then does a chapter at the end telling the would-be publisher all of the reasons why it is a terrible book that doesn’t really say anything. They’d never get published! Best to pitch one’s work, not highlight all of the flaws.

Therefore, it is important that researchers have somebody to check their work to make sure that they aren’t just giving themselves an A+ on everything to guarantee they keep getting paid.

The system we developed is predicated on the idea that researchers who share a field are themselves the most qualified for evaluating each others’ research, which is almost certainly true, as they are doing basically the same work.

The problem is, again, human. Although we like to think of scientific researchers as pillars of pure objectivity and reason, they are actually just people like the rest of us. They behave in predictably human ways.

For instance, consider a group of five researchers.

One of them publishes a study, and the other four “peer review” it. Three of them give it a rubber stamp, because they are very busy with their own research and don’t really have the time to do all the work of somebody else’s study, just to see if it comes out the same. After all, they known and trust their peer, and by extension their work.

The fourth peer does his duty properly, and invests a lot of time tearing apart his peer’s work. Ultimately, his review reveals a serious flaw of reasoning that ruins the whole study. Sadly, the study had cost $300,000 to carry out, and that is all now wasted.

The donor who had funded the study is very angry at such a huge waste. He stops funding the department, to the detriment of all five workers.

Meanwhile, that fourth peer finishes his own work shortly thereafter. The three who rubber-stamped the last one again rubber-stamp this one. However, the other peer, who normally rubber-stamps everything, now has a professional vendetta against his peer who sank his research, and so he works very hard to find every single possible flaw in his peer’s work, and succeeds in sinking its publication as well.

The two warring peers now spend a lot of time redoing each other’s work to try to find flaws, crippling their own ability to get work done, and to publish work when they complete it. The three rubber-stamp peers happily go on about their own work, never stopping to look at anyone else’s, and never finding themselves the target of extra scrutiny. Indeed, the dutiful peer reviewer now spends all of his time attacking one specific peer’s research, so the three rubber-stampers can worry even less about scrutiny.

Ultimately, the two warring peers do not get nearly as much published, although when they do, it has come through a very aggressive level of peer review. Consequently, their published works are nearly flawlessly composed.

The three “rubber stampers” get much more work published, but it has not been sufficiently vetted. They overlook things here and there, and nobody checks their work to catch those oversights. Their work may be total garbage, but they make a lot of it for the investors.

A sixth researcher enters the field. The three rubber stampers tell her, directly or otherwise, “You really don’t want to get involved in a peer review war. Look at how it has damaged those guys’ careers! Plus, sinking our own studies really hurts our funding as a whole department. Just stamp ours and we’ll stamp yours. Win-win.”

Over time, the only people with successful careers are those who rubber-stamp each others’ research; the only fields that receive funding are those who rubber-stamp; and research institutions thus become little more than rubber stamp factories of approval for any study done by anybody with a credential by their name and a paradigmatic thesis.

Check it out.

Other than the flaws in our peer review system, these sources of uncertainty in study results are largely considered to be inherent to the human condition. That is, as long as imperfect humans are the ones carrying out the studies, we will have to contend with those kinds of uncertainty.

Thankfully, through diversity of study conditions and researchers as well as a massive scale of experimental repetition, we can reduce that uncertainty almost to 0%. Doing so is expensive and time-consuming, but it is how we have achieved every scientific theory so far. For instance, through massive, incredibly well-funded research and application (we are talking $BILLIONS from the weapons and manufacturing industries), physicists have been able to nail down such impossible questions as what happens to subatomic particles when you accelerate them to the speed of light and slam them into each other. The whole thing sounds imaginary, like angels dancing on the head of a pin, but then you remember that time all the people of Hiroshima and Nagasaki were obliterated exactly as intended, and you decide to take physical science’s fanciful theories very seriously.

However, there is another confounding factor: political agendas of researchers, those who fund their research, and the organizations which publicize and distribute that research. Politicization of “hard sciences” basically does not happen, because the benefits of the research require true replicability for industrial purposes. In contrast, the “soft sciences” are extremely vulnerable to partisan manipulations, because the benefits of the research are mostly political rather than literally productive. “Soft science” research with the intent to publish widely matters more whether people believe it is true than whether it is actually true. (I would expect the most reproducible psychological research to be corporate advertisers’ secrets.)

In short, the natural uncertainty, errors, and potential for bias create a plausibly-deniable opportunity for ideologically motivated “mistakes” to generate studies or guide meta-analyses to produce whatever outcome is desired. Accuracy would be necessary for truly socially optimal public policy, but inaccuracy can be even better for influencing public opinion toward personally or ideologically optimal public policy.

These political manipulations may take many forms:

The researcher deliberately setting up a study with limitations or selective biases such that it will produce the desired, misleading result.

The researcher deliberately failing to observe certain outcomes.

The researcher deliberately excluding known, key confounding variables

The researcher deliberately failing to document certain outcomes.

The researcher deliberately excluding data inappropriately.

The researcher fabricating data.

The researcher deliberately miscalculating to find the desired result.

The researcher applying different methods of analysis until a favorable result is achieved.

The researcher deliberately misinterpreting data to arrive at an unsupported conclusion.

The researcher deliberately stretching implications to provide an exaggerated conclusion.

The researcher only providing peer review to opposition research, and automatically rubber-stamping “friendly” research.

The researcher declining to publish unfavorable results.

The researcher publishing the results with an abstract that says the opposite of what the study actually found.

The researcher deliberately refusing to correct misinterpretations and exaggerations of their research by “friendly” media outlets.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

We like to think of our researchers as being paragons of virtue and objectivity, with no personal or self-interest and superhuman discipline and intellect. We like to believe that researchers are illuminating all darkness in an altruistic quest to better humankind.

The reality is that even the best attempts at objective research have inevitable room for bias and error. Furthermore, most research that could have partisan political implications is carried out by large, partisan institutions who reliably produce intentionally biased studies primarily as a way of adding a veneer of “objectivity” to their ideological belief systems. Furthermore, those partisan institutions seek to spread their partisanship to as many other respected institutions as possible; if they can monopolize the whole field, their ideology becomes “fact.”

All of this is not to say that scientific research is not the best method we have developed for understanding the world. It definitely is better than all of the other methods we have. Indeed, the scientific method is an attempt at escaping the natural human inclination towards biased and wrong thinking, and provides a very good, though imperfect, structure toward that end. The limitations of science are our own human failures to carry out the method correctly and honestly. We would do even worse without our flawed attempt at the scientific method than we do with it!

One of the most important features of scientific thinking is an eternal agnosticism about existing theory, and a strong emphasis on challenging assumptions and results through criticism and repetition. Study results exist to be questioned and challenged; that’s why they have to lay out everything that they did, and all of their calculations, and all of their reasoning: so that anyone can see if what they concluded is actually supported by their study, and what ambiguities remain.

Don’t believe everything that has a little data and a fancy sounding “research institute” stapled to it. Remain scientifically skeptical at all times, lest you find yourself worshiping beside a false prophet of a corrupt church.

How do you know if study results are really accurate? You never truly can, but you can greatly reduce uncertainty by looking almost only at large studies and, better yet, meta analysis comparing many study outcomes between ideologically opposed research institutes. Remember, scientific knowledge is part of the culture war, and anyway almost always builds very slowly; it doesn’t come from radical new findings, which usually turn out to have been misunderstandings. This is boring and slow, of course; all the real science is old news, because it’s confirming some small study that made headlines with its unfounded implications, or confirming another small study that refuted the first. In science, boring and slow is best.

A moderately educated person knows a lot of scientific facts. A very well educated scientist knows which “facts” are most important to test for accuracy next, and which might probably be okay to rely upon for the time being, but should never fully be trusted.