Fork Science

No, this isn’t about cutlery.

I’m proposing to fork science in the sense that Bitcoin was forked, into an adversarial science and a crowdsourced science.

As with Bitcoin, I have no expectation that the two branches will be equal.

These ideas could apply to most fields of science, but some fields need change more than others. P-values and p-hacking controversy are signs that a field needs change. Fields that don’t care much about p-values don’t need as much change, e.g. physics and computer science. I’ll focus mainly on medicine and psychology, and leave aside the harder-to-improve social sciences.

What do we mean by the word Science?

The term “science” has a range of meanings.

One extreme focuses on “perform experiments in order to test hypotheses”, as in The Scientist In The Crib. I’ll call this the personal knowledge version of science.

A different extreme includes formal institutions such as peer review, RCTs, etc. I’ll call this the authoritative knowledge version of science.

Both of these meanings of the word science are floating around, with little effort to distinguish them [1]. I suspect that promotes confusion about what standards to apply to scientific claims. And I’m concerned that people will use the high status of authoritative science to encourage us to ignore knowledge that doesn’t fit within its paradigm.

Background

Up through the early 1900s, the scientific community seems to have been small enough that most decisions could be made by subgroups that were smaller than the Dunbar number. Scientific credibility could be evaluated in part by the normal social mechanisms of a medium-sized tribe. Those social mechanisms helped minimize harm from special-interest groups [2].

Fast forward to 2018: the scientific establishment is heavily influenced by some big industries: the academic citation-and-tenure industry, the drug and medical device industries, the self-help advice industry, the science story industry [aka the news media], etc. Many people in those industries are willing to p-hack their way to success, even if they know that doing so will reduce the credibility of their field a decade later. Peer pressure limits that trend a bit, but with groups much larger than the Dunbar number, and large amounts of money (or status) at stake, peer pressure isn’t strong enough.

Some scientific authorities have responded to these pressures by requiring increasingly high standards of evidence [3]. I suspect this consists mostly of ad-hoc reactions to specific symptoms of the underlying problem. Even when scientists have some awareness of the root causes, most scientific fields tend not to attract people who are interested in thinking systematically about institutional design. There may even be some pressure to signal honesty by not wanting to believe that many scientists are motivated by anything other than the truth, and inaccurate beliefs about the motives of scientists seems likely to hinder clear analysis.

high level view of the fork

We want to use very different standards of evidence for decisions made by the Supreme Court than we use when deciding whether to hire a consultant, whom to invite to a dinner party, or whether to trust an Ebay seller.

Likewise, we want high standards of evidence when deciding whether a new drug should be paid for by insurance/welfare-type programs. We should be willing to tolerate much more informal standards of evidence for questions such as how much wine or coffee I should drink today.

Much of that difference is due to how much value is at stake in each decision. But some of that is due to the cost of getting more evidence – it’s hard to get new evidence about whether abortion is a right, but each drink provides new evidence about the effects on social interactions and hangovers.

I propose that scientific institutions, at least for fields that are prone to controversy, be split into institutions with these two focuses:

adversarial science : built to produce authoritative knowledge

: built to produce authoritative knowledge crowdsourced science: built to encourage the production and dissemination of scientific knowledge for which it has not yet been feasible to generate authoritative knowledge

Adversarial science would focus on evidence for which there are good reasons to think researchers are motivated to mislead us, and for which the expense (in money, time, or other resources) of replication deters us from using that to limit bias and fraud.

Crowdsourced science would focus on evidence for which it is relatively easy to attempt a replication, so that the primary defense against fraud and bias would be to have many independent experiments.

I recommend that adversarial science continue to move in the direction of requiring proof beyond a reasonable doubt. Let’s stop thinking of peer-review as an adequate first line of defense against bad incentives. Instead, we could have someone be paid to point out any flaws in an adversarial science paper or in a proposed drug.

But adversarial science means that lots of effort is needed to publish knowledge, causing lower fraction of results to be published. That limits the amount of evidence that gets published, and contributes to publication bias – why should I devote lots of effort to publishing a negative result [4] [5]?

I expect that some of you will react by imagining that the extra cost doesn’t deter many competent researchers.

I suggest imagining the difference between startups being able to choose either Indiegogo funding or a path to an SEC-regulated IPO, as opposed to only having the latter option. Even if the most important companies all use the latter, there are many medium-value smaller companies that wouldn’t get started without something as simple as Indegogo.

Or for those of you who have been to a CFAR workshop in 2014 or later: those workshops are the result of hundreds (or maybe thousands?) of little experiments. If they needed to be done in ways that were documented in peer-reviewed papers, I’m guessing they would have taken an order of magnitude more time and effort.

Imagine how much software we’d have if software could only be distributed via brick-and-mortar retail stores, compared to what we have now that anyone can write a program and distribute it via github. I think the former is roughly what we’re doing for many areas of scientific knowledge.

I’m hoping for a site that does for medical evidence what Wikipedia did for encyclopedias.

Ideas for crowdsourced approaches

Betting and/or prediction markets would be a good way to generate crowdsourced knowledge.

Alas, this probably requires a good deal of money in order to ensure that it’s expensive to manipulate prices, and to get participants interested in betting on valuable topics. The main benefits are public goods, so it pretty much requires something like a charity to ensure that it is subsidized.

It takes some effort to identify predictions that would be more valuable than the required subsidy, given that we have little evidence about how much subsidy is needed. Also, markets conspicuously increase inequality among markets participants, while the benefits are typically not conspicuous, making it hard to see why it’s altruistic to subsidize markets.

So I’m pessimistic about getting prediction markets for many scientific questions. I restrict my hopes to a few of the most cost effective topics: bets on which medical hypotheses will produce the most valuable evidence if tested via expensive RCTs. I’m hoping that this can be used to direct funding from governments and/or/large charities for unpatented medical interventions. E.g. does program such-and-such delay Alzheimer’s by getting people to exercise more? Does lower saturated fat consumption by people with Apoe4 alleles reduce all-cause mortality?

Another approach is to apply compression to large databases. Given one or more large public databases with relevant evidence, it shouldn’t be too hard to get people competing to produce better results as quantified by the compression.

Here the main stumbling block is the large public database. Lots of data are being generated. Some of it gets into a variety of databases that a few researchers can access, with nontrivial restrictions that likely prevent many people from analyzing them. A much tinier fraction gets mentioned publicly in poorly organized anecdotes on various forums and blogs. But most of the data are effectively unavailable to most people who might want to analyze them.

For example, The End of Alzheimer’s has presumably prompted many thousands of people to try interventions, and to carefully measure responses via blood tests. E.g. people taking folate, B6, and/or TMG and measuring the resulting change in their homocysteine levels. It seems unlikely that many of those results will be used to inform others who are trying similar interventions. I suspect that even many of the interventions that motivated the book’s advice have not been published.

These experiments could in principle be aggregated into a valuable database. I imagine a user submitting a blood test before starting an intervention, a description of the intervention, and one or more blood tests for the same biomarker taken after the user expects the effects to be observable. Plus, ideally, genetic info, such as 23andMe results.

I’m concerned about selective reporting, so I hope there’s some way to get most users to pre-register their interventions. It shouldn’t be too hard to make a web site that would make such pre-registration somewhat easy [6].

It should be possible to get blood test results automatically sent to this database, although I expect this will be delayed a good deal by institutional resistance.

If such a database becomes important, it will risk having special interests, such as drug companies, pay people to submit false or misleading reports. So it would be nice if the database owner could require conflict of interest disclosure from users, with significant penalties for dishonesty. I’m unsure whether this can be adequately accomplished without additional legislation. But a database should still become somewhat valuable without fully addressing this issue.

Blood tests are just one example of evidence that would belong in such a database – an example that I chose because of the frequency with which reliable data goes unreported. Other examples of what might belong here:

various interventions for depression, combined with some standard measurement of depression severity.

for any drug or supplement, some measure of whether the user needed an unusually small or large dose – I’m tired of having to start with a dose that’s standardized for an average person, when at least some of the time it should be possible to tell from my genes that I want a much smaller or larger dose [7].

plans to diet, followed by reports of how well the user maintained the diet, with weight measurements over a period of several months.

survey results.

An obvious complication is the tension between privacy and ease of research.

I want to push for more openness about medical data, for reasons hinted at by David Brin (especially genetic data). I expect most of us would be better off if we all agreed to make our medical data public. But the risk/reward ratio isn’t great for individuals who do this unilaterally.

But I don’t know whether enough people will be willing to make their medical data public, so I’ll also think about alternatives. differential privacy sounds like a somewhat promising approach to making a database fairly useful to many researchers, while still maintaining as much privacy as most participants would have a use for. I don’t understand it well enough to confidently analyze the costs and benefits.

It would be really nice if this could be implemented by a charity that could afford to reward people who contribute information [8]. But I’m not very optimistic that there are enough donors who would be willing to fund this.

Could it work like Wikipedia? I don’t quite see how there’d be enough incentive to contribute, but I’d love to be proved wrong.

So I’ll try imagining a for-profit version. Here’s a possible business model (loosely inspired by LabDoor):

Advertisers pay to send suggestions to users. Advertisers can target individuals by combinations of test results, genes, age, etc. The system could potentially customize suggested doses to individuals [9].

I envision advertisers bidding for the ability to reach users (somewhat like the bidding for Google AdWords), with users being able to see how much the advertiser bid, and able to set a minimum bid, so as to avoid seeing poorly targeted ads. Maybe users could even get a cut of these bids?

But making ads that are very customized is similar enough to practicing medicine that it might get regulated heavily (even if it was clearly legal to blindly advertise higher doses of the same vitamins, doses that might be dangerous to some users). Could a business like this afford to hire doctors to approve all the relevant decisions? That’s a bit too tricky for me to evaluate.

I see some risk that such a business might neglect topics for which they can’t sell ads. But I see potentially big advantages to having the most comprehensive database – see Google or Craigslist for examples where non-paying sections survive, at least when they don’t require separate maintenance.

I think this business model would work given some optimistic assumptions, but it’s easy to imagine it failing to attract users, failing to make money, or finding more profitable strategies that don’t generate much public scientific knowledge.

There’s no particular shortage of companies and other organizations that are trying to do something vaguely like what I want. Yet it seems like they’re only generating a tiny fraction of the benefits that they could.

CureTogether has a good deal of what I want. Is it still attracting users? Does it have any plans to expand the kind of info it attracts?

PatientsLikeMe has a bunch of promising rhetoric, but I’m not too happy with their means of making money, and I don’t see an easy way to tell whether they’re doing anything desirable.

All of Us appears to have fairly good ambitions; I’m a bit unclear what data they will collect and how hard it will be for researchers to access.

23andMe tries to be health-oriented, but got distracted by evidence that’s only weakly related to what interventions I should try.

Ancestry.com has lots of data; it’s unclear whether they can make it helpful for health-related decisions.

Decode Genetics has a big database of Icelandic genes. I’m unclear what’s being done with that data.

HumanDx is a non-profit that’s hard to categorize or evaluate.

I’m sure I overlooked some promising projects here – let me know if you’re aware of one that better fits my hopes.

I can’t resist mentioning a somewhat off-topic but related idea: prediction based medicine. CrowdMed does something almost as interesting, and I expect I’ll try it if I run into a problem that is hard to diagnose.

Conclusion Final Section, where I wish I had a conclusion

This has been just an outline of some dreams. If I had aimed for a complete plan in whose viability I had confidence, I would probably have procrastinated indefinitely. And this post has grown to take more of my time than I wanted it to take.

My suggestions for better crowdsourced science don’t quite seem as well thought out as I want my typical blog post to be. I’m pretty confident that this post points in the right general direction for science to be moving, but I hope someone else turns these ideas into a more competently articulated business plan.

I’m not at all confident that two is the correct number of subcategories into which we should fork science. Maybe I proposed that mainly because I came up with two decent names, and that decomposing science further would require the help of someone who is better at naming things, or settling for obscure or boring names.

Would industrial science be a better name than adversarial science?

I worry that academia is organized so as to want something in the middle – crowdsourced science has limited need for people who make a career of academic science, and adversarial science seems likely to work better with more industrial-scale projects. Academia seems to be optimized more for providing careers than for generating valuable knowledge, and there’s no obvious source of incentives to change that.

Crowdsourced science needs some institutional support in order to get respect, but academia seems like the wrong place to look for that. Instead, I want to look first at the hope of newly created non-profits, and if that fails, then at for-profit startups.

Footnotes

[1] – This looks a little bit like the differences that Feynman was pointing to when he said “Science is the belief in the ignorance of experts.”. It’s important to distinguish “what do experiments tell us?” from “what do high-status scientists tell us?”, but I want to ignore that for this post, and focus on situations where at least some of the people involved are trying to learn from experimental evidence.

[2] – I presume it also restricted entry so that it was hard to become much of a scientist without being an upper-class man. That was unimportant to the extent we’re just interested in the truth about physics, or maybe even the truth about where to drill for oil; when dealing with economics, the risks and benefits are much bigger.

[3] – Not all of science is heading that way. BMJ Case Reports are closer to crowdsourced science than adversarial science. But I couldn’t have named any such journal before this year, which says something about the status of such journals. Also, it appears to be mostly restricted to people who belong to institutions which pay for it, or to those willing to pay £204 per year to be one of its fellows. I think we can get closer to Wikipedia’s style than that.

[4] – About a decade ago, I looked into doing some experiments with prediction markets to try to find cost-effective ways of subsidizing them, in hopes of informing potential donors to a futarchy-related charity.

I ended up abandoning that plan. Attracting participants would have been too hard if I were doing it alone. I found a professor who was interested in experiments of that general nature, and had the ability to run relevant experiments. But he was focused on proving that he deserved tenure, so he needed to document the experiment rigorously enough to convince a skeptical audience, whereas I was only interested in convincing people who were willing to trust that I had no ulterior motive. I saw a big difference in effort between the two approaches, and got discouraged.

[5] – I sometimes wonder if that’s deliberate – do established scientists want it to be hard for new scientists to compete with them? And it’s not hard to see that some other institutions have a more general desire to avoid science.

[6] – I expect some problems with asking people to pre-register the planned time of a blood test – I often alter those plans somewhat. I expect the pre-registration to mainly be useful at ensuring we can tell how representative a sample we’re getting.

I also expect some problems with describing the interventions. E.g. I planned to take a standard daily dose of ashwagandha, but quickly saw that was a higher than optimal dose, and adjusted the dose many times. It will be hard to make a database flexible enough to handle that, while still being standardized enough to be analyzed by software.

[7] – Folate being a fairly clear-cut example: the MTHFR C677T genes seem to suggest a much higher than standard dose. I suspect there are many other situations like this which could be discovered if we had slightly better ways of sharing evidence.

[8] – It’s somewhat important to keep the payment below the cost of the tests, to minimize the incentive to make money by getting too many tests.

[9] – But there are potential privacy concerns: the advertiser will get some pretty specific info about people who respond to very targeted ads.