Ideas on how to improve scientific research

Bridging the gap from scientific discovery to new products.

Update: the idea proposed in this blog post has now launched at https://www.researchhub.com/. Please go check it out and join to show your support.

Academic research could be much more efficient. It struggles with:

prioritization of what to work on

reproducibility of results

alignment to market incentives

access to funding

cost and delays from academic journals

In this post, I’ll explore some ideas on how to improve scientific research, including how to make it work more like open source software.

Several million academic papers are published each year, but only a small handful of them actually lead to new products and services where people can really benefit from them.

Scientists often live in their own world, producing insights that are only read by other scientists. At the same time, business people often live in their own world, creating products that lack any real technological innovation. There isn’t much information flow between the groups¹.

Most new products that get created rely on marketing and branding to differentiate. This is common in the cosmetics, food, and fitness/diet industries, for instance, where private label manufacturers make many of the same products under different labels.

Here are a bunch of drinks sold at Whole Foods. They are marketed as improving energy, passion, and healing but really they are all repackaging the same few FDA approved ingredients (caffeine, theanine, sugar, etc). Where is the drink with some kind of proprietary molecule or intellectual property in it?

The exceptions to this proves the rule. Companies that bridge the divide (or successfully marry technology with business) tend to be the most valuable: Tesla, Genentech, Google, Apple, SpaceX, etc. Often these seem to require a technical founder (or founding team), with sufficient understanding of both science and business².

People who understand both are rare. They are the intersection of two already rare groups. Many scientists have an allergic reaction to business, and many business people are unable to distinguish real science from pseudoscience. Perhaps, if we didn’t have to rely on these rare bilingual people, we’d see more innovative products in the world. Maybe we can create some translation tools to help improve the flow of ideas from academic research to business.

Some challenges with research today

Academia exists in a weird alternate reality where money and traditional market incentives don’t seem to matter. Tenure, citations, and the opinion of your peers is what leads to grants, so instead this has become the currency of academia³.

Here are some problems that I see with research today:

Reproducibility

In some fields, more than 50% of experiments are not able to be reproduced. Many papers do not include the underlying datasets. Researchers sometimes have an incentive to hide key details in their paper to stay one step ahead of competing labs. In addition, negative results⁵ are less likely to be published. Prioritization

We often don’t fund or pursue work that would do the most good⁶. While it can be difficult to tell what a line of inquiry will eventually lead to (many great discoveries happened while looking at something unrelated), it would be nice if there was a better feedback mechanism from private industry about the most important challenges they are facing. This ties into “market incenives” below. One scientist I spoke with while writing this post shared an anecdote that “In my field of [redacted], I see >70% of publications being on technology that will never come close to commercialization. [a recent big push of $100M in funding] yielded, in my opinion, nothing.” Market Incentives

Research seems too disconnected from the real world at times, perhaps because scientists don’t often capture the financial upside of their work. Instead, they use alternate currencies (like citations). Perhaps licensing⁷ could be simplified to align research more with market incentives and help scientists capture more upside form their inventions. Funding

It can take a huge amount of time for researchers to apply for grants and receive funding⁸. They often have to alter their ideas⁹ to fit them into grant proposals (i.e. could I argue that this is somehow relevant to defense spending?¹⁰) and a lot of money goes to running the university itself (about 50% at Stanford, for instance). In addition, a lot of NIH/NSF funding is arguably too conservative. The people writing the grants are trying to minimize downside, not maximize upside, which means they are likely to overlook (contrarian) breakthrough ideas. Speed

Making a new discovery is difficult enough, but once one is made, it can still be many years before it sees the light of day. The process of applying to journals and peer review adds a lot of delay. In addition, people are hesitant to publish in-progress or half finished work (although “pre-print” servers have helped with this), so there is a tendency to wait until something is “done” to share it. Comprehensibility

Academic papers use a lot of jargon that is only decipherable by other people in the field. Some of this is unavoidable (research is about complex topics), but it also makes it less likely that people outside the field will understand the potential of a new discovery. Volume

There are a large number of papers published, some without much original content. This can make it difficult to wade through all the material out there to see what might be relevant to you. Cost/Accessibility

A lot of research is funded by taxpayer dollars and should arguably be free to the public, but instead it is locked up behind paywalls at journals. There are workarounds for this (Sci-Hub, which is illegal, or just emailing the author directly to ask for a copy), but the journals add a lot of friction to research without adding much value. Ivory Tower

Research can take place anywhere from CERN’s Large Hadron Collider to a garage, and be done by anyone from tenured professors to hobbyists (or even anonymous people). It can also take place in private companies, and in any language (right now there is a barrier between machine learning papers published in Chinese and English). It would be nice to see a wider definition of research come together all in one community. Journals

The academic journals seem to slow things down in terms of time and money, in exchange for providing some curation and credentialing. The aggregate revenue of the academic journals is about $20B annually, for little value add.

A proposed solution

It’s easy to point out problems, and much harder to create solutions. I’ll try the latter.

It would be nice if research happened more like open source software and was more aligned with market incentives. It would also be nice if there was prioritization like Reddit, comments like Google Docs, and pull requests like GitHub.

My proposed solution is an app or website that brings together a community of researchers in a novel way. It attempts to improve the quality and speed at which research gets done, how it gets communicated to entrepreneurs and business people, and how scientists could get funding and upside.

This idea is not very new, I just don’t think it’s been executed yet.

You’ll see elements of Reddit, Github, Wikipedia, StackExchange, RapGenius, and Kickstarter present. Part of this is really just taking the best of what is already working online.

The other part of this is providing a viable alternative to scientific journals. To do this we need to replicate the positive aspects of journals (curation of content and status/reputation for those getting published) and eliminate the negative aspects (cost and delays).

Below are each of the core components of my proposed solution:

Ranking/Prioritization

The sheer volume of papers published every day is overwhelming. If we can get trusted people to rate papers, we can turn this into rankings or leaderboards. Imagine being able to see the “top papers in biology this year/month/week” based on crowd sourced votes from knowledgeable people. If you only have a few hours to read papers in a month (like me), this would help a lot.

Perhaps out of some sense of politeness, people in academia seem to be reluctant to give a thumbs up/down to their colleagues work in a public forum. Journals often don’t publish the identity of the people doing the peer review. This strikes me as a key part of the problem.

The ratings could be similar to Google’s Page Rank algorithm, meaning they are weighted by how knowledgeable or trusted the rater is.

I could imagine research being rated across a handful of metrics:

Originality

Is there a genuine breakthrough here that added a branch on the tree of knowledge? Reproducibility

Does the paper contain sufficient detail for others to reproduce the work, and how many other people/labs have actually been able to do so? Commercial viability

Could this research plausibly lead to something that would benefit people?

These ratings could be aggregated into an overall score for each paper, possibly eventually incorporating hundreds of variables (like Page Rank). There are some basic rankings today, such as the H-Index, which looks at how often a paper is cited.

It would be interesting to try aggregating papers into meta studies as well, to generate a “confidence” score on a particular conclusion, and see how it has changed over time.

There should be some way to clearly mark research as “in progress” as well to prevent people getting negative ratings before it is really “finished”. In theory, all research is “in progress”, so the idea of a static PDF seems woefully outdated.

Reputation

The reputation of the people doing the rating (and discussion) in the community is the other key component here. For this site to be trusted, the people with real knowledge of a field need to have their voice rise to the top, and not be drowned out or bullied by the internet trolls. Similar to other online forums (Reddit, Hacker News, StackExchange) users should develop a reputation over time based on their contributions. This can be derived from their comments, edits, original research, or it could be their reputation outside of the app (LinkedIn, more traditional academic credentials, etc) which could appear on a profile page. In other words, traditional academic credentials shouldn’t be the only way to develop a reputation in research.

It’s an interesting question about whether the site should allow anonymous users. My instinct is to say yes, and allow users to be anonymous if they want to. The reason is simply that there are some fields where it is difficult to offer a dissenting opinion without repercussions. As Sam Altman points out, “nearly all ideas that turn out to be great breakthroughs start out sounding like terrible ideas”. Galileo was famously imprisoned for proposing the idea that the earth moved around the sun. Satoshi proposed the idea of Bitcoin under a pseudonym. I’ve heard stories from a handful of scientists who faced retaliation when their research was contradictory or competitive with another well respected person in the field.

Users who want to remain anonymous may not be able to bring their external reputation with them to the site (this is downside), but hopefully they can speak freely and “crazy” ideas can be evaluated on their own merit.

The most important reason to have reputation on the site is just to have sane discussion take place so that legitimate scientists feel they can interact with other rational people.

Funding

How do we align incentives amongst everyone in this community? We want to encourage participation (submitting research, commenting/editing, etc) and also fund research that the community deems worthy.

One idea on this is to do an ICO of sorts, and give away a coin that incentivises the behavior the community needs to keep growing. People could potentially also apply for grants and get paid in this new coin.

Capturing upside

Today, there is a ton of promising research that never gets commercialized. When it does happen, a company or entrepreneur typically reaches out to the tech transfer office of the university to try and license the technology. Or in some cases, the researchers themselves try to spin out companies, with varying degrees of success.

I wonder if the process of licensing technology from researchers could be made much simpler. Imagine having a “License” button at the bottom of every research paper profile page that hand holds you through the process. Or imagine having standard licensing terms, similar to the YCombinator SAFE documents, but for licensing the technology. Ideally, people could license your technology in 5 minutes, without ever having the pick up the phone.

Every piece of research published could be available under one of the following licenses (for instance):

Free, public domain

This could actually be a requirement depending on the source of funding. “Standard” license terms

For instance, receive 1–5% of profit for any product derived from it for the first 5 years. Non-exclusive license. Custom

Contact the tech transfer office (or equivalent) to discuss a custom deal.

Plain english summaries

I find most academic papers to be fairly challenging to read, so it would be great to see a “plain English” explanation of what each paper attempted to do, and what it found. These summaries could be crowdsourced like Wikipedia or written by paying grad students, for instance.

Most papers contain an “abstract” and “conclusion” section that takes a good step in this direction, but I don’t think it goes far enough.

For example, take this article which I selected at random (it happens to be at the top of page on archiv.org today as I’m writing this):

Here is the abstract:

This is a good example of the flowery language that sometimes can make research unapproachable. An even more concise summary would probably suffice, such as “does stress in the womb cause neurodegenerative disease later in life?”

Side note: I tried reading the conclusion of this paper also, but after doing so I could not tell you (1) what did they try doing? (2) did it work? (3) what are the limitations? and (4) what are the suggested next steps?

So I think summarizing research for a wider audience could be dramatically improved, perhaps targeting a high school or bachelors level of reading, or limiting summaries to the N most common words in the English language. Wikipedia is a great example to follow here. They have some content on how to make technical articles more understandable and how to write clear articles in general. Their guidelines suggest a “straightforward, just the facts” style that is free from opinion and “available to the widest possible general audience”

I’ve wondered at times whether scientists are hesitant to explain things in plain language for good reason. People may inappropriately act on what they are saying as medical advice, they may be misquoted or attacked by digital mobs, or they may be accused of “hyping” their work instead of speaking in the precise language of science. Speaking in code helps ensure that only the people they want to speak with can understand what they’re saying. This is a reasonable survival instinct, but it also means that most research is happening in small closed groups. Wikipedia I think demonstrates that it is possible to explain complex topics to a wide audience while still getting into technical detail when necessary.

Discussion, editing, and collaboration

It would be great to see modern tools in this area applied to research.

Sites like Reddit, Hacker News, and StackExchange have demonstrated how powerful nested and voted/sorted comments are. It’s surprising to me that this hasn’t become ubiquitous on the internet. They are not terribly difficult to implement, yet the vast majority of sites on the internet still have comment section that are chronological and filled with low quality content. This includes every site I’ve seen where academic research can be discussed.

It would also be interesting to try inline comments like on Genius.com or Google Docs to discuss specific lines in papers.

Discussion is just scratching the surface though. Why not allow people to submit pull requests to papers like on Github, or make suggested edits like in Google Docs? Why not let people add collaborators (even if they’ve never met in real life). Why not let people fork research and take it in a new direction?

One of the biggest issues with research today (at least in my view) is that teams seem to work in isolation until something is “done” or ready for publication. Of course, research (just like software) is rarely “done”; it is continually being refined. I think open source software has a much better model and culture here, where you make your very first commit public from day one and it is never “done”. It’s totally ok in open source to show work that is “in progress”. In fact, that is half the point because you never know who might show up to help you along the way.

Open to everyone

I have a hunch that we’d get more innovation if research was opened up beyond closed academic circles. There may be someone with an important new idea, who is just working in their garage, or a laboratory on the other side of the world.

Many great innovations happen as a by product of trying to build something. Bringing applied scientists, engineers, and researchers closer together in one community could help a lot.

It could be great to see tenured professors and tinkerers commenting side by side. We could expand the number of people collaborating on a particular problem by 10x.

Expanding beyond the idea of “papers”

A paper implies a finished publication with a fixed set of contributors.

It would be great if people could post in progress research as well. Maybe just a hypothesis (something you’d like to see tested in the future), or a dataset (that you weren’t able to draw any conclusions from, but maybe someone else could). This could also take the form of a Jupyter notebook.

Maybe the more fundamental unit is really an experiment. What did you try? What was the result? What does this imply? How sure are we that it is true/correct? What are some examples where this could be useful?

What are people using today since this doesn’t exist?

Each of the items below is solving a piece of the problem.

Networking/Conferences/Friends

This is the primary method today, although it relies too much on happenstance. How many companies were never started because two people failed to meet, each one with half of a good idea? Wikipedia

Some of the biggest/most popular scientific breakthroughs get plain english explanations written up here in a way that non-academics can understand, but there isn’t enough coverage. Github

We’re starting to see more datasets published here, which is nice. Arxiv.org, PubMed, PubChem, Biorxiv, etc

The pre-print servers have been a step in the right direction (eliminating some paywalls), and at least some papers are available to purchase online now. Arxiv-sanity

Great step in the right direction to make the overwhelming number of papers which are published a bit more accessible, but could go even further. For instance, it only covers machine learning papers. Reddit/HackerNews/StackExchange

Some topics have garnered their own sub-reddits, like /r/machine-learning. HackerNews is where I first read about the Bitcoin whitepaper (I didn’t regularly read CS papers, but I happened to read the Bitcoin whitepaper after seeing it get up-voted there). These sites give you the ranking and reputation piece. YouTube

Channels that explain topics like cryptocurrency and machine learning make certain research topics more accessible. Sci-Hub

People have taken to pirating papers to make them more accessible online, even though it breaks the law. Big tech companies

The GAFA companies and a few others have done so well in part because they have great scientists working with good product/business people internally. Much of this research is proprietary. Experiment.com/ScienceExchange.com

Crowdsourcing research. Google Scholar, ResearchGate

Great steps in the right direction. Meta.org, OccamszRazor

Uses natural language processing to try and make discovery of related papers easier. Center For Open Science

And the Open Science Framework.

Others have noticed that there is an opportunity here for a killer app.

Conclusion

Innovation is sometimes just peanut butter and chocolate coming together to create something greater than the sum of the parts. You have half an idea, and someone somewhere in the world has the other half of an idea. If you can both find each other, innovation happens faster.

For this to work, at least one of you need to publish your half baked idea (I call this putting out a bat signal) so the other person can find you. Making academic research happen more like software (Github, etc) would be a great step in this direction.

In the killer app I’m thinking of, people would start connecting long before a paper gets published. There could be research that is done in the open from the beginning, or private research areas where people begin to collaborate.

I’m writing this to put out my own bat signal. Maybe someone in the world has another piece of the solution. I’ve also thought about investing in a good team that might want to tackle this challenge.

If you’re interested in hearing more about this idea, have a piece of the solution you think I should know about, or might want to work on this yourself please fill out this form here.

Update: the idea proposed in this blog post has now launched at https://www.researchhub.com/. Please go check it out and join to show your support.