Scientists live in a very fast-paced world where hundreds of new papers – each representing months or years of hard work – flash by each day. With such a large volume of information, it would be nice to have some way of organising it, making it possible to discover papers you might have missed, and also allowing someone new to your field to know where to start.

The arXiv – an open website where scientists can publish their research and make it immediately available for the whole world to read – has been around since 1991, but in that time it has amassed a staggering 865,000 papers in physics, mathematics, computer science, statistics, quantitative biology and finance. Each weekday around 300 new research papers are added, on everything from the Higgs boson to quantum teleportation to the formation of stars.

While all these topics are quite distinct, there are many cross-disciplinary works that draw from completely different areas, and we thought it would be interesting to build a map that could visualise how the whole thing fits together – a bit like six degrees of separation, but for research papers.

If you sat a person down and asked them to group arXiv papers together based on their topic (such as black holes or quantum teleportation), they could probably do a very good job of making a map, but it would just take too long. Instead, we turned to computers.

Scientific papers have an inherent structure that is perfect for automatically making a map: in their reference sections are a bibliography of other papers that are on the same, or a closely related, topic. So we could put papers that refer to each other closer together on the map than those that have no such links. We used an algorithm that simulates the formation of galaxies, replacing stars with papers, and turned the attractive force of gravity into a repulsive, anti-gravity in order to spread papers out across our landscape.

We called our online map Paperscape (click on the image above to enlarge). Each circle represents a scientific paper with its area proportional to the number of citations that paper has. Papers in different arXiv categories (such as physics, mathematics, computer science) are coloured differently.

Clicking on a circle displays that paper's reference, with its links on arXiv, including a pdf of the paper.

Detail of Paperscape map representing high-energy theoretical physics. Image: Paperscape

What is really interesting to see is that high-energy theoretical physics (hep-th, the big blue blob) is the central structure in the map. This subsection of theoretical physics has laid down a lot of the foundations of fundamental physics, and ties together other areas such as high-energy physics phenomenology (the prediction and study of experimental results), astrophysics, condensed matter physics, quantum physics and also some parts of mathematics.

Being the centre of the map comes with a downside, however. Theoretical physics is very tightly interwoven and it is therefore difficult to isolate individual sub-topics, unlike in the other categories.

We can see that the other coloured areas are also generally more tightly clumped together. This means that scientists refer to papers mostly within their arXiv-defined category. So, either the arXiv picked its initial categories well, or the arXiv is actually defining scientific research. Most likely it is the former, plus the fact that scientists usually read papers within their own category.

Detail of Paperscape map showing interface between astrophysics and high-energy physics. Image: Paperscape

Another interesting aspect is the interface between the coloured categories, where we find cross-disciplinary papers. For example, the connections between astrophysics (astro-ph) and theoretical high-energy physics (hep-th) are topics such as the study of inflation and dark energy. Both of these areas require techniques from astrophysics, such as measurements of the cosmic microwave background, as well as detailed theoretical calculations – hence the large overlap in the map.

Between astrophysics and high-energy physics phenomenology (hep-ph) lies the field of dark matter. Dark matter can explain why our universe has the structure it does today, and also why galaxies rotate faster than we would expect based on the combined mass of their constituent stars. It brings together the categories of astrophysics, concerned with measurements of galaxies and the history of the universe, and high-energy physics phenomenology, which attempts to explain dark matter in terms of new fundamental particles that can be detected in high-energy experiments such as the Large Hadron Collider at Cern.

Detail of Paperscape map, with 'hot topics' (recently published papers) highlighted in red. Image: Paperscape

We have also implemented an alternative colour scheme which depicts, instead of category, how recently a paper was written. This allows one to readily find the "hot topics" in a given area, just by looking for the bright red regions.

It's fantastic to see the whole arXiv in one big picture, because this way you know you haven't missed anything important. We hope that the Paperscape map makes the arXiv less overwhelming, especially for students entering the field, but also for seasoned academics, so they can perhaps read outside their field now that it's easy to find the big papers and hot topics.

Paperscape is still under active development, and we plan to add features that allow users to organise arXiv papers with tags and share collections of papers with others. But for now we are happy to finally have a real window into the world of research that is both aesthetically beautiful and also, hopefully, useful for finding papers you never knew existed.