The Cambridge Structural Database has been collecting the crystal structures for organic and organometallic compounds since 1965, and has just collected its one millionth structure.

To celebrate this milestone, we’ve worked with some of the the staff at the Cambridge Crystallographic Data Centre, which curates the CSD, to show some of the individual highlights of their first one million structures, and some of the chemistry and crystallographic trends that a collection of so many structures reveals.

Each structure has its own unique ‘refcode’ identifier made up of six letters, to which we refer below. According to the CCDC’s CSD team members, they try to ensure the names are broadly pronouncable while also excluding swear words in as many languages as they can. As you can imagine, the staff have a few of their own favourite refcodes – did you ever make a MUPPET, KITTEN, DONKEY or BATMAN in the lab? We didn’t want to ask about BADBOY or BANGED but would be interested in studying BARBAD05 in situ.

The highlights

Take a roller coaster ride through the world’s tightest knot, reported and deposited in 2017 – and covered by your favourite monthly chemistry magazine too.

For all the above structures, carbon atoms are grey; oxygen = red; hydrogen = white; nitrogen = blue; chlorine = bright green; fluorine = pea green; sulfur = yellow; phosphorus = orange; ruthenium = blue-grey; cobalt = pink; silver = err silver.

Trends in chemistry and crystallography

Looking back through the CSD’s entries also gives chemists a peek at how chemistry and science have changed over the years. Here are some of the trends that the CSD’s stock of structures reveals.

Elements

Unsurprisingly for a database of organic compounds, the most common elements found in its molecules are carbon, hydrogen, oxygen and nitrogen (so we’ve ignored them in these plots) - but chemists are no slouches when it comes to using the rest of the periodic table. As they have become more adventurous, even the extremes of the table have been combined with carbon and company – from the database’s first helium compound (YEMTUH), deposited in 2013, all the way up to the synthetic elements with no fewer than five californium compounds.

But which elements (apart from C, H, O, N) have been most popular with chemists over the years? Place your bets…

Functional groups

So what’s the most popular functional group? There’s been a tussle for the top spot over the years but halides have the upper hand today. Some notable compound classes have also been included in the mix - if you weren’t already convinced that MOFs are the molecules of the moment, the surge in their CSD entries clearly shows the meteoric rise of the MOF makers.

Further, better, faster, stronger

Things have change alot since the CSD started back in 1965. Crystallographic techniques have improved and we’re making and studying bigger and more complex molecules all the time. Average formula weight and unit cell sizes keep rising and, on average, there are more elements in our molecules.

The R-factor (a quality measure that represents how well the structure fits the data) has steadily decreased too, although it has levelled off, which may be for a few reasons: as techniques get better, we study more complicated samples, so the effects cancel each other out; and a value of 5 is widely regarded as a sufficiently good fit for most purposes.

Authors

The CSD’s top contributors are crystallographers who solve these structures day in and day out. As is only fitting for a Cambridge database, a bumps chart is really the only way to show who’s who among the top crystal contributors. Of course, these scientists work together as well, so some of them will share the credit for a particular structure.

In fact, the growth in collaboration among researchers generally is also evident from the CSD’s entries, with the number of authors per structure increasing over time.

Even odds

Back in 1996, a group of chemists spotted an odd feature of the molecules listed in various chemistry databases: structures with even numbers of carbon atoms occur more frequently than those with odd numbers of carbons atoms. The disparity seems to come from the way we make molecules: even-numbered compounds have simpler synthetic routes, and the trend is still apparent even today. How odd.

Acknowledgements

One million thanks to Natalie Johnson, Clare Tovee, Seth Wiggin and Suzanna Ward of the CCDC’s CSD team for their work extracting all this data.