Polyhedral models of icosahedral architecture

Virus structures are prominent examples of icosahedral symmetry in biology. Their architectures are currently modelled and classified in terms of the series of Goldberg polyhedra14—three dimensional solids with pentagonal and hexagonal faces—that provide a reference frame for the positions of the capsid proteins (Fig. 1a). In particular, the polyhedral faces indicate the positions of pentagonal and hexagonal protein clusters called pentamers and hexamers, respectively. The same polyhedra also provide blueprints for the atomic positions of the fullerene cages in carbon chemistry, in particular the Buckminster fullerene known as the buckyball1. They also provide blueprints for the structural organisation of a wide range of both man-made and natural protein nanocontainers. Their duals, the geodesic polyhedra15, are the architectural designs of the geodesic domes by Buckminster Fuller.

Goldberg polyhedra can be constructed from a hexagonal grid (lattice) by replacing 12 hexagons by pentagons (Fig. 1b), as required by Euler’s Theorem to generate a closed polyhedral shape16. The distance \(D\) between the pentagons at neighbouring fivefold vertices is the only degree of freedom in this construction, and can therefore be used to label the different geometric options in this infinite series of polyhedra. \(D\) can only take on specific values that are constrained by the underlying hexagonal lattice geometry. In particular, using the hexagonal coordinates \(h\) and \(k\), which take on any integer values or zero to navigate between midpoints of neighbouring hexagons in the lattice, one obtains the following geometric restriction11:

$$T(h,k):= {D}^{2}(h,k)/{A}_{0}=\left({h}^{2}+hk+{k}^{2}\right).$$ (1)

Here, \({A}_{0}\) corresponds to the area of the smallest triangle between any hexagonal midpoints, that is, the case \(h=1\) and \(k=0\)—or equivalently, \(h=0\) and \(k=1\). A similar formula has been derived for elongated capsid structures17.

T is called the triangulation number (Fig. 1c) owing to its geometric interpretation in terms of the icosahedral triangulations obtained by connecting midpoints of neighbouring pentagons and hexagons, i.e., in terms of the dual (geodesic) polyhedra. T indicates the numbers of triangular faces, called facets, in the triangulation that cover a triangular face of the icosahedron by area. The association of a protein subunit with each corner of such a triangular facet translates this infinite series of triangulations into the capsid layouts in quasiequivalence theory (Fig. 1d). Such blueprints only permit capsid layouts with 60T CPs, organised into 12 pentamers and \(10(T-1)\) hexamers11. The condition expressed by Eq. 1 is therefore a geometric restriction on the possible values of T and the possible CP numbers in the CK geometries. The initial elements of the series are \(T=\)1, 3, 4, and 7, and therefore the number of CPs contained in small icosahedral capsids are 60, 180, 240, and 420, respectively (Supplementary Table 1).

However, this is only one way in which an icosahedral structure can be built from repeats of the same (asymmetric) unit, and excludes geometries built from proteins of different sizes (such as a major and minor capsid protein) or capsids built from a protein in which one or several domains play distinguished roles. Such capsid layouts must be constructed from lattices in which every vertex is identical in terms of the lengths, numbers and relative angles of its protruding edges, but the relative angles between different edges at the same vertex can vary, reflecting occupation by different types of proteins or protein domains. From a geometric point of view, there are only 11 lattices (Chapter 2 in Grünbaum and Shephard18) that satisfy this generalised quasi-equivalence principle, which are the Archimedean lattices—also known as uniform lattices13,16. Among these lattices, only four contain a hexagonal sublattice (Fig. 2a). One of them is the hexagonal lattice itself on which the CK classification scheme is based. This lattice is labelled \((6,6,6)\) according to the types of regular polygons surrounding each vertex, in this case three hexagons. However, the hexagonal lattice is only the simplest grid that enables this construction. Other lattices containing hexagons at appropriate distances, that is, as a hexagonal sublattice, are equally amenable to the CK construction, but have until now been ignored. These are the trihexagonal tiling \((3,6,3,6)\), the snub hexagonal tiling \(({3}^{4},6)\), and the rhombitrihexagonal tiling \((3,4,6,4)\) (Fig. 2a). These lattices are also called hexadeltille, snub hextille, and the truncated hexadeltille lattice, respectively16.

Fig. 2 Design of icosahedral architectures from Archimedean lattices. a The four Archimedean lattices permitting the Caspar-Klug construction (from top to bottom): the hexagonal \((6,6,6)\), the trihexagonal \((3,6,3,6)\), the snub hexagonal \(({3}^{4},6)\), and the rhombitrihexagonal \((3,4,6,4)\) lattice. In each case, the asymmetric unit (repeat unit of the lattice) is highlighted. Its overlap with the hexagonal sublattice used for the construction of the icosahedral polyhedra is shown in red. Apart from the case of the hexagonal lattice, this also includes a third of a triangular surface (blue), and in addition a triangle or a half square (both shown in green) for two of the lattices, respectively. b Construction of Archimedean solids via replacement of 12 hexagons by pentagons in analogy to the Caspar-Klug construction (see also Fig. 1b). c The polyhedral shapes corresponding to the examples shown in b. They each correspond to the smallest polyhedron in an infinite series of polyhedra for the given lattice type. Folded structures for larger elements in the new series are provided in Supplementary Fig. 2. d The smallest polyhedral shapes (\({T}_{t}\), \({T}_{s}\) and \({T}_{r}\), denoting polyhedra derived from the trihexagonal, snub hexagonal and rhombitrihexagonal lattices, respectively) are shown organised according to their sizes in context with the Caspar-Klug polyhedra. As surface areas scale according to Eq. (2) with respect to the Caspar-Klug geometries, the new solutions fall into the size gaps in between polyhedra in the Caspar-Klug series, or provide alternative layouts for capsids of the same size, as is the case for \(T(2,0)={T}_{t}(1,1)=4/3T(1,1)=4\) Full size image

By analogy to Caspar and Klug’s construction, we classify the icosahedral polyhedra that can be constructed from these tilings via replacement of 12 hexagons by pentagons (Fig. 2b). Replacement of nearest neighbour hexagons results in each case in an icosahedrally symmetric Archimedean solid (Fig. 2c) that corresponds to the start of an infinite series of polyhedra, constructed by spacing the pentagonal insertions further apart. As a means to characterise different polyhedral structures in the series, we again use the hexagonal coordinates \(h\) and \(k\), now indicating steps between hexagonal midpoints in the hexagonal sublattice, to indicate the possible distances between the pentagonal insertions. In the three additional lattices, the midpoints of neighbouring hexagons are more distal than in the hexagonal lattice. Thus, the area covered by a triangular facet connecting midpoints of neighbouring hexagons (that is, the case \(h=0\) and \(k=1\), or vice versa) is larger than in the CK construction by a factor \({\alpha }_{t}=4/3\approx 1.33\) for the \((3,6,3,6)\) lattice, \({\alpha }_{s}=7/3\approx 2.33\) for the \(({3}^{4},6)\) lattice, and \({\alpha }_{r}=4/3+2/\sqrt{3}\approx 2.49\) for the \((3,4,6,3)\) lattice, i.e., by factors corresponding to the relative sizes of the asymmetric lattice units (see coloured highlights in Fig. 2a). The T-number in the CK construction can therefore be scaled accordingly for the new lattices as follows

$${T}_{j}(h,k):= {\alpha }_{j}\left({h}^{2}+hk+{k}^{2}\right)={\alpha }_{j}\ T(h,k)\ ,$$ (2)

where \(j=t,s,r\) indicates the lattice type used in the construction, denoting the trihexagonal, the snub hexagonal, and the rhombitrihexagonal lattice, respectively. In particular, a polyhedron labelled \({T}_{j}(h,k)\) has the same number of pentagons and hexagons as a \(T(h,k)\) Caspar Klug lattice, but the surface area covered by its faces is larger due to the additional polygons (triangles, squares) between the hexagons and pentagons. This is indicated by the scaling factor \({\alpha }_{j}\) that refers to the gain in surface area according to the planar lattice from which it is constructed as illustrated in Fig. 2.

The resulting geometries (Supplementary Tables 2–4) significantly widen the spectrum of possible icosahedral viral blueprints. For example, \({T}_{t}(1,0)=4/3\), \({T}_{s}(1,0)=7/3\) and \({T}_{r}(1,0)=(4/3+2/\sqrt{3})\) are in between the \(T(1,0)=1\) and \(T(1,1)=3\) CK blueprints in terms of capsid size (Fig. 2d) if their hexagonal (sub)lattices are assumed to have the same footprint on the capsid surface, that is, same CP sizes. Additionally, some of these geometries constitute alternative layouts for similarly-sized CK geometries, such as \({T}_{t}(1,1)=4\) and \({T}_{s}(1,1)=7\) for \(T(2,0)=4\) and \(T(2,1)=7\) structures, respectively. In these cases, the alternative capsid models have the same relative surface areas, but are predicted to have different numbers and orientations of hexamers and pentamers, with interstitial spaces between these capsomers. These alternative structures (and their duals) correspond to previously unsuspected capsid layouts and offer a unifying framework for the classification of icosahedral virus architectures.

Non-quasi-equivalent architectures in the HK97 lineage

Increasing numbers of capsid architectures are reported with CP numbers and capsid layouts that are incompatible with the geometric blueprints of CK theory. Viruses with capsids formed from a combination of a major and minor capsid protein are examples that are challenging to interpret in the classical CK theory. Here we provide examples from the HK97 lineage, demonstrating that such viruses can be rationalised in the Archimedian lattice framework proposed here.

The Bacillus phage Basilisk, for example, contains 1080 CPs, combining 540 major capsid proteins (MCPs) and 540 minor capsid proteins (mCPs)19. Using the relation \(60\ T\) for CP numbers in CK theory, this would correspond to a \(T\)-number of 18, that is excluded by the geometric restriction in CK theory given by Eq. 1. If one only focuses on the 12 pentamers (more precisely, 11 pentamers and a putative portal) and 80 hexamers, then its structure would be classified as \(T(3,0)=9\)19. However, this ignores the 180 intersticial trimers and misrepresents the relative orientations of the protein clusters as well as the surface area of the capsid (Fig. 3a). By contrast, Basilisk’s CP positions are accurately represented by a \({T}_{t}(3,0)=12\) structure based on the trihexagonal lattice series in the framework of the overarching icosahedral design principle. This classification is also consistent with measurements of Basilisk’s surface area (\(1.69\times 1{0}^{4}\ {{\rm{nm}}}^{2}\), see Methods), that is comparable to the surface area of phage SIO-2 (\(1.70\times 1{0}^{4}\ {{\rm{nm}}}^{2}\)), which is a classical \(T=12\) capsid20. The Basilisk capsid is thus an icosahedral structure of similar size to that of a CK geometry, but exhibits a CP number and capsid layout that are not possible in the CK formalism.

Fig. 3 Viruses within a viral lineage adopting the same icosahedral series. Examples of viruses in the HK97 lineage, demonstrating that different members conform to the same family of icosahedral polyhedra: a Basilisk (\({T}_{t}(3,0)\)), b HSV-1 (\({T}_{t}(4,0)\)), c phage \(\lambda\) (\({T}_{t}(2,1)\)). The building blocks of their polyhedral surface lattices are shown in red (pentagons), blue (hexagons), and green (triangles) superimposed on figures adapted from (a)19, (b)23 and (c)25 Full size image

Basilisk (Fig. 3a) shares its MCP fold with other bacteriophages, archaeal and animal viruses in the HK97-lineage12,21,22. A reevaluation of other virus structures within this lineage reveals that these evolutionarily related viruses share the same underlying icosahedral lattice geometry, i.e., they belong to the same series of polyhedral designs (in this case, the trihexagonal series of \({T}_{t}\)-architectures).

For example, herpes simplex virus type 1 (HSV-1) organises its MCP (VP5) in hexamers and pentamers with orientations reminiscent of those in the Basilisk capsid (Fig. 3b). The positions of these capsomers are consistent with the current classification of HSV-1 as \(T(4,0)=16\). However, this misrepresents the relative orientations of the hexamers and ignores the secondary network of trimeric complexes between the capsomers that are formed from three mCPs (Tr1, Tr2a and Tr2b)23. The classification as a \({T}_{t}(4,0)=64/3\) structure in the new framework (Supplementary Table 2), however, accurately reflects both its 960 MCPs and 960 mCPs. The same holds for human cytomegalovirus (HCMV)24 (structure not shown), which is structurally similar to HSV-1.

The mature capsid of phage \(\lambda\) (Fig. 3c) is another example of a HK97-lineage virus with a trihexagonal icosahedral structure. It is currently classified as \(T(2,1)=7\)12, but the orientation of the capsomers exhibits instead the layout of a \({T}_{t}(2,1)=28/3\) structure, because the protruding domains of the MCPs—rather than additional mCPs—occupy the triangular sublattice. These positions are also the locations of the reinforcement proteins gpD25, highlighting the importance of these trimeric positions in the surface lattice (Fig. 3c). Alternatively, Halorubrum sodomense tailed virus 2 (HSTV-2), another member of the HK97-lineage, has been classified as \(T(2,1)=7\). However, its capsid contains gpD-like trimers that occupy intersticial positions between capsomers, which is consistent with the trihexagonal structure \({T}_{t}(2,1)=28/3\) (see Fig. 8 in Pietilä et al.26). This implies an increase in capsid volume (and, consequently, genome size) by a factor of \({\alpha }_{t}^{3/2}\approx 1.54\) with respect to a classical \(T(2,1)\) capsid. This prediction is consistent with the empirical observation that HSTV-2 has a genome that is ~\(1.4-1.7\) larger than that of \(T=7\) tailed phages26, further corroborating its classification as a \({T}_{t}(2,1)=28/3\) capsid in our framework. Another example is the thermophilic bacteriophage P23-45, which is currently classed as a supersized \(T=7\) capsid architecture27.

In summary, these examples suggest that the classification scheme for virus architecture introduced here highlights structural features shared by evolutionarily related viruses, and thus lends itself as a characteristic of viral lineages.

Alternative capsid layouts with identical stoichiometry

There are many examples of quasiequivalent viral capsids that are formed from the same number of CPs, but exhibit different CP positions and capsomers. CK-theory does not distinguish between them. However, we demonstrate here based on the example of different \(T=3\) geometries, that the Archimedean lattices and their duals—called Laves lattices—provide a means to address this.

In CK theory, hexagonal surface lattices and their duals, corresponding to the triangular lattice (3, 3, 3), are used interchangably. The smallest icosahedral polyhedron derived from a triangular lattice is the icosahedron, made of 20 triangles. The next largest is formed from 60 triangles, and provides a blueprint for a classical \(T=3\) structure. Using the convention of CK theory that polyhedral faces must represent groups of proteins that correspond, by number, to the rotational symmetry of the tile (e.g., triangles representing three proteins etc.), capsid layouts can be associated with polyhedral structures. Pariacoto virus (PAV; Fig. 4a), with its strong interaction between the three chains forming the triangular units, is an example of this type of \({T}^{D}(1,1)\) surface architecture.

Fig. 4 Capsid protein interfaces are constrained by icosahedral geometry. The classification of icosahedral designs distinguishes between capsid layouts of viruses formed from the same number of proteins. Examples of a triangle and rhomb tiling are shown: a Pariacoto virus (\({T}^{D}(1,1)\)); b MS2 (\({T}_{t}^{D}(1,1)\)). Tiles are shown superimposed on figures adapted from the ViPER data base (Pariacoto virus: PDB-id 1f8v64; MS2: PDB-id 2ms265) Full size image

The duals of the other Archimedean lattices (trihexagonal, snub hexagonal, rhombitrihexagonal) present alternative surface architectures to those in CK theory in terms of rhomb, floret, and kite tiles, respectively (cf. Supplementary Table 5). Strictly applying the CK rule that the symmetry of a tile must be correlated with the number of proteins represented by the tile, singles out the dual trihexagonal lattices (\({T}_{t}^{D}\)), i.e. the rhomb tilings with tiles representing clusters of two proteins (CP dimers). Rhomb tilings provide alternative layouts to the CK surface lattices, describing capsids with the same protein stoichiometry but different CP organisation. Bacteriophage MS2 (Fig. 4b), a virus assembled from 90 CP dimers, is an example of a \(T=3\) rhomb tiling (\({T}_{t}^{D}(1,1)\); Supplementary Table 5). Note that whilst the protein stoichiometry in this case coincides with the CK framework, corresponding to the 180 proteins expected for a \(T=3\) structure, the identification as a \({T}_{t}^{D}(1,1)\) geometry provides a more accurate account of CP positions and their relative orientations in the capsid surface.

Non-quasi-equivalent and higher order rhomb tilings

Extending the CK convention to allow rhombs to represent more than two CPs, as long as their positions on the tile respect the symmetry of the tile, higher numbers of proteins are also conceivable geometrically. This could be achieved, for example, by combining two dimers. The protein stoichiometry for such capsids would be \(120\ T(h,k)\), and the first elements of the series would contain 120, 360 and 480 proteins. Picobirnavirus represents an example of the first element of this series (Supplementary Fig. 3a). This virus forms rhombus-like tiles made up of two protein dimers in parallel orientation, and contains 120 proteins in total28. This structure has been traditionally described as a forbidden \(T=2\) number in the CK framework, but it fits naturally into the new framework as a higher order rhomb tiling. The next elements of this series predict the existence of the forbidden numbers \(T=6\) (360 proteins) and \(8\) (480 proteins). Following this pattern, it is logical to think about the possibility of rhombus-like tiles representing three protein dimers, which would also satisfy the required twofold symmetry. The protein stoichiometry for these capsids would be \(180\, T(h,k)\), and the three smallest geometries of this type would contain 180, 540 and 720 proteins. An example of the first element of this series is Zika virus (Supplementary Fig. 3b) in the Flaviviridae family. In particular, each rhomb tile in its capsid represents six elongated proteins (three dimers in parallel respecting the twofold symmetry of the tile), so that the 30 tiles represent 180 proteins in total. In pioneering work in 2002, the Rossmann lab and collaborators realised that the three E monomers in each icosahedral asymmetric unit of Dengue virus29 do not have quasiequivalent symmetric environments in the external, icosahedral scaffold formed from the 90 glycoprotein E dimers. Our approach based on the duals of the Archimedean lattices accommodates such non-quasiequivalent capsid structures.

Our framework thus extends the predictions of quasiequivalence theory by a more detailed understanding of capsid geometry, distinguishing between capsid architectures with different types of capsid protein organisation and interfaces given the same numbers of capsid proteins. This is important for a better understanding of the biophysical properties of viral capsids, such as their stability, and their roles in viral life cycles, e.g. during virion assembly and disassembly, and reveals geometric constraints on viral evolution.