Conventional molecular dynamics (MD) was used to explore possible protein–calcite binding geometries. In order to determine the most likely structure of SCA-1 and SCA-2 in solution, initial structural estimates were obtained from crystal structures. The model structure for SCA-2 was obtained using a homology model (Biasini et al., 2014) based on the published crystal structure (Ruiz-Arellano et al., 2015). Calcite structures were optimized using the Raiteri forcefield (Raiteri et al., 2010), selected due to its parameterization against the free energy of dissolution of calcium carbonate, which provides reliable descriptions of the structuring and interactions between water and the surface. The resultant structures possessed the known density of calcite, 2.71 g/mL, and their exposed surfaces maintained the (10.4) structure. Since the calcite blocks were of sufficient thickness, it was not necessary to freeze the atoms in the middle of the calcite block. Charges for SCA-1 and SCA-2 were calculated using Mulliken charges obtained from the AMBER Antechamber program (Wang et al., 2004). The overall charges for SCA-1 and SCA-2 were determined to be −11 and −10 respectively using the Avogadro code (Hanwell et al., 2012) at pH 8.0 to 10, the point of zero charge for calcite. Cross terms for the interaction of the atoms within the protein and peptide molecules were calculated using Lorentz-Berthelot mixing rules (Lorentz, 1881; Berthelot, 1898); cross terms for interactions between the protein and peptide molecules and the calcium carbonate surface were taken from the Raiteri potential (Raiteri et al., 2010) or otherwise fitted via scaling methods based on the charge differences of the ions as described previously in the literature (Freeman et al., 2007; Schröder et al., 1992). A mean cut-off of 12 Å was used for van der Waals interactions. All calculations were performed using the DL POLY Classic program (Todorov et al., 2006).

The surface unit cell for the (10.4) surface was defined by two perpendicular vectors of length 83.544 Å, which gives a total surface area of 6979 Å2, sufficient to accommodate the proteins or the peptide sequences. A block of 10 layers of calcite was used (giving a thickness of 32 Å). The water layer was a depth of 82 Å (about 30,000 water molecules), which is sufficient for an adequate coverage of the biomolecules.

Since performing full adsorption studies of these proteins in every possible binding configuration was not computationally tractable, it was necessary to perform an initial study of the relative binding energies of the proteins placed at the surface in various configurations. This was accomplished by performing Euler transformations on the proteins, rotating them in 5 degree increments over the angles alpha, beta, and gamma from 0 to 180 degrees. The reoriented protein was then placed at the calcite surface with 30,000 TIP3P water molecules, providing a water density of 0.99 g/mL, and subjected to an energy minimization routine in DL POLY Classic with the calcite block held rigid. Five calcium counterions were placed in each solution with one additional sodium counterion needed for SCA-1.

This enabled us to choose low energy configurations for more detailed investigation. Binding energy calculations were performed using the following set of simulations upon each: protein/amino acid sequence in aqueous solution, protein/amino acid sequence starting in an unbound position above the surface and adsorbing onto the surface and a simulation of the protein/amino acid sequence in aqueous solution bound to the surface. If compared to simulations of the calcite-water interface and pure water itself, it is possible to obtain information based solely upon the interaction of these proteins/amino acid sequences and the surfaces. The following equations were used, employing configurational energies in all cases:

(1) E ( a q u e o u s ) = E ( m o l e c u l e ; a q u e o u s ) − E ( w a t e r ) − E ( m o l e c u l e s )

(2) E ( m o l e c u l e ; b i n d i n g ) = E ( m o l e c u l e ; h y d r a t e d s u r f a c e ) − E ( h y d r a t e d s u r f a c e ) − E ( m o l e c u l e ; a q u e o u s ) + E ( w a t e r )

In Equations 1 and 2 the meaning of the symbols is as follows: E ( a q u e o u s ) is the solvation energy of the protein/amino acid sequence in water; E ( m o l e c u l e ; a q u e o u s ) is the total energy of the molecule in water; E ( w a t e r ) is the energy of a box of water containing the same number of water molecules as the calculation for E ( m o l e c u l e ; a q u e o u s ) to ensure Equation (2) is balanced; and E ( m o l e c u l e ) is the energy of the protein/amino acid sequence molecule in vacuum; E ( m o l e c u l e ; b i n d i n g ) is the energy of binding of the molecule to the surface; E ( m o l e c u l e ; h y d r a t e d s u r f a c e ) is the total energy of the molecule on the surface in the presence of water; E ( h y d r a t e d s u r f a c e ) is the energy of the hydrated surface with the same number of calcium carbonate units as the previous calculation.

Additionally we have estimated the entropic contribution to the molecular binding arising from the displacement of water molecules from the surface to bulk water. From our previous work (Freeman and Harding, 2014) we expect this to be the dominant contribution to the entropy of binding for molecules of this kind. Our results show that the number of water molecules displaced is in the range 20.2–23.1 for the proteins and 6.6–8.3 for the amino acid sequences. This corresponds to an entropic contribution of 36.4–41.6 kJ mol−1 for the proteins and 11.8–14.9 kJ mol−1 for the amino acid sequences at 300 K, sufficiently small to justify the use of configurational energies in this work to estimate the strength of binding. It should also be noted that a smaller, negative contribution to the entropy is expected from the reduced freedom of motion of the protein/amino acid sequences at the calcite surfaces, which is considered to be constant between all binding profiles and expected to be smaller than that of many water molecules.

Simulations of these protein/amino acid sequences in water were performed by studying each of the systems for 10 ps in steps of 10 K from 10 K to 330 K. The lowest energy configurations of these proteins/amino acid sequences in aqueous solution were obtained from averages of 3 separate simulations over 3 ns of simulation time post equilibration at a temperature slightly above that used for other experiments, 330 K, which was selected as an efficient method to explore configurations at energies near the lowest energy configuration without having to resort to more expensive methods such as replica exchange or metadynamics. These structures were utilized for further calculations on surfaces, and for calculations in aqueous solution at 300 K. It was found that at this given temperature these protein/amino acid sequences would adopt a low energy conformation within approximately 500 ps and would not vary in energy by more than 5% of the total energy during the remainder of the simulation. 3 ns of simulation time at 300 K with a 1 fs timestep was sufficient to provide acceptable statistics for analysis. The simulations were performed at a constant volume (NVT ensemble) employing a Nose-Hoover thermostat with a relaxation time of 1 ps. For the systems containing surfaces, the lowest energy configuration of each protein or peptide sequence is placed such that the nearest atom to the surface is within 3 Å of the surface to ensure absorption to the surface within a reasonable timeframe (accounting for the slow kinetics of molecules diffusing through the adsorbed water layers) and again simulated for 3 ns. Molecules were placed at this distance due to the large barriers to crossing the organized water layers at the surface.

Water residence times were calculated and averaged within 1.1 Å regions surrounding atoms comprising each individual amino acid within the sequences identified as being most likely to be present at the surface during protein binding to calcite. Water residence times were calculated by fitting survival probability functions as defined below:

(3) P α ( t ) = ∑ j = 1 N w 1 N − m + 1 ∑ n = 1 m p α j ( t 0 , t 0 + t / , δ t )