However, the differences between surface and bulk water have yet to fully established, because it is difficult to analyze the transition between the two scenarios using conventional simulation methods such as trajectory analysis and snapshot structural observations. Thus, this type of analysis remains one of the simplest but most important issues related to the study of water on material surfaces. In the present work, we employed a data science approach based on a persistent homology (PH) analysis 14 , 15 ) combined with molecular dynamics (MD) simulations. In contrast to the previous application of PH to a solid (i.e. silica) by Hiraoka et al., this study represents the first-ever use of PH to assess a liquid (i.e. water). 16 , 17 )

Water is a very familiar substance and is necessary for life as we know it, and so both bulk water and water on surfaces or confined in nanospaces have been studied in detail. Surface water is especially important because it can greatly affect the physical and chemical properties of material surfaces, such as wettability and friction. 1 ) Graphite surfaces have been widely used to research such effects because graphite represents a typical hydrophobic surface. 2 ) Surface water on carbon nanomaterials such as graphene 3 ) and carbon nanotubes (CNTs) 4 ) has also attracted much attention. Because these carbon nanomaterials have non-polar, atomically flat surfaces, they are ideal for the study of the microscopic structure of surface water. Recent theoretical and experimental investigations showed that both graphene and CNT surfaces can be covered with several layers of water molecules even though these surfaces are hydrophobi. 5 – 7 ) This surface water can be directly observed using high-speed frequency modulation atomic force microscopy (FM-AFM). 8 , 9 ) In addition, recent theoretical studies based on first-principles calculations have demonstrated the existence of several stable types of polygonal water clusters in bulk water based on hydrogen bonding. 10 – 12 ) In the case of surface water on graphene, several planar water clusters were also identified in work by Maekawa et al., 13 ) who established that the probability of the formation of such clusters on surfaces is different from that in bulk water.

During the statistical analysis, the two-dimensional hydrogen bond network of the first layer on graphene was examined using a polygon extraction algorithm, whereas a three-dimensional structural analysis was carried out during the PH analysis (see appendix A for details regarding the polygon extraction algorithm). PH is a method for the characterization of data structures including graphics and images in a multi-scale manner, and represents a type of topological data analysis. This method, which was first proposed by Edelsbrunner and Haere, 14 ) has recently being applied in a wide range of fields, and the details of PH have been described in a number of publications. 22 – 26 ) In brief, this technique is based on finding features in data by searching for so-called holes, such as rings and cavities. Together, the times at which a hole is born and dies comprise a persistence pair, and a plot of the birth and death times on the horizontal and vertical axes is termed a persistence diagram (PD). A PD plot in which the line is further from the diagonal suggests longer hole survival times. In the field of materials science, PD has been applied by regarding atom positions (specifically oxygen positions in the present work) as a collection of points in space. 16 , 17 ) Herein, we applied this technique to the structural analysis of water molecules for the first time ever. In order to simplify the computational treatment, a periodic boundary was not used in our MD simulations. CGAL, 27 ) Dionysus 28 ) and Diode 29 ) were used in the preparation of the PD data. As in our previous work, 13 ) we used a cell with x = 2.13 nm, y = 2.214 nm and z = 4.15 nm, in which 60, 100 or 160 water molecules were distributed. Since 60, 100 and 160 molecules formed what were essentially single, double and triple layers of water molecules, these are referred to as the 1L, 2L and 3L models, respectively. Similar calculations and analyses were also performed for a model in which 400, 250 and 150 water molecules were dispersed in a cell having dimensions of (x, y, z) = (3.409, 3.444, 6.134) nm. The results of such calculations were found not to be affected by the MD cell size, and so the analytical results for the 60, 100 and 160 molecular systems are shown in the following sections.

The structure of water on graphene, on which the present statistical analysis was based, was constructed using classical MD simulations. The MS ForcitePlus 18 ) module in the Materials Studio 8.0 software package was employed for this purpose, with force fields calculated using COMPASS II. 19 , 20 ) The cutoff for short-range interactions in these simulations was 1.0 nm while long-range interactions were calculated using the Ewald method. 21 ) The optimized structure of water molecules generated by the COMPASS II package included an O–H bond length of 0.11 nm and an O–H–O bond angle of 109.4°, and this molecular structure was fixed for use in the present MD simulations in conjunction with a time step of 1.0 fs. The MD temperature was controlled by a Nose thermostat and calculations were carried out at both room temperature (RT) and low temperature (LT). Since it is known that the MD temperature of T MD = 250 K corresponds to the actual temperature of 300 K based on the evaluation of the diffusion coefficient 13 ) in the COMPASS II force field, this value was used for RT simulation. On the other hand, a value of T MD = 100 K was used for LT simulation. The same initial configuration was used for simulations at both LT and RT. The simulations were performed over a time span of 100 ns at each temperature and the structure obtained from the last 10 000 steps was used for statistical analysis.

3.1. Density distribution of water molecules perpendicular to graphene surface

The structures of the 1L, 2L and 3L models and the distribution functions for oxygen atoms in the z-axis direction (perpendicular to the graphene surface) obtained from MD calculations are shown in Fig. 1. From these data, it is evident that water layer structures were formed at both RT and LT, particularly for the 1L and 2L models. In contrast, in the case of the 3L model, the formation of the first layer on the graphene surface is clearly visible but the second layer has a lower peak than the first while the third layer is broader and has no distinct peak structure. At RT, The second and third peaks connect smoothly. It should also be noted that subpeaks (at 0.5 and 0.8 nm) can be seen slightly below the main peaks (at 0.6 and 1 nm), indicating the appearance of second and third layers at LT. There has, to date, been little discussion of this complex peak structure, but its origin will be explored later (Sect. 3.5) in this paper. Fig. 1. (Color online) Molecular structures and oxygen distributions along z-axis of (a) 1L model (60 molecules), (b) 2L model (100 molecules) and (c) 3L model (160 molecules). The backgrounds of each figure are the structures from the final MD step at LT. Red, white and gray represent oxygen, hydrogen and carbon, respectively. Graphene, shown in gray, overlaps the horizontal axis. Download figure: Standard image High-resolution image Export PowerPoint slide

3.2. Inner-layer hydrogen bonding network on graphene surface

Here, we discuss the analysis of the in-plane hydrogen bonding network structure of the first layer on the graphene surface. The network structure in the planes of 1L and 2L models have been discussed in detail by Maekawa et al.13) and the results of the present work are in good agreement with these prior data. Figure 2 shows the number of polygons (from triangles to octagons) in the hydrogen bonding networks in the first layers on the graphene for a total of 10 000 steps sampled from MD production runs. Using the 1L and 2L models, tetragons appeared most frequently at both RT and LT and there is a characteristic increased number of appearance of tetragons at LT in the 1L model. In the 1L model, the formation of tetragons is the most efficient and stable means of covering the surface because all OH groups can interact via intra-layer hydrogen bonds. Other polygons, such as pentagons and hexagons, are less efficient. In the case that the water molecules in the layer adopt these other shapes, some OH groups must orient themselves vertically relative to the graphene surface, and so cannot connect with other water molecules by hydrogen bonding. Interestingly, the results obtained from the 3L model show the same trends as in the 1L and 2L models at RT, but the number of pentagons is highest at LT, which differs from the other scenarios. Fig. 2. (Color online) Distributions of cluster structures in 10 000 snapshot structures at 100 K for (a) 1L, (b) 2L and (c) 3L models, and at room temperature for (d) 1L, (e) 2L and (f) 3L models. Download figure: Standard image High-resolution image Export PowerPoint slide

3.3. Analyses of PH

The changes in the in-plane network structure of the first water layer on the graphene can be clearly seen from the results of the PD analysis for each model in Fig. 3. In these plots, the units for both axes are angstroms so that the data are easier to compare to the lengths of hydrogen bonds. At a distance of approximately 1.4 Å above the birth axis, the dark region is seen to move upwards (that is, in the direction in which the death time becomes longer) as the model transitions from 1L to 3L. If we consider a regular polygon for which the length of one side equals the distance between hydrogen bonds, the birth value is approximately 1.4 Å while the death values are approximately 2.0, 2.4 and 2.8 Å for a regular square, pentagon and hexagon, respectively, and the polygon moves upward along the same birth line (appendix B). Thus, it is evident that the change in the thick portion of the PD plot at which the birth value is 1.4 Å reflects the change in the in-plane cluster structure of the water in the first layer on the graphene. This result is also qualitatively consistent with the change in the proportion of polygons in the cluster structure formed by the water in the first layer on the graphene surface, as seen in Fig. 2. Fig. 3. (Color online) Persistence diagrams at 100 K for (a) 1L, (b) 2L and (c) 3L models, and at room temperature for (d) 1L, (e) 2L and (f) 3L models. Download figure: Standard image High-resolution image Export PowerPoint slide

3.4. Novel three-dimensional water structure discovered by PH

Additional features can be obtained from the PD analysis, including a structure that appears in the vicinity of birth and death values of 2.2 Å and is prominent at LT. In the 1L model, there is no persistence pair near this point. Rather, this feature begins to appear in the 2L model and is not only more prominent in the 3L model but has a definite pinnate structure. When the origin of these persistence pairs was investigated by considering their relationship with the hydrogen bond distance, it was found that they were caused by a distorted triangle with two oxygen atoms at its apex and a water molecular at the center of gravity of a tetrahedron. The data also showed that the pinnate part represents a persistence pair derived from a polygon formed by the oxygen atoms located at the center of gravity of the tetrahedron and at its second nearest neighbors [Fig. 4(a)]. Considering the efficient formation of the hydrogen bonding network, in the case of the 2L model, energetic stability is increased by closing the hydrogen bonding network through forming polygonal clusters in and between planes. That is, in the 2L model, tetrahedra are formed in a limited manner under the condition that the direction of water molecules is restricted at the graphene-water and water-vacuum interfaces. Conversely, in the 3L model, since the third layer exists, more tetrahedra can be formed by increasing the degrees of freedom related to the direction of movement of water molecules in the intermediate layer. Therefore, the difference in the total number of tetrahedral structures formed suggests that no pinnate peak should appear in the 2L model at the same LT. At RT, this peak is buried in a broad distribution, likely because of the frequent rearrangement of the hydrogen bonding network due to thermal fluctuations and the presence of water molecules in various situations, including tetrahedral structures. The PD differences between the three models at RT appear small. However, an analysis of the tetrahedral order30,31) shows an increase in tetrahedral structures in the 3L model compared to the other two models. Therefore, the behavior at LT evidently demonstrates that the formation of tetrahedral structures is more pronounced at RT. See appendix C for details. Fig. 4. (Color online) (a) The anisotropic tetrahedral structure formed by surface-affected water. Red, blue and green denote the tetrahedral centroid, the tetrahedral apex and the water in the second nearest neighbor of the tetrahedral centroid, respectively. (b) The relation among subpeak values of both 0.5 and 0.8 nm and anisotropic tetrahedral structures. The centroid oxygen of the tetrahedral structures shown in blue and deep green is considered to be the origin of the first and second subpeaks, respectively. Download figure: Standard image High-resolution image Export PowerPoint slide

3.5. Crossover between surface and free water