As chemistry and materials synthesis is starting to embrace an era of automation and the use of machine learning, it is becoming vital that the quality and reliability of that data is assessed. By automating and parallelizing batch chemical reactions, enough samples may be run that statistical data can be obtained on the reaction system. We monitored the crystallization for hundreds of parallel reactions using a webcam and found that crystal features in the images obtained could be used to generate true random numbers. We also found that the approximate entropy of these numbers was different for different types of chemical reaction, and that the encryption capability of these numbers was greater than a commonly used pseudorandom number generator. This is the first time that stochasticity of chemistry has been investigated in large datasets from experimental data.

Chemistry inherently involves a wide range of stochastic processes, yet chemists do not typically explore stochastic processes at the macroscale due to the difficulty in gathering data. We wondered whether it was possible to explore such processes, in this case crystallization, in a systematic way using an autonomous robotic platform. By performing inorganic reactions in an automated system, and observing the resultant occupied macrostate (crystallization images), we developed a powerful entropy source for generation of true random numbers. Randomness was confirmed using tests described by the National Institute for Standards and Technology (p uniformity ≫ 0.0001). Deficit from maximum approximate entropy was found be different between compounds (p ANOVA ≪ 0.01), and encryption security of these numbers was found to be greater than that of a frequently used pseudorandom number generator. This means that we can now use random number generation as a probe of the stochastic process, as well as explore potential real-world applications.

In this regard, we hypothesized that one such system with a large entropy pool is that of compound formation and subsequent crystallization, where the detectable ensemble macrostates considered are the locations and morphologies of each crystal that has grown in a period of time as a result of these processes. To explore this idea, we set out to develop a fully automated system to not only do the chemical reactions but also grow crystals of the products using a camera as a detector. The platform was designed so that it converts these data into binary sequences, as shown schematically in Figure 1 , which are assessed for randomness using the methods specified in National Institute for Standards and Technology (NIST) special publication 800-22a.We find that the numbers generated in this way are random, demonstrating the possibility to investigate and use crystallization as an entropy pool for random numbers, and we show that this is possible by encrypting a word and validating the difficulty in breaking the code.

In a chemical system, each time a reaction is performed there is an almost infinite number of energetically equivalent ways for particular reagents to combine, resulting in both high uncertainty and entropy, and the exact pathway undertaken will never be repeated. The overall outcome of such a reaction is therefore an example of one specific state out of an almost infinite number of possible macrostates.In chemistry, the energies of different configurations of molecules under thermodynamic control can be considered as part of a canonical ensemble, in which an almost infinite number of energetically degenerate states can be accessed, and of which only one macrostate will be occupied during a particular observation.As such, the entropy of such a chemical system is extraordinarily high,and may therefore serve as a very good entropy pool for application of random number generation.

Random numbers are used extensively in many applications, such as cryptographyand scientific modeling,where their non-deterministic properties and unpredictability are essential. The source of randomness (i.e., the entropy source) is a crucial factor for whether correlations between generated numbers are present, and this has a great impact on the randomness and utility of the output.As such, random number generation not only can help us understand the reproducibility of a process, but also are more desirable than computational methods of number generation (e.g., pseudorandom number generators). This is because they extract their randomness from a physical system with a large available pool of entropy.Importantly, the generation of random outcomes is of profound importance as a source of noise, and potentially allows the generation of unanticipated data.

Recently the reproducibility and bias in chemical reaction data have been discussed.This is because machine learning in chemical and materials synthesis will require large amounts of reliable data, but also so that algorithms can be validated.The recent development of automated platforms for chemistry are not only transformative for chemical synthesisand discovery,but also for exploration of data reliability. Another important aspect of exploring the extent to which experimental data are reproducible is the fact that many processes are intrinsically stochastic. Such stochasticity can also be useful, such as in the generation of random numbers.

Results

16 Gutierrez J.M.P.

Hinkley T.

Taylor J.W.

Yanev K.

Cronin L. Evolution of oil droplets in a chemorobotic platform. 17 Jones R.

Haufe P.

Sells E.

Iravani P.

Olliver V.

Palmer C.

Bowyer A. RepRap—the replicating rapid prototyper. Figure 2 Setup of the Robotic System Show full caption Photograph showing the crystallization array inside the CNC framework and its relative position to the input stock solutions, pumps, camera, and controlling computer. A robotic platform ( Figure 2 ) was designed to generate images of fresh crystallizations for random number generation from chemistry, based on a Computer Numerical Control (CNC) machine. Using rapid prototyping techniques described previously,an additional set of motorized linear axes were attached to the underside of the device to support a camera on a mobile gantry. The mobility in the main CNC framework and auxiliary framework were controlled using technology originally designed for the open source “RepRap” 3D printer.Reagent stock solutions were located adjacent to the platform, and could be transferred to vials in the crystallization array using a combination of tubing and pumps.

18 Abdulla W. Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. GitHub repository. , 19 He K.

Gkioxari G.

Dollar P.

Girshick R. Mask R-CNN. Additional 3D-printed components were incorporated to direct the reagent outlets, fix the positions of the vials in a 10 × 10 array, and support the camera ( Figures S1–S8 ). Experiments consisted of pre-set volumes of stock solutions being pumped into each 14-mL vial in a vial array, sequentially. The subsequent growth of crystals in each of the vials was recorded by a mobile camera at regular 10-min intervals at a resolution of 1,280 × 800 pixels. Image analysis using an object-detection and image-segmentation algorithm (Mask R-CNN)was employed to locate the crystals in the vial. The full methodology for platform construction and operation is described in Supplemental Experimental Procedures , with supporting software found in the repository linked below.

20 Duros V.

Grizou J.

Xuan W.M.

Hosni Z.

Long D.-L.

Miras H.N.

Cronin L. Human versus robot in the discovery and crystallization of gigantic polyoxometalates. 14 Krivovichev S.V. Which inorganic structures are the most complex?. 4 ·5H 2 O; the synthesis and crystallization of the polyoxometalate salt (C 2 H 8 N) 8 Na 3 [W 19 Mn 2 O 61 Cl(SeO 3 ) 2 (H 2 O) 2 ]Cl 2 ·6H 2 O, 21 Symes M.D.

Kitson P.J.

Yan J.

Richmond C.J.

Cooper G.J.T.

Bowman R.W.

Vilbrandt T.

Cronin L. Integrated 3D-printed reactionware for chemical synthesis and analysis. 19 }; and the synthesis and crystallization of the coordination cluster [Co 4 (2-pyridinemethanol) 4 (MeOH) 4 Cl 4 ], 22 Yang E.-C.

Hendrickson D.N.

Wernsdorfer W.

Nakano M.

Zakharov L.N.

Sommer R.D.

Rheingold A.L.

Ledezma-Gairaud M.

Christou G. Cobalt single-molecule magnet. 4 }. Images of these crystallizations at different times are shown in Figure 3 Chemical Schemes for Process Investigation Show full caption (A) Chemical schemes for process investigation. CuSO 4 requires the stochastic process of crystallization alone, whereas {W 19 } and {Co 4 } require cluster formation in addition, and {Co 4 } requires the further step of ligand attachment. (B) Reactions to form crystals of CuSO 4 , W 19 , and Co 4 . Top: initial reaction solutions (time = 0 min). middle: partially complete crystallization (time = 40 min). Bottom: crystallizations at the end of the experiment (time = 150 min). Chemical inputs were chosen primarily such that they would produce macroscopically observable crystals in a time scale of minutes to hours without the formation of precipitate.The location, size, shape, orientation, and color of crystal formation within the vial were detected and taken as the entropy source for this system (see Supplemental Information for details). We performed reactions that involved different chemical processes, namely (1) crystallization alone, (2) cluster formation, and (3) ligand attachment to cluster, and hypothesized that by increasing the number of chemical processes prior to observation of crystallization a larger number of microstates would be accessible, increasing the system's entropy, and therefore increase the randomness of the number generated ( Figure 3 A).The three investigated reactions were: recrystallization of the inorganic salt CuSO·5HO; the synthesis and crystallization of the polyoxometalate salt (CN)Na[WMnCl(SeO(HO)]Cl·6HO,hereafter referred to as {W}; and the synthesis and crystallization of the coordination cluster [Co(2-pyridinemethanol)(MeOH)Cl],hereafter referred to as {Co}. Images of these crystallizations at different times are shown in Figure 3 B and their chemical structures are shown in Figures S9–S11 . These reactions involve the chemical processes of (1), (1 and 2), and (1, 2, and 3), respectively. The synthetic procedure files are included in the online repository and were used to perform the experiments in a fully automated manner. Compound confirmation was obtained by performing single-crystal X-ray diffraction.

Images of each vial were obtained at 10-min intervals for each reaction. The crystals in each image were isolated using an object-detection and image-segmentation algorithm (Mask R-CNN) and their locations within the vial were determined using computer vision techniques (see Figures S12 and S13 for details). A raw binary sequence was then generated by assigning 5 bytes per crystal pixel based on size, orientation, and color, with crystals ordered from top left to bottom right in the vial region (see Supplemental Information for details). These bytes were concatenated with those of adjacent crystal pixels and of subsequent crystals to generate a long binary sequence. This sequence was then split into sections of 64 bytes, and the sha512 algorithm was applied to each of these sections. The resulting binary sequences were concatenated to form a larger-output binary sequence.

15 Rukhin A.

Soto J.

Nechvatal J.

Smid M.

Barker E.

Leigh S.

Levenson M.

Vangel M.

Banks D.

Heckert A.

et al. A statistical test suite for the validation of random number generators and pseudo random number generators for cryptographic applications. 4 } is shown in 19 } and {Co 4 }, along with figures showing bar charts of p value uniformity and pass rates, are presented in Figure 4 Results of NIST Testing for a Sequence Generated by {CuSO 4 } at a Time of 2 h Show full caption The histogram consists of p values obtained by running the first-level testing on a single sequence of binary integers divided into 200 blocks of length 1,120,378. The output binary sequences were evaluated for randomness using the tests for randomness recommended by NIST and published in NIST special publication 800-22a.We assessed the randomness of each experiment in different reaction vials at the same time. The p value histogram of each test for {CuSO} is shown in Figure 4 and confirm that each in each case the numbers generated were random, due to the pass rates and uniformity of p values. Similar results for {W} and {Co}, along with figures showing bar charts of p value uniformity and pass rates, are presented in Figures S14–S16 . It is worth noting that assessment of only one feature produced strings that were too short to be reliably assessed using the entire NIST package, and as such randomness of features individually were not assessed.

23 Pincus S.

Singer B.H. Randomness and degrees of irregularity. m ) for independent reactions in an experiment for different values of m for one compound, and comparing these against DEF m values from other compound reactions at an equal time index. The results from a one-way ANOVA test show that there are statistically significant differences between the three samples for m = 1,2,3 (F values = 20.2, 31.8, 75.7; p values ≪ 0.001), and results from application of Dunn's test indicate that each of the compounds are different from each other. Box plots showing this information for DEF 3 for each reaction are shown in Figure 5 Deficit from Total Maximum Approximate Entropy for Reactions with CuSO 4 , W 19 , and Co 4 across 30 Different Reactions in an Experiment Show full caption Results of the one-way ANOVA show that the samples are statistically different, while Dunn's test shows that all compounds are different from each other with order of increasing DEF 3 being {W 19 }, > CuSO 4 , {Co 4 }. We also wanted to assess the entropy content of each random sequence. To do so, we calculated the deficit from maximum entropy for each sequence as described in Pincus and Singer.Results were obtained by calculating deficit from maximum approximate entropy (DEF) for independent reactions in an experiment for different values of m for one compound, and comparing these against DEFvalues from other compound reactions at an equal time index. The results from a one-way ANOVA test show that there are statistically significant differences between the three samples for m = 1,2,3 (F values = 20.2, 31.8, 75.7; p values ≪ 0.001), and results from application of Dunn's test indicate that each of the compounds are different from each other. Box plots showing this information for DEFfor each reaction are shown in Figure 5 , while other values of m are shown in Figure S17

24 Matsumoto M.

Nishimura T. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. 25 Hensen B.

Bernien H.

Dréau A.E.

Reiserer A.

Kalb N.

Blok M.S.

Ruitenberg J.

Vermeulen R.F.L.

Schouten R.N.

Abellán C.

et al. Loophole-free Bell inequality violation using electron spins separated by 1.3 kilometres. Finally, since random numbers are commonly used to encrypt data, we considered the encryption capability of this random number generator versus that of a frequently used pseudorandom number generator, the Mersenne Twister (MT).Since the MT output is determined based on its internal state, knowledge of this state allows accurate prediction of future output. However, this is not possible in the case of the true random number generator, whose internal state is either (seemingly) non-existentor unknowable due to the amount of uncertainty/entropy of the physical process.