The Real Bioinformatics Revolution

Proteins and Nucleic Acids Singing to One Another?

How a daring hypothesis may rescue bioinformatics from masses of undecipherable and hence useless genome sequences and turn biology and medicine upside down Dr. Mae-Wan Ho

Excerpt from Water Carnival - the images of organisms discovered in Mae-Wan Ho's laboratory within a quantum jazz soundscape. Download the full video from the online store.

Scientists who asked and answered the right question

The real bioinformatics revolution may be here, not the one hyped in mainstream science journals, nor the ‘systems biology' supposed to make intellectual meat out of genome sequences ‘blasted' in and out of databases with little success so far [1] ( No System in Systems Biology , SiS 21). It is something else, and the scientists at the heart of it started by asking a question that was unthinkable except to a very few: Is it possible that molecules recognize and find each other by singing the same note(s), or flashing the same colour(s)?

Irena Cosic, Professor of Biomedical Engineering, RMIT University, Melbourne, Australia, gained her Masters and PhD., and worked in Belgrade, Serbia, before emigrating to Australia in 2002. While still in Belgrade, she began as a graduate student of Dr. Veljko Veljkovic who now heads the Center for Multidisciplinary Research and Engineering, Institute of Nuclear Sciences Vinca, and has recently co-authored a report, Unraveling AIDS [2] with several of us in ISIS.

Veljkovic's research team had discovered in the 1970s a method for predicting which of the hundreds of new chemicals made by the rapidly expanding chemical industry were carcinogenic, by calculating certain electronic properties of the molecules [3]. This method was soon found equally applicable to predicting organic chemicals that were mutagenic, or toxic, and even those that were antibiotic, or cytostatic (anticancer) [4]. Veljkovic's institute in Belgrade has since teamed up with other European laboratories to apply the same method to drug discovery, especially against AIDS disease [5] ( DeskTop Drug Discovery , this series).

Veljkovic and Cosic essentially asked a fundamental question in biology: what is it that enabled the tens of thousands of different kinds of molecules in the organism to recognize their specific targets [6, 7]? Living processes depend on selective interactions between particular molecules, and that is so from basic metabolism to the subtlest nuances of emotion [8], all part of the Quantum Jazz of life ( SiS 32) [9].

Try finding a friend in a very big very crowded ballroom in the dark

The conventional picture of a cell even now is that of a bag of molecules dissolved in water. And through bumping into one another by chance - random collisions - those molecules that have complementary shapes lock onto to each other so the appropriate biochemical reactions can take place.

This ‘lock and key' model has been refined to a more flexible (and realistic) ‘induced fit' hypothesis that allows each molecule to change shape slightly to fit the other better after they get in touch, but the main idea remains the same. It is supposed to explain how enzymes can recognize their respective substrates, how antibodies in the immune system can grab onto specific foreign invaders and disarm them. By extension, that's how proteins can ‘dock' with different partner proteins, or latch onto specific nucleic acids to control gene expression, or assemble into ribosomes for translating proteins, or other multi-molecular complexes that modify the genetic messages in various ways.

A conservative estimate of 10 000 molecular species will involve 10 000 x 10 000 = 10 8 pair-wise interactions. The problem is that even if the molecules have very intricate complementary shapes, it would be well nigh impossible for the right molecules to find each other in such a crowded environment as the cell, where there will only be a few copies of each kind of molecules. Imagine trying to find a friend in a very big and very crowded ballroom in the dark.

According to the conventional account, genetic information that determines the characteristics of organisms are encoded in the linear sequence of four bases in the DNA of its genome, some segments of which are transcribed and translated into proteins. Proteins are linear sequences of 20 different amino acids that, in turn, determine how they fold up into three-dimensional structures that carry out all the living activities. Beyond these bald assumptions, however, it is not at all clear how ‘genetic information' actually translates into biological function. Besides, this conventional genetic determinist account has been thoroughly discredited even before the genome project was proposed [10] ( Living with the Fluid Genome ), though it still dominates much of mainstream biology.

Typically, people look for sequence similarity (homology) in DNA or protein as indicative of a common function, but that has proved wrong all too often. Just as genes and proteins with similar sequences can have very different functions, those with markedly different sequences can end up having the same function.

Everyone accepts that the conformation (the folded three-dimensional secondary and tertiary structures) of nucleic acids and proteins are more important for determining function, but there is no easy prediction of conformation from the linear sequences of bases or amino acids. It takes an enormous amount of computer time to simulate even a very simple protein folding into shape, and water molecules are now known to play a very important role [11] ( Water Smoothing Protein Relationships , SiS 28). Predicting how one conformation of protein or nucleic acid can recognize another seems a thoroughly intractable problem.

The conventional account is also too mechanical, and at odds with the fuzzy picture of atoms and molecules as ‘clouds' of probability density in quantum theory [8, 11] ( The Rainbow and the Worm - The Physics of Organisms 2nd Edition ). Also, macromolecules are very mobile and flexible, the binding/recognition sites especially so, and hence do not fit well with the static lock and key recognition model.

Molecular resonant recognition

Veljkovic and Cosic proposed that molecular interactions are electrical in nature, and they take place over distances that are large compared with the size of molecules [6, 7]. Cosic later introduced the idea of dynamic electromagnetic field interactions, that molecules recognize their particular targets and vice versa by electromagnetic resonance [6]. In other words, the molecules send out specific frequencies of electromagnetic waves which not only enable them to ‘see' and ‘hear' each other, as both photon and phonon modes exist for electromagnetic waves, but also to influence each other at a distance and become ineluctably drawn to each other if vibrating out of phase (in a complementary way).

Molecular resonance is a relatively well-understood phenomenon in chemistry, but it also has analogy in the macroscopic world. A piano tuner strikes a tuning fork next to the piano, and the particular piano string, when correctly tuned to the same frequency will start to sing back to the vibrating tuning fork. Energy is from the tuning fork to the piano string and vice versa , and in that way, the vibration lasts much longer than if they were not resonating to the same frequency.

Another advantage of molecular resonance is that it is extremely selective, to less than 0.01percent of the resonant frequency. That and the wide range of the electromagnetic spectrum makes molecular resonance the mechanism of choice for specific interactions, as already pointed out by physiologist Colin McClare in the early 1970s [12, 13], and had also postulated resonant interactions in at least some biochemical reactions.

Quite independently, solid state physicist Herbert Fröhlich had proposed at around the same time that cells and organisms were more like solid state systems, packed full of dielectric molecules, and that metabolic energy could ‘pump' the system into coherently excited states, with coherent vibrations extending over a whole range of frequencies from a highly polarised state (frequency zero) to the microwave range and beyond [14, 15]. I have compared this state of coherent excitation to a laser vibrating in many frequencies, or a receiver tuned to absorb in many frequencies [11].

Testing the idea

To test the idea, Veljkovic and Cosic converted the amino acid sequence into a sequence of electron-ion interaction potential (EIIP) [6], the same as those calculated earlier by Veljkovic and other colleagues for the small organic molecules [16],

EIIP = 0.25 Z*sin (1.04 p Z* )/2 p

Z* is the average quasivalence number (AQVN), obtained by summing up the number of valence electrons of all the components of each amino acid and divided by the number of atoms in the amino acid,

Z* = å m n i Z i / N

The EIIP represents the energy in Rydbergs (one Rydberg is approximately 13.5 eV) of mobile electrons along the protein. In the case of DNA, the four different bases along the sequence were converted to the EIIP of the bases. The reasoning is that charges moving through the (excited) protein or nucleic acid backbone, will produce electromagnetic radiation and absorption at special frequencies corresponding to the electronic energy distribution along the chain.

The next step was to look for periodicities in the distribution of free electronic energies. Cosic applied a standard signal processing mathematical tool, the Fourier Transform. A Fourier Transform enables electronic engineers to determine the frequency components in a signal. The result is a Fourier spectrum of many frequency peaks, the higher the amplitude of the peak, the more that frequency component contributes to the signal. The ‘frequencies' in this case are related to the one-dimensional string. A good analogy is a vibrating guitar string fixed at both ends, and the wavelengths (reciprocal of frequencies) will be in fractions of the total string length.

The procedure is repeated for many proteins with the same function, such as haemoglobins from different animals which transport oxygen in the blood, or proteases that break down proteins, growth factors, and so on.

In order to extract common spectral characteristics of sequences having the same or similar biological function, cross-spectral analyses were performed to arrive at a “consensus spectrum”. The peak frequencies in the consensus spectrum give frequency components that are common to all the protein sequences. In general, peak amplitudes at least 20 times the noise levels are considered significant.

More than 1 000 proteins from over 30 functional groups have been analysed. Remarkably, the results showed that proteins with the same biological function share a single frequency peak while there is no significant peak in common for proteins with different functions; furthermore the characteristic peak frequency differs for different biological functions.

The same results were obtained when regulatory DNA sequences were analysed.

Cosic referred to this phenomenon as the Resonant Recognition Model (RRM) of molecular function. Some of the results obtained in 1994 are presented in Table 1 [7].

Molecular Type Frequency No. Signal/Noise Error DNA regulatory sequences Promoter .34375 53 128 .016 Operators .07813 8 44 .008 Enhancers .04883 10 467 .024 Protein sequences Oncogenes .03130 46 468 .004 ACH receptors .49219 21 137 .002 Heat shock proteins .09473 10 326 .005 Interferons .08203 18 117 .008 Haemoglobins .02340 187 119 .008 Protease inhibitors .35550 27 203 .008 Proteases .37700 80 511 .004 Amylases .41211 12 170 .002 Neurotoxins .07031 16 60 .004 Growth factors .29297 105 200 .016 Glucagons .32030 13 71 .034 Homeobox proteins .04590 9 100 .001 Cytochrome B .05900 16 201 .004 Actins .48000 12 163 .002 Myosins .34000 11 201 .004 RNA polymerases .35693 10 256 .001

Table 1. RRM frequencies for different functional groups of proteins and DNA regulatory sequences

Finding the resonant electromagnetic frequency

Cosic's findings do suggest that selective interactions between molecules are achieved by molecular resonance, though not all scientists who use the technique accept the idea behind it, or find it necessary to think about the implications.

As mentioned earlier, the common frequency identified in the consensus spectrum of proteins or DNA sharing the same function is not the actual frequency of the electromagnetic waves to which the molecules resonant. A rough estimate based on the length of a typical protein chain and the distance between neighbouring amino acids gives the maximum and minimum wavelengths of the electromagnetic radiation as 30 000 and 300 nanometres, i.e., from the very low infrared through the visible to the ultraviolet, a very wide range indeed.

In order to find out if the consensus frequency is related to the resonant electromagnetic frequency, Cosic turned her attention to light sensitive proteins.

Light-sensitive proteins absorb light strongly at certain frequencies and become excited to send out an electric signal or to re-emit light. For example, there are three kinds of rhodopsins, light sensitive proteins in the eye that absorb at red, blue and green respectively. The bioluminescent protein aequorin absorbs a similar wavelength of light as the rhodopsin that absorbs blue light. In the consensus spectrum of blue rhodopsins and aequorins, there is only one prominent peak frequency at 0.475, and this is most likely to be the one related to the absorption of blue light. In a similar way, it was estimated that the frequency of 0.355 is related to the absorption of green light, and the frequency of 0.346 to the absorption of red light. Analysing other light sensitive proteins in this way resulted in a table of correspondences between the RRM consensus frequency and the frequency of the electromagnetic radiation absorbed.

It turns out that the two frequencies are highly correlated, and there is a scaling factor between the two which averages out at K = 201. The approximate wavelength l of the electromagnetic radiation at molecular resonance can be calculated for each functional group of proteins or nucleic acids from its RRM frequency f RRM according to the relationship,

l = K / f RRM

Using this relationship, Cosic and her colleagues predicted the wavelengths of low intensity electromagnetic radiation from the common RRM frequency of various groups of growth factors that would stimulate the growth of cells. They reported that the observed effects of low intensity electromagnetic radiation on cell growth and DNA synthesis were indeed maximal at the predicted wavelengths.

In another experiment, the RRM frequency of chymotrypsins as a protease was used to predict the wavelength of light that could activate chymotrypsin activity, which was 851nm. That too, coincided precisely with the experimentally observed wavelength for maximum activation of enzyme activity.

These results were recently confirmed in lactate dehydrogenase enzyme, which was activated at the predicted resonant frequency [17]. They clearly offer further support for the molecular resonance hypothesis.

Predicting amino acid hotspots, drug design, finding genes and disease markers

Once the characteristic frequency of a particular protein function is known, it is possible to find out which amino acids “hotspots” in the sequence contribute predominantly to the RRM frequency and hence the function, using various mathematical tools, such as the inverse of Fourier Transform, or a more refined technique, wavelet analysis [18, 19]. Cosic's team found that amino acid hotspots are typically clustered in the protein tertiary structure in and around the protein active site, which again reinforced the view that the RRM frequency is strongly linked to function.

Another application of the RRM frequency is in the design of proteins and peptides with particular biological functions. From the RRM frequency of the biological function and its phase of vibration, Cosic's team has designed small peptides that act as agonist (same function) or antagonist (opposite function or inhibitor) to the proteins.

Cosic's extensive review of her work [7] was published six years before the discipline of bioinformatics was founded in 2000, in anticipation of the formal announcement of the human genome sequence. It is only very recently that other scientists have begun to take advantage of RRM. For example, bioinformatics researchers in Kerala, India, demonstrated that using RRM instead of base sequence to find genome segments that code for proteins reduced the computational overhead by 75 percent, and with much better discrimination [20]. In Saudi-Arabia, a biomedical scientist showed that RRM analysis could identify proteins that might be markers for cardiovascular disease [21]. In Taiwan National Ocean University, researchers are refining the mathematical analyses of RRM, all the better to extract genetic information from nucleic acid sequences and proteins [22].

Finding peptides that inhibit HIV-1 entry into cells

Veljkovic's group has recently applied the technique, which they refer to by its old name, Information Spectrum Method, ISM, to develop new AIDS drugs [23]. It is estimated that approximately 76 percent of HIV+ patients with a measurable viral load are infected with a strain of virus that is resistant to one or more classes of antiretroviral drugs. So drug developers are on the constant lookout for drugs that target different stages in the virus's life cycle. A new generation of antiviral drugs is intended to prevent HIV-1 entry into susceptible cells. The first step in the process is the binding of the viral glycoprotein gp120 to the CD4 receptor on the cell membrane, which requires co-receptor CCR5 and CXCR4. Despite its high variability, gp120 has conserved its affinity for the CD4 receptor and co-receptors CCR5 and CXCR4.

Fifty HIV-1 isolates belonging to different groups and collected from different parts of the world were analysed. The consensus information spectrum (CIS) of gp120 contains only one characteristic peak at the frequency 0.1035, which probably determines the interaction of the proteins with the CCR5 co-receptor.

To find out if that was the case, the CIS of gp120 proteins was multiplied with that of CCR5, CXCR4 and CD4 respectively. The characteristic peak at frequency 0.1035 significantly increased in amplitude after multiplication with the CIS of CCR5, but decreased after multiplication of with that of CXCR4 or CD4. This demonstrated that only CCR5 shares common information with gp120.

Further information spectrum scanning of the CCR5 primary structure revealed that the N-terminus of the second extracellular loop (ECL2) encompassing amino acid residues 168 – 186 has the IS frequency 0.1035. This region was previously found to interact with HIV-1 gp120 by binding to its V3 loop. Most CCR5 antagonists that effectively block HIV-1 entry bind within the pocket encompassing the ECL2 residues 168-186, and peptides derived from ECL2 (residues 169-173) elicit antibodies that block gp120/CCR5 binding.

In order to select for peptides that block HIV-1 entry, the CCR5-derived peptide, residues 168-186 of ECL2, was used to select for peptides that bind to it from a bacteriophage library. Five phage peptides were isolated, and when analysed, they showed the same common frequency of 0.1035.

As a final test, the phage peptides were examined for antiviral activity in infection assays. One of the peptides, P25, inhibited HIV-1 entry at concentrations in the nanomolar range, and more effectively than peptide T-20, the first commercially available HIV-1 entry inhibitor.

Implications for biology

Beyond the problem of intermolecular recognition, neither Cosic nor Veljkovic has said much on the implications of these findings for biology, which are potentially quite profound.

The idea of molecules communicating and exchanging energy by electromagnetic resonance fits in with accumulating evidence that cells and organisms are liquid crystalline, that all the molecules, including especially the 70 percent or water, are aligned and working coherently together [9, 12]. There is little or no free diffusion in such a system, as Fröhlich [14, 15] had pointed out earlier, and before that, cell physiologist Gilbert Ling [24, 25] ( Strong Medicine for Cell Biology , SiS 24) and biochemist/historian of Chinese science, Joseph Needham [26].

Instead, energy transfer - by molecular resonance or coherent excitations – probably has to occur through large distances, activating entire populations of similar molecules that are in different parts of the cell or different parts of the body, so long-range coordinating of function can happen instantaneously. At least, that's what I am proposing.

The precise role of organised biological water in transmitting and perhaps amplifying electromagnetic signals has yet to be defined, but a growing number of us suspect that water may be playing the lead role in living processes [27] ( Water's Effortless Action at a Distance , SiS 32) . Significantly, water is largely transparent just within the narrow window of frequencies around the visible range of electromagnetic radiation where most of the molecular resonance frequencies are to be found, with steep rises in absorption on either side. This does enable resonating molecules to ‘see' one another and transfer energy. At the same time, however, there is now little doubt that electromagnetic radiation in the microwave range and far below can have biological effects [9, 28] ( Confirmed: Mobile Phones Break DNA & Scramble Genomes , SiS 25). As molecules self-assemble into structures on all scales, one would not be surprised to find vibrations and resonance over the entire range of frequencies.

What is clearly emerging is the predominant electronic nature of the living matrix and living activities, which will require a complete rewrite of biochemistry and cell biology, if not also physiology and medicine.

Article first published 02/02/07

References

Ho MW. No system in systems biology. Science in Society 21 , 46, 2004. Ho MW, Burcher S, Gala R and Vejkovic V. Unraveling AIDS, Vital Health Publishing, Ridgefield, CT, 2005, https://www.i-sis.org.uk/onlinestore/books.php#236 Veljkovic V and Lalovic DI. Simple theoretical criterion of chemical carcinogencity. Experientia 1977, 33, 1228-9. Veljkovic V. A theoretical approach to preselection of carcinogens and chemical carcinogenesis. Gordon & Breach, New York, 1980. Ho MW. Desktop Drug Discovery. I-SIS Report. Veljkovic V, Cosic I, Dimitrijevic B, Lalovic D. Is it possible to analyze DNA and protein sequences by the methods of digital signal processing? IEEE Trans Biomed Eng 1985, 32(5), 337. Cosic I. Macromolecular bioactivity: is it resonant interaction between macromolecules?-theory and applications. IEEE Trans Biomed Eng 1994, 41, 1101-14. Pert CB. Molecules of Emotion , Pocket Books, London, 1997. Ho MW. Quantum jazz, the meaning of life, the universe and everything. Science in Society 32 , 11-14, 2006. Ho MW. Living with the Fluid Genome , I-SIS & TWN, London and Penang, 2003, https://www.i-sis.org.uk/onlinestore/books.php#238 Ho MW. Water smoothing protein relationships. Science in Society 26 , 51, 2005. Ho MW. The Rainbow and the Worm, the Physics of Organisms , 2 nd Edition, World Scientific, 1998, reprinted 2000, 2001, 2003, 2005, https://www.i-sis.org.uk/onlinestore/books.php#238 McClare CWF. Chemical machines, Maxwell's demon, and living organisms. J theor Biol 1971, 1-34. Fröhlich H. long range coherence and energy storage in biological systems. Int J Quantum Chem 1968, 2, 641-9. Fröhlich H. The biological effects of microwaves and related questions. Adv Electronics and Electon Phys 1980, 53, 85-152. Veljkovic V and Slavic I. General model of pseudopotentials. Physical Review Lett 1972, 29, 105–8. Vojisavljevic V, Pirogova E and Cosic I. Investigation of the mechanisms of electromagnetic field interaction with proteins. Proceedings of the 2005 Engineering in Medicine and Biology 27 Annual Conference , Shanghai, China, September 1-4, 2005. Fang Q and Cosic I. Finding characteristic bands from protein sequences using wavelet packet transform and energy map. 2 nd International Conference on Bioelectromagnetism, February 1998, Melbourne, Australia. Cosic I, Fang Q and Pirogova E. Modification of the RRM model using wavelets transform and ionisation constant to predict protein active sites. Proceedings of the First Joing BMES/EMBS Conference Serving Humanity, Advancing Technology, October 13-16, 1999, Atlanta, GA, USA. Nair AS and Sreenadhan SP. A coding measure scheme employing electron-ion interaction pseudopotential (EIIP). Bioinformation 2006, 1, 197-202. de Trad CH. Identification of selected proteins that might be used as markers for cardiovascular disease, using signal processing techniques. Int J Sci Res 2005, 15, pp. X-Y. Cheng C-F and Chang H-C. Biological functions prediction via Goertzel filter bank approach. Proceedings of 2006 CACS Automatic Control Conference, St. John's University, Tamsui, Taiwan, November 10-11, 2006. Veljkovic V, Veljkovic N, Esté JA, Hüther A and Dietrich U. Application of the EIIP/ISM Bioinformatics concept in development of new drugs. Current Medicinal Chemistry 2007, 14, 133-55. Ling G. Life at the Cell and Below-Cell Level. Pacific Press, New York, 2001. Ho MW. Strong medicine for cell biology. Science in Society 24 , 32-33, 2004. Needham, J. Order and Life, Cambridge University Press, Cambridge, 1936. Ho MW. Water's effortless action at a distance. Science in Society 32 , 21-23, 2006. Ho MW and Saunders PT. Confirmed: mobile phones break DNA and scramble genomes, but no health risks? Science in Society 25 , 46-47, 2005.