While p53 family homologs are well described in Metazoa and especially in vertebrates [ 22 ], the two earliest branching homologs to date were described outside Metazoa in clade Choanoflagellates [ 30 33 ] and some p53 homologs were mentioned in some general studies of transcription factors in further premetazoan clades but did not go deeper into sequence analysis [ 34 36 ]. According to [ 6 ], the ancestral p53/63/73 protein present in early metazoans (like sea anemones) probably plays a role in germ cell protection. These homologs also have a tetramerization domain, which is present in today’s sea anemones (and is similar to the human tetramerization domain, except for missing a glutamine-rich region [ 20 ]. The biological role of the remote homologs found in Holozoa is unclear. To shed light onto the evolution of the p53 family, we have characterized the sequences of p53 orthologs in all non-animal eukaryotes, with a focus on the existing genomic and transcriptomic data of unicellular Holozoa. The investigation of sequence databases revealed the presence of p53 homologs in all clades of unicellular Holozoa (Choanoflagellatea, Filasterea, Ichthyosporea, Corallochytrea), with two new p53 homologs inand(both belong to Ichthyosporea). No p53 homolog was found outside Holozoa. We used amino acid sequence homology analyses and 3D modeling predictions to identify structural similarities in evolutionary close relatives and in human proteins.

The most detailed evolutionary studies focused on p53 family diversification after gene duplication and rearrangements, which led to the formation of proteins with different functional properties. These diversifications include variations in structurally disordered regions [ 19 ], in various secondary structures [ 20 ], loss of the SAM domain for p53 [ 21 ] and changes of regulatory phosphorylation sites [ 22 ]. On the contrary, the core domain containing the DBD retains high homology and preserves similar DNA binding affinities for all known p53 family proteins. The precise DNA-binding properties of these proteins are regulated not only by changes in this domain, but also in the N-terminal and C-terminal domains (especially for human p53) [ 23 ]. Post-translational modifications [ 24 27 ] as well as interaction with other proteins [ 28 29 ] modulates DNA-binding and subsequent transcriptional activation/suppression of individual target genes.

p53 is an intensively studied protein due to its role in tumor suppression, where p53 mutations are found in approximately 50% of human tumors [ 1 3 ] and dysregulation of p53 activity occurs in most other cancers [ 4 ]. It is accepted that p53 family proteins (p53, p63 and p73) arose from a common ancestral gene (most similar to contemporary p63 found in human [ 5 ]) and their functions diversified after gene duplications and rearrangements [ 6 ]. All three contemporary p53 family members are transcription factors that execute their functions through binding to DNA consensus sequences that are often located within locally structured regions [ 7 10 ]. The typical p53 consensus binding site has internal symmetry and two copies of the motif 5’-RRRCWWGYYY-3’ [ 11 ], although non-canonical half sites [ 12 ], targets with long spacers [ 9 ] and with various structural features [ 13 14 ] are also recognized by p63 and p73 [ 15 ]. All p53 family members act as tetramers through their oligomerization domains, and p63 and p73 also contain a C-terminal sterile alpha motif (SAM) [ 16 17 ], which has been lost in p53. The DNA binding domain (DBD) of p53 contains four of the five most highly conserved regions of vertebrate p53 and is a hotspot for p53 mutations in cancer, underscoring its importance for function [ 4 18 ].

The most distant predicted structure of p53 family homolog DBDs was found in. This is a single-celled eukaryote living in coral reef lagoons; it was considered to be part of fungi, but contemporary phylogenetic analyses based on DNA sequencing show that this organism is a member of the Holozoa branch [ 38 ], as are all known p53 family homologs. Structure-based sequence alignment of primary amino acid sequences of human p63 and the six most distant non-Metazoan organisms ( Figure 4 ) shows high homology (orange rectangles), complete identity is highlighted in the consensus line in red upper case. Structure-based sequence alignment show us combination of sequence alignment and structural features (alpha helices, beta sheets) as determined by the Match Maker tool (3D superimposition of all template-based structures predicted and experimental 2rmn p63).

We used QUARK ab initio protein structure prediction algorithm [ 46 47 ] to de novo model the structure of the DBDs. It is clearly visible that all Holozoan p53 family homologs contain functionally important beta sheets in their DBDs (as well as human p63 ab initio modelled structure and most importantly also experimentally verified p53 family DBDs), but the alleged homolog fromcontains mainly alpha-helices ( Figure 3 A). Root mean square deviation (RMSD) between experimentally determined structure of human p63 DBD (PDB code: 2rmn) and our ab initio modelled human p63 DBD was only 1.259 Å. Using SWISS-MODEL [ 48 ] and UCSF Chimera [ 49 ] we visualized 3D models of their DBDs ( Figure 3 B). Overall RMSD was 0.718 Å (compared to the experimentally determined DBD structure of human p63 as the reference structure). Since the accepted cut off for similarity is <2 Å, this points to a high level of DBD structural conservation in these distant homologs. Simulation of theprotein was not allowed due its low homology. All predicted models in PDB format are enclosed in Supplementary Material 3 (QUARK) and Supplementary Material 4 (SWISS-MODEL). The fact that the sequences in the DBD might arrange in a similar tridimensional conformation as human p53 suggests these orthologs in unicellular Holozoa could bind to DNA, as observed for animal p53/63/73.

We inspected the exon-intron structure of p53 homologs in two paralogs of, three paralogs of, one homolog fromand one from Figure 2 ). All these homologous genes are very short, maximum length just over 5 kbp in(XM_004994533.1; protein XP_004994590.1). For comparison, exon-intron structure of three close relative Metazoans are shown and it is evident that the length of introns in p53 homologs increases with the organism’s complexity (for example, the human p63 gene is over 250 kbp long). Furthermore, it is interesting that the homologous gene inhas only two exons, so the whole DBD is located in exon 2. We could not investigate exon-intron structure of homologous genes from more distant clades, because their genomic DNA sequences are unavailable at this time.

Due to the evolutionary distance and parallel evolution of these organisms for hundreds of millions of years, the large number of conserved sequences is surprising. We compared the presence of individual amino acids in the DBD of the six most remote p53 family homologs (from clades Filasterea, Ichthyosporea and Corallochytrea) with human p53 ( Table 2 ). The 100% identity of 11 amino acid residues and conservation of positively charged residues at positions 248 and 273 of human p53 points to their crucial importance for p53 family protein structure and function. Fascinatingly, these conserved residues are important for p53-DNA binding in human p53 and/or are hotspot mutation sites in human cancer.

Until now, most p53/63/73 homologs were found in the clade Metazoa. Using the Blast algorithm (blastp and tblastn) [ 37 ], we examined various databases (non-redundant protein sequences, WGS, EST, STS, GSS, TSA and non-annotated sets of protein sequences from http://multicellgenome.com/meet-our-organisms ) [ 38 39 ]). The results revealed eleven significant hits outside Metazoa (E-value < 0.001), all designated as hypothetical proteins, five from clade Choanoflagellata (XP_001746020.1;XP_001747656.1;XP_004994590.1;XP_004991397.1 andXP_004991396.1), one from clade Filasterea (, XP_004365382.2), four from clade Ichthyosporea (, XP_014156832.1;CFRG4869T1;Ihof_evm3s137;Nk52_evm78s1737) and one from clade Corallochytrea (Clim_evm153s157) (see Table 1 , significant domain homology is indicated by a plus (+) mark). All eleven non-metazoan homologous sequences are given in Supplementary Material 1 . To validate the homology of these proteins, we performed reciprocal searches using the phmmer tool [ 40 ] against the reference proteome database. These results show significant homology for multiple p53 family proteins, including human p53, p63 and p73 ( Supplementary Material 2 ).

3. Discussion

Monosiga brevicollis , Chromosphaera perkinsii and Corallochytrium limacisporum , but this may be due to the fact that the tetramerization domain is relatively short and homology lies slightly below selected E-value threshold. Obviously, the binding properties of the remote homologs should be determined by combinations of wet-lab methods. The high homology between the core domains of p53 family members suggests that the remote proteins found, containing newly described homologs, have DNA binding features similar to p53 from mammals, amphibians, fish, insects and nematodes [ 50 ] ( Table 2 , alignment of the entire DBD protein sequences of novel homologs with human p53 is shown in Supplementary Material 5 ). The positive charge of the key DNA binding amino acid residues 248 and 273 of the human canonical p53 sequence is conserved in all remote homologs; interestingly these residues are always arginines (as in the human sequence) or lysines, or a combination of the two. Furthermore, two of four zinc-coordinated amino acid residues, particularly C176 and C238, are 100% conserved in all remote homologs. Zinc ions play a critical role in stabilizing the architecture of the DBD. We also found significant homology in the tetramerization domains, which indicates the possibility of functional tetramers forming in these homologs. Interestingly, tetramerization domains were not found inand, but this may be due to the fact that the tetramerization domain is relatively short and homology lies slightly below selected E-value threshold. Obviously, the binding properties of the remote homologs should be determined by combinations of wet-lab methods.

Zea mays (GenBank: AAT42177.1). This is a clear artefact due to human DNA contamination (the Zea mays sequence has 100% identity with human p53). The presence of a p53 family homolog has also been reported in Entamoeba histolytica (Amoebozoa group) [−8) and lacks homology with the p53 family DBD domain ( From our data, we may speculate as to the original function of p53 homologs in unicellular Holozoans and Metazoans. It is supposed that the first evolutionarily role of p53 in primitive Metazoans could be in apoptosis regulatory network via activation by upstream kinase CHK2 [ 30 ] and/or in DNA repair via activation of RNR (ribonucleotide reductase) gene expression, which produces deoxyribonucleotide triphosphates (dNTPs) required for DNA replication and repair [ 30 51 ]. The absence of an homologous transactivation domain in the most remote p53 family homologs is consistent with previous observations that the transactivation domain first appeared in Placozoa [ 52 ]. However, one must be aware that contemporary databases and articles contain misleading information. For example, it is clear that there is no p53 family homolog in plants, but there is a misleading GenBank annotated p53 protein sequence in(GenBank: AAT42177.1). This is a clear artefact due to human DNA contamination (thesequence has 100% identity with human p53). The presence of a p53 family homolog has also been reported in(Amoebozoa group) [ 53 ]–however our results indicate that this protein is not a p53 family homolog; the protein shows the highest homology to the Alpha kinase superfamily (E-value = 2.5 × 10) and lacks homology with the p53 family DBD domain ( Supplementary Material 7 , compared to thousands of significant hits for the proteins characterized in this study including human p53, p63 and p63 proteins in Supplementary Material 2 ).

Entamoeba histolytica provides a good example of the problems inherent in searching for distant protein homologs based solely on overall sequence similarity and has led to misleading information in subsequent papers, including a well-known review on p53 family evolution, where Amoebozoa is included in the evolutionary tree [ The finding of an alleged p53 family homolog inprovides a good example of the problems inherent in searching for distant protein homologs based solely on overall sequence similarity and has led to misleading information in subsequent papers, including a well-known review on p53 family evolution, where Amoebozoa is included in the evolutionary tree [ 30 ]. To avoid such problems, we initially used sequence similarity of the core domain, the most highly conserved region in p53 family evolution and known to be vital for functional activity. We then used structural predictions to model the 3-D organization of the putative homologs and identify those with similar structures. Based on these recent data we have updated the p53 family ancestral tree and show the closest evolutionary branches where p53 family homologs are not present ( Figure 5 ).