Iterative HMM-based searches of marine metagenomes, on the basis of a reference panel of autolykiviruses and previously identified DJR capsid bacterial and archaeal viruses, yield approximately 15,000 proteins following stringent quality control filtering of the initial approximately 45,000 sequences that were recovered. Network visualization reflects MCL clustering of BLASTp-based similarities among sequences. a, Placement of reference panel sequences within the network. b, Characterization of proteins as DJRs on the basis of sequence- and structural-similarity-based annotation. c, Best BLASTp matches to RefSeq viruses, bitscore requirement of 50. d, Association of Tara Oceans-derived sequences to size fraction of isolation. e, Subset of sequences selected for phylogenetic analyses (Fig. 4) on the basis of membership in protein clusters strongly supported as bacterial and archaeal virus DJR capsids and requiring a length of ≥200 amino acids (Methods). We note that this selection is conservative, given the greater number and diversity of sequences recovered by our HMM-based search that passed all quality controls and show no structural- or sequence-based similarity to any other proteins, and thus were excluded from further analyses. The observed dominance of eukaryotic virus DJR capsids in this search is predicted to reflect four major aspects of our approach. First, inclusion of cellular metagenomes allows capture of large viruses such as the Mimiviridae (>400?nm), Iridoviridae (120–350?nm) and Phycodnaviridae (100–220?nm). Second, some Phycodnaviridae have been shown to encode up to eight sequence-diverse copies of their DJR major capsid gene84. Third, <0.22?μm viral metagenomes are biased against recovery of bacterial and archaeal DJR viruses, as described here. And fourth, the sequence content of HMMs using iterative searches is defined by the search space, such that if eukaryotic virus DJR capsid sequences are well represented, as they are in the larger size-fraction sequence databases used here, they will drive searches towards increased detection of similar sequences.