Taxon sampling and specimen collection

To test the hypothesis of a single species of Electrophorus, we examined 107 specimens (all sequenced for mitochondrial DNA, mtDNA, and 94 specimens for nuclear DNA, nDNA) from across Greater Amazonia including the type locality of E. electricus in Suriname (Supplementary Data 1). Outgroup species were Gymnotus carapo, G. choco, G. cylindricus, G. pantherinus, Hypopomus artedi, and Sternopygus macrurus (Supplementary Data 3). Specimens were collected and sampled in the field according to the Animal Care and Use standards of the depository institutions and the countries of origin of the tissue samples used in the DNA analyses. In addition, tissues and/or specimens were received from multiple institutions in North and South America and Europe following pertinent Material Transfer Agreements and the national and international protocols for the shipment of materials. Specimens were euthanized and muscle or fins removed and stored in 95% ethanol. All voucher specimens are deposited in the institutions listed in the abbreviation section.

DNA sequencing

Genomic DNA was isolated from muscle or fin using phenol-chloroform in the Autogen platform or DNeasy Tissue Extraction Kits (QIAGEN) following manufacturer’s instructions. The polymerase chain reaction (PCR) was used to obtain fragments of the mtDNA and nDNA and amplified using the primers compiled in Supplementary Data 4. PCRs for COI, 12s, 16s, Atpase 8/6, 361298E1, 4174E20, 55378E20, and S7-i1 were carried out for 10 µl volumes as follows: 1 µl of 10x buffer, 0.5 µl of 10 µM dNPTs, 0.4 µl of 50 µM MgCl 2 , 0.3 µl of 10uM of each primer, 5 U/M of Taq DNA polymerase, 6.4 µl of deionized water, and 1µl of DNA extract. Thermal cycling conditions for genes were: 35 cycles, 95 °C for 300 s, 95 °C for 30 s, 72 °C for 45 s, and 72 °C for 300 s. In the case of ND4, PCR was carried out for 20 µl volumes and 1 µl of DNA extract. Nested PCRs for SH 3 PX 3 were carried out for 25 µl volumes and 2 µl of DNA extracts. Thermal cycling conditions for SH 3 PX 3 were: 35 cycles, 94 °C for 60 s, 94 °C for 30 s, 72 °C for 80 s, and 72 °C for 300 s. The annealing temperatures and times are provided in Supplementary Data 4. PCR products were purified using EXOSAP. DNA sequencing followed standard protocols employed in molecular systematics laboratories and were completed through a capillary sequencing technique on the LAB MAHVN4550 sequencer. All obtained sequences were deposited in GenBank (Supplementary Data 3).

Sequence alignment

Sequences were edited in the CodonCode Aligner (www.codoncode.com) and preliminarily aligned using ClustalW in MEGA 6.0.6.40. Alignments were checked by eye and manually adjusted when necessary. Kimura Two Parameter (K2P) pairwise distances were calculated using MEGA 6.0.6. For all analyses Sternopygus, when included, was designated as the outgroup. Hypopomus artedi, and four species of Gymnotus were also included. 107 individuals of Electrophorus from throughout their range were included as the ingroup. The best model of nucleotide evolution for each locus was estimated using jModelTest41, though codon-level estimates were not inferred or enforced. For introns sequences heterozygosity was noted with degenerate IUPAC codes. Insertion/deletion mutations for the EPIC 36298E1 sequences were incorporated in the phylogenetic analyses e.g., ref. 42.

Species delimitation

Species delimitation was based on the subsequent evaluation of four molecular datasets. Dataset 1: Single locus (COI; 569 bp); Dataset 2: five mtDNA genes (COI, ND4, ATP6/8, 12S rDNA, and 16S rDNA; 2973 bp total); Dataset 3: five nDNA loci (one exon: SH 3 PX 3 ; one intron: S7-i1; and three EPICs: 36298E20, 4174E1, 55378E20; 2459 bp total); Dataset 4: concatenated mtDNA and nDNA genes (5432 bp).

Dataset 1: Application of a simple barcoding approach for species delimitation, i.e., COI sequences, in combination with pairwise distance comparisons has resulted in highly successful species-level identifications in fishes, e.g., ref. 43. In spite of this, determination of the limits between inter- and intra-specific differences and the delimitation of the appropriate level of differences between species threshold has proven difficult, particularly in under-sampled phylogenies44. For DNA taxonomy herein we utilize COI, which as noted above has previously demonstrated the ability to provide good resolution for species delimitation among fishes. We complement the traditional molecular taxonomic approach, i.e., a single tree for defining species clades and revealing the included gaps, with an independent investigative tool based on pairwise distances to automatically detect significant barcoding gaps without an a priori species hypothesis—the Automatic Barcoding Gap Discovery, ABGD16.

Dataset 2: A computationally multi-faceted parametric inferential approach. Species validation can come in many forms; herein our approach begins with a GMYC model using the 5-gene concatenated mtDNA. Given that branching events, in this case the history of haplotypes along any phylogenetic tree, should be more recent within a species and more distant between species; implementation of the GMYC model seeks to distinguish between cladogenetic (species-level differences modeled by the Yule process) and tokogenetic (intra-specific differences modeled by the Coalescent process) events. A Bayesian, single-threshold ML method45, and multi-threshold ML method46 are all available for the GMYC model and all three were used here. The single-threshold ML method is the most conservative approach, the multi-threshold method allows for variation in the depth of history at which tokogeny gives way to speciation46, and there is a Bayesian implementation that takes into account error in reconstruction of phylogeny and model uncertainty47. The full dataset (113 terminals) was reduced to unique haplotypes (63 terminals) prior to analyses. Beast 2.418 was run for 20 million generations sampling every 1000 generations, and results were assessed using Tracer v1.5 (http://beast.bio.ed.ac.uk/Tracer) to ensure stationarity and to check that all parameters had acceptable effective samples sizes (>200) for use in generating the distribution of ultrametric trees. The derived maximum clade credibility (MCC) tree from the *BEAST2.4 run was used for the two GMYC analyses based on ML. These analyses were run on the GMYC web server (http://species.h-its.org/gmyc/) using the single and multi-threshold approaches, as described above. The Poisson tree process (PTP) has been shown to better delineate species, particularly when divergences among lineages is low48, and a Bayesian based implementation of this is available (http://species.h-its.org/ptp/). As two GMYC approaches were implemented we elected to use the bPTP method, to investigate a third approach to species number.

Dataset 3: Nuclear DNA used herein for Electrophorus and related species is a combination of three EPIC loci (see Supplementary Data 3), a single nuclear ribosomal intron (S7-i1), and one exon (SH 3 PX 3 ). Further estimates of species boundaries and validation of species were completed by analyzing the data in several different ways including analyzing each nuclear locus individually to determine the posterior probability support provided by each locus, analyzing the concatenated full nuclear DNA Dataset 3, incorporating coalescent-based methods for species delimitation of nDNA loci via Bayesian Phylogenetics and Phylogeography (BP&P) v3.217, and investigating the full ten locus dataset with the Genealogical Sorting Index, GSI49 using 10000 permutations on the lattice server50 as well as joint estimation of divergence times and the gene trees species tree using *BEAST2.418. Details of these analyses are as follows: Each nuclear locus was analyzed individually and in a concatenated matrix using MrBayes v3.2.251 on the CIPRES science gateway52 to determine the posterior probability for each of the putative lineages. The concatenated matrix was run for 10 million generations sampling every 1000 generations. Determination of convergence was completed with Tracer v1.5 and using Are We There Yet, AWTY (http://ceb.csit.fsu). Each individual nuclear and mitochondrial locus were run for 200 million generations sampling every 5000 generations using both MrBayes and *BEAST2.4. Stationarity of each locus in each run was assessed with Tracer and AWTY, and the distributions of the results from these runs were plotted in DensiTree v2.2.153 with a ten percent burn-in to visualize the congruence and conflict among topologies across loci. Analyses were completed with the entire Dataset 4, but are visualized with only the focal taxon Electrophorus, and a reduced number of terminals in each putative species. This was completed by trimming the total number of terminals post-analysis using Phyutility54.

Dataset 4: Bayesian Phylogenetics and Phylogeography (BP&P) v3.2 was used to delimit species boundaries on the complete 10-locus dataset with all terminals included and with a reduced number of terminals (three individuals from each putative lineage). BP&P uses the multispecies coalescent to delimit species and infer phylogeny in a Bayesian framework. The program also accounts for population genetic uncertainties in incomplete lineage sorting associated with ancestral polymorphism conflicts in gene trees and species trees55. In this program the Gamma prior G (α,β) is assigned to both population size (θ) and age of the species tree root (τ 0 ) and in our analyses α = 2 and β = 1000; all other parameters of divergence time used the Dirichlet prior56 with the heredity scalar set to 0.25 for the mtDNA loci and to 1 for the five nuclear loci. The analyses were run twice to ensure consistency between the runs.

Phylogenetic estimation (trees with outgroups)

We estimated the phylogeny of Electrophorus using the concatenated alignment of dataset 4 (5852 nucleotides total) in RAxML and MrBayes 3.2.651 both run on the CIPRES science gateway52.

Time divergence estimates: We simultaneously estimated divergence time, based on an external calibration point, and the species trees from multilocus sequence data using *BEAST2.4 and a relaxed clock for the mtDNA loci and a strict clock for the nDNA23. A subject of much recent debate has been the chronological closure of the Central American Seaway via the Isthmus of Panama and its consequences for biotic dispersal between North and South America and vicariance between Atlantic and Pacific Oceans57,58,59,60,61,62,63,64. On the younger side, 62dated the formation of the Isthmus of Panama sensu stricto to around 2.8 Ma. Paleoceanographic studies20,65 show a decrease in the transport of deep and intermediate Pacific waters into the Caribbean by 10 to 11 Ma, probably related to a closing Central American Seaway61. Based on uranium-lead geochronology in detrital zircons, 61provided evidence that rivers originating on the Panama arc transported sediment to the shallow marine basins of northern South America by the middle Miocene (13–15 Ma). Finally, 57used both molecular and fossil data to argue for two significant waves of terrestrial dispersal at around 20 and 6 Ma. Based on these studies there a wide time scale from which to select calibration points, each with support from the literature: 2.8 Ma62; and 5.1 Ma—57,58final wave of colonization; 10.5 Ma—20,65closing of Central American Seaway; 14 Ma – 61fluvial transport of sediment from Panama arc to South America; or 20 Ma—57,58early wave of terrestrial dispersal. The date used, i.e., 10.5 ± 1.5 Ma, is in the middle of these ranges and is a conservative approach as well as one supported by multiple studies. Mitochondrial DNA indels. One other item of note with respect to the molecular sequence data generated for this study concerns the mitochondrial gene ND4 and indels. In the aligned data matrix Hypopomus artedi was found to have a full codon triplet in ND4 that no other sampled member of the outgroup or ingroup contained. Full codon gaps are known in the mtDNA of fishes, e.g., ND2 of Aphredoderus66. It is unlikely that this amplicon is a pseudogene as no stop codons in either this sequence or any outgroup or ingroup sequences were demonstrated by ORFfinder at NCBI. In addition, at position 1270 (−569) within the species of Electrophorus some individuals have an extra adenine residue.

Molecular diagnosis

Species of Electrophorus were diagnosed by unique nucleotide substitutions shared by all individuals of the distinct populations. Optimizations of the nucleotide substitutions among the species of Electrophorus were obtained from the MP topology using MEGA 6.0.6. Each numeric position was determined by the alignment between the species of Electrophorus with the outgroup. Screening for diagnosed nucleotide substitution were performed manually post alignment using Mesquite (http://mesquiteproject.org).

Phenotypic analysis

Morphometric and meristic summaries do not include data from individuals smaller than 300 mm TL. Although of large to very large sizes compared to most species of Neotropical freshwater fishes, specimens of Electrophorus less than 300 mm are juveniles with pronounced differences in some meristic (e.g., number of anal/caudal-fin rays) and morphometric values (e.g., preanal-fin distance) relative to larger specimens. Internal anatomy was studied through radiographs.

Meristics follow11 with the addition of the number of lateral-line pores posterior of the gill opening. Anal/caudal-fin ray counts include the dorsal procurrent rays, when present (made through radiographs). Morphometrics are point-to-point distances taken with digital calipers with intra-specific ranges presented in tables. Measurements were taken from the left side of individuals, when possible, as follows: body width—the distance across the body at the pectoral-fins base; branchial opening—the distance from the dorsal to the ventral extremities of the opening; eye diameter—the horizontal distance between the anterior and the posterior margins of the eye; eye-posterior naris distance—the distance from the anterior margin of the eye to the posterior margin of the posterior naris; greatest body depth—the greatest vertical extent of the body, usually at the origin of the anal fin along the posterior margin of the gill slit; head depth—the distance between the dorsal and ventral margins of the head at the vertical through the eye; head length—the distance from the tip of the lower jaw to the posterior margin of the opercle; head width—the horizontal distance between the dorsal limits of the branchial opening; internarial distance—the distance between the posterior margin of the anterior naris and the anterior margin of the posterior naris; interorbital distance—the distance between the medial margins of the eyes; mouth-eye distance—the distance from the posterior margin of the mouth to the ventral margin of the eye; mouth width—the distance between the inner corners of the mouth; pectoral-fin length—the distance from the base of the dorsal most pectoral-fin ray to the distal most point on the fin margin; postorbital distance—the distance from the posterior margin of the eye to the posterior margin of the opercle; preanal-fin distance—the distance from the tip of the lower jaw to the anal-fin origin; preanus distance—the distance from the tip of lower jaw to the anterior margin of the anus; preorbital distance—the distance from the anterior margin of the eye to the anterior margin of the lower jaw; snout-corner of mouth distance—the distance from the snout to the corner of the mouth; and total length—the distance from the tip of the lower jaw to the base of the central caudal-fin ray.

Species distribution modelling and niche analysis

According to67 the species distribution patterns are the consequences of three main factors: (1) dispersal ability; (2) the spatial distribution of environmental conditions that determine the survival of individuals and the persistence of populations; and (3) biotic interactions and the dynamics of resources. The species distribution models are based on the set of climate variables in wide resolution scales (macroscale) that determine the distribution of organisms, i.e., Grinnelian niche67.

To build the SDM models we used the MaxEnt algorithm, which works with presence data only68. Methods that use only presence data are common especially in areas with large gaps of information and high biodiversity such as the Amazon River basin, where there is no information about absence. MaxEnt estimates the probability of species distribution by fitting a function close to the uniform distribution under the environmental information associated to the occurrence points68. This method can discriminate between the environmental variables associated to the occurrence data and the background variation of the predictor variables based on 10000 random points, i.e., the algorithm contrasts presences against the background location69. We used occurrence points of the three species described in this paper, to show the niche differences among them, Electrophorus electricus had 29, E. voltai 24, and E. varii 46 spatially unique occurrence points, i.e., one occurrence point per pixel, split into 20% test and 80% training.

The environmental variables were chosen according to their potential to represent the topographical and limnological characteristics in the Amazon freshwater ecosystem based on70 who showed that broad scale variables could be used as proxies for characteristics of the local aquatic environment for modelling fish species distributions in areas like the Amazon where large gaps exist in our understanding of distributions. The climatic macroscale variables were obtained from BioClim (www.worldclim.org): annual mean precipitation (AMP), annual mean temperature (AMT), seasonality of precipitation (SP) and seasonality of temperature (ST). We also used geomorphological variables about terrain slope (SL), altitude (AL) and flow accumulation (FLA) obtained from Hydro1k (www.usgs.gov) database. Soil type characteristics (SOT 0, 3, 6, 11) were gathered from FAO’s database (www.fao.org.br). All descriptor variables were obtained for pixels of 4 × 4 km of resolution.

Model evaluation was performed using the Area Under Curve (AUC), which is a threshold-independent measure based on ranking locations, i.e. the probability to choose randomly the presence locations in relation to randomly choosing the background locations, commonly used in SDM modelling71,72. This measure could be interpreted as average of true positives values (sensitivity) of all possible false positive values (specificity), producing a global measure of fit for the model. These values are then plotted (sensitivity against 1-specificity) to generate what is known as the ROC (Receiver Operating Characteristic) curve72. Alternatives to AUC there are the threshold dependent measures that from a threshold create a presence and absence binary feature and a confusion matrix71. For that we used the Minimum Difference Threshold Criterion (MDT) that are expected to minimize omission and commission errors73.

To test the significance of the niche differentiation among lineages, we performed the multivariate analyses of variance (MANOVA). All the analyses were performed using dismo and vegan packages for Maxent, and MANOVA in R software74.

Electric organ discharge analysis

Low-voltage EOD waveform recordings. We measured the low-voltage electrolocation pulses generated irregularly (rates of ca. 0.1–10 Hz) by the Sachs’ electric organ5. Head-to-tail EOD waveforms sensu ref. 75 were recorded within 12 h of capture in inflatable swimming pools (2.5–3.0 m diameter, or rectangular ca. 3 × 1.8 m) filled to 40 cm depth with water from the collecting site. Temperature was standardized to 27.0 ± 0.2 °C. Submerged NiCr electrodes were placed at least 40 cm away from the head and tail of the fish, along the head–tail axis, and a train of low-voltage pulses acquired directly by an audio-digitizer (96 kHz sampling rate) or a National Instruments digital acquisition device (sampling rates 100–200 kHz). Here we plot a single low-voltage EOD for each individual. We measured EOD duration at a 1% threshold of the peak positive amplitude of the EOD following ref. 75.

High-voltage EOD amplitudes. We measured the high-voltage pulses generated in rapid volleys by the main electric organ and Hunter’s electric organ for predation and defense4. We used a Fluke 190–202 storage oscilloscope to measure the peak voltage in the volley of high-voltage EODs. Soon after capture the subject specimen was stretched out on a dry heavy-duty (non-conductive) plastic sheet to isolate it from the load of water. In this position a DC-coupled voltage reading from snout to the distal end of the tail was taken by gently prodding the tip of the snout to elicit a volley of high-voltage discharges. The entire procedure was accomplished in less than one minute.

Low-voltage EOD quantitative analysis. All seven EODs were conditioned to a common sampling rate, energy-normalized to root mean squared (rms) amplitude and centered to the peak of the single EOD phase. Following the procedure described in refs. 30,32, and using a custom MATLAB (The Mathworks, Natick, MA) program, we subjected the conditioned waveforms to the discrete wavelet transform (DWT), using the Symmlet-4 wavelet base, to generate a matrix of 256 DWT coefficients (256 unique coefficients at 8 wavelet scales [(2^8) −1], and one scaling coefficient). The DWT is a popular time-frequency based procedure to deconstruct signals into a smaller number of features informative of temporal (waveform shape) and spectral (frequency) differences among groups of signals76. Following ref. 30 we then subjected the matrix of 256 DWT coefficients × 7 individuals to dimension reduction by pairwise ANOVA to extract those waveform features (DWT coefficients), which permit the most effective discrimination among the three species. This yielded a final matrix of just 4 DWT coefficients × 7 individuals. Finally, using the ‘cluster’ package in Statistica 13.3 (Tibco/Statsoft, Palo Alto, CA) we performed nearest-neighbor (single linkage) hierarchical clustering of all individual EODs in the matrix of reduced DWT coefficients, with the Euclidean distance as a metric of distance in multivariate space.

Nomenclatural acts

This published work and the nomenclatural acts it contains have been registered in ZooBank, the proposed online registration system for the International Code of Zoological Nomenclature (ICZN). The ZooBank LSIDs (Life Science Identifiers) can be resolved and the associated information viewed through any standard web browser by appending the LSID to the prefix “http://zoobank.org/”. The LSIDs for this publication are: 7598B3E4-8E0C-43CE-A57B-CD71A9C99526, 7FA17DC2-5F58-4366-8908-9E66BE922458, and 142863F0-1F6F-4789-A05B-ECECC3CC022F.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.