Comparative genomics provides insights into the aquatic adaptations of mammals

Species invasions into novel habitats mark major transitions in the evolution of life on Earth. Some of these are relatively ancient, such as the vertebrate transition from the oceans to life on land (∼375 Mya) or the evolution of arboreal vertebrate species (∼160 Mya). When divergent lineages transition to the same novel habitat, it provides an opportunity to investigate the mechanisms that permit these adaptations and the relationship between similar phenotypes among lineages and the underlying genetic basis. Convergent processes may utilize homologous genomic regions in different lineages to achieve similar phenotypes (1). Alternatively, distinct, genomic processes may be possible (2), or genetic drift may lead to different options for divergent lineages. Relatively recent transitions may be the most informative, on the assumption that extended periods of evolution may obscure the relationship between genomic differences and the original adaptations. A system well suited to this investigation is the adaptation of divergent, terrestrial mammalian lineages to life in aquatic environments.

Marine mammals, broadly defined as mammals whose terrestrial predecessors entered the sea and who obtain all or most of their food from a marine environment, comprise at least 129 extant species divided into three orders (3). Cetartiodactyla includes cetaceans (whales, dolphins, and porpoises); Carnivora includes pinnipeds (walruses, sea lions, and seals), sea otters, and polar bears; and Sirenia includes sea cows (now extinct), manatees, and dugongs (3). Of these, cetaceans, pinnipeds, and sirenians are considered the oldest groups of marine mammals (3). In contrast, sea otters and the polar bear emerged relatively recently so much so that the polar bear can still hybridize with terrestrial sister taxa (35). The most species-rich group of marine mammals is Cetacea, which comprises ∼90 species (3). Cetaceans, pinnipeds, and sirenians represent an exceptional case of convergent evolution—the emergence of similar phenotypic traits in species separated by millions of years of evolution (6). In these separate lineages of marine mammals, phenotypic convergence is observed in all major physiological systems (7, 8). The degree to which convergence is reflected at the molecular level can now be partially answered using genomics. However, the interpretation of such results has hitherto been restricted by the limited number of high-quality genomes from marine mammals (6, 9). Remaining uncertainties include the phylogenetic relationships between and within marine mammal groups and their demographic history. To address these questions, we assembled and annotated 17 marine mammal genomes (11 cetaceans and six pinnipeds). Based on more comprehensive genomic data, we identified many putative genetic innovations for the aquatic adaptation of mammals, including those associated with thermoregulation and skeletal systems.

Results

Genome Sequencing, Assembly, and Annotation.

We performed the sequencing and de novo assembly of 17 marine mammal genomes (11 cetaceans and six pinnipeds) (SI Appendix, Table S1). Among these, 14 were assembled by Supernova (10) with 10× Genomics data (average scaffold N50 = 28.66 Mb and contig N50 = 142.33 kb) (Table 1 and SI Appendix, Tables S1–S3). The remaining three genomes were assembled using Illumina paired-end reads (SI Appendix, Tables S1–S3). Eight of the assemblies were further improved by Hi-C chromosome anchoring (SI Appendix, Fig. S1). The assembled genomes of the 17 marine mammal species range in size from 2.37 to 2.62 Gb, which is similar to k-mer–based estimations using GCE (11) (SI Appendix, Table S4) and those of published marine mammal genomes (SI Appendix, Table S5). More than 95% of each species’ short reads could be mapped to their respective assembly (SI Appendix, Fig. S2). BUSCO (Benchmarking Universal Single-Copy Orthologs) (version 3.0.2) (12) was used to assess the quality of the assemblies, revealing an average genome completeness of 90.98% (SI Appendix, Table S6). Analysis of syntenic relationships, comparing genome assemblies of closely related species, also showed high continuity of these genomes (SI Appendix, Fig. S3).

Table 1.

Assembly statistics for the 17 novel marine mammal genomes generated for this study

We employed de novo– and homology-based prediction methods to annotate the genes and repeat sequences of the assembled genomes (SI Appendix, Tables S7 and S8). Annotated protein-coding genes ranged from 20,083 to 20,947 per species (Table 1). The average gene lengths were similar to those of closely related species (SI Appendix, Fig. S4), and we recovered an average 96.44% of the BUSCO Mammalia gene set (4,104 genes) (Table 1). Overall, we generated high-quality genome sequences for 17 marine mammals, providing a good foundation for developing a better understanding of aquatic adaptation in marine mammals across three divergent ancestral lineages.

Phylogeny and Demographic History of Marine Mammals.

Combining published genome data with our 17 genomes, we were able to provide a detailed phylogenomic reconstruction of marine mammal species. Two nucleotide datasets were used (SI Appendix, Table S9): ortholog sequences from whole-genome alignment and reciprocal best hit ortholog genes from gene annotations. The maximum-likelihood trees generated from the alignments of the individual loci of the two datasets were used as input for the coalescent-based phylogenetic method ASTRAL-III (13), and these two datasets generated a consensus topology (SI Appendix, Fig. S5 and Fig. 1A). The overall phylogenetic relationship of three lineages of marine mammals is consistent with previous studies (8, 1416). For cetaceans, they support the monophyly of Physeteroidea + Kogiidae, Delphinidae, Monodontidae + Phocoenidae, and Ziphiidae among odontocete taxa, with Physeteroidea as the most basal clade of odontocetes, consistent with a recent large-scale phylogenomic analysis of cetaceans (17). For pinnipeds, there is support for a sister group relationship between Musteloidea and Pinnipedia and the monophyly of Odobenidae + Otariidae, consistent with studies based on mitochondrial DNA (18).

We further assessed divergence times for each marine mammal phylogenetic tree node (SI Appendix, Fig. S7). The divergence time between Cetacea and Hippopotamidae was estimated to be ∼55.5 Mya, which coincides with the Paleocene–Eocene transition and a global temperature rise, which possibly prompted terrestrial mammals to enter the sea (19). The initial split of Mysticeti (baleen whales) and Odontoceti (toothed whales) was about ∼37.7 Mya. The emergence of Pinnipedia was estimated to be 27.4 Mya, while the divergence time between Odobenidae and Otariidae was about 18.6 Mya. The divergence time of sirenians and the African savanna elephant, their closest land relative, was estimated to be ∼63.9 Mya.

We also reconstructed the demographic histories of cetaceans, pinnipeds, and sirenians (SI Appendix, Table S10). The three marine mammal lineages were found to experience different historical changes in population size (see normalized average effective population size, Ne, in Fig. 1B and individual species profiles in SI Appendix, Fig. S8). Specifically, the Ne of cetaceans experienced a rapid decline during the last 500,000 y. Consistently, the heterozygosity rate of most cetaceans is even lower than the endangered giant panda [∼1.32‰ (20, 21)] (SI Appendix, Table S11), highlighting the ongoing conservation needs of cetacean species.

Genome Evolution of Marine Mammals.

We compared the genome sizes of the three marine mammal lineages with their terrestrial relatives: Cetacea versus Ruminantia, Pinnipedia versus Canidae, and Sirenia versus Proboscidea. The average genome size of Pinnipedia (2.4 Gb) and Sirenia (3.1 Gb) was similar to their terrestrial sister taxa (Fig. 2B). In contrast, the genome size of cetaceans ranged from 2.4 to 2.6 Gb and displayed a decreasing trend compared to Ruminantia (∼2.8 Gb in reindeer, cattle, and goat), their most closely related lineage (Fig. 2B). Consistent with the genome size comparisons, pinnipeds and sirenians present similar repeat contents to their terrestrial sister taxa, while cetacean genomes have ∼10% fewer repeats than ruminants. Five subtypes of repeats are more abundant in ruminant species (SI Appendix, Table S12), including LINE/RTE-BovB, LTR/ERV1, LTR/ERVK, SINE/Core-RTE, and SINE/tRNA-Core-RTE. In addition to several reported large fragments in ruminant genomes (22), we found 11 large (>1.5 Mb) deletions and three large insertions (SI Appendix, Tables S13–S15) in cetaceans, compared to their terrestrial counterpart cattle.

Fig. 2.
Fig. 2.

Structural characteristics and chromosome evolution of marine mammal genomes. (A) Circos plot of representative genomes of marine mammals: sperm whale, Indo-Pacific bottlenose dolphin (IPB dolphin), South American fur seal (SA fur seal), and spotted seal. (B) Genome sizes and transposable element content analysis of representative genomes of marine mammals. We selected three Ruminantia species, three cetacean species, three Canidae species, three pinniped species, an elephant, and a manatee. (C) Chromosome evolution of Cetacea and Pinnipedia. We reconstructed 23 and 19 ancestral chromosomes of Cetacea and Pinnipedia, respectively. The chromosome assignment to ancestral chromosomes is shown by colored bars, Indo-Pacific humpback dolphin (IPH dolphin).

Based on the eight chromosome-level genome assemblies that we generated (SI Appendix, Fig. S1) and two publicly available chromosome-level genomes [(sperm whale (23) and Indo-Pacific humpback dolphin (24)], we reconstructed the ancestral chromosomes of Cetacea (using the Indo-Pacific bottlenose dolphin as the reference genome) and Pinnipedia (using the South American sea lion as the reference genome) with DESCHRAMBLER (25) at 300-kb resolution (Fig. 2C). In Cetacea, we identified 1,308 conserved segments and reconstructed 23 ancestral predicted chromosome fragments (APCFs), with a total length of 2.09 Gb. In Pinnipedia, we identified 194 conserved segments and reconstructed 19 APCFs, with a total length of 1.84 Gb. We traced back the source of these APCFs for both lineages and found there are fewer chromosome rearrangement events in Pinnipedia than in Cetacea (Fig. 2C).

Evolution of Genes and Gene Families.

We next assessed the expansion and contraction of gene families, positively selected genes (PSGs), and rapidly evolving genes (REGs) in the three marine mammal lineages. In total, 44, 29, and 212 gene families were identified as expanded, and 87, 15, and 12 gene families were contracted in the ancestor node of Cetacea, Pinnipedia, and Sirenia, respectively (SI Appendix, Fig. S9). Functional enrichment analysis of these gene families revealed that “olfactory transduction” is the only shared contracted Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway (SI Appendix, Table S16). Several expanded gene family-associated KEGG pathways are shared among two types of marine mammals: “thermogenesis” and “oxidative phosphorylation” in Cetacea and Pinnipedia and neural plasticity (as suggested by the “alcoholism” pathway) and “estrogen signaling” in Pinnipedia and Sirenia (SI Appendix, Table S17).

To assess the selective pressures acting on marine mammal genomes, we estimated the dN/dS ratio (ω) using 7,252 orthologous, protein-coding genes. When compared with terrestrial outgroups, marine mammal branches always had a higher dN/dS ratio (SI Appendix, Fig. S10). We identified 5, 11, and 16 PSGs and 21, 17, and 295 REGs in the ancestral branches of Cetacea, Pinnipedia, and Sirenia, respectively (SI Appendix, Tables S18 and S19 and Fig. S9) (χ2 test, P < 0.05). We found that cystic fibrosis transmembrane conductance regulator (CFTR) underwent rapid evolution in both Pinnipedia and Sirenia. CFTR plays a vital role in the transport of various ions across the cell membrane, water transport, and fluid homeostasis (26, 27).

Conserved Noncoding Elements and ATAC-Seq.

We identified 4,518,724 and 4,341,059 conserved noncoding elements (CNEs) in Cetacea and Pinnipedia, respectively. We further performed assay for transposase-accessible chromatin sequencing (ATAC-seq) (28) of two cetaceans (Indo-Pacific bottlenose dolphin and Risso’s dolphin) and two pinnipeds (Baikal seal and South American sea lion) to identify CNEs associated with open chromatin (i.e., accessible to the transcriptional machinery). A total of 1,158 and 1,684 genes in Cetacea and Pinnipedia, respectively, have CNEs with ATAC-seq signal peaks within 3 kb upstream or downstream (SI Appendix, Tables S21 and 22). Of these genes, 371 have CNE peaks in both marine orders (SI Appendix, Table S23 and Fig. S11). Although further experimental work could be a worthwhile attempt to assess the contribution of these CNEs, our results provide a valuable resource for further studies on gene regulation in marine mammal species.

Signals of Convergent Evolution among Marine Mammals.

The evolution of marine mammals, the adaptation of terrestrial mammalian lineages to life histories dependent on the marine environment, is considered a seminal example of convergent evolution. The degree to which convergence is reflected at the molecular level can be addressed using genomics. Understanding this phenomenon addresses key questions about redundancy, pleiotropy, and the relationship between genotype and phenotype. We applied the “Convergence at Conservative Sites” method (29) to investigate convergent genes in the three lineages of marine mammals. Orthologous genes were assigned by synteny alignment (SI Appendix, SI Materials and Methods). We identified 195 convergent amino acid substitutions in 172 genes among marine mammals (SI Appendix, Tables S24). Only three genes (FAM20B, NFIA, and KYAT1) share convergent amino acid substitution in all three marine mammal lineages. Six genes (HERC1, MITF, EPG5, FAT1, SYNE1, and ATM) show convergent mutations at different amino acid positions in cetacean manatee and pinniped manatee. For example, MITF has an L10F substitution in cetaceans and sirenians (the manatee) and a T570A substitution in pinnipeds and the manatee. Among the 94 genes with convergent amino acid substitutions in the fully aquatic cetaceans and Sirenia, but not between the amphibious pinnipeds in either cetaceans or Sirenia, five genes are within the KEGG pathway “dopaminergic synapse” (though the adjusted P value is not significant at the 0.05 level: P = 0.51; SI Appendix, Table S25). Previous studies indicate that UCP1 has been independently lost in many marine mammals, especially in cetaceans and sirenians (30, 31). We confirm and extend this inference, showing that a functional UCP1 is present in most pinnipeds, except for the Antarctic fur seal, which is the most polar of the species included in this assessment (SI Appendix, Table S26 and Fig. S12).

Genetic Changes Related to Cetacean Traits.

Cetaceans have many unique biological characteristics, including echolocation, deep diving, and large variation in body size. The molecular basis of echolocation has been well studied previously (3234). However, based on more comprehensive data, we systematically reanalyzed the 504 hearing-related gene sequences in 40 species, including two groups of echolocating bats (group M: big brown bat, Natal long-fingered bat, Brandt’s bat, and little brown bat and group G: greater horseshoe bat) and 16 toothed whales (group T) (SI Appendix, Fig. S13). A total of 64 genes were identified as convergent genes, most reported in previous studies (SI Appendix, Table S27).

We next compared the four whale species with the best diving abilities to 20 comparatively shallow-diving species to study the genetic basis of deep diving in cetaceans. The deep-diving species are sperm whale (reported to dive to 1,860 m for >1 h) (35), Blainville’s beaked whale (1,251 m for 57 min) (36, 37), and dwarf and pygmy sperm whales [species in the family Kogiidae with highly similar ecology and habitat (up to 1,425 m for 43 min) (3840)]. We retrieved 1,803 genes from HypoxiaDB, a hypoxia-regulated protein database (41), and observed 39 genes with at least one specific amino acid change unique to the deep-diving group (SI Appendix, Table S28). MB encodes myoglobin, a protein critical for oxygen storage and transport (42). Deep-diving species have amino acid residue changes associated with elevated myoglobin net surface charge and maximal dive time (43). Compared with background branches, 45 genes showed significantly higher dN/dS ratios in deep-diving species (SI Appendix, Table S29) (χ2 test, P < 0.05). We detected 45 REGs in deep-diving cetaceans. Of these, three genes (SETX, GIF, and TMPRSS11D) had dN/dS values above 1, indicating positive selection. Seven REGs (CEP170, DHCR7, DSP, GBE1, PLD1, SETX, and TMPRSS11D) have shared amino acid mutations in the four deep-diving species.

Cetacean bodyweight spans orders of magnitude from 50 kg (the vaquita, Phocoena sinus) up to 180,000 kg (the blue whale, Balaenoptera musculus) (44). We selected a set of 1,528 genes involved in body size development and estimated their dN/dS ratios in cetaceans with large body size: the blue whale (3) and the sperm whale (3). Compared to the background, we found 102 REGs (with significantly higher dN/dS) in giant cetaceans (SI Appendix, Table S30 and Fig. S14) (χ2 test, P < 0.05). These REGs were enriched in the Hedgehog and Wnt signaling pathways essential for bone development (45) (SI Appendix, Table S31). Additional bone development–related genes with a higher dN/dS in giant cetaceans include BMP1 in the TGF-β signaling pathway and the Notch signaling pathway genes SNW1 and CTBP2.

Read more here: Source link