A broad comparative genomics approach to understanding the pathogenicity of Complex I mutations

Variants classed as definitely pathogenic in humans

Among the 18 mutations of Complex 1 that are definitely pathogenic after application of the modified scoring system, 11 were found in at least one species from our dataset, and four of them occurred on the phylogenetic tree more than once (Table 1, for species names see Table S2, Supplementary Materials). It should be noted that many of the species with CPDs in the current dataset are distantly related to humans. Twelve substitutions of the amino acids in ND1 that are pathogenic in humans that occurred on our phylogenetic tree were not seen in mammals, with five being seen in vertebrates and seven in invertebrates (see Table 1 and Fig. 1). Overall, among the 28 substitutions to any amino acids that have been reported as pathogenic in humans across all Complex I proteins encoded by mtDNA, only eight took place in vertebrates.

Table 1 CPD in mitochondrially encoded Complex I proteins. Variants that are considered in details below are shown in bold.
Figure 1

Substitutions to human pathogenic amino acids in sites of ND1 across the cladogram of Metazoa. White star, human branch.

Some CPDs can occur due to permissive/compensatory substitutions in the same protein

For each site with a CPD we looked for non-human amino acids in sites that are closer than 8 (AA) to this site in a spatial structure in order to find amino acid substitutions that could potentially make the human pathogenic amino acids permissible. As the average amino acid identity between Complex 1 proteins of humans and of species with CPDs in our study is 50, we expected that many would carry non-human amino acids at interacting sites. Indeed, only 49 of 103 sites that putatively interact with a CPD-carrying site had the same amino acid in humans. More interestingly, among the remaining 54 sites, 14 had non-human amino acids in all species with CPDs potentially giving us insight into to permissible combinations of amino acids (Table 2).

Table 2 Putative permissive substitutions in spatially close sites of the same protein. In the last two columns, there are number of positions, and positions are listed in brackets. Variants that are considered in detail below are shown in bold.

We looked in more detail at potential CPDs (or masking variants) that were found in more than one species in our dataset.

At site 59 of ND1, Glutamic Acid (E) is the major human variant, this variant is highly conserved occupying this site in 99% of species in our dataset. Lysine (K), which is pathogenic in humans, was found only in one clade consisting of four species of mites. There are five sites that potentially interact with site 59 of ND1. Among them, three sites (60, 216, 217) contain a non-human amino acid, and this is in all four mite species with the CPD. In two of these sites (60, 216) the human amino acid is the major amino acid across the phylogenetic tree being found in 70% and 69% of species, respectively. In contrast, the amino acids at these sites in mites with CPDs are minor, seen in a very small number of species: 0.4% and 0.4% for sites 60 and 216 respectively. And at site 217, both the human and potentially permissive amino acids are minor alleles, but comparatively frequent, being found in 29% and 20% of species, respectively. The E59K human variant changes the charge of the residue, but there are no charge-changing substitutions in spatial proximity of site 59 of ND1 in species with the CPD.

Considering site 131 of ND1, the human pathogenetic substitution to Serine (S) was seen to occur three times in our dataset, in five parasitic flatworms, two mites and one beetle. This position is highly conserved throughout evolution, and its homologous site in the NuoH protein of E.coli, carrying the wild-type amino acid, was shown to play an essential role in stability of Complex I28. Among 14 sites that are in close spatial proximity to site 131 of ND1, 10 contain non-human amino acids in at least one of three clades with the CPD. At position 135 of ND1, beetles and worms have Cysteine (C) due to independent substitution events, and the mites have Valine (V). Both amino acids are very rare across the phylogeny, being seen in 0.3% and 0.25% of species respectively. At positions 201 and 203, which are located far from site 131 in primary structure, but close in 3D structure, mites and worms with 131S have different but rare (no more than in 3% of species from the dataset) non-human amino acids. Interestingly, worms and mites are among species with the highest fraction of rare variants (that are found in less than 10 species from our dataset) amino acids in their ND1 gene. As parasitic flatworms live in hypoxic conditions inside their hosts, the selection that acts on their OXPHOS genes might be relaxed. The ratio of the rates of non-synonymous to synonymous substitutions (dN/dS) of ND1 was 0.18 between Pork tapeworm (Taenia solium) and Asian tapeworm (Taenia asiatica), which was twice as high as that seen between humans and gorillas. Nonetheless these rates were still far less than one, suggesting that the protein is still under negative selection.

The ND1:214 K variant was seen to occur six times in our dataset, with five species of fish, one species of fly and one beetle carrying Lysine (K), thus making K the second most common amino acid in the site after the ‘wild-type’ Glutamic Acid I. Among 10 positions structurally close to site 214, four contained non-human amino acids, and two sites, 61 and 213, had non-human amino acids in all species with the 214 K pathogenic human variant: Valine (V), Isoleucine (I) and Threonine (T) in site 61 and Valine (V) in site 213. The 213 V variant is an ancestral amino acid for the tree and is the most prevalent amino acid in this site, being seen in 88% of species. In contrast, the human amino acid Isoleucine (I) is seen in 9% of species. There were 34 substitutions leading to this variant including a substitution in the lowest common ancestor of monkeys. Thus, it is possible that I213 creates a pathogenic potential for the K at site 214. Interestingly, homologous positions of the NuoH protein of E.coli carry the same amino acids as human ND1 does: I227 (homologous to I213 in human) and E228 (homologous to E214 in human), and an E228K mutation leads to assembly of practically non-functional enzyme in E.coli29.

At site 34 of ND3, the human amino acid Serine (S) is not a major amino acid seen on the tree being found in only 8.5% of species, and this site is not highly conserved: there are 14 amino acids that occupy it in more than one species. This could mean that many amino acids are benign in ND3:34 simultaneously. Alternatively, it could mean that the fitness landscape of this site changes frequently, and the occurrence of human pathogenic amino acid 34P as a wild-type amino acid in other species is evidence to support this notion. There are only three ND3 sites structurally close to site 34, none of them having non-human amino acids in the five species with the CPD.

We found the human wild-type variant ND3:47A in 93%, and the pathogenic human variant ND3:47 T in 1% of 2766 species where this position was covered in our dataset. Only one of three contacting sites carried the non-human amino acids in all 34 species with this CPD. It was not only one amino acid, but 10 different amino acids. Other contacting sites carried non-human amino acids in 26 (site 45) and 0 (site 48) of 34 species with CPDs. Thus, substitutions in one or several contacting sites may mask pathogenic effect of ND3:47A. Both sites 34 and 47 of ND3 are included in a loop between trans-membrane regions 1 and 2 (so-called TMH1-2 loop), which was shown to be critical for proton pumping28,30.

Probably pathogenic amino acids

Besides mutations with “Pathogenic” status, “Probably pathogenic” amino acids also have strong evidence for association with disease in humans, especially if they have functional evidence to support their categorization. We found 10 probably pathogenic mutations as a ‘wild-type’ allele in at least one species used in our study (Table 3, for species names see Table S3, Supplementary Materials). Of these, we focused on site ND5:398 (13528A > G)36, where the human pathogenic variant was prevalent in non-human species, and even represented the wild-type allele in some primates. Interestingly, most of the primates with the pathogenic change at site ND5:398 belonged to Cercopithecinae subfamily, or old world monkeys. This group shares ~ 80% amino acid identity with humans in the ND5 gene. The variants have functional evidence of pathogenicity in humans31.

Table 3 Probably pathogenic human variants in non-human species. Variants that are considered in detail below are shown in bold.

ND5: T398A

The human pathogenic amino acid Alanine (A) is the most prevalent amino acid in this site, found in 59% of species in our dataset, and human amino acid Threonine (T) is found in 14% of species. The A398T substitution occurred in the lowest common ancestor (LCA) of primates, and two T398A reversions occurred: one in the Squirrel monkey, and one in the root of Cercopithecinae subfamily that is represented by 10 species in our dataset (Fig. 2). Interestingly, 129 of 130 species with 398 T belong to the Terapoda clade, a clade consisting of four-limbed animals.

Figure 2

Substitutions to human wild type (T, blue) and pathogenic (A, red) amino acids on site 398 of ND5 on the cladogram. The cladogram is colored according to occupying amino acid: coral, A; blue, T; black, other amino acids. Primate clade has yellow background. Upper cladogram show the whole phylogenetic range considered, and other cladograms show primate clade. White star, Homo sapiens branch.

Among the 14 ND5 sites that are closer than 8 Å to site 398, five sites carried non-human amino acids in at least one primate species with the CPD. Of these, three sites (394, 401 and 478) carried non-human amino acids in all 11 primate species with the CPD (Fig. 2). In sites 401 and 478, these non-human amino acids are seen as the most prevalent amino acids in corresponding sites across the dataset (76% and 72%, for sites 401 and 478, respectively), and in site 394, this amino acid is the second most prevalent (41% of species). Among 545 species with the human pathogenic amino acid (A) in site 398, only five contained human amino acid Methionine (M) in site 401. Furthermore, none contained human amino acids Histidine (H) in site 394 and Phenylalanine (F) in site 478. This suggests that the combination of human variants H394, M401 and F478 with human pathogenic variant A398 may be undesirable.

Human LHON variants can be met in closely related species

Leber’s Hereditary Optic Neuropathy (LHON) is a debilitating disease which causes loss of retinal ganglion cells within the central retina and subsequent degeneration of the optic nerve. Patients develop acute blindness within six weeks of symptom onset. LHON-causing amino acid variants do not score highly in the current pathogenicity scoring systems, including our updated version. There are a number of reasons for this, including the possibility for LHON causing mutations to be present as homoplasmic variants in unaffected individuals. Thus we looked for CPDs for the “top-19” nucleotide variants associated with LHON according to MitoMap 2 that lead to 18 different amino acids. The three most common (m.11778G > A, m.3460G > A and m.14484 T > C) of the 18 variants are thought to cause > 85% of LHON cases. Dramatically, we found no species with these amino acids in our data (Table D). Among the remaining 15 mitochondrial variants considered to have good evidence for association with LHON on the MitoMap database, five were associated with other syndromes in addition to LHON, and these variants were previously analyzed in this paper as “pathogenic” or “probably pathogenic” (Table 4, for species names see Table S3, Supplementary Materials). We have found eight of the 10 remaining amino acid changes in at least one metazoan species form our datasets. For one site, (ND4L:65) the human LHON amino acid (A) was the major variant in a tree, and for two sites (ND1:132, ND6:58), LHON amino acids were found in primate species. We took a closer look at these variants, specifically looking at local interactions to find potential permissive substitutions in these close human relatives.

Table 4 Human LHON variants in non-human species. Variants that are considered in detail below are shown in bold.


Human LHON variant T132 was found in four species: three vertebrates and one invertebrate, among them was Northwest Bornean orangutan (Pongo pygmaeus pygmaeus), while its close relative Sumatran orangutan (Pongo abelii) carried the human variant A132. Variant T132 has already been considered in Bornean orangutan32. The estimated divergence time of Bornean and Sumatran orangutans is between 400 000—1Mya, and site 132 is among 15 of 318 ND1 amino acids that differ between Bornean and Sumatran orangutans33.

Sixteen amino acid positions of ND1 were closer than 8 Å to ND1:132, six of which were further than five positions from it in a primary structure, and only site 201 carried non-human amino acid in Bornean orangutan. Both Bornean and Sumatran orangutans carried the same non-human variant T201 (Table 5, Fig. 3). Of 67 primates in our dataset, A201 was found only in humans and gorillas (common ancestry). T201 is a major amino acid in vertebrates. A201T substitutions occurred in the LCA of vertebrates, and it was 15 T201A reversions that led to 20 vertebrate species with A201. Thus, amino acid A201 could make T132 deleterious in humans. Sites 132 and 201 both belong to non-membrane regions of the protein, facing the same side of the inner mitochondrial membrane. These loops are highly conserved across the tree of life and play an important role in Complex I assembly, and position 132 carries the same amino acid A in both humans and E.coli.28.

Table 5 Co-occurrence of Alanine (A) and Threonine (T) in sites 132 and 201 of ND1 in our dataset.
Figure 3

Substitutions to A (blue dot) and T (red dot) in sites 132 and 201 of ND1 across the dataset (top) and on a clade of Primates (bottom). Blue and red branches are occupied by A and T, respectively. White star—Homo sapiens branch.


Human wild-type allele Valine (V) and human pathogenic amino acid Alanine (A) are the two most prevalent amino acids in the dataset (31% and 36% of species, respectively). The closest animals to humans with Alanine (A) in position 65 are reptiles. Human amino acid Valine (V) is the major amino acid in mammals being found in 462 of 464 mammalian species in our dataset. The human LHON associated variant A65 is a major amino acid in fish and is also prevalent in birds and reptiles. Substitution A65V that led to V in humans occurred in the LCA of mammals. Thus, the LHON-causing V65A mutation in humans is an undesired reversion to the mammalian ancestral state.

Although just one C-T transition suffices to mutate A and V to one another, very few substitutions between these amino acids occurred in a tree (1 of 13 substitutions from V are V > A, and only 2 of 48 substitutions from A are A > V). This suggests that one of these amino acids might be unfavorable in lineages where the other one is prevalent. This is consistent with differences in their physicochemical properties: According to ranking of amino acids by physicochemical similarity based on the Miyata matrix34, V has rank 6 for A, and A has rank 9 for V.


There was a total of 237 species, mostly amniotes, that carried human LHON variant I58V in ND6. Among them two were primates: Pongo pygmaeus (Bornean orangutan) and Lepilemur hubbardorum (Hubbard’s sportive lemur) (Fig. 4). The Amino acid Isoleucine (I) is ancestral and the most prevalent variant in site 58 on the tree. Among 35 58 V substitutions in our phylogenetic tree, 34 were I58V substitutions, 22 of which occurred in mammals, including substitutions in Bornean orangutan and Hubbard’s sportive lemur.

Figure 4

Substitutions to I (human normal variant, blue dot) and V (LHON variant, red dot) in site 58 of ND6 across the dataset and on a clade of Primates (yellow rectangle). Blue and red branches are occupied by I and V, respectively. White star—Homo sapiens branch.

Among the seven sites of ND6 that are closer than 8 Å to position 58, only one had a non-human amino acid in Bornean orangutan (V54 instead of M54), and no sites carried non-human amino acids in Hubbard’s sportive lemur. In site 54 of ND6, orangutan variant V was a major amino acid in the whole phylogenetic tree, but human variant M was a major variant in mammals. Only two M54V substitutions occurred in mammals, one of which was in Bornean orangutan.

Site ND6:58 is a part of the B/C hydrophobic domain of the protein, which is conserved in humans35. Given the similarity of 59% between ND6 of humans and Hubbard’s sportive lemur, it is rather surprising to find no non-human amino acids in sites involved in local interactions with site 58, assuming the uniform distribution of sites with different evolutionary rates along the protein (hypergeometrical probability of observing no non-human amino acids in seven sites = 0.02). Therefore, the structural region with the CPD is evolutionarily conserved between the two species more than the protein on average.

Read more here: Source link