Genetic diversity and selection in Puerto Rican horses

Horses have been considered one of our most prized possessions, used for travel, work, food, and pleasure for at least five and a half millennia17,18,19,20. Nevertheless, the ancestry of various horse breeds and their characteristic traits remains unclear21. In this paper, we describe the patterns and the origins of genetic diversity in nuclear and mitochondrial markers and examine the distribution of specific gait-keeper alleles that have been reported to be responsible for the prized phenotype in two Puerto Rican horses: the purebred paso finos (PRPF) and the nonpurebred PRNPB (Criollo).

Over the centuries, the two breeds have gained distinctive appearance that is reflected in the genetic structure revealed by our analysis. We have shown that the PRPF and PRNPB horses are distinct from other breeds (Fig. 2) but nevertheless related as their maternal lineages are intertwined (Fig. 1). We also demonstrated that the “gait-keeper” mutation is almost as common in PRNPB as it is in PRPF. This was surprising, as no selection for the paso gait phenotype has been previously reported for the PRNRB horses. This observation has led us to explore a possible scenario is that PRPF were not originally selected for the paso gait, but picked from the population of local nonpurebred horses (PRNPB), where the “gait-keeper” mutation was already established either by the founder effect or by centuries of selective breeding by the local farmers.

To gain further insight into the origin of the two breeds, we used modified structure plots22,23,24 and developed our own tools14 to look at genome contributions in the context of population variation among the worldwide breeds13. While we cannot clearly identify distinct sources of genomic admixture in the Puerto Rican breeds, the PCA clearly supports the notion that PRPF is a distinct and native breed among the American Horses of Iberian origin, most closely related to the PRNPB (Criollo) population, but completely distinct from all other breeds (Fig. 3). One likely interpretation of these results is that the PRPF founders were originally selected from the pool of the admixed horses on the island of Puerto Rico, represented today by the PRNPB, as over the centuries the local farmers selected individuals based on the desired phenotypes, especially the gait.

The selection for the desired phenotypes over the centuries should have left its mark on the horse genomes. However it is not easy detect: our genome-wide analysis of genome variation indicates that both Puerto Rican horses have high levels of inbreeding which are comparable to those of many other horse breeds. While each population shows extensive homozygosity regions devoid (ROH), there are differences in magnitude, indicating differences in population histories (Fig. 5). At the same time, even the purebred PRPF is more outbred than roughly half of the horse breeds surveyed in Petersen et al.13, and the PRNPB horses of Puerto Rico are even more outbred.

Due to the maternal inheritance of mtDNA and lack of recombination, mtDNA has been widely used for studying the history of maternal lines. Mitochondrial DNA, specifically its control region, has been used effectively to study the origin and diversification of domestic horses worldwide6,7,17. Over the years, various studies have proposed that the variability found in the mtDNA of horses can be traced and restricted to geographic regions10,25,26. One of the first to propose such a hypothesis discovered 17 frequent haplotypes (mtDNA sequences), each creating a distinctive cluster. We used mitochondrial D-loop sequences from our samples for the preliminary assessment of the origin of the domestic horses on the island, specifically using hypervariable region 1 (HVR1) in a cost-effective approach to help understand the origins and diversification of the Puerto Rican horses. The sequences were grouped into 54 unique sequences (haplotypes) that could be cross-referenced to the nomenclature used in Cieslak et al.10.

Recent studies using mtDNA confirmed Iberia as the geographic and genetic source for the New World horses, as several predomestic maternal lineages unique to the peninsula still survive in modern horses of Iberian descent27. Generally, these breeds were established by the haplotypes that came from multiple sources, but the frequency of Iberian haplotypes in New World breeds is generally consistent with the historical documentation of their origins7. Specifically, haplogroup D, as defined by Jansen et al.25 and later redefined by Cieslak et al.10 as haplogroup X, is well represented in both the Southern Iberian and New World breeds, thus suggesting the importance of Iberian breeds in founding horse varieties of the New World5,7.

Our study of the mtDNA diversity in the two Puerto Rican horses also points to the mainly Iberian origins, since haplotypes D and X are the ones most represented (Fig. 1; Table S1). There seem to be many shared haplotypes among the two breeds in Puerto Rico: among the total of 20 haplotypes found, 19 were identified among the PRNPB, and out of 19 haplotypes found in PRNPB, 11 were also shared with the PRPF, which in turn had only a single unique mtDNA haplotype (Fig. 1). This particular haplotype is most likely to have been missed in the PRNPB population due to the limited sample size and may be encountered with more extensive sample selection, as it belongs to the haplogroup H1 found in other Iberian and New World horses and differs only in two positions with Hap15. This is consistent with the scenario where the ancestral pool was formed from many Iberian breeds arriving at the island and establishing the original genetic pool.

The direct inheritance from mother to daughter without recombination can provide valuable clues in the preliminary assessment of ancestry in the maternal lineages that can be used in reconstructing the history of the breed’s origin. Since previous research has shown at least some genetic clustering of haplotypes, mtDNA analysis allows us to make a preliminary assessment of maternal lineages7. However, there is a high level of variability within and among horse breeds, without a clearly defined geographical pattern of distribution28, so the mtDNA evidence alone is not sufficient to fully describe the ancestry of the Puerto Rican breeds. Therefore, the identification of the population origin required a more complex genetic approach that included dense genotyping across the genome.

Thanks to the analysis of the genome-wide array data, we can see that Puerto Rican horses share genome variation components with a number of horses worldwide (Fig. 3). In particular, the PRNPB horses appear to have genomic fragments in common with the Northern European and Asian horse breeds (Fig. 3, top row). Specifically, they share the “light green” and the “orange” components with the Finnhorses, Mongolian and Tuvan breeds. This appears to be the same component present in the Iberian (Lusitano), Middle Eastern (Caspian horse), or US derivatives from the Spanish stock brought to Florida in the 1500s (Florida Cracker). The “orange” component present on the island, also completely dominates the Peruvian Paso, the breed that is most closely related to the PRNPB horses outside of Puerto Rico. Both Puerto Rican breeds display a common “purple” component that seems to be unique to the local island horses and cannot be found in any of the surveyed horse breeds at the time (Fig. 3). This component represents a larger part of genetic variation in the PRNPB horse (which also has green and orange components shared with other breeds) but completely dominates the PRPF genomes. The most likely explanation of this observation is that the PRNPB horse has a unique mixture that incorporates variation from a diverse set of lines brought on the ships to the island, and the PRPF has been selected for this particular set of variation from the admixed pool. If the latter statement is true, PRPF should have less genetic diversity than PRNPB.

We observed extended runs of heterozygosity (ROH), contiguous uninterrupted stretches of chromosomes without any heterozygous SNPs29, which may be a consequence of natural or artificial selection on genome-wide variation, as selection for one allele would have swept variation across the linked loci30. In fact, ROH approach is commonly used to test hypotheses for artificial selection in domesticated animals31,32. The observed differences in ROH are indeed consistent with the hypothesis that PRPF has been under selection (Fig. 4). However, the extended ROHs are not a definite indication of recent artificial selection, as they can be derived from consanguineous mating in a small population (i.e., drift). Therefore, it is important to distinguish the signature of selection around the targeted locus from the signal of inbreeding across the entire genome. A good candidate for this analysis is the “gait-keeper” mutation in the DMRT3 gene with a known major effect on altered gait characteristics, such as the paso gait of the PRPF1,3. Nevertheless, the large regions of homozygosity spanning across portions of entire chromosomes (Fig. 3) in these horses make selection tests based on population variation difficult to use30.

The “gait-keeper” DMRT3 mutant allele (allele A, Ser301STOP) shows high frequency in many gaited breeds and breeds bred for harness racing, while other horse breeds were homozygous for the wild-type allele (allele C) in earlier studies1. It has also been reported at high frequencies in Northern European breeds (Table 1). For instance, it appears that selective breeding for lateral gaits in the Icelandic horse population could lead to the complete loss of the C-allele33. This mutation is not common in the Iberian horses and was only reported there once at low frequency in the Pura Raza Galega breed1 (Table 1). On the other hand, many horse breeds in the New World have this allele, possibly due to the admixture with other, non-Iberian breeds. The analysis of 152 Colombian Paso horses (most with phenotypic data) demonstrated selection on the DMRT3 gene can explain differences in horse gait in that breed34. On the other hand, a similar analysis in Mangalarga Marchador and the French Trotter horses shows that DMRT3, while associated with the trait, may not be the sole locus that controls the gait ability35,36.

The frequency of the DMRT3 mutant allele in the combined PRPF sample from this and the other studies is the highest reported in all animal breeds (Table 1). Remarkably, it was also present in the majority of the PRNPB; 142 out of the 143 genotyped PRNPB horses had at least one DMRT3 mutant allele (Table 1). This stands out in comparison to the other criollo horses reported in the literature that have a low frequency of the mutant “gait-keeper” allele (ex. Brazilian, Venezuelan and Columbian, Table 1). These breeds also arose from the mixture of different Iberian breeds, including a strong influence of Portuguese breeds. Why is then the PRNPB different?

In theory, alleles can achieve high frequency due to mechanisms different than selection. For example, genetic drift is expected to result in the fixation of most alleles over time or even instantaneously following the founder effect. To argue for the action of recent selection (selective breeding), the genomic neighborhood of the candidate allele must be evaluated in a formal test. Since the selection for this allele should have been pretty recent, not older than the historic horse arrival to Puerto Rico, the selection tests can be evaluated based using the extended haplotype homozygosity (EHH), population differentiation tests, or a combination of both approaches30.

Our reasoning was that, if this allele was favored in one or both of the Puerto Rican breeds, it would be associated with long haplotypes at high frequencies (EHH), typically representing recent selection37. Somewhat surprisingly, we did not observe any signatures of selection in the PRPF with genome-wide significance (Fig. 5), which means that there was no specific selection for this genetic variant in the pure bread lines. In contrast, in PRNRB horses, there is a clearly selected region located on chromosome 23 located very close to the DMRT3 locus.

The major limitation of the selection tests based on haplotypes is that they do not perform well in genomes with low genetic diversity (where selected haplotypes are difficult to identify). Therefore, it is not surprising that the iHs test, a recombination-based test that uses only the variation within the specific horse breed, did not identify any selection signatures in PRPF. This would be expected when (a) there is almost no variation in the locus and (b) only a few variable markers exist on chromosome 23, undermining the performance of EHH30. A contrast of diversity and divergence would be a better approach with the reasoning that the haplotypes containing selected loci should show more differences between diverging populations compared to the other loci genome wide (see Materials and Methods). This is why we followed XP-EHH and RSB tests that combined EHH statistics with the degree of population differentiation. For the addition of a phylogenetically based outgroup reference in these comparisons, we used a combination of samples from breeds in the same lineage1 (Fig. S2).

Using genome-wide XP-EHH and RSB tests, we detected a strong signature selection in PRNRB horses compared to the outgroup composed of trees of other breeds (Figs. 7A, 8A). Once more, this is a single selected region in PRNRB horses and is located on chromosome 23 next to the DMRT3 locus. Neither of the tests showed any selection signatures in PRNRB compared to PRPF (Figs. 4 and 5).

The addition of population differentiation has helped to identify several targets of selection in the PRPF genome compared to PRNPB and other horse breeds (Table S2, Figs. 7A, 8A). None of these candidate selection loci were located close to the candidate DMRT3 gene. However, at least some of them could be potential candidates with functions associated with horse gait selected in PRPF. Among these, the strongest signatures are located next to MYH7 muscle myosin on chromosome 1 and a prion protein PRNP on chromosome 22.

The human homolog of the MYH7 gene is known to be expressed in human ventricles as well as in skeletal muscle tissues rich in slow-twitch type I muscle fibers, where its expression correlates with the contractile velocity of the cardiac muscle and is altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. The PRN gene human homolog may play a role in neuronal development and synaptic plasticity and be required for neuronal myelin sheath maintenance. Mutations in this gene have been associated with Creutzfeldt-Jakob disease, kuru, fatal familial insomnia, Gerstmann-Straussler disease, and Huntington disease-like 1. A list of all the selected targets is presented in Table S2, and a more detailed description of these and other genes selected in PRPF is given in Table S3.

These signatures may reflect other characteristics selected for Puerto Rico: a long torso for a more comfortable ride, a thick, abundant mane, a long, elegant tail, and the yellow eyes. In addition to the naïve genome-wide tests for selection described above, a unique character called “tiger-eye”, characterized by a bright yellow, amber, or orange iris, was chosen for the Puerto Rican Paso Fino breeders. A recent study reported that most of the “tiger-eye” horses were either homozygous for either tiger-eye-associated alleles or were compound heterozygotes8. We used our data to independently evaluate the presence of a signature of selection around the SLC24A5 gene in our PRPF lineages. While this analysis cannot be performed directly on our dataset, since the four markers from that study (BIEC2_60719, BIEC2_61330, UKUL310, and BIEC2-61972) were not included in our genotypes, these were located very close (within 0.5 Mb) from a peak on ECA1 (centered on 141,514,807 bp, Figs. 7, 8 and Figs. S7,S8), indicating an instance of nearby selection that occurred between PRNPB and PRPF, as would be expected. Additional genotypes covering the region of the SLC24A5 gene as well as the phenotype data would be necessary to verify this finding.

In summary, we have shown that the PRPF and PR NPB horses are related, as their mitochondrial sequences are intertwined (Fig. 1). Then we demonstrated that the “gait-keeper” mutation is almost as common in PRNPB as it is in PRPF (Table 1). Somewhat unexpectedly, we did not see any signatures of selection focusing on this gene in PRPF, but a strong signature associated with this gene was found in PR NPB (Figs. 7, 8). Given our current results, we propose that the most likely historic scenario is that PRPF is a distinct horse breed that has been selected from the local nonpurebred horses (PRNPB). The genetic pool of the PRNPB was likely a result of admixture between the horses historically imported to Puerto Rico from Spain and other regions of the Old World. Some of the founders of this pool must have originally brought the “gait-keeper” DMRT3 mutant allele (allele A, Ser301STOP) with them. Local farmers must have been selectively breeding for the mutant allele, and over several centuries, it has increased in frequency in the nonpurebred population of horses on the island. Consequently, the founders of PRPF were initially picked out from the existing PRNPB pool, but since the DMRT3 mutant allele was already nearly fixed, the selection in the purebred horses was focused on other genes that may or may not be associated with the paso gait, including MYH7, PRN and others. In order to further validate our current hypothesis and to identify the specific functional mutations that have been selected by the PRPF breeders, a comprehensive phenotype-genotype analysis based on horse pedigrees and sequencing data from these candidate genes needs to be conducted.

Read more here: Source link