In this study, we de novo assembled the plum, Prunus mira, and Prunus davidiana genomes for the first time and improved the peach and apricot genomes by integrating single-molecule real-time (SMRT) long-read sequencing (PacBio), short high-quality Illumina paired-end sequencing, and Hi-C technology. First, we used SMRT reads (99−130 Gb, 396−520-fold coverage of estimated genomes, Supplementary Table 1) to assemble contigs, and we captured 244−276 Mb initial genome assemblies consisting of 122−315 contigs with an N50 ranging from 2.27 to 8.30 Mb (Supplementary Table 2). Second, the aforementioned SMRT reads and clean Illumina paired-end reads (55−59 G, 220−236-fold coverage of estimated genomes, Supplementary Table 3) were used to correct and polish the initially assembled contigs. Third, the initially assembled contigs were categorized and ordered into pseudochromosomes using Hi-C sequencing data (25−42 G, 100−168-fold coverage of estimated genomes, Supplementary Table 4). The resulting final genome size ranged from 240 to 267 Mb, 94−99% of the genome was anchored to 8 chromosome-scale scaffolds, with the N50 of scaffolds ranging from 27.79 to 31.53 Mb and with a gap number from 75 to 229 (Supplementary Table 5). Compared to a previously published Lovell peach assembly3 (~227 Mb of genome size), which consisted of 2,525 contigs with an N50 of ~250 kb (www.rosaceae.org), our final peach assembly consisted of 315 contigs with an N50 of ~4,640 kb and resulted in a reduction in the gap number from ~2,300 to ~140 in the final assembly. Therefore, our peach assembly showed an ~18-fold increase in the length of the contig N50 and ~16-fold fewer gaps than the Lovell assembly. Compared with the recently released apricot assembly27 (~220 Mb of genome size), which consisted of 444 contigs with an N50 of 1.02 Mb, our apricot assembly consisted of 122 contigs with an N50 of 3.31 Mb and resulted in a reduction in the gap number from 241 to 163. Our apricot assembly, therefore, showed an ~3.25-fold increase in the length of the contig N50 and ~1.48-fold fewer gaps compared with the recently published apricot assembly. Our five assembled Prunus genomes depicted high congruence because the strongest signals from the Hi-C data clustered at the expected diagonal (Fig. S1). Strong collinear relationships existed among Prunus genomes (Fig. S2), indicating that our pseudochromosomes derived from anchored and oriented contigs were of high quality. We also mapped the clean Illumina short DNA reads to their respective assemblies with a mapping ratio from 94 to 98%, which further supported the accuracy and completeness of the genome assembly. These high-quality genomes offer the opportunity to study the evolutionary relationships among genomes.
Genome annotation was based on full-length RNA-iso sequencing and homology-based and de novo prediction strategies. We obtained 25,333−27,826 protein-coding gene models in these five Prunus species (Supplementary Table 6). A total of 93.70−98.10% of the complete orthologs were detected in these assemblies based on the 1,375 Benchmarking Universal Single-Copy Orthologs (BUSCOs)28 (Supplementary Table 7). Based on these complete assemblies, we predict that 43.30−50.13% of the genomes were composed of repeat sequences (Supplementary Table 8). The most highly present mobile elements in these species were of the “unknown” type (Supplementary Table 9).
To investigate the evolutionary history of these Prunus species within Rosaceae, we constructed a maximum likelihood phylogenetic tree using these five Prunus species and six other representatives Rosaceae species, including apple (Malus domestica), woodland strawberry (Fragaria vesca), almond (Prunus dulcis), peach (Prunus persica), mei (Prunus mume), and sweet cherry (Prunus avium). From the results of gene family clustering, 248 single-copy orthologous genes were used for tree construction and species divergence time estimation. As the phylogenetic tree shows (Fig. S3), plum was placed as a sister species adjacent to apricot and mei. Furthermore, wild peach species were sister species with cultivars of peach, as expected. We estimated that plum diverged from the common ancestor shared with apricot and mei ~5 million years ago (MYA). According to our data, plum diverged from its common ancestor with peach, wild peach species, and apricot ~7 million years ago (MYA).
Characterization of the 417 peach accessions
In this study, we analyzed 417 worldwide peach accessions; the 159 publicly obtained accessions had different coverage depths (3x−100x), and the remaining 258 accessions were newly sequenced in this study. Furthermore, 182 of these 258 accessions were sequenced twice and merged at ~20x (each is ~10x), and the other 78 accessions were sequenced at ~10x. We called a total of 3,749,618 high-quality SNPs and 577,154 small indels from the 417 accessions. Additionally, we identified 31,800 DELs, DUPs, and INVs and 32,338 LIs from 326 deeply sequenced cultivars. We used 99,265 pruned SNPs for the phylogenetic tree, population structure, PCA, and LD decay analyses of the 417 accessions. From the phylogenetic tree (Fig. 1a), we grouped the 417 accessions into three subpopulations, wild, landrace, and improved accessions, which was further supported by PCA, which slightly separated most landraces and improved accessions (Fig. 1b), and a population structure plot (Fig. 1c). There was no clear population structure in the PCA plot when 18 wild accessions were excluded (Fig. 1b). With LD decay analyses, we found that wild accessions had the lowest LD value (~10k; r2 = 0.39), the improved accessions had the highest value (~50k; r2 = 0.39), and the LD values for the whole population were ~25k at r2 = 0.39. These values were comparable to those estimated in a previous study29 and in cultivated maize30 (22−30 kb) but far smaller than those estimated in rice10 (~167 kb) and cotton31 (~145.5 kb). The characterization of this GWAS population suggested that this population was suitable for performing GWASs according to its small LD decay value and lack of population structure. Using this population, we identified a number of candidate gene loci for key agronomic traits in peach (Table 1).
GWAS of fruit shape
Fruit shape in peach was generally classified into round and flat, and previous studies have shown that the flat shape was dominant to the round shape and regulated by a major gene mapped to the distal end of chromosome 632,33,34. Previous GWASs have discovered that some SNPs in chromosome 6 are closely associated with this trait16,17. In this study, we first performed a SNP-based GWAS and identified a SNP with the strongest association (chr6:28,973,642; A/T; P = 1.17e−35) (Fig. S4a). We later applied an SV-based GWAS approach to this trait. We identified a large inversion event (chr6:26,847,156; P = 1.35e−84; ~1.67 Mb in size) that showed the strongest association with this trait (Fig. 2a). The upstream end of this inversion event was located ~3 kb downstream of Prupe.6G290900 (Fig. 2c), which encodes an ovate family protein (OFP) whose ortholog in tomato was a key fruit shape controlling gene35. The downstream region of this inversion variant was located upstream of Prupe.6G323700, which encodes the activator subunit of the SNF1 complex; this gene regulates energy dynamic equilibrium in cells36. The upstream region of this inversion variant is ~2.13 Mb away from the top associated SNP identified using the SNP-based GWAS approach in this study and is ~0.08 Mb away from the top associated SNP identified by a previous study17. There were two main haplotypes in the SV-based GWAS panel based on this inversion variant (Fig. 2d). All accessions (n = 274) with Hap.1 (reference genome) had round-type fruit, while all accessions (n = 34) with Hap.2 had flat-type fruit. To validate whether this inversion event was responsible for the flat peach trait, we first analyzed the expression patterns of flanking genes during fruit development based on public RNA-seq data37. We found that Prupe.6G290900 expression was significantly higher in flat-type fruit than in round-type fruit at each time point (Fig. 2e), especially at 0 and 15 DAFB (Days After Full Bloom), which are critical stages for fruit shape determination. We further detected the expression of Prupe.6G290900 at 40 DAFB in 48 accessions. At the population level scale, the expression of Prupe.6G290900 in the flat-type population (n = 23) was higher than that in the round-type population (n = 25) (Fig. 2f). At the individual level, in each flat-type fruit (n = 23), the expression of Prupe.6G290900 was always higher than that in any of the round-type fruit accessions (n = 25) (Fig. 2g). The ratio of supported reads for the two alleles of Prupe.6G290900 was similar in round-type accession (50% v 50%), while it was significantly different (28% v 72%) in flat-type accession at 80 DAFB based on RNA-seq (Fig. 2h), suggesting an allele-specific expression pattern of Prupe.6G290900 in flat-type accessions. To validate this discovery, we overexpressed Prupe.6G290900 in round-type wild-type tomato (Fig. 2i), and we observed that in three independent overexpressing lines of Prupe.6G290900 (n = 3), all fruits were flatter than wild-type fruits (Fig. 2j, m) and were flatter at each sampled time during the fruit development process (Fig. 2k). The fruit shape index of these overexpressing lines was significantly smaller than that of the wild type (Fig. 2m), and the fruit shape index was negatively proportional to the relative expression of Prupe.6G290900 (Fig. 2l, m). Based on the locations of these two inversions, we designed two pairs of primers to genotype the 64 randomly selected accessions in our resource nursery. We found that all flat-type accessions (n = 32) showed an ~300 bp DNA product with the primer pair upstream of this inversion; however, none of the round-type accessions (n = 32) showed this ~300 bp DNA product (Fig. S5). Moreover, using a primer pair downstream of this inversion, we found that an ~500 bp PCR product could be amplified from all 32 flat-type accessions, and no DNA fragment could be amplified in any of the 32 round-type accessions (Fig. S6). This study showed that Prupe.6G290900, not Prupe.6G323700 (the expression level of this gene was not related to fruit shape, Fig. S7), was the best candidate gene for the flat-shape trait of peach. We hypothesized that a cis-element from the promoter of Prupe.6G323700 was transferred downstream of Prupe.6G290900 as an enhancer to activate its expression. We considered Prupe.6G323700 the best candidate gene involved in the premature abortion of fruits trait described in a previous study33, as it was linked with flat-type peach and was a recessive trait, considering the important role of this gene in energy sensors38.
GWAS on the nonacidity trait in peach fruit
The peach fruit taste is a key internal quality that is determined by complex factors. Low acidity is dominant to high acidity, and a major gene was mapped to the beginning of scaffold Pp05 in previous studies17,39. Although the SNP/indel and SV datasets used for GWAS were all analyzed (Fig. S8), we identified a small indel (chr5:628,841; P = 1.3e−8; T/TG) located upstream of the candidate Prupe.5G005400 gene that showed the strongest association with this trait, which is a different locus from the candidate reported by a previous study17 and encodes a switch subunit three protein. After careful analysis of the candidate interval, we identified a gene involved in fruit sugar content variance, as a sugar QTL and sugar/acidity QTL were previously reported in the same interval40,41,42. We identified a single SNP (chr5:720,760; G/T; P = 5.00e−3) in the third exon of Prupe.5G006300 (Fig. 3a), which encodes a sugar transport protein whose ortholog in Arabidopsis has been validated to control sugar transport43. This SNP leads to the conversion of acidic Q to H (Fig. 3c); this site was conserved from grass species to higher fruit-bearing plant species and from nematodes to humans (Fig. 3d). A protein crystal structure study of homologs in bacteria and mammals suggested that this site was one of the amino acids composing the binding site for sugar substrates44,45. The conversion of this Q site to A caused mostly functional loss of this gene and abrogated the sugar transport capacity of the cell in an in vitro experiment44,45. Thus, this SNP may lead to functional loss of Prupe.5G006300 and generate a null allele. We analyzed the expression pattern of this gene-based on public data37 and found that it was relatively more highly expressed at the fruit ripening stage in both varieties (65 DAFB) (Fig. 3e), suggesting its role in enhancing the peach fruit sugar content at the mature stage. When we studied the genotype frequency in the West and East groups, we found that in the West population (n = 26), only individuals with the GG genotype were selected (Fig. 3f); these individuals had two copies of the functional allele for high sugar content. In contrast, in the East population (n = 372), individuals with the GT genotype were preferably selected (Fig. 3f). As all accessions in the West population in this study showed normal acidity (pH < 4) and most accessions in the East population (79% in this study) were nonacidic (pH > 4), accessions with relatively high acidity and a high sugar content were selected in the West, and accessions (TT) with a low sugar content were selected elsewhere (Fig. 3f; <5%). The relative expression level of Prupe.5G006300 at 40 DAFB was similar across 48 peach accessions (Fig. 3g), suggesting that this gene is not differentially regulated at the transcriptional level.
GWAS on the fruit development period
The fruit development period and maturity date traits are key agronomic traits that determine the harvest and shipping time in peach production. Previous studies have shown that a major gene was mapped to the middle part of scaffold Pp04 in the peach genome based on linkage analysis46, and one NAC candidate gene, Prupe.4G186800, with nine base insertions in its last exon was proposed46. In the SNP-based GWAS approach, we identified SNPs with the strongest association signals (chr3:24,599,761; C/T; P = 8.00e−11) where no suitable candidate genes were identified for this trait (Fig. S9a). We further performed an SV-based GWAS to try to discover candidate genes. An ~400 bp DEL variant with the strongest association signal (chr4:11,127,010) was located ~10 kb upstream of Prupe.4G187100 (Fig. 4a,b), which encodes an NAC transcription factor whose ortholog in tomato was the NOR (non ripening) gene closely associated with fruit ripening47. A recent study confirmed the function of this gene in fruit ripening in peach48. This candidate gene was located on a different scaffold from the SNP with the strongest association discovered by standard SNP-based GWAS in this study (Fig. S7a); however, it was only ~10 kb from the strongest signal based on the SV-based GWAS (Fig. 4a, b). A further study suggested that this ~400 bp DEL variant is within a complex genome rearrangement interval. Haplotyping analysis suggested that there were three main haplotypes (Fig. 4c) based on the large structural variant. Hap. 1 is the reference sequence (~2 kb), and Hap. 2 is the rearrangement result of an ~400 bp DEL (right dashed line), another ~190 bp DEL (left dashed line), and an ~130 bp INS (two left rectangles), in contrast to the reference genome. This ~130 bp INS included an ~100 bp sequence (the middle yellow rectangle) identical to the flanking downstream sequence at the insertion site (the right yellow rectangle). Thus, Hap. 2 is ~1,550 bp in length. Hap. 3 is the tandem repeat of Hap. 2 and has an ~3 kb length (Fig. 4c). After analyzing 187 accessions with phenotypic data, we found that accessions with Hap. 1 (n = 47) had significantly longer fruit development periods than those with Hap. 2 (n = 58; P = 1.62 × 10−7) and Hap. 3 (n = 82; P < 2 × 10−16) (Fig. 4e). Accessions with Hap. 2 (n = 58) also had significantly longer fruit development periods than those with Hap. 3 (n = 82; P < 2 × 10−16) (Fig. 4e). We later detected the expression level of Prupe.4G187100 in 31 accessions at 40 DAFB at the population level (Fig. 4f). The expression level of this gene in the population with Hap. 1 (n = 5) was significantly lower than those in the population with Hap. 2 (n = 16; P = 0.0109) and Hap. 3 (n = 10; P = 4.56 × 10−4) (Fig. 4f). The expression level of this gene in the population with Hap. 2 (n = 16) was significantly lower than that in the population with Hap. 3 (n = 10; P = 9.46 × 10−8) (Fig. 4f). At the individual level, the expression pattern of Prupe.4G187100 was significantly different across 31 peach accessions (Fig. 4i); most accessions with a short fruit development period had the highest expression level, and most accessions with a long fruit development period had the lowest expression level, suggesting that this gene is regulated at the transcriptional level. However, some accessions, such as 4, 14, 20, 21, 22, and 23, showed an opposite trend. This suggested that other loci were involved in this trait, as it is a QTL controlled by many gene loci49. When grouping accessions based on haplotypes, we found that in accessions with Hap. 1, the expression level of this gene was the lowest; in accessions with Hap. 2, the expression level of this gene was the median; and in accessions with Hap. 3, the expression level of this gene was the highest (Fig. 4j). The expression pattern of Prupe.4G187100 during fruit development was analyzed based on public data37, and we found that the expression level of Prupe.4G187100 was very high at the fruit ripening stage (Fig. 4g; 65 DAFB), suggesting that this gene is closely associated with fruit ripening. Furthermore, using transcriptome and genome data of the same accession (n = 2; heterozygous Hap. 1/Hap. 3), we found a clear allele-specific expression pattern of Prupe.4G187100 at the mature fruit stage (Fig. 4h; 99% v 1%; 80 DAFB), suggesting that there is a cis-element regulating its expression. This complex structural rearrangement interval may be the best candidate for the underlying cause of this allele-specific expression pattern. In a previous study, a homozygous ~26.6 kb DEL located 700 bp upstream of Prupe.4G187100, which includes the complex structural rearrangement interval in the peach cultivar “Venus”, abolished fruit ripening50, further indicating that this complex structural rearrangement was associated with peach fruit development and maturity. The ortholog in the apple was also confirmed to be closely associated with the fruit development period in a previous GWAS51; this interval showed collinearity in apple, peach, apricot, and berry52. Thus, these genes may control fruit development using a conserved mechanism in these species. The GWAS results on the maturity date trait were the same as those on the development period trait (Fig. S10).
GWAS on the double flower trait
Double flowers with extra petals are important for artificial selection because of their attractive appearance and commercial value in several ornamental plants, such as peach. Two distinct loci were described as the underlying genetic causes for the double flower traits in peach. The first locus responsible for a recessive trait (Dl/dl) for double flowers was described by Lammerts53 and was mapped to chromosome 217,54. The second locus, identified as a single dominant gene (Dl2/dl2), which was first described by Beckman et al.55, was assigned to chromosome 6 and identified as a TOE-type AP2 gene. Deletion of the miR172 target site in this gene is responsible for the dominant double-flower trait in Rosaceae56,57. However, to date, the Dl gene is still controversial. In this study, we first performed SNP-based GWAS and identified the top associated SNP (chr6:21,609,914; A/G; 3.52e−40; Fig. S11), which was located on a different scaffold from the Dl gene, which was located on chromosome 2. We later performed SV-based GWAS and discovered that the top associated signal was in the promoter of Prupe.2G237700 (chr2: 25,860,343; 9.99e−83; Fig. 5a, b), which is a different gene locus from the candidate reported by a previous study17 and annotated as a 70 amino acid peptide without a functional domain. Using 1860 bp of genomic sequence as a query for blasting the NCBI nucleotide collection (nr/nt) dataset (blast.ncbi.nlm.nih.gov/Blast.cgi), we discovered that this gene is actually a noncoding RNA that transcribes the miR172d precursor (yellow-colored sequence in Fig. 5e). The strongest associated signal (chr2: 25,860,343) was only 333 bp from mature miR172d (Fig. 5d; blue-colored sequence in Fig. 5e). After analyzing all peach accessions with a double flower phenotype (n = 7), we identified two independent insertion events within the promoter of Prupe.2G237700 and a total of three main haplotypes based on a large insertion SV (Fig. 5d). Hap. 3 included the top associated signal (chr2: 25,860,343), which is an ~5.5 kb transposable element with an ~250 bp LRT (Fig. S12). Hap. 2 harbored an ~1.2 kb transposable element without an LRT (chr2: 25,860,413; ~263 bp from the mature miR172d) (Fig. S13). These two independent insertion events may prevent the transcription of the miR172d precursor and result in decreased levels of mature miR172d, and this decrease may lead to an increase in petal number, as the orthologs in other species and the target gene AP2 family were closely associated with petal number58,59,60.
GWAS on the nectarine trait
Nectarine in peach is a key agricultural character affecting both appearance and ecological adaptation. The nectarine trait is recessive to normal peach, and the major gene controlling this trait is the MYB gene Prupe.5G196100 in the middle of scaffold Pp056. A large insertion variant was discovered in the third exon of this gene and was shown to cause loss of function and the hairlessness phenotype6. To determine if this event was the only event responsible for all hairlessness phenotypes, we first performed a SNP-based GWAS and found that the top associated SNP (chr5:16,633,286; G/A; P = 4.90e−35) was located ~700 kb downstream of this gene locus (Fig. S14a), where no proper candidate genes could be identified. We next performed an LI-based GWAS approach and showed that the strongest associated signal (chr5:15,893,169; P = 3.93e−15) (Fig. S14d) was located within the third exon of this MYB gene (as previously reported6). We further analyzed this insertion event in all nectarine accessions and found that all nectarine accessions from Gansu Province did not have this large insertion variant at this site. To determine other variants responsible for hairlessness traits in accessions from Gansu Province, we first performed a SNP-based GWAS on this trait, but we defined other nectarine accessions with known LIs (large insertion structural variants) as hairy peaches. We identified one SNP with the strongest association signal (chr5:15,893,290; A/G; P = 2.15e−43) located in the third exon of the same gene, Prupe.5G196100 (Fig. 6a, c). Haplotyping analysis suggested that at least 3 haplotypes were responsible for the nectarine trait (Fig. 6d). Hap. 1 included the different and newly identified LI in the second exon of this gene in a landrace from Jiyuan city of Henan Province; Hap. 2 included the known LI in this study; and Hap. 3 included two newly identified SNPs for nectarine traits in accessions from Gansu Province. The SNP causing the conversion of the Q250 amino acid to R250 (Hap. 3) was located in a conserved position (Fig. 6e). Another observation indicated that Hap. 3 (G719G749) was the cause of the hairlessness trait: all accessions with only one copy of the insertion (Hap. 2) were normal peach, but an accession with one copy of the insertion (Hap. 2) and SNP749 (Hap. 3) was a nectarine. Hap. 3 was only present in nectarines originating from Gansu Province. These observations suggested that the hairlessness trait originated from at least three independent mutation events.
GWAS on the fruit flesh color trait
Flesh color (white or yellow) in peach affects fruit nutritional value and consumer preference. White color is dominant to yellow color. The candidate gene Prupe.1G255500 was considered responsible for the trait, and three independent mutation events were considered responsible for the origin of yellow flesh5,61. In the present study, we first performed a SNP-based GWAS to try to relate the top associated SNPs to this gene. However, we discovered that the top SNP (chr1:29,627,336; T/G; P = 4.21e−10) was ~3 Mb away from this gene (Fig. S15a). We later performed LI-based GWAS and identified the eighth strongest associated signal (chr1: 26,614,905; P = 1.21e−05) in the intron of Prupe.1G255500 (Fig. S15d), which was identified by a previous study5. A previous study suggested that the interval including the candidate gene was narrowed to ~500 kb (25,842,123−26,865,123) on linkage group 161; only the eighth strongest signal in this study was in this interval and thus discovered.
Read more here: Source link