Genome resequencing approach for genotyping 306 sugar beet germplasm resources
In this study, we performed high-depth genome-wide resequencing of 306 sugar beet accessions using an Illumina HiSeq 2000 sequencer, obtaining 1977.12 Gb of sequencing data. This collection included 72 endemic accessions from Northeast China (Harbin), 114 endemic accessions from North China (Hohhot), 100 endemic accessions from Northwest China (50 from Urumqi and 50 from Shihezi), and 20 accessions from abroad. This diverse population represents the genetic diversity of ecotypes, geographical locations, and various traits. After filtering the raw sequencing data (see Materials and Methods), the high-quality clean data were aligned to the sugar beet reference genome, RefBeet-1.2.2 (plants.ensembl.org/Beta_vulgaris/Info/Index). The resulting high-density and high-quality genotype data covered 566,181,630 bp with an average sequencing depth of 11 × and a maximum depth of 22.5 × per accession. The average alignment rate for the population samples was 96.43% ± 0.5%, with an effective localization rate of 74.8%. The average sequencing depth across the genome (excluding gap regions) was 12.2 × ± 1.1 × . The genome coverage was 86% ± 1.8%. The high mapping and coverage rates ensured the reliability and high quality of the sequencing data. This comprehensive dataset allowed for a more accurate assessment of genetic diversity and identification of potential candidate genes associated with agronomic traits in sugar beet accessions.
A total of 18,875,282 variant loci were detected, which included 14,900,035 single nucleotide polymorphism (SNP) loci and 3,975,247 insertion/deletion (INDEL) loci. After screening for a gene frequency (minor allele frequency, MAF) greater than 0.05, 8,258,753 SNPs were identified. Among them, 872,623 were located 1 kb upstream of the genes, and 512,777 variants were in exonic regions, resulting in 242,816 non-synonymous mutations and 269,961 synonymous mutations. The ratio of non-synonymous SNPs to synonymous SNPs was 0.899. A total of 2,961,245 variants were found in intronic regions, 3309 variants were in splice sites, and 748,749 variants were upstream of one gene and downstream of another gene. Additionally, 12,439,140 variants were located in intergenic regions (Table 1).
Population structure analysis of 306 sugar beet germplasm resources in different production areas
We investigated the phylogenetic relationships among 306 germplasm resources by genome-wide SNP analysis. Phylogenetic trees were constructed for the resources from the four major production areas using the neighbor-joining (NJ) method, and the diversity of genotypes among these materials was assessed. The 306 accessions were divided into four groups based on the phylogenetic tree and genetic distance (Fig. 1A). The first group of 105 materials was dominated by introduced materials from Hohhot and Europe, with 91 from Hohhot and 14 from Europe. The second group, comprising 24 materials, consisted mostly of samples from Hohhot and Urumqi, with ten from Hohhot, 12 from Urumqi, and two from Shihezi. The third group included 86 materials, with 16 from Harbin, 14 from Hohhot, 26 from Urumqi, 24 from Shihezi, and three each from Europe and the United States. The fourth group, containing 91 materials, was mainly from Harbin but also included samples from Shihezi and Urumqi, with 57 from Harbin, 23 from Shihezi, and 11 from Urumqi (Fig. 1B).
Population structure analysis of 306 sugar beet different subpopulations and different production areas. (A) Phylogenetic tree constructed for all sugar beet attributions. Groups 1–4 are indicated by different colors, and materials from different production areas are indicated by different shapes. (B) Production area source statistics for each subpopulation. (C) Mixed ancestry analysis for sugar beet subpopulations. Each color represents an ancestral component. K is set from 2 to 5 to track different ancestral components. (D) Principal component analysis of the first two eigenvectors for all sugar beet raw materials. Materials from different production areas are shown in different shapes, while subpopulations are shown in different colors.
We conducted a structure analysis on 306 sugar beet samples using Admixture software to gain insights into the genetic makeup and relationships of the germplasm resources. Each column in the image represents one individual, and the length of the differently coloured segments indicates the proportion of an ancestor in the individual’s genome (Fig. 1C). When K = 2, the major ancestral component (yellow) of group 4 splits, indicating the highest level of selection in group 4. When K = 3, a new ancestral component (red) infiltrates, and two ancestral components, red and yellow, dominate in group 4. When K = 4, the optimal number of ancestral groups, we observe that the yellow ancestors of group 4 dominate, and the group has relatively homogeneous ancestors and a simpler genetic background compared to those of the other three groups. In contrast, the ancestral origins of group 1, group 2, and group 3 were relatively diverse. Yellow, green, and blue, representing different ancestral components, were relatively balanced in group 1, blue was relatively high in group 3, followed by green, and red and yellow ancestral components were dominant in group 2.
Principal component analysis (PCA) showed that some samples from group 2 were more clearly separated from the other three subgroups in the upper left corner, group 1 and group 3 both radiated to the lower left, group 4 samples were more centrally distributed, and only a few samples deviated from the other three subgroups and diverged to the lower right. With respect to the different production areas, samples from the Harbin region were tightly clustered. In contrast, samples from the Hohhot region showed the greatest dispersion, and the few samples from the Urumqi production area showed significant separation (Fig. 1D). Overall, however, the materials from the four regions did not show significant segregation and did not resemble a clearly differentiated population.
The above analysis showed that materials from different subpopulations and production areas had similar genetic backgrounds, indicating a high degree of gene exchange and interpenetration between resource materials within subpopulations. The analysis also indicates that since the introduction of sugar beet germplasm resources in China, some resource materials from the Hohhot area have been domesticated for many years, resulting in segregation from materials from other production areas, and materials from the Harbin area have a relatively homogeneous genetic background. This observation suggests that in China, sugar beet, as an exotic species, has a simple genetic background and low genetic diversity due to the lack of wild resources. After its introduction to China, sugar beet had a short reproductive period and underwent frequent genetic exchange between materials from different regions without forming independent subgroups with regional characteristics.
Genetic diversity and selective scanning analysis
The results of linkage disequilibrium (LD) analysis showed that the LD decay distances of the four subpopulations were not significantly different, with G2 > G1 > G3 and G4 (Fig. 2A). Further analysis of the LD decay distances of the four production areas indicated that Hohhot had the largest distance, followed by Urumqi, Harbin, and Shihezi (Fig. 2B). These results suggest that the materials in the Hohhot region have experienced a higher degree of domestication and selection and greater selection intensity and have lower genetic diversity than those in the other three production areas. The results also show that the LD of materials in the Hohhot region is higher than that in the other three production areas.
Genetic diversity analysis and putative selective regions of sugar beet resources from different subpopulations and production areas. (A) LD decay plots for sugar beet subpopulations. (B) LD decay plots for Harbin, Hohhot, Shihezi, Urumqi. (C) IBS value distribution for Harbin, Hohhot, Shihezi, Urumqi. (D) IBS value distribution for sugar beet subpopulations. (E) Comparison of θπ values for sugar beet subpopulations. (F) Comparison of θπ values for Harbin, Hohhot, Shihezi, Urumqi. (G) Comparison of Fst values between Harbin Hohhot Shihezi Urumqi. (H) Landscape of sugar beet genetic diversity across the whole genome. (a) Chromosomes. (b) Density of genes. (c) Density of SNPs (red). (d) LD value distribution for Group1 (green), Group2 (orange), Group3 (blue) and Group4 (yellow). (e) LD value distribution for Harbin (yellow), Hohhot (green), Shihezi (purple) and Urumqi (blue). (f) Tajima’s D value distribution for Group1 (green), Group2 (orange), Group3(blue) and Group4 (yellow). (g) Tajima’s D value distribution for Harbin (yellow), Hohhot (green), Shihezi (purple) and Urumqi (blue). (h) θπ value distribution for Group1 (green), Group2 (orange), Group3 (blue) and Group4 (yellow). (i) θπ value distribution for Harbin (yellow), Hohhot (green), Shihezi (purple) and Urumqi (blue). LD, linkage disequilibrium; IBS, identical-by-state; SNP, single-nucleotide polymorphism.
The identity-by-state (IBS) analysis determines the concordance of all genetic markers to reflect the degree of correlation among individuals. This analysis revealed that group 2 was the smallest among the four subgroups, while groups 1, 3, and 4 were convergent (Fig. 2C). The average IBS values in the Hohhot production area were slightly higher than those in the other three production areas, which were very similar. Group 2 had the largest attenuated LD distance but the smallest mean IBS value (Fig. 2D). The decay distance of IBS values and LD show the same trend among the four regions.
Genetic diversity within a population can be measured using θπ value, which represents the number of different loci between any individuals within the population. This value is useful in understanding the genetic variation within a group and can provide information and guidance for breeding programs. The θπ values within the four subpopulations and clusters in the four production areas did not differ significantly, indicating that the genetic diversity among materials within different subpopulations and different production areas was similar (Fig. 2E, F).
The fixation index (Fst) was calculated to represent the genetic distance and differentiation between populations. The Fst values between the four subpopulations and the four production areas showed that G3 and G4 had the lowest values, with little differentiation and a close genetic distance. The Fst values of G2 and G4 were the highest, indicating the greatest degree of differentiation between these two subgroups (Fig. 2G). The Fst values were calculated among the four production areas. The highest Fst values were found between Harbin and Hohhot, indicating that the resource materials in these two areas were the most differentiated. Meanwhile, the Fst values between Hohhot and the other three production areas, compared to the values among the other three areas, all showed a significant degree of differentiation. Harbin and Urumqi had the lowest Fst values, suggesting that these two areas had the least differentiation, closest genetic distance, and most frequent gene exchange.
LD, Tajima’s D, and Fst are effective metrics for studying genetic variation, and in this study, they were employed in pairs to investigate the genetic diversity of all materials, as shown in Fig. 2H, which illustrates the genome-wide genetic diversity of sugar beets.
Phenotypically associated loci and genes identified using GWAS
We conducted a GWAS of 26 agronomic traits in 306 sugar beet accessions from four geographical locations (Harbin, Hohhot, Urumqi, and Shihezi) in three major ecological regions (Northeast, North, and Northwest China). Differences between traits were determined by calculating Pearson correlation coefficients of the traits (Supplementary Figure S1), root yield, sucrose content and sugar yield were also statistically analyzed (Supplementary Table 1). The results showed that root yield and sugar yield were positively correlated, with the highest correlation coefficient of 0.81. Plant height and flesh coarseness and plant height and root yield were positively correlated, with correlation coefficients greater than 0.5. Flourishing growth vigour and plant height, plant height and crown size, cotyledon leaf area and sucrose content, and seedling growth vigour and damping-off were negatively correlated, with correlation coefficients less than − 0.5.
We obtained 8,358,753 SNPs for subsequent analysis by screening with a gene frequency (MAF) greater than 0.05. The GWAS of the 26 agronomic traits was performed using Genomic Association and Prediction Integrated Tool (GAPIT) with the maximum likelihood method mixed linear model (MLM), and Manhattan and QQ plots were generated (Supplementary Figure S2–S27). In total, 3904 associated genes were identified, and the distribution of each trait on different chromosomes is shown in Fig. 3. The phenotype ID cross-reference table is shown in Table 2.
Genes associated with skin roughness
Primary roots with a smooth epidermis reduce soil carryover at harvest time, and the smooth epidermis also acts as a barrier preventing bacteria and viruses from invading the primary roots23. In this study, a total of 599 SNP loci associated with skin roughness were identified, with 16 loci being associated (Fig. 4) and 14 loci ultimately annotated. The two genes with the strongest association signals among the 14 annotated genes were DFAX2 and P5CS (Table 3), with the highest − log10(P) of 9.17703. Both genes are located on chromosome 2 and share a common mutation site, 2,312,199, in the intergenic region. Gene annotation revealed that DFAX2 is involved in the formation of the defensin-like protein AX2 and P5CS is involved in the formation of δ-1-pyrroline-5-carboxylate synthase. In subsequent qRT-PCR validation, the expression of DFAX2 was significantly higher in the unsmooth-epidermis material than in the smooth-epidermis material. The expression of P5CS also showed a significant change.
Genes associated with sugar yield
Sugar yield is a critical trait for sugar beet growers, as it is calculated based on the weight of roots per hectare and the percentage of root sucrose (°S)24. Previous studies on sugar beet consistently demonstrated a strong negative correlation between sucrose content and root yield. Several factors in sugar beet can affect yield (e.g., mass/area) and physiological components (e.g., the proportion of total mass). Due to this characteristic of sugar beet, the present study introduces the complex trait of sugar yield as a means to investigate yield quality. For the sugar yield trait, a total of 10 candidate genes were identified. After further analysis, three genes, FRO5 (BVRB_3g053570), GL24 (BVRB_3g053550), and PPR91 (BVRB_1g005550), two on chromosome 3 and one on chromosome 1, were found to be associated with both root yield and sugar yield (Fig. 5A, Table 4). Gene FRO5 mutations were located downstream, GL24 mutations in the intergenic region, and PPR91 mutations in the UTR5. Gene annotation revealed that FRO5 is involved in regulating iron reduction via oxidase, GL24 is a member of the sprouting protein subfamily 2, and PPR91 is involved in the construction of At1g62670, a protein containing pentapeptide repeats in mitochondria. The results show that functional mutations of the genes FRO5 and GL24 yield three haplotypes, AA, AG, and GG, for each gene. GG mutations were primarily concentrated in the breeding materials of Urumqi and belonged to the subgroup materials of group 4. Correlation analysis by phenotype showed that the correlation coefficient between root yield and sugar yield was 0.81 (Fig. 5C), indicating a significant positive correlation. In haplotype comparisons, both AA and AG haplotypes showed significant or highly significant correlations. Functional PPR91 gene mutations, i.e., CC and CT, showed a significant correlation (Fig. 5B). Following qRT-PCR validation, the expression of two genes, FRO5 and GL24, was found to decrease with decreasing root yield and sugar yield, whereas the expression of the PPR91 gene increased considerably with decreasing root yield and sugar yield.
Three genes related to the sugar beet root yield and sugar yield identified by GWAS. (A) Manhattan plot of root yield and sugar yield and the candidate FRO5, GL24, PPR91 genes. (B) The two traits associated with sugar yield distribution for the haplotypes of PPR91, GL24, and FRO5. *P < 0.05; **P < 0.01; ns, not significant. (C) Phenotype correlation of root yield and sugar yield relate traits.
POLX was associated with sugar yield
The five traits associated with sugar yield (flourishing growth vigour, plant height, crown size, flesh coarseness and sugar yield) shared a common gene, POLX (BVRB_1g140620), located on chromosome 6 (Fig. 6A). Mutations occurred at two positions, 21,650,496 and 21,669,194, from G to A (Supplementary Table 2). The relationship between the two functional mutations was further analysed, with four mutations located upstream and one in the intronic region (Table 5). Phenotypic correlations were observed: flourishing growth vigour and crown size, plant height and flesh coarseness, plant height and sugar yield, and flesh coarseness and sugar yield showed a positive correlation, while flourishing growth vigour and plant height, flourishing growth vigour and flesh coarseness, plant height and crown size, and crown size and flesh coarseness exhibit a negative correlation (Fig. 6C).
POLX was associated with sugar yield traits. (A) Manhattan plot and qq plot of candidate genes related to the sugar flourshing growth vigour, plant height, crown size, flesh coarseness and sugar yield. (B) The five traits associated with sugar yield distribution for the haplotypes of POLX. **P < 0.01; ***P < 0.001; ****P < 0.0001; ns, not significant. (C) Phenotype correlation of five traits associated with sugar yield.
The results reveal that the functional mutations formed three haplotypes, AA, GA, and GG, for each gene. Comparisons of haplotypes and phenotypes separately showed that for flourishing growth vigour and plant height, the phenotypes of GA and GG displayed a highly significant correlation; for crown size and flesh coarseness, AA, GA, and GG all exhibited a highly significant correlation; and for sugar yield, the phenotypes of GA and GG exhibited a significant correlation (Fig. 6B). Following qRT-PCR validation, the results showed that the expression of POLX changed significantly with different haplotype phenotypic changes.
Construction phenotype-gene networks using central gene modules and multiple genes
Through an in-depth exploration of the GWAS results, we identified instances where one trait was associated with multiple genes, and one gene was linked to multiple traits. Furthermore, we discovered complex networks between various phenotypes and genes due to extensive protein-level interactions among genes. To gain further insights, we performed a functional mutation-based haplotype test to construct a protein–protein interaction (PPI) network map and a phenotype-gene network map, which included 14 traits and 256 annotated pleiotropic genes (Figs. 7, 8). We categorized these traits into six categories: seedling traits, morphological traits, root traits, yield quality traits, root rot resistance, and economic type (Supplementary Table 3). Traits within the same category showed close links throughout the network.
In this genetic network, the six trait categories were linked by 14 central nodes containing 256 genes, with the largest central node being a root trait. In the PPI network diagram, the genetic network was dominated by ten major genes that linked different pleiotropic genes. The gene PPI network diagram allowed the identification of the pleiotropic genes of the major nodes. The combination of phenotypic associations and protein interactions with these primary genes provided a basis for identifying important candidate genes. This comprehensive information could be utilized to develop targeted breeding strategies, enhance the efficiency of genetic selection, and accelerate the process of cultivar development.
Read more here: Source link