Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species

  • Giovannoni, J. J. Genetic regulation of fruit development and ripening. Plant Cell 16, S170–S180 (2004).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355, 391–394 (2017).

    CAS 
    PubMed 

    Google Scholar
     

  • Peralta, I. E., Spooner, D. M. & Knapp, S. Taxonomy of wild tomatoes and their relatives (Solanum sect. Lycopersicoides, sect. Juglandifolia, sect. Lycopersicon; Solanaceae). Syst. Bot. Monogr. 84, 1–186 (2008).


    Google Scholar
     

  • Rick, C. M. Perspectives from plant genetics: the Tomato Genetics Stock Center. In Genetic resources at risk: scientific issues, technologies, and funding policies. Proceedings of a symposium, American Association for the Advancement of Science annual meeting, San Francisco, California, USA, 16 January 1989 (Eds McGuire, P. E. & Qualset, C. O.) 11–19 (Genetic Resources Conservation Program, University of California, 1990).

  • Mutschler, M. A. et al. QTL analysis of pest resistance in the wild tomato Lycopersicon pennellii: QTLs controlling acylsugar level and composition. Theor. Appl. Genet. 92, 709–718 (1996).

    CAS 
    PubMed 

    Google Scholar
     

  • Spooner, D. M., Peralta, I. E. & Knapp, S. Comparison of AFLPs with other markers for phylogenetic inference in wild tomatoes [Solanum L. section Lycopersicon (Mill.) Wettst.]. TAXON 54, 43–61 (2005).


    Google Scholar
     

  • Beckles, D. M., Hong, N., Stamova, L. & Luengwilai, K. Biochemical factors contributing to tomato fruit sugar content: a review. Fruits 67, 49–64 (2012).

    CAS 

    Google Scholar
     

  • The Tomato Genome Consortium. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012).


    Google Scholar
     

  • Hosmani, P. S. et al. An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequencing, Hi-C proximity ligation and optical maps. Preprint at bioRxiv doi.org/10.1101/767764 (2019).

  • Lin, T. et al. Genomic analyses provide insights into the history of tomato breeding. Nat. Genet. 46, 1220–1226 (2014).

    CAS 
    PubMed 

    Google Scholar
     

  • Aflitos, S. et al. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing. Plant J. 80, 136–148 (2014).

    PubMed 

    Google Scholar
     

  • Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161.e23 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).

    CAS 
    PubMed 

    Google Scholar
     

  • Sherman, R. M. & Salzberg, S. L. Pan-genomics in the human genome era. Nat. Rev. Genet. 21, 243–254 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Della Coletta, R., Qiu, Y., Ou, S., Hufford, M. B. & Hirsch, C. N. How the pan-genome is changing crop genomics and improvement. Genome Biol. 22, 3 (2021).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhou, Y. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606, 527–534 (2022).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yu, X. et al. Chromosome-scale genome assemblies of wild tomato relatives Solanum habrochaites and Solanum galapagense reveal structural variants associated with stress tolerance and terpene biosynthesis. Hortic. Res. 9, uhac139 (2022).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bolger, A. et al. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat. Genet. 46, 1034–1038 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Schmidt, M. H.-W. et al. De novo assembly of a new Solanum pennellii accession using nanopore sequencing. Plant Cell 29, 2336–2348 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang, X. et al. Genome of Solanum pimpinellifolium provides insights into structural variants during tomato breeding. Nat. Commun. 11, 1–11 (2020).


    Google Scholar
     

  • Takei, H. et al. De novo genome assembly of two tomato ancestors, Solanum pimpinellifolium and Solanum lycopersicum var. cerasiforme, by long-read sequencing. DNA Res 28, dsaa029 (2021).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Powell, A. F. et al. A Solanum lycopersicoides reference genome facilitates insights into tomato specialized metabolism and immunity. Plant J. 110, 1791–1810 (2022).

    CAS 
    PubMed 

    Google Scholar
     

  • Khan, A. W. et al. Super-pangenome by integrating the wild side of a species for accelerated crop improvement. Trends Plant Sci. 25, 148–158 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    PubMed 

    Google Scholar
     

  • Chen, J. et al. Tracking the origin of two genetic components associated with transposable element bursts in domesticated rice. Nat. Commun. 10, 1–10 (2019).


    Google Scholar
     

  • Stein, J. C. et al. Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat. Genet. 50, 285–296 (2018).

    CAS 
    PubMed 

    Google Scholar
     

  • Liu, Y. et al. Pan-genome of wild and cultivated soybeans. Cell 182, 162–176.e13 (2020).

    CAS 
    PubMed 

    Google Scholar
     

  • Mu, Q. I. et al. Fruit weight is controlled by Cell Size Regulator encoding a novel protein that is expressed in maturing tomato fruits. PLoS Genet. 13, e1006930 (2017).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Mora-García, S. & Yanovsky, M. J. A large deletion within the clock gene LNK2 contributed to the spread of tomato cultivation from Central America to Europe. Proc. Natl Acad. Sci. USA 115, 6888–6890 (2018).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yuste-Lisbona, F. J. et al. ENO regulates tomato fruit size through the floral meristem development network. Proc. Natl Acad. Sci. USA 117, 8187–8195 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wellenreuther, M. & Bernatchez, L. Eco-evolutionary genomics of chromosomal inversions. Trends Ecol. Evol. 33, 427–440 (2018).

    PubMed 

    Google Scholar
     

  • Huang, K. & Rieseberg, L. H. Frequency, origins, and evolutionary role of chromosomal inversions in plants. Front. Plant Sci. 11, 296 (2020).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Xia, X. et al. Brassinosteroid signaling integrates multiple pathways to release apical dominance in tomato. Proc. Natl Acad. Sci. USA 118, e2004384118 (2021).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Vasav, A. P. & Barvkar, V. T. Phylogenomic analysis of cytochrome P450 multigene family and their differential expression analysis in Solanum lycopersicum L. suggested tissue specific promoters. BMC Genomics 20, 1–13 (2019).


    Google Scholar
     

  • Eshed, Y. & Zamir, D. An introgression line population of Lycopersicon pennellii in the cultivated tomato enables the identification and fine mapping of yield-associated QTL. Genetics 141, 1147–1162 (1995).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gamuyao, R. et al. The protein kinase Pstol1 from traditional rice confers tolerance of phosphorus deficiency. Nature 488, 535–539 (2012).

    CAS 
    PubMed 

    Google Scholar
     

  • Zhang, Z. et al. Genome-wide mapping of structural variations reveals a copy number variant that determines reproductive morphology in cucumber. Plant Cell 27, 1595–1604 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ameur, A. Goodbye reference, hello genome graphs. Nat. Biotechnol. 37, 866–868 (2019).

    CAS 
    PubMed 

    Google Scholar
     

  • Zhu, G. et al. Rewiring of the fruit metabolome in tomato breeding. Cell 172, 249–261.e12 (2018).

    CAS 
    PubMed 

    Google Scholar
     

  • Darwin, S. C., Knapp, S. & Peralta, I. E. Taxonomy of tomatoes in the Galapagos islands: native and introduced species of Solanum section Lycopersicon (Solanaceae). Syst. Biodivers. 1, 29–53 (2003).


    Google Scholar
     

  • Peralta, I. E., Knapp, S. & Spooner, D. M. New species of wild tomatoes (Solanum section Lycopersicon: Solanaceae) from Northern Peru. Syst. Bot. 30, 424–434 (2005).


    Google Scholar
     

  • Bayer, P. E., Golicz, A. A., Scheben, A., Batley, J. & Edwards, D. Plant pan-genomes are the new reference. Nat. Plants 6, 914–920 (2020).

    PubMed 

    Google Scholar
     

  • Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Berlin, K. et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 33, 623–630 (2015).

    CAS 
    PubMed 

    Google Scholar
     

  • Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • English, A. C. et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS ONE 7, e47768 (2012).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 1–11 (2015).


    Google Scholar
     

  • Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).

    CAS 
    PubMed 

    Google Scholar
     

  • Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 38, e199–e199 (2010).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Edgar, R. C. & Myers, E. W. PILER: identification and classification of genomic repeats. Bioinformatics 21, i152–i158 (2005).

    CAS 
    PubMed 

    Google Scholar
     

  • Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 1–6 (2015).


    Google Scholar
     

  • Hoede, C. et al. PASTEC: an automatic transposable element classification tool. PLoS ONE 9, e91929 (2014).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chen, N. Using Repeat Masker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 5, 4.10.1–4.10.14 (2004).


    Google Scholar
     

  • Rice, P., Longden, I. & Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000).

    CAS 
    PubMed 

    Google Scholar
     

  • Ma, J. & Bennetzen, J. L. Rapid recent growth and divergence of rice nuclear genomes. Proc. Natl Acad. Sci. USA 101, 12404–12410 (2004).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: homology-based gene prediction utilizing intron position conservation and RNA-seq data. Methods Mol. Biol. 1962, 161–177 (2019).

    CAS 
    PubMed 

    Google Scholar
     

  • Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).

    CAS 
    PubMed 

    Google Scholar
     

  • Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jia, H. et al. PASA: identifying more credible structural variants of Hedou12. IEEE/ACM Trans. Comput. Biol. Bioinformatics 17, 1493–1503 (2019).


    Google Scholar
     

  • Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Trapnell, C. et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).

    CAS 
    PubMed 

    Google Scholar
     

  • Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, 1–22 (2008).


    Google Scholar
     

  • Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

    CAS 
    PubMed 

    Google Scholar
     

  • Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005).

    CAS 
    PubMed 

    Google Scholar
     

  • Tang, H. et al. Screening synteny blocks in pairwise genome comparisons through integer programming. BMC Bioinformatics 12, 102. (2011).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).

    CAS 
    PubMed 

    Google Scholar
     

  • Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

    CAS 
    PubMed 

    Google Scholar
     

  • Särkinen, T., Bohs, L., Olmstead, R. G. & Knapp, S. A phylogenetic framework for evolutionary study of the nightshades (Solanaceae): a dated 1000-tip tree. BMC Evol. Biol. 13, 214 (2013).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).

    CAS 
    PubMed 

    Google Scholar
     

  • Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chakraborty, M., Emerson, J. J., Macdonald, S. J. & Long, A. D. Structural variants exhibit widespread allelic heterogeneity and shape variation in complex traits. Nat. Commun. 10, 4872 (2019).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Sirén, J. et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 374, abg8871 (2022).


    Google Scholar
     

  • McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, M.-X., Yeung, J. M. Y., Cherny, S. S. & Sham, P. C. Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum. Genet. 131, 747–756 (2012).

    CAS 
    PubMed 

    Google Scholar
     

  • Li, H. Scripts and codes used in the tomato super-pangenome paper (1.0). Zenodo doi.org/10.5281/zenodo.7396707 (2022).

  • Read more here: Source link