Crossing design shapes patterns of genetic variation in synthetic recombinant populations of Saccharomyces cerevisiae

Population creation

All yeast strains used in this study originated from heterothallic, haploid, barcoded derivatives of the SGRP yeast strain collection30. A subset of 12 of these haploid strains, originally isolated from distinct geographic locations worldwide, were used to create the synthetic populations we describe here (See Supplementary Fig. S1 for phylogeny). These 12 strains were all genetically modified as described in detail by Linder et al.23 to enable easy crossing and diploid recovery; these modified strains were kindly provided by Anthony D. Long (UC Irvine) in 2017. Briefly, these strains were modified so that MATa and MATα strains both contain ho deletions to prevent mating-type switching, but each contain a different drug-resistance marker in a pseudogene (YCR043C) tightly linked to the mating type locus (MATa, hoΔ, ura3::KanMXbarcode, ycr043C::NatMX and MATα, hoΔ, ura3::KanMXbarcode, ycr043C::hphMX). These genotypes enable haploids of each mating type to be recovered using media supplemented with either hygromycin B or nourseothricin sulfate, and they enable newly mated a/α diploids to be recovered in media supplemented with both drugs.

Two different crossing strategies were used to create genetically diverse populations using 4, 8, and 12 strains as founders (Table 1). “K-type” populations (named for what we call the “kitchen sink method”, or the practice of pooling isogenic strains together without careful focus on representation) were created by simply pooling equal volumes of saturated overnight cultures of the respective haploid founders and allowing those cells to mate. To accomplish this, single colonies of each haploid founder strain were sampled and grown overnight (at 30 °C/200 rpm) in 1 mL of rich media consisting of 1% yeast extract, 2% peptone, and 2% dextrose (YPD). After ~ 24 h, cultures were washed in fresh YPD media, pooled with the relevant other overnight cultures in a 50 mL conical tube, vortexed, and now-mixed cultures were allowed to settle and mate for 90 min at room temperature. These cultures were then transferred in 200uL aliquots to agar plates containing 100 mg/mL nourseothricin sulfate (“NTC”), 300 mg/mL hygromycin B (“hyg”) as well as 200 mg/mL G418; this strategy ensured that that only newly mated diploids would grow. The resulting lawns of mated diploid cells were collected by scraping with a sterile glass slide into a fresh YPD media. This “cell bank” was archived as a frozen stock at − 80 °C for each of the K-type populations made with 4, 8, and 12 haploid founders (4 K, 8 K, and 12 K respectively).

Table 1 Strains used to create the synthetic populations featured in this study. Strains are arranged in the rows in the table to indicate which specific pairs were crossed in the S-type populations.

“S-type” populations (named S due to the manipulation of spores to achieve better representation of founder genotypes) were built to mimic more careful crossing designs in which founding lines are crossed in pairs and/or a round-robin. To accomplish this, each haploid strain was paired with a different strain of the opposite mating type and mated as described above. Successful diploid colonies were isolated, grown overnight in 1 mL of YPD, washed and resuspended in sporulation media (1 mL 1% potassium acetate), then cultured for 72 h at 30 °C/200 rpm. Tetrads from these diploid cells were dissected using a SporePlay dissecting microscope (Singer). The four meiotic products (spores) were then collected, allowed to grow for 2 days, and replica plated to plates containing either NTC or hyg to verify the proper segregation of drug resistance markers and thus mating types. Once validated, the meiotic products were grown overnight in 1 mL of YPD. Overnight cultures were standardized to the same optical density (OD600) before being pooled in equal volumes in a 50 mL conical tube. Populations were given 90 min to mate at room temperature, then were plated on agar plates supplemented with both NTC/hyg/G418 so that only newly mated diploid cells could grow. The resulting lawns of mated diploid cells were collected by scraping with a sterile glass slide into fresh YPD media. This “cell bank” was archived as a frozen stock at − 80º °C for each of the S-type populations made with 4, 8 and 12 founders (S4, S8, and S12 respectively; see Supplementary Fig. S2 for crossing schematics).

Population maintenance and 12 cycles of outcrossing

After the creation of the 3 “K-type” and 3 “S-type” synthetic recombinant populations described above, all populations were taken through 12 consecutive cycles of intentional outcrossing; in other words, the populations were subjected in parallel to a series of steps that induced regular sporulation, spore isolation, and mating. Detailed methods are described by Burke et al.31. Briefly, newly mated diploid cells from the last step of the population creation protocol (i.e. the “cell banks”) were grown overnight in 10 mL YPD media. These cultures were washed and resuspended in sporulation media, and incubated with shaking for 72 h (30 °C/200 rpm). Cells then underwent a number of methods to disrupt asci and isolate/randomize spores, including incubation with Y-PER yeast protein extraction reagent (Thermo) to kill vegetative diploids, digestion with 1% zymolyase (Zymo Research) to weaken ascus walls, and well as high-speed shaking with 0.5 mm silica beads (BioSpec) to mechanically agitate the asci. After these steps spores were resuspended in 10 mL YPD and allowed to settle and mate for 90 min at room temperature. Diploids were recovered as described above; cultures were transferred to 10 individual YPD agar plates supplemented with NTC/hyg/G418 in 200 μL aliquots and incubated at 30 °C for 48 h. The resulting lawns of mated diploid cells were collected by scraping with a sterile glass slide into fresh YPD media. This “cell bank” was again sampled for archiving at − 80 °C, and used to initiate an overnight culture for the next outcrossing cycle. We estimate that 15–20 asexual generations occurred between every outcrossing cycle of the experiment. Based on counting colonies from dilutions of cultures plated at various benchmarks during the protocol, we expect that 7.5–11 generations elapse during the overnight culture in YPD media, and another 7.5–11 generations elapse during the period of diploid recovery on agar plates. Thus, a minimum of 15*12 = 180 cell doublings likely took place over the 12 cycles of outcrossing in each synthetic recombinant population.

Genome sequencing and SNP identification

Each of the recombinant K- or S-type population was sequenced at three specific timepoints: initially (we also call this timepoint “cycle 0”), after 6 cycles of outcrossing (“cycle 6”), and after 12 cycles of outcrossing (“cycle 12”). We also sequenced each haploid founder strain such that we could estimate the relative contributions of each to the recombinant populations. Each of the founding SGRP strains were plated as haploids on plates containing either NTC or hygromycin to verify the presence of the appropriate drug resistance markers. Individual colonies were isolated from each strain for verification of identifying barcodes at the URA3 locus using Sanger sequencing (Cubillos et al.30 provide barcode and primer sequences). Once validated, single colonies were again isolated for whole-genome sequencing. One milliliter of YPD media was inoculated with single colonies, grown overnight, and the resulting culture was harvested for gDNA extracted using the Qiagen Puregene Yeast/Bact. Kit. Purified gDNA from each haploid founder was then prepared for sequencing using the Nextera DNA Sample Preparation Kit (Illumina). Some minor modifications to the manufacturer’s protocol were implemented to optimize throughput (cf. Baym et al.32). Genomic DNA libraries were prepared for experimental recombinant populations in the same way and all samples were pooled to generate a single multiplexed library. Because the recombinant (i.e. genetically variable) populations require significantly higher coverage to accurately estimate allele frequencies at variable sites, these populations were added to the library at 10X the molarity of each haploid founder sample. The multiplexed library was run on two SE150 lanes on the HiSeq3000 at the OSU Center for Genomic Research and Biocomputing (CGRB). Data for the 4S populations were previously published in Burke et al.31 and raw fastq files are available through NCBI SRA (BioProject ID: PRJNA678990). Raw fastqs for all other populations are available through NCBI SRA (BioProject ID: PRJNA732717).

We have developed a processing pipeline for estimating allele frequencies in each population directly from our pooled sequence data. We used GATK v4.033,34 to align raw data to the S. cerevisiae S288C reference genome (R64-2–1) and create a single VCF file for all variants identified across all replicate populations, using standard best practices workflows and default filtering options. We also downloaded and indexed a reference VCF file with SNP information for a number of distinct natural isolates of S. cerevisiae35; this is a recommended best practice for calibrating base quality with GATK v4.0. This VCF file was converted into a SNP frequency table by extracting the AD (allele depth) and DP (unfiltered depth) fields for all SNPs passing quality filters; the former field was used as the minor allele count and the latter was used as the total coverage. The python scripts used to generate and convert VCF files to tables suitable for downstream analyses in R (www.R-project.org) are available through GitHub (see Data Availability statement for details on where to find all major scripts used to process and analyze data).

Our general SNP analysis strategy involved portioning the data to create three separate SNP tables with each table corresponding to a set of founders and populations derived from them (e.g. a table containing with the S4 and K4 populations and their founders). In each table, we chose to only include sites with a minimum coverage > 20X in the in synthetic populations as a quality control measure. Next, sites were filtered based on data from the founder populations. We excluded all sites that appeared to be polymorphic within a given founder, and sites where a single nucleotide was fixed across all founders. This was done as such occurrences could indicate sequencing error given that our founder strains are haploid and isogenic, and a site is unlikely to be polymorphic in our synthetic populations if it is fixed across all of the founders. After these filters were applied, we retained a collection of high-quality SNPs in each population to subject to further analysis. The total number of SNPs identified in each population is given in Table 1, and the average genome-wide coverage (i.e. depth of sequence coverage) of each population is given in Supplementary Table S1. All populations had mean coverages > 50X with all but one population (S4 cycle 0) having greater than 70X mean coverage (Supplementary Table S1).

SNP variation

Our main objective was to evaluate how crossing strategy and the number of founder strains impacts patterns of SNP variation in synthetic recombinant populations. To that end, we assessed SNP-level variation in our recombinant populations using several metrics. First, we simply determined the number of polymorphic sites segregating in each population immediately following their creation (cycle 0), and monitored how that number changed over time ((i.e. after 6 or 12 outcrossing cycles). This approach of tracking the total number of SNPs should reveal whether particular crossing strategies – i.e. using a certain number of founders, and/or one of the two crossing strategies – consistently produced populations with more SNPs, and whether these SNPs were maintained or lost over 12 outcrossing cycles. We also generated UpSet plots using the UpsetR package36 in R to visualize patterns of overlap between the total number of SNPs possible for a given combination of founder strains, and the SNPs we observed in our actual populations. We define the total number of possible SNPs as all loci for which at least one of the founding strains used has an allele different from the others; this number will therefore differ among the 4-way, 8-way, and 12-way crosses.

In addition to SNP number, we also characterized the distribution of SNP frequencies in each population, which allows more direct comparisons between populations with different numbers of founders but the same crossing strategy, or the same number of founders but different crossing strategy. To do this, we focused on two metrics: the site frequency spectrum (SFS), and genome-wide heterozygosity. Here heterozygosity refers to 2pq, the product of the reference (i.e. the S288C allele) and alternate allele frequency at a given site multiplied by 2. In addition to looking at differences in mean genome-wide heterozygosity between populations, we also generated sliding window plots showing patterns of variation across each chromosome. To define windows, we used the GenWin package37 in R with the following parameters: “smoothness = 6000, method = 3.” GenWin itself uses a smoothing spline technique to define windows based on breakpoints in the data. While we ultimately used “smoothness = 6000”, we did initially try a range of values. Our final selection was made based on what most clearly represented trends in the data. For interested parties, plots with more or less smoothness can be easily generated using data and scripts we have made available through Dryad and Github (See “Data availability” statement for details).

It is worth noting that our ability to assess levels of genetic variation across our synthetic populations is limited by the fact we have only collected Pool-SEQ data. Given the complex life-history of the yeast populations in this experiment, which involves periods of 7–15 generations of asexual growth punctuated by discrete outcrossing events, it is not possible for the genotypes of all individuals in the population to be shuffled by recombination every generation. Therefore, asexual lineages will evolve by clonal interference for relatively short periods of time, until the next outcrossing event decouples individual adaptive alleles from a particular genetic background. It is possible that during these periods of clonal interference, particular diploid lineages will dominate, and if these lineages are heterozygous at a given locus, that will lead to an artificially elevated heterozygosity value at that SNP. But we do not believe this is a major complication in the understanding of nucleotide diversity in our experiment for several reasons; namely, that our outcrossing protocol includes several measures that maximize outcrossing efficiency (i.e. any asexual diploids that fail to sporulate are killed), the generally high rate of recombination in yeast, and that the periods of asexual growth are short and unlikely to exceed ~ 20 cell doublings.

SNP frequency changes over 12 cycles of outcrossing

Although statistical power in this power is limited due to a lack of replication, we attempted to identify regions of the genome showing obvious responses to selection in each synthetic population. Specifically, we used Pearson’s χ2 test as implemented in the poolSeq38 package in R to compare SNP frequencies between cycle 0 and cycle 12 in each synthetic population. We chose this particular test based on a benchmarking effort that suggests it is well-suited to detecting selection in E&R experiments lacking replication9. After results were generated for each synthetic population, log transformed p values were plotted for each chromosome across sliding windows. The GenWin package in R (parameters: “smoothness = 2000, method = 3”) was once again used to define windows based on breakpoints in the data. Plots were then examined to see if there were any genomic regions showing signs of selection based on significance levels relative to the background.

Haplotype representation

In addition to describing SNP diversity, we also describe the diversity of founder haplotypes represented in our synthetic populations. We were particularly interested in evaluating whether the S-type strategy might produce populations in which founder haplotypes are more evenly represented (at intermediate frequency) compared to the K-strategy. Given the stochasticity inherent in the K-type strategy, we thought it probable that founder genotypes with especially high sporulation and/or mating efficiencies (i.e. those with the highest reproductive outputs in the outcrossing context) might come to dominate. To this end, we estimated haplotype frequencies in all experimental populations initially, and after 6 and 12 cycles of outcrossing to determine how evenly haplotypes were represented, and how this might have changed over time. We used the sliding-window haplotype caller described in Linder et al. (2020) and software the authors have made available as a community resource: github.com/tdlong/yeast_SNP-HAP. Our results were generated by using the haplotyper.limSolve.code.R script and estimates were made across 30 KB windows with a 1 KB stepsize. This particular haplotype caller was developed specifically to estimate haplotype frequencies in multiparent populations when founder haplotypes are known. A full description of the algorithm being used, and results of empirical validation can be found in Linder et al. (2020). To quantify haplotype variation in each population, we calculated haplotype diversity (H) using the following formula: (1) (H=1- sum_{i=1}^{n}{x}_{i}^{2}) where xi is the frequency of the ith haplotype of the n founders used to create given population39. Though it is worth noting that maximum expected H will vary depending on the number of founders used to create a given population as (2) ({H}_{max}=1-left(frac{1}{n}^{2}*nright).)

Phenotypic characterization of experimental populations

To evaluate the possibility that populations might be phenotypically differentiated, we measured two life-history traits: sporulation efficiency and growth rate. We estimated the 3-day sporulation efficiency for each recombinant population at the beginning and end of the experiment, as this is a life-history trait that might have reasonably responded to the selection imposed by the regular outcrossing protocol. All populations archived at “cycle 0” (i.e. the pool of diploid cells used to initiate each K- or S-type population) and “cycle 12” (i.e. diploid cells recovered from each population after the 12th outcrossing cycle) were revived by plating 1 mL of thawed culture onto a YPD agar plate and incubation at 30 °C for 48 h. In order to sample the genetic diversity of each population, a sterile wooden applicator was scraped in a zig-zag pattern across the lawn of cells on each plate to collect a pinhead-sized clump of yeast. Each clump was mixed in 10 mL YPD in a 50 mL conical tube and vortexed. Tubes were then incubated at 30 °C/200 rpm for ~ 24 h. After confirming that each tube had comparable cell densities – this was done by verifying that the OD600 absorbance value of a 1:100 dilution ranged between 0.095 and 0.2—cell pellets were collected by spinning for 5 min at 5000 rpm. Cell pellets were washed in 1 mL of sterile water, spun down again, and resuspended in 40 mL of minimal sporulation media (1% potassium acetate w/v). Each culture was transferred to sterile 250 mL Erlenmeyer flasks and covered loosely with foil, where they were cultured at 30 °C/200 rpm for ~ 72 h to sporulate. After sporulation, aliquots of each culture were loaded onto a hemacytometer (Incyto C-Chip, type NI) and visualized under 40 × magnification on a Singer SporePlay microscope. For each culture, ~ 200 cells were counted (specific range: 190–230 cells), and sporulation efficiencies were estimated as the proportion of tetrads observed over the total number of cells in the field of view. Sporulation efficiency for each of the 12 recombinant populations (6 “cycle 0” and 6 “cycle 12”) was assessed by averaging these proportions over 2–3 independent biological replicates.

In addition to characterizing sporulation efficiencies for each of the “cycle 0” and “cycle 12” recombinant populations, we also measured growth rate with high-throughput absorbance-based assays in liquid YPD. We also included the 12 founder strains in this assay, for comparison with the recombinant populations. S- and K-type recombinant populations were sampled from each freezer recovery plate as described above. Haploid founder strains were revived from freezer stocks by striking for single colonies onto YPD agar plates. Each population or strain was assayed in two biological replicates; recombinant populations were sampled to inoculate two separate overnight cultures in liquid YPD, and strains were sampled by picking two distinct colonies to initiate two separate overnight cultures (one colony per culture). All biological replicates were incubated for ~ 24 h at 30 °C/200 rpm. The day of the assay, OD600 was measured in all cultures and the readings used to standardize them to a target OD600 of 0.05 in fresh YPD (observed values ranged 0.042–0.061). 200uL of each culture was aliquoted to separate wells of a 96-well plate, with two technical replicates per biological replicate. The arrangement of technical replicates on the plate was carried out in an attempt to control for possible edge effects. The growth rate assay was carried out in a Tecan Spark Multimode Microplate Reader, set to record the absorbance at 600 nm for each well every 30 min for 48 h at 30 °C, without plate agitation/aeration. The R-package “Growthcurver” (Sprouffske and Wagner40) was used to estimate population growth parameters from the raw data. In order to determine the carrying capacity and doubling time of the culture in each well, the absorbance measurements taken during the assay were fit to the following equation:

$${N}_{t}=frac{{N}_{0}K}{{N}_{0}+(K-{N}_{0}){e}^{-rt}}$$

(1)

where Nt is the absorbance reading at time t, N0 is the initial absorbance, K is the carrying capacity, and r is the growth rate, or doubling time. Here, doubling time refers to the time necessary for the size of a population to double under non-restricted conditions, while carrying capacity is the maximum population size under the given conditions. The values for each biological replicate were averaged across technical replicates, and the values for each strain/population were determined by averaging across biological replicates.

Read more here: Source link