Shallow shotgun sequencing reduces technical variation in microbiome analysis

Participant selection and sample collection

Informed consent was obtained for five adult volunteers. The study protocols were reviewed and approved by the Advarra Institutional Review Board (Advarra, Inc., Columbia, MD). All analyses were performed according to the relevant guidelines and regulations. Fecal collection was completed by self-sampling, which has proven to be highly successful in previous work25. Briefly, participants each collected a small quantity of fecal material into 90% ethanol using a sterile swab once per day during each day of the planned collection period. Subjects collected one sample to begin their sampling cycle (day 1), with another sample taken on day 2. Subsequent samples were taken exactly one week after the first samples (days 8 and 9). Participants were given postage-paid pre-addressed boxes and asked to drop samples in the US postal service mail to return samples to Diversigen for processing after all four samples were collected. After shipping, all samples were stored at − 80 °C prior to processing.

DNA extraction, library preparation and sequencing (16S amplicon)

Samples were extracted using a QIAamp PowerFecal DNA Kit (QIAGEN, Germantown, MD, USA) automated for high throughput on a QiaCube, with bead beating in 0.1 mm glass bead plates. Samples were then quantified via qPCR using primers for hypervariable region 4 (V4) (515f/806r) of the 16S rRNA gene. Libraries were prepared using a protocol derived from previous methods21 using KAPA HiFi Polymerase to amplify the V4 region (515f/806r). Samples were indexed using Illumina Unique Dual Indexes (UDI), followed by pooling of libraries. The resulting pooled libraries were denatured with NaOH, diluted to a loading concentration of 8 pM in HT1 buffer (Illumina Inc., San Diego, CA, USA), and spiked with 15% PhiX. The final libraries were sequenced on an Illumina MiSeq instrument using paired-end 2 × 250 reads and the MiSeq Reagent Kit v3 (Illumina). Sequences were demultiplexed on the sequencer and then converted to FASTQ files using bcl2fastq (Illumina). DNA sequences were filtered for low quality (Q-Score < 20) and length (< 50 bp), and adapter sequences were trimmed using cutadapt26.

DNA extraction, library preparation and sequencing (shallow shotgun metagenomics)

Samples were extracted in the same manner as above for amplicon sequencing, followed by quantification using a Quant-iT PicoGreen dsDNA assay kit (Thermo Fisher Scientific, Waltham, MA, USA), with fluorescence measured on a TECAN plate reader (Tecan Group Ltd., Männedorf, Switzerland). Samples were indexed using Illumina Unique Dual Indexes (UDI), followed by pooling of libraries. Libraries were pooled, followed by SPRI bead purification and concentration using SpeedBead Magnetic Carboxylate Modified Particles (GE Healthcare Life Sciences). The resulting pooled libraries were denatured with NaOH, diluted to a loading concentration of 1.8 pM in Illumina’s HT1 buffer, and spiked with 2% PhiX. Shotgun metagenomic sequencing was performed on an Illumina NextSeq 500 instrument using a NextSeq 500/550 High Output 150 cycle kit (1 × 145 bp reads). Sequence reads were demultiplexed and quality filtered in the same manner as above for amplicon sequencing. Finally, FASTQ files were merged and converted into a single FASTA using SHI727. All sequences were trimmed to a maximum length of 100 bp prior to alignment.

Sequence alignment and annotation (16S amplicon)

Amplicon Sequence Variants (ASVs) were determined using DADA2 (v1.12)28 with default parameters except where specified. Briefly, filterAndTrim(…truncLen = c(240,150)) was run to trim low-quality tails from sequence reads, followed by learnErrors to learn error rates of the full dataset. ASVs were inferred using the dada algorithm, and reads were then merged using mergePairs. Chimeras were removed using removeBimeraDenovo. Finally, ASVs were assigned taxonomies via classification to the Silva reference database (v132)29,30 using assignTaxonomy, and, where possible, species level assignments using exact matching were made using addSpecies. ASVs with identical taxonomic assignments in the ASV table were collapsed to create a taxa table. This collapsed taxa table was used in all downstream analyses.

Sequence alignment and annotation (shallow shotgun metagenomics)

DNA sequences were aligned to a curated database containing all representative genomes in RefSeq (release 81) for bacteria31, with additional manually curated strains (i.e. Diversigen’s Venti database). Alignments were made at 97% identity against all reference genomes. Every input sequence was compared to every reference sequence in Venti using fully gapped alignment with BURST32. Ties were broken by minimizing the overall number of unique Operational Taxonomic Units (OTUs) based on taxonomic assignments (i.e. “CAPITALIST” method in BURST). Specifically, each input sequence was assigned the lowest common ancestor that was consistent across at least 80% of all reference sequences tied for best hit. The number of counts for each OTU was normalized to the average genome length. OTUs accounting for less than one millionth of all species-level markers and those with less than 0.01% of their unique genome regions covered (and < 1% of the whole genome) were discarded. Samples with fewer than 10,000 sequences mapping to the database were also discarded. Count data was then converted to relative abundance for each sample. The normalized and filtered tables were used for all downstream analyses. For functional annotation, Kyoto Encyclopedia of Genes and Genomes Orthology groups (KEGG KOs)33,34,35 were observed directly using alignment at 97% identity against a gene database derived from the strain database used above. The KO table was used to derive the downstream KEGG Enzyme table by collapsing KOs to the enzyme level, followed by conversion to relative abundances. This KEGG enzyme table was used for all downstream analyses.

Diversity metrics calculations, statistical analyses and visualization

Alpha diversity (Chao1, Shannon index and Observed features) and beta diversity (Bray–Curtis dissimilarity) of taxa and functions were calculated using the R36 package vegan (v2.6-4)37, using the ASV/OTU/KEGG Enzyme tables rarified to the lowest sample depth for their respective sequencing types (amplicon vs. shallow shotgun). To assess differences in alpha diversity between subjects, the nonparametric Kruskal–Wallis one-way analysis of variance followed by Dunn’s post-hoc test were used. False discovery rate (FDR) correction was used to account for multiple hypothesis testing. Differences in beta diversity between groups of interest was assessed via multivariate homogeneity of groups dispersions, followed by permutational multivariate analysis of variance (PERMANOVA) as implemented in vegan’s betadisper and adonis functions, respectively. To quantify the magnitude of variation from different sources, the Bray–Curtis dissimilarity matrix was subset to include only samples for each given source of interest as follows: (1) technical variation caused by DNA extraction of the same sample, (2) technical variation caused by the same extraction replicate prepped into separate libraries and run on separate sequencing runs, (3) biological variation caused by day of sampling, (4) biological variation caused by week of sampling, and (5) biological variation caused by subject-to-subject variation. Differences in variation by source were assessed using Kruskal–Wallis followed by Dunn’s post-hoc test. Specific differences in technical variation between 16S and shallow shotgun sequencing were assessed using two-way ANOVA followed by student’s t-tests. All visualizations were generated using the R package ggplot2 (v3.3.6)38.

Read more here: Source link