Comparison of fecal and blood metabolome reveals inconsistent associations of the gut microbiota with cardiometabolic diseases

Study cohorts

This study was based on the Guangzhou Nutrition and Health Study (GNHS). The detailed description of GNHS cohort could be found in previous studies22,23. Briefly, a total of 4048 Chinese participants aged 40–75 years living in the urban area of Guangzhou, China, for at least 5 years were recruited between 2008–2013. We followed these participants every three years. During 2014–2018 follow-up visits, stool and fasting blood samples were collected at the same time. We included participants with paired fecal and blood samples (N = 1008), and one paired fecal and blood samples were available per individual. We excluded participants who have taken antibiotics within 2 weeks (N = 1). Finally, 1007 participants (age: 64.7 ± 5.6; BMI: 23.6 ± 3.1) were remained for subsequent analysis (Table 1).

Table 1 Characteristics of the study participants

We used the control arm of a case-control study for hip fraction as the validation cohort24. The participants were enrolled between 2009 and 2012 in Guangdong Province, China. Stool and fasting blood samples were collected at the follow-up visits between February 2017 and May 2017 at the same time point. We included participants with paired fecal and blood samples in the present study (N = 103; age: 71.5 ± 7.1; BMI: 24.2 ± 3.7; Table 1).

All participants involved in this study provided written informed consent prior to sample collection. The Ethics Committee of the School of Public Health at Sun Yat-sen University (2018048) and Westlake University (20190114ZJS0003) approved the study protocols.

Shotgun metagenomic sequencing and preprocessing

Fecal samples from all participants in the discovery and validation cohort were collected on the examination day. During a follow-up visit to the study center, we gave each participant a stool sampler and provided the detailed instructions for stool sample collection. Participants collected their stool samples after defecation and delivered the sample to the staff immediately. The stool samples were temporarily stored in an ice box, and transported to the research laboratory and stored in a −80 °C freezer within four hours.

Microbial DNA was isolated using the QIAamp DNA Stool Mini Kit (Qiagen, Hilden, Germany) based on the manufacturer’s instruction. DNA concentrations were determined by the Qubit quantification system (Thermo Scientific, Delaware, US). The Illumina HiSeq platform (Illumina Inc., CA, USA) was used for shotgun metagenome sequencing with 2 × 300-bp paired-end reads protocol. The microbial DNA extraction and shotgun metagenome sequencing were performed at Novogene Company (Beijing, China). After sequencing, we obtained an average of 41.9 million (minimum: 22.1 million; maximum: 65.2 million) paired-end raw reads for each sample. The detailed information on bioinformatics analysis of the metagenome data could be found in our previous paper37. PRINSEQ (version 0.20.447) was employed to filter the reads with low-quality scores, with the following filtering parameters: (1) trim the reads by quality score from the 5′ end and 3′ end with a quality threshold of 20; (2) remove read pairs when either read was <60 bp, contained “N” bases or quality score mean bellow 30; and (3) deduplicate the reads. Reads that could be aligned to the human genome (H. sapiens, UCSC hg19) were removed (aligned with Bowtie2 v2.2.5 using –reorder –no-contain –dovetail)38. Taxonomic profiling of the metagenomic samples was performed using MetaPhlAn2 (version 2.6.02) with default parameters which used a library of clade-specific markers to provide pan-microbial (bacterial, archaeal, viral and eukaryotic) quantification at the species level39. We used the HUMAnN2 (version 2.8.1) with default parameters for functional profiling of metagenomic samples40, in which microbial pathways were generated based on MetaCyc metabolic pathway database41,42. We included microbial species and pathways with a minimum detective relative abundance of 0.01% in at least 10% of the samples, which yielded 149 species and 214 pathways. We log-transformed the relative abundance of species and pathway features before subsequent analysis and scaled them to zero-mean and unit-variance.

Fecal and blood metabolomics profiling and preprocessing

We performed targeted metabolomics profiling for fecal and serum samples by an ultra-performance liquid chromatography coupled to tandem mass spectrometry (UPLC-MS/MS) system (ACQUITY UPLC-Xevo TQ-S, Waters Corp., Milford, MA, 570 USA). The Q300 Kit provided by Metabo-Profile Corp. (Shanghai, China), coving up to 310 metabolites and 12 biochemical classes, was used for targeted metabolomics profiling, which was commonly used in several recent studies43,44,45. It mainly includes known gut microbiota-derived metabolites including SCFAs, bile acids, indoles, etc. and other key host metabolites, such as amino acids, carbohydrates, organic acids, and so on. Briefly, serum/lyophilized fecal sample vortexed vigorously with ice methanol (internal standards contained), and the supernatant was obtained. Then, ice-cold 50% methanol solution was added to dilute the sample with 4000 × g centrifugation, and the supernatant mixed with internal standards for each sample was sealed before UPLC-MS/MS profiling. The instrument parameters were setting as follows: C18 analytical column (2.1 × 100 mm,1.7 μM); column temperature 40 °C; mobile phases A (water with 0.1% formic acid), mobile phases B (acetonitrile: IPA, 90:10). The whole profiling process was performed at Metabo-Profile Corp. (Shanghai, China). The quality control (QC) samples were made up of pooled samples and were run every 14 samples. Raw data generated by UPLC-MS/MS were processed using the QuanMET software (v2.0, Metabo-Profile, Shanghai, China) to perform peak integration, calibration, and quantification for each metabolite. Metabolomic features were annotated to metabolites with MSI level 1 of confidence by comparing them to the standards of targeted metabolites. The gut metagenomic sequencing and fecal metabolomics profiling were performed with independent randomization procedures to ensure that the sample orders were not the same.

We quantified the concentrations of 204 fecal metabolites and 232 blood metabolites in the targeted metabolomics measurements. There were 173 overlapped metabolites between fecal and blood metabolites. After removing fecal or blood metabolites that were detected in less than 80% of samples, 159 overlapped metabolites were obtained. We further excluded metabolites with the relative standard deviation (standard deviation/mean) value in QC samples larger than 0.3 in fecal or blood samples. Finally, 132 paired fecal and blood metabolites were included in the present study. These metabolites mainly include amino acids, fatty acids, organic acids, carbohydrates, bile acids, benzenoids, carnitines, phenylpropanoic acids, pyridines, indoles, organooxygen compounds, and nucleosides. We imputed the missing values of metabolites by half the minimal concentration of the corresponding metabolites in the remaining non-missing samples. We performed the log-transformation for fecal and blood metabolomics data and standardized them into Z-scores (mean = 0, variance = 1).

Cardiometabolic disease ascertainment

Type 2 diabetes (T2D) was defined as fasting blood glucose ≥ 7.0 mmol/L (126 mg/dL) or HbA1c ≥ 6.5% (48 mmol/mol) or self-reported drug medications for T2D, according to T2D diagnosis criteria from the American Diabetes Association46. Hypertension was ascertained based on systolic blood pressure ≥ 140 mmHg or diastolic blood pressure ≥ 90 mmHg or current antihypertensive medication use, according to the hypertension diagnostic standards published by WHO/International Society of Hypertension Committee47. Nonalcoholic fatty liver disease (NAFLD) was identified based on abdominal ultrasonography using a Doppler sonography machine (Sonoscape SSI-5500, Shenzhen, China). NAFLD was diagnosed according to criteria of the Fatty Liver Disease Study Group of the Chinese Liver Disease Association48. Obesity was defined as BMI ≥ 28 based on the suggestion of Working Group On Obesity In China for Chinese populations49.

Statistical analysis

Phenotypic correlation and genetic correlation analysis

We estimated the phenotypic correlations (a direct comparison of each metabolite’s values between feces and blood) between paired fecal and blood metabolites using partial Spearman correlation analysis, adjusted for age, sex and BMI. In addition, we calculated the genetic correlation (the proportion of shared heritability between paired fecal and blood metabolites) between fecal and blood metabolites using the bivariate GREML analysis by GCTA25,50. Genetic correlation analysis was performed for 596 participants with matched genotyping and fecal and blood metabolomics data. The detailed information for the genotyping data was described in our previous paper51. We used the Benjamini-Hochberg method to control the false discovery rate (FDR) caused by multiple testing. Phenotypic or genetic correlations between paired fecal and blood metabolites with the absolute value of correlation coefficients \(|r|\)> 0.3 and FDR < 0.05 were considered significant.

Gut microbiota-fecal/blood metabolome association analysis

Given that there were strong symbiotic relationships among gut microbes, we treated the gut microbiota as a whole and estimated how well taxonomic composition or microbial pathways could predict the concentrations of fecal and blood metabolites using several machine learning pipelines. Random Forest (RF) model with default hyperparameters has recently been adopted to predict the fecal metabolites based on gut microbiota after comparing several machine learning pipelines18. Meanwhile, Light Gradient Boosting Machine (LightGBM) model has been used to estimate the associations of gut microbiota with serum metabolites5. As RF and LightGBM models were commonly used in the gut microbiota-metabolome association studies, we first used both of them to predict the levels of fecal and blood metabolites based on taxonomic composition or microbial pathways and compared their performances. The machine learning pipelines including the RF and LightGBM models were conducted using the five-fold cross-validation strategy to avoid the potential overfitting, with 202, 202, 201, 201, 201 held-out samples, respectively, for these five-folds. We calculated Spearman’s correlation coefficient r and Spearman’s correlation P value between the measured and predicted metabolite levels for held-out samples. Spearman’s correlation P value was further transformed into FDR value to correct for multiple testing. We defined metabolites with r > 0.3 and FDR < 0.05 as well-predicted metabolites. The cut-off r value was based on previous studies18,26. We also performed sensitivity analyses by setting the cut-off r value as 0.2 and 0.4. Sensitivity analyses for participants without T2D medications (N = 923), hypertension medications (N = 706), dyslipidemia medications (N = 749), or without any of the above three medications (N = 530) were also performed, respectively. The RF and LightGBM methods were implemented using the R package randomForest (version: 4.6-14)52 and lightgbm (version: 3.3.1)53, respectively. We used the RF model with default hyperparameters recommended by Muller et al.’s study18. The hyperparameters of LightGBM model were determined according to Bar et al.’s study:5 learning_rate = 0.005, max_depth = default, feature_fraction = 0.2, num_leaves = default, min_data_in_leaf = 15, metric = L2, early_stopping_rounds = None, n_estimators = 2000, bagging_fraction = 0.8, bagging_freq = 1. Eventually, as the RF model had a better performance in terms of root mean square error (RMSE) and predictability than LightGBM model (Supplementary Notes 1; Supplementary Fig. 1), the main results were reported based on the RF method throughout this study.

We assessed the differences between the associations of taxonomic composition/microbial pathways with paired fecal and blood metabolites for each well-predicted metabolite using the method proposed by Hittner et al.54, which was implemented by R package cocor (version: 1.1-4)55.

Independent validation for the identified gut microbiota-fecal/blood metabolite associations

We then attempted to replicate the above identified associations in an independent validation cohort. Firstly, the RF model was built in the GNHS (discovery) cohort to predict the levels of fecal/blood metabolites based on the taxonomic composition/microbial pathway data. Then, the constructed models were directly applied in the validation cohort, and the corresponding Spearman’s correlation coefficient r and Spearman’s correlation FDR values between measured and predicted metabolite levels were obtained for each metabolite. We considered the associations with Spearman’s correlation coefficient > 0.3 and FDR < 0.05 as being validated.

Associations between well-predicted fecal/blood metabolites and cardiometabolic diseases

We examined the associations of well-predicted fecal and blood metabolites with prevalent cardiometabolic diseases (obesity [Npatients = 86], T2D [Npatients = 160], hypertension [Npatients = 472], and NAFLD [Npatients = 668]) using multivariable logistic models, adjusted by age, sex, smoking status, alcohol status, education, income, physical activity, and total energy intake for obesity, and by age, sex, BMI, smoking status, alcohol status, education, income, physical activity, and total energy intake for T2D, hypertension and NAFLD. We also performed sensitivity analysis to additionally adjust for T2D, hypertension, and dyslipidemia medications in multivariable logistic models. Associations with FDR < 0.05 were considered statistically significant. The results were further replicated in the validation cohort. Only T2D (Npatients = 27), hypertension (Npatients = 23), and obesity (Npatients = 14) were available in the validation cohort. Pearson correlation between the partial regression coefficients obtained from the discovery and validation cohort was calculated.

All statistical analyses were performed using R software (version: 4.1.1) unless otherwise specified.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Read more here: Source link