Chapter 9 Differential abundance analysis

Here, we analyse abundances with three different methods: Wilcoxon test (CLR), DESeq2,
and ANCOM-BC. All of these test statistical differences between groups.
We will analyse Genus level abundances.

We might want to first perform prevalence filtering to reduce the amount of multiple tests. In this particular dataset, all genera pass a prevalence threshold of 10%, therefore, we do not perform filtering.

Wilcoxon test

A Wilcoxon test estimates the difference in an outcome between two groups. It is a
non-parametric alternative to a t-test, which means that the Wilcoxon test
does not make any assumptions about the data.

Let’s first combine the data for the testing purpose.

DESeq2. This method performs the data
normalization automatically. It also takes care of the p-value
adjustment, so we don’t have to worry about that.

DESeq2 utilizes a negative binomial distribution to detect differences in
read counts between groups. Its normalization takes care of the
differences between library sizes and compositions. DESeq2 analysis
includes multiple steps, but they are done automatically. More
information can be found, e.g., from Harvard Chan Bioinformatic Core’s
tutorial Introduction to DGE –
ARCHIVED

Now let us show how to do this. First, run the DESeq2 analysis.

The analysis of composition of microbiomes with bias correction (ANCOM-BC)
is a recently developed method for differential abundance testing. It is based on an
earlier published approach.
The former version of this method could be recommended as part of several approaches:
A recent study
compared several mainstream methods and found that among another method, ANCOM produced the most consistent results and is probably a conservative approach. Please note that based on this and other comparisons, no single method can be recommended across all datasets. Rather, it could be recommended to apply several methods and look at the overlap/differences.

As the only method, ANCOM-BC incorporates the so called sampling fraction into the model. The latter term could be empirically estimated by the ratio of the library size to the microbial load. Variations in this sampling fraction would bias differential abundance analyses if ignored. Furthermore, this method provides p-values, and confidence intervals for each taxon.
It also controls the FDR and it is computationally simple to implement.

As we will see below, to obtain results, all that is needed is to pass
a phyloseq object to the ancombc() function. Therefore, below we first convert
our tse object to a phyloseq object. Then, we specify the formula. In this formula, other covariates could potentially be included to adjust for confounding.
Please check the function documentation
to learn about the additional arguments that we specify below. Also, see here for another example for more than 1 group comparison.

documentation of the function
under Value for an explanation of all the output objects. Our question can be answered
by looking at the res object, which now contains dataframes with the coefficients,
standard errors, p-values and q-values. Conveniently, there is a dataframe diff_abn.
Here, we can find all differentially abundant taxa. Below we show the first 6 entries of this dataframe:

Source link