The notebook on 'Differential abundance - Basics' only explained two different methods for doing DA analysis. There are however many different methods, and choosing the best is not a simple task. Benchmark studies have shown that different methods have different strengths, and the choice of method therefore in part relies on the specific dataset in question. The DAtest package tries to solve this problem by testing a lot if different methods on your own dataset and provide some statistics on their performance. The details can be found in the wiki, but the basic idea is that the predictor of interest (e.g. treatment/control) is shuffled. Then for some random features (ASVs) we multiply the abundance with some number (e.g. 5), but only for the shuffled treatment samples. We then apply the diferent DA methods and check if they can find the ASVs which were increased in abundance for the shuffled predictor. The predictor is shuffled to remove any true effect present in the dataset. This procedure is repeated several times, by default 20, to evaluate the consistency of the performance.
Another feature of DAtest is that it is easy to run several different methods and compare their output. One could for example run 3 methods known to perform well, and then use their combined results for a less biased DA analysis.
# Load phyloseq object and phyloseq package
library(phyloseq)
load("../data/physeq.RData")
library(DAtest)
As explained in the Basics notebook, we always want to prepare our data first, especially by pruning low abundant taxa
preDA is identical to the ps_prune function in the MicEco package
In R:
# Agglomorate to Genus level
phy_genus <- tax_glom(phy, "Genus")
# Only keep genera present in at least 10 samples
phy_genus_pre <- preDA(phy_genus, min.samples = 10)
testDA runs all the different methods and compares them by their empirical power (ability to detect spiked features), their False Discovery Rate (FDR, fraction of significant features which are false positives), and their AUC (area under the ROC curve, ability to rank the features from most to least associated to the predictor). This is all combined into a score, which ranks the different methods. The method with the highest score should perform best.
It is important to note that DAtest assumes that most features are NOT associated with the predictor. Therefore, if for example your control and treatment samples are very different (clearly separated in a PCoA ordination), then only the False Positive Rate (FPR) can be used to weed out bad methods, but the other measures are not trustworthy.
In R:
res <- testDA(phy_genus_pre, predictor = "Delivery")
summary(res)
MetagenomeSeq Feature (msf) method appears to be the best. However, the bootstrapped confidence interval (Score.5% and Score.95%) show that many methods are indistugishiable from msf (*).
Let's run the msf method:
res_msf <- DA.msf(phy_genus_pre, predictor = "Delivery")
res_msf[res_msf$pval.adj <= 0.05, ]
Lots of significant ones!
If the best Score among the methods is 0 (zero), then none of the methods are good enough to detect anything in your dataset. One problem could be that the predictor you are testing has too large an effect, in which case you should only use testDA to remove methods which have a too high FPR. You could also re-run testDA with a higher effectSize, which should make it easier for the methods to detect differentially abundant features.
allDA runs the different methods on the actual dataset. This way you can easily compare the different methods. Let's try it out.
res_all <- allDA(phy_genus_pre, predictor = "Delivery")
An easy comparison is by creating Euler (Venn) diagrams of features which are found significant.
Let's try to create a diagram with 3 best methods:
vennDA(res_all, tests = c("msf", "ds2", "ttt"))
The 11 features found significant by all 3 methods might be truly differentially abundant, but let's ensure that the sign of the log fold change is similar:
vennDA(res_all, tests = c("msf", "ds2", "ttt"), split = TRUE)
Looks good! 7 features have a positive log fold change with all 3 methods, and 4 features have a negative log fold change with all 3 methods. If the signs were not matching, then we would not know in which group the features have the highest abundance.