Differential abundance - Extended

The notebook on 'Differential abundance - Basics' only explained two different methods for doing DA analysis. There are however many different methods, and choosing the best is not a simple task. Benchmark studies have shown that different methods have different strengths, and the choice of method therefore in part relies on the specific dataset in question. The DAtest package tries to solve this problem by testing a lot if different methods on your own dataset and provide some statistics on their performance. The details can be found in the wiki, but the basic idea is that the predictor of interest (e.g. treatment/control) is shuffled. Then for some random features (ASVs) we multiply the abundance with some number (e.g. 5), but only for the shuffled treatment samples. We then apply the diferent DA methods and check if they can find the ASVs which were increased in abundance for the shuffled predictor. The predictor is shuffled to remove any true effect present in the dataset. This procedure is repeated several times, by default 20, to evaluate the consistency of the performance.

Another feature of DAtest is that it is easy to run several different methods and compare their output. One could for example run 3 methods known to perform well, and then use their combined results for a less biased DA analysis.

In [1]:
# Load phyloseq object and phyloseq package
library(phyloseq)
load("../data/physeq.RData")
In [2]:
library(DAtest)
DAtest version 2.7.18

preDA

As explained in the Basics notebook, we always want to prepare our data first, especially by pruning low abundant taxa

preDA is identical to the ps_prune function in the MicEco package

In R:

In [3]:
# Agglomorate to Genus level
phy_genus <- tax_glom(phy, "Genus")

# Only keep genera present in at least 10 samples
phy_genus_pre <- preDA(phy_genus, min.samples = 10)
204 features grouped as 'Others' in the output

testDA

testDA runs all the different methods and compares them by their empirical power (ability to detect spiked features), their False Discovery Rate (FDR, fraction of significant features which are false positives), and their AUC (area under the ROC curve, ability to rank the features from most to least associated to the predictor). This is all combined into a score, which ranks the different methods. The method with the highest score should perform best.

It is important to note that DAtest assumes that most features are NOT associated with the predictor. Therefore, if for example your control and treatment samples are very different (clearly separated in a PCoA ordination), then only the False Positive Rate (FPR) can be used to weed out bad methods, but the other measures are not trustworthy.

In R:

In [4]:
res <- testDA(phy_genus_pre, predictor = "Delivery")
Running on 7 cores

predictor is assumed to be a categorical variable with 2 levels: Sectio, Vaginal

Spikeing...
Testing 26 methods 20 times each...
  |======================================================================| 100%
ds2x, erq2, ere2 were excluded due to failure

In [5]:
summary(res)
                     Method  AUC  FPR  FDR Power Score Score.5% Score.95%  
        MgSeq Feature (msf) 0.80 0.04 0.00  0.13  0.04    -1.00      0.16 *
 DESeq2 man. geoMeans (ds2) 0.72 0.04 0.00  0.07  0.01    -1.00      0.06 *
               t-test (ttt) 0.70 0.05 0.00  0.07  0.01    -0.51      0.02 *
          Permutation (per) 0.63 0.06 0.00  0.07  0.01    -0.50      0.01 *
               Wilcox (wil) 0.55 0.06 0.00  0.03  0.00    -0.50      0.01 *
          LIMMA - ALR (lia) 0.62 0.09 0.00  0.00  0.00    -1.09      0.00 *
         t-test - ALR (tta) 0.62 0.09 0.00  0.00  0.00    -1.08      0.02 *
          LIMMA - CLR (lic) 0.60 0.06 0.00  0.00  0.00    -1.00      0.01 *
            Log LIMMA (lli) 0.57 0.04 0.00  0.00  0.00    -1.00      0.00 *
        t-test - Rank (ttr) 0.56 0.05 0.00  0.00  0.00    -1.00      0.00 *
         Log t-test2 (ltt2) 0.67 0.04 0.00  0.00  0.00    -0.51      0.02 *
         t-test - CLR (ttc) 0.60 0.06 0.00  0.00  0.00    -0.50      0.01 *
         Log LIMMA 2 (lli2) 0.67 0.04 0.00  0.00  0.00    -0.50      0.02 *
           Log t-test (ltt) 0.57 0.04 0.00  0.00  0.00    -0.50      0.00 *
    Quasi-Poisson GLM (qpo) 0.67 0.06 0.12  0.07 -0.11    -1.00      0.02 *
        ZI-NegBin GLM (znb) 0.73 0.10 0.55  0.20 -0.50    -0.73      0.11 *
           LIMMA voom (vli) 0.58 0.12 0.57  0.03 -0.57    -1.00      0.00 *
       ZI-Poisson GLM (zpo) 0.75 0.93 0.86  1.00 -0.61    -0.88     -0.52 *
    EdgeR exact - TMM (ere) 0.72 0.15 0.67  0.20 -0.62    -0.87     -0.14 *
      EdgeR qll - TMM (erq) 0.71 0.23 0.71  0.33 -0.64    -0.82     -0.29 *
          Poisson GLM (poi) 0.71 0.95 0.86  1.00 -0.65    -0.85     -0.55 *
            MgSeq ZIG (zig) 0.70 0.55 0.82  0.80 -0.66    -0.95     -0.51 *
         Negbinom GLM (neb) 0.57 0.19 0.78  0.27 -0.76    -0.88     -0.62 *

MetagenomeSeq Feature (msf) method appears to be the best. However, the bootstrapped confidence interval (Score.5% and Score.95%) show that many methods are indistugishiable from msf (*).

Let's run the msf method:

In [6]:
res_msf <- DA.msf(phy_genus_pre, predictor = "Delivery")
Default value being used.

In [7]:
res_msf[res_msf$pval.adj <= 0.05, ]
A data.frame: 17 × 19
Feature+samples in group 0+samples in group 1counts in group 0counts in group 1logFCsepvaladjPvaluespval.adjorderingMethodKingdomPhylumClassOrderFamilyGenusSpecies
<chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr>
408b21f8171b35ef001832ba1df9b8fbc6956232864 81668-1.44414160.30838432.828047e-061.583706e-041.583706e-04Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes_C Negativicutes Veillonellales Veillonellaceae Veillonella NA
9107063e8453bf94891c4d7a17b00af07 417 319 9511 2.89379220.97050502.866166e-032.140070e-022.140070e-02Vaginal>SectioMgSeq Feature (msf)BacteriaFirmicutes_C Negativicutes Veillonellales Veillonellaceae Veillonella_A NA
1011d9c9ed1d2d58a9cb0bbcdc93aa3e55 326 41 27727 4.73444481.26753311.875924e-043.082105e-033.082105e-03Vaginal>SectioMgSeq Feature (msf)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus_H NA
24333fc718c6bccdbf0fd4e4f84d0dd38e3435143352 11880-2.67860340.60489469.501622e-063.547272e-043.547272e-04Sectio>VaginalMgSeq Feature (msf)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Prevotella NA
2944d9fc4de8898b6c82c69654435a9f0b3225 26682 6755-1.80195160.56449061.412069e-031.757241e-021.757241e-02Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus NA
3045c4f6f269ec95ff0c8dd594104519e92717159140 40544-1.79961740.59014512.292617e-032.139776e-022.139776e-02Sectio>VaginalMgSeq Feature (msf)BacteriaProteobacteriaGammaproteobacteriaEnterobacterales Enterobacteriaceae Citrobacter NA
486dbcf153b0182c45e43924e6a0814551 9 2 378 9-3.85800861.36234844.627506e-033.092891e-023.092891e-02Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes_A Clostridia TANB77 CAG-508 CAG-354 NA
50705da2e88a1aaa0c02b27eab4a642b72 114 14 7183 0.97388010.26009361.808657e-043.082105e-033.082105e-03Vaginal>SectioMgSeq Feature (msf)BacteriaProteobacteriaGammaproteobacteriaEnterobacterales Enterobacteriaceae Morganella NA
5372bda5561e865a5539c4f19f992ee8305438117888 49799-1.24853140.41736852.776685e-032.140070e-022.140070e-02Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Ruminococcus_B NA
6487c2912f70537a375ec52d2e374795341133 1086 33112 2.44043070.77995541.754404e-031.882993e-021.882993e-02Vaginal>SectioMgSeq Feature (msf)BacteriaFirmicutes Bacilli ErysipelotrichalesErysipelatoclostridiaceaeErysipelatoclostridiumNA
6895b523dacb244bb891da4c48923d8788 8 6 614 42-3.16032521.04708602.542727e-032.140070e-022.140070e-02Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes_A Clostridia Tissierellales Peptoniphilaceae Peptoniphilus_A NA
75a3f2abd64e462dc86f8242a90d98ee66 9 1 2353 38-0.56500410.18147421.849368e-031.882993e-021.882993e-02Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes_A Clostridia Oscillospirales DTU089 Eubacterium_R NA
76a4ac1cd75e12a7e924dd49b96366a5df 713 1078 11544 2.76568280.85786031.264437e-031.757241e-021.757241e-02Vaginal>SectioMgSeq Feature (msf)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Bacteroides_A NA
90ccc3cce7144df93b827dc9a9fd18dfaf2311 18890 1173-2.26766070.60819841.926316e-043.082105e-033.082105e-03Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus_C NA
93d20d3658de331c939f54d8acaed7c4c16473317989773805 1.86399490.31666253.947523e-094.421225e-074.421225e-07Vaginal>SectioMgSeq Feature (msf)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Bacteroides NA
103efe2ef829cdc0a2cf36f8c27f96678d71411 3493 68-2.33657920.82644154.694566e-033.092891e-023.092891e-02Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes_A Clostridia Tissierellales Helcococcaceae Anaerococcus NA
111Others 7370 19298 11601-1.05464270.24186111.297471e-053.632920e-043.632920e-04Sectio>VaginalMgSeq Feature (msf)NA NA NA NA NA NA NA

Lots of significant ones!

Troubleshoot

If the best Score among the methods is 0 (zero), then none of the methods are good enough to detect anything in your dataset. One problem could be that the predictor you are testing has too large an effect, in which case you should only use testDA to remove methods which have a too high FPR. You could also re-run testDA with a higher effectSize, which should make it easier for the methods to detect differentially abundant features.

allDA

allDA runs the different methods on the actual dataset. This way you can easily compare the different methods. Let's try it out.

In [8]:
res_all <- allDA(phy_genus_pre, predictor = "Delivery")
Running on 7 cores

predictor is assumed to be a categorical variable with 2 levels: Sectio, Vaginal

Running 26 methods...
  |======================================================================| 100%
ds2x, erq2, ere2 were excluded due to failure

An easy comparison is by creating Euler (Venn) diagrams of features which are found significant.

Let's try to create a diagram with 3 best methods:

In [9]:
vennDA(res_all, tests = c("msf", "ds2", "ttt"))
Registered S3 method overwritten by 'eulerr':
  method    from  
  plot.venn gplots

The 11 features found significant by all 3 methods might be truly differentially abundant, but let's ensure that the sign of the log fold change is similar:

In [10]:
vennDA(res_all, tests = c("msf", "ds2", "ttt"), split = TRUE)

Looks good! 7 features have a positive log fold change with all 3 methods, and 4 features have a negative log fold change with all 3 methods. If the signs were not matching, then we would not know in which group the features have the highest abundance.