Differential abundance

Neither alpha or beta diversity analysis tells us which taxa are different in our treatment groups. For this, we use differential abundance (DA) analysis. DA methods compare the abundance of each taxa between our treatment groups to tell which are differentially abundant.

There are many different tools for conducting DA analysis, and no gold standard. The problem is that there are many statistical challenges with analysing amplicon data, which the different tools try to solve in different ways:

  • Compositionality: The abundances of taxa are not indepedent of each other (see also the Compositionality notebook)
  • Zero-inflation: There are many zeroes; that is, most taxa are absent in most samples
  • Skewed distributions: Few taxa are abundant, and most taxa are rare
  • Counts: Abundances are represented by counts of reads, which changes the statistical assumptions
  • Multiple hypothesis problem: Conducting many statistical tests require corrections (See Multiple correction notebook)

Example dataset and DAtest package

In [1]:
# Load phyloseq object and phyloseq package
library(phyloseq)
load("../data/physeq.RData")

Here we will use the DAtest package for DA analysis. The advantage with DAtest is that all the different methods require the same standardized input, also making it simple to conduct many analysis and compare them.

In [2]:
library(DAtest)
DAtest version 2.8.0

In [3]:
# Load MicEco for helper functions
library(MicEco)

Data preparation and agglomoration

Agglomoration

First step before doing differential abundance analysis is to decide at which taxonomic/phylogenetic level you want to test your hypothesis. Do you want to be specific and do it at the ASV level, or is it more fitting with your hypothesis to test at a higher taxonomic level.

Things to consider when choosing whether to agglomorate:

  • Agglomoration makes the tests less specific, and might hide large variations at the ASV level. For example, maybe one Staphylococcus ASV is highly abundant in the control, and another Staphylococcus is highly abundant in the treatment group. Agglomrating to the genus level will hide this variation.
  • Agglomoration combines the reads of the ASVs, and will therefore give better estimations of abundances
  • Agglomoration will give less statistical hypotheses to test and hence more power to detect differences, because of multiple hypothesis correction

The choice depends entirely on the hypotheses of the study.

Filtering

Removing low abundant taxa (ASVs or another level, depending on agglomoration), is a useful and widely used preparation step before performing DA analysis. The reason is that many taxa in microbiome studies are rare, and often many are so rare that there is no way they would be statistically significant. An obvious example is an ASV only present in one sample. Depending on your sample size and experimental setup we would often want ASVs to be present in at least 5 samples (unless your treatment groups are less than this size). Another threshold to filter by is the abundance/reads; if an ASV only have few reads the estimation of its abundance will be poor.

There are, unfortunately, no gold standard on how to set the thresholds when pruning low abundant ASVs. It is a trade-off between keeping rare ASVs which could be interesting to test, and removing ASVs to increase statistical power when testing the remaining ASVs.

In R:

We use the ps_prune function to filter, which is also presented in the phyloseq operations notebook.

In [4]:
# Agglomorate to Genus level
phy_genus <- tax_glom(phy, "Genus")

# Only keep genera present in at least 10 samples
phy_genus_pre <- ps_prune(phy_genus, min.samples = 10)
204 features grouped as 'Others' in the output

Wilcoxon

Wilcoxon Rank Sum test is used to test if the median of two statistical samples are different (See more in the Statistics 101 notebook). If we have a simple experimental design with for example a control and treatment group, we can conduct a Wilcoxon Rank Sum test for each taxa, comparing whether they are different in control versus treatment group.

In R:

In [5]:
res_wil <- DA.wil(phy_genus_pre, predictor = "Delivery")
In [6]:
head(res_wil)
A data.frame: 6 × 13
Featurepvalpval.adjlog2FCorderingMethodKingdomPhylumClassOrderFamilyGenusSpecies
<chr><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr>
102985759c1ee39be5cc8ca5d48054b1c8.486175e-020.2880156514-0.4378944Sectio>VaginalWilcox (wil)BacteriaBacteroidota Bacteroidia Bacteroidales Barnesiellaceae Barnesiella NA
203a722bba911de891254934069dc89591.548125e-010.3940681809 0.4023601Vaginal>SectioWilcox (wil)BacteriaFirmicutes_A Clostridia Clostridiales Clostridiaceae Clostridium NA
304ac1aaec7096c309bacf3c0fb0d23119.722624e-010.9940574390-0.1034501Sectio>VaginalWilcox (wil)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Clostridium_QNA
408b21f8171b35ef001832ba1df9b8fbc2.880426e-060.0001075359-1.1695647Sectio>VaginalWilcox (wil)BacteriaFirmicutes_C Negativicutes Veillonellales Veillonellaceae Veillonella NA
50978d7032689ab3eeef2fc60e4490e114.493579e-010.7258196586 1.1092544Vaginal>SectioWilcox (wil)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Lachnospira NA
609efd4982be79756a70f3b739241efda2.200392e-010.4902661368-1.1414941Sectio>VaginalWilcox (wil)BacteriaActinobacteriotaActinobacteriaActinomycetalesDermabacteraceaeDermabacter NA

Output explanation:

  • Each row is a test of one ASV
  • Feature: This is the ASV id
  • pval: The p-value
  • pval.adj: The adjusted p-value. This is the p-value to use!
  • log2FC: Log2 fold change. The log ratio of abundance between the two groups
  • ordering: In which group is it high abundant, and in which group low abundant
  • Method: Name of our method
  • The rest are taxonomy information of the ASV

Usually we are only interested in those that are significantly abundant. So we can subset the data.frame with results:

In [7]:
res_wil[res_wil$pval.adj <= 0.05, ]
A data.frame: 12 × 13
Featurepvalpval.adjlog2FCorderingMethodKingdomPhylumClassOrderFamilyGenusSpecies
<chr><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr>
408b21f8171b35ef001832ba1df9b8fbc2.880426e-061.075359e-04-1.1695647Sectio>VaginalWilcox (wil)BacteriaFirmicutes_C Negativicutes Veillonellales Veillonellaceae Veillonella NA
9107063e8453bf94891c4d7a17b00af071.618262e-031.812453e-02 2.1279914Vaginal>SectioWilcox (wil)BacteriaFirmicutes_C Negativicutes Veillonellales Veillonellaceae Veillonella_A NA
1011d9c9ed1d2d58a9cb0bbcdc93aa3e551.087545e-066.090250e-05 3.4407200Vaginal>SectioWilcox (wil)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus_H NA
1420cfe7f61d18f6525cc71caae0ab28dc1.186511e-031.476547e-02 0.9864273Vaginal>SectioWilcox (wil)BacteriaProteobacteriaGammaproteobacteriaEnterobacterales Enterobacteriaceae Escherichia NA
50705da2e88a1aaa0c02b27eab4a642b723.714679e-047.637397e-03 1.6567590Vaginal>SectioWilcox (wil)BacteriaProteobacteriaGammaproteobacteriaEnterobacterales Enterobacteriaceae Morganella NA
5372bda5561e865a5539c4f19f992ee8302.121554e-032.160128e-02-1.0713722Sectio>VaginalWilcox (wil)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Ruminococcus_B NA
6487c2912f70537a375ec52d2e374795342.832326e-057.930512e-04 2.5400747Vaginal>SectioWilcox (wil)BacteriaFirmicutes Bacilli ErysipelotrichalesErysipelatoclostridiaceaeErysipelatoclostridiumNA
658c3ff6c4c4b125d5e72f5276d57be4d05.992308e-049.587693e-03 1.1677965Vaginal>SectioWilcox (wil)BacteriaFirmicutes_A Clostridia Oscillospirales Ruminococcaceae Bittarella NA
82b6a1675fa3f43bec37b92541779954a15.187787e-034.841934e-02-1.6748575Sectio>VaginalWilcox (wil)BacteriaFirmicutes Bacilli Lactobacillales Enterococcaceae Family_EnterococcaceaeNA
93d20d3658de331c939f54d8acaed7c4c11.057463e-071.184359e-05 1.0913261Vaginal>SectioWilcox (wil)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Bacteroides NA
95dc85940d84ddbd7315db16b14390cb8d4.091463e-047.637397e-03-1.4262647Sectio>VaginalWilcox (wil)BacteriaFirmicutes_A Clostridia Clostridiales Clostridiaceae Clostridium_P NA
111Others 1.118138e-031.476547e-02-1.1671921Sectio>VaginalWilcox (wil)NA NA NA NA NA NA NA

11 significantly abundant genera!

We can also order it, and see which have the highest fold change:

In [8]:
res_wil_ordered <- res_wil[rev(order(abs(res_wil$log2FC))), ]
In [9]:
head(res_wil_ordered)
A data.frame: 6 × 13
Featurepvalpval.adjlog2FCorderingMethodKingdomPhylumClassOrderFamilyGenusSpecies
<chr><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr>
1011d9c9ed1d2d58a9cb0bbcdc93aa3e551.087545e-060.0000609025 3.440720Vaginal>SectioWilcox (wil)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus_H NA
24333fc718c6bccdbf0fd4e4f84d0dd38e2.254789e-010.4902661368-3.289467Sectio>VaginalWilcox (wil)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Prevotella NA
89ca5152dd2313e7fe7d25872f10c57f6d9.802693e-030.0713851898 3.213467Vaginal>SectioWilcox (wil)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus_B NA
6487c2912f70537a375ec52d2e374795342.832326e-050.0007930512 2.540075Vaginal>SectioWilcox (wil)BacteriaFirmicutes Bacilli ErysipelotrichalesErysipelatoclostridiaceaeErysipelatoclostridiumNA
273aac5ec59c4f41b64a70357d5b971d461.083525e-020.0713851898 2.180310Vaginal>SectioWilcox (wil)BacteriaActinobacteriotaCoriobacteriiaCoriobacteriales Coriobacteriaceae Collinsella NA
9107063e8453bf94891c4d7a17b00af071.618262e-030.0181245295 2.127991Vaginal>SectioWilcox (wil)BacteriaFirmicutes_C Negativicutes Veillonellales Veillonellaceae Veillonella_A NA

MetagenomeSeq Featuretest

MetagenomeSeq Featuretest was developed for microbiome data, and is usually the most powerful for finding differentially abundant taxa.

Let's run MetagenomeSeq to test the effect of Delivery:

In [9]:
res_msf <- DA.msf(phy_genus_pre, predictor = "Delivery")
Registered S3 method overwritten by 'gplots':
  method    from  
  plot.venn eulerr

Default value being used.

MetagenomeSeq output:

  • Each row is a test of one ASV
  • Feature: This is the ASV id
  • +samples in group 0: Samples in the first group where the ASV is present
  • +samples in group 1: Samples in the second group where the ASV is present
  • counts in group 0: Total reads of this ASV in first group
  • counts in group 1: Total reads of this ASV in second group
  • logFC: The log2 ratio of the abundance in second level of the predictor compared to the first level
  • se: Standard error of the log2 fold change
  • pval: The p-value
  • adjPvalues/pval.adj: The adjusted p-value. This is the p-value to use!
  • ordering: In which group is it high abundant, and in which group low abundant
  • Method: Name of our method
  • The rest are taxonomy information of the ASV

Let's only look at significant ASVs:

In [10]:
res_msf[res_msf$pval.adj <= 0.05, ]
A data.frame: 17 × 19
Feature+samples in group 0+samples in group 1counts in group 0counts in group 1logFCsepvaladjPvaluespval.adjorderingMethodKingdomPhylumClassOrderFamilyGenusSpecies
<chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr>
408b21f8171b35ef001832ba1df9b8fbc6956232864 81668-1.44414160.30838432.828047e-061.583706e-041.583706e-04Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes_C Negativicutes Veillonellales Veillonellaceae Veillonella NA
9107063e8453bf94891c4d7a17b00af07 417 319 9511 2.89379220.97050502.866166e-032.140070e-022.140070e-02Vaginal>SectioMgSeq Feature (msf)BacteriaFirmicutes_C Negativicutes Veillonellales Veillonellaceae Veillonella_A NA
1011d9c9ed1d2d58a9cb0bbcdc93aa3e55 326 41 27727 4.73444481.26753311.875924e-043.082105e-033.082105e-03Vaginal>SectioMgSeq Feature (msf)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus_H NA
24333fc718c6bccdbf0fd4e4f84d0dd38e3435143352 11880-2.67860340.60489469.501622e-063.547272e-043.547272e-04Sectio>VaginalMgSeq Feature (msf)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Prevotella NA
2944d9fc4de8898b6c82c69654435a9f0b3225 26682 6755-1.80195160.56449061.412069e-031.757241e-021.757241e-02Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus NA
3045c4f6f269ec95ff0c8dd594104519e92717159140 40544-1.79961740.59014512.292617e-032.139776e-022.139776e-02Sectio>VaginalMgSeq Feature (msf)BacteriaProteobacteriaGammaproteobacteriaEnterobacterales Enterobacteriaceae Citrobacter NA
486dbcf153b0182c45e43924e6a0814551 9 2 378 9-3.85800861.36234844.627506e-033.092891e-023.092891e-02Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes_A Clostridia TANB77 CAG-508 CAG-354 NA
50705da2e88a1aaa0c02b27eab4a642b72 114 14 7183 0.97388010.26009361.808657e-043.082105e-033.082105e-03Vaginal>SectioMgSeq Feature (msf)BacteriaProteobacteriaGammaproteobacteriaEnterobacterales Enterobacteriaceae Morganella NA
5372bda5561e865a5539c4f19f992ee8305438117888 49799-1.24853140.41736852.776685e-032.140070e-022.140070e-02Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Ruminococcus_B NA
6487c2912f70537a375ec52d2e374795341133 1086 33112 2.44043070.77995541.754404e-031.882993e-021.882993e-02Vaginal>SectioMgSeq Feature (msf)BacteriaFirmicutes Bacilli ErysipelotrichalesErysipelatoclostridiaceaeErysipelatoclostridiumNA
6895b523dacb244bb891da4c48923d8788 8 6 614 42-3.16032521.04708602.542727e-032.140070e-022.140070e-02Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes_A Clostridia Tissierellales Peptoniphilaceae Peptoniphilus_A NA
75a3f2abd64e462dc86f8242a90d98ee66 9 1 2353 38-0.56500410.18147421.849368e-031.882993e-021.882993e-02Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes_A Clostridia Oscillospirales DTU089 Eubacterium_R NA
76a4ac1cd75e12a7e924dd49b96366a5df 713 1078 11544 2.76568280.85786031.264437e-031.757241e-021.757241e-02Vaginal>SectioMgSeq Feature (msf)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Bacteroides_A NA
90ccc3cce7144df93b827dc9a9fd18dfaf2311 18890 1173-2.26766070.60819841.926316e-043.082105e-033.082105e-03Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus_C NA
93d20d3658de331c939f54d8acaed7c4c16473317989773805 1.86399490.31666253.947523e-094.421225e-074.421225e-07Vaginal>SectioMgSeq Feature (msf)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Bacteroides NA
103efe2ef829cdc0a2cf36f8c27f96678d71411 3493 68-2.33657920.82644154.694566e-033.092891e-023.092891e-02Sectio>VaginalMgSeq Feature (msf)BacteriaFirmicutes_A Clostridia Tissierellales Helcococcaceae Anaerococcus NA
111Others 7370 19298 11601-1.05464270.24186111.297471e-053.632920e-043.632920e-04Sectio>VaginalMgSeq Feature (msf)NA NA NA NA NA NA NA

16 significantly abundant genera with MetagenomeSeq!

DESeq2

DEseq2 was originally developed for RNA-seq (transcriptomic) data, but this is very similar to 16S rRNA gene amplicon data, and DESeq2 has been shown to perform well on this kind of data. Furthermore, with DESeq2 you can create more complicated models, with for example multiple variables both categorical and quantitative.

Let's run DESeq2 to test the effect of Time:

In [11]:
res_ds2 <- DA.ds2(phy_genus_pre, predictor = "Time")
converting counts to integer mode

using pre-existing size factors

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates

fitting model and testing

-- replacing outliers and refitting for 33 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing

In [12]:
head(res_ds2)
A data.frame: 6 × 16
FeaturebaseMeanlog2FoldChangelfcSEstatpvalorderingpval.adjMethodKingdomPhylumClassOrderFamilyGenusSpecies
<chr><dbl><dbl><dbl><dbl><dbl><chr><dbl><chr><chr><chr><chr><chr><chr><chr><chr>
102985759c1ee39be5cc8ca5d48054b1c 330.878518-0.55257270.7448826 0.80536056.685258e-011m>1w6.932860e-01DESeq2 man. geoMeans (ds2)BacteriaBacteroidota Bacteroidia Bacteroidales Barnesiellaceae Barnesiella NA
203a722bba911de891254934069dc8959 651.496487 0.04597720.5704140 23.97880606.209669e-061w>1m1.987094e-05DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Clostridiales Clostridiaceae Clostridium NA
304ac1aaec7096c309bacf3c0fb0d2311 2.197331 0.00000002.2866770 8.39549931.502936e-02NA 2.337900e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Clostridium_QNA
408b21f8171b35ef001832ba1df9b8fbc2545.681519-1.09617230.5678429 19.82202614.962514e-051m>1w1.292562e-04DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_C Negativicutes Veillonellales Veillonellaceae Veillonella NA
50978d7032689ab3eeef2fc60e4490e11 85.004312-1.07486370.8558008152.70027406.943154e-341m>1w3.888166e-32DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Lachnospira NA
609efd4982be79756a70f3b739241efda 4.527625-5.18143512.5475795 7.88327181.941643e-021m>1w2.938702e-02DESeq2 man. geoMeans (ds2)BacteriaActinobacteriotaActinobacteriaActinomycetalesDermabacteraceaeDermabacter NA

It is important to remember that our time variable has three levels: 1w, 1m, and 1y. By default we only get the log fold change of the second level (1w here) against the first level (1m here). The p-value is an overall p-value telling us whether any of the levels (timepoints) are different.

Output explanation:

  • Each row is a test of one ASV
  • Feature: This is the ASV id
  • baseMean: The average abundance of the first level in the predictor
  • log2FoldChange: The log ratio of the abundance in second level of the predictor compared to the first level
  • lfcSE: Standard error of the log2 fold change
  • stat: Wald statistic
  • pval: The p-value
  • pval.adj: The adjusted p-value. This is the p-value to use!
  • ordering: In which group is it high abundant, and in which group low abundant
  • Method: Name of our method
  • The rest are taxonomy information of the ASV

Let's see if there any significant:

In [13]:
res_ds2[res_ds2$pval.adj <= 0.05, ]
A data.frame: 85 × 16
FeaturebaseMeanlog2FoldChangelfcSEstatpvalorderingpval.adjMethodKingdomPhylumClassOrderFamilyGenusSpecies
<chr><dbl><dbl><dbl><dbl><dbl><chr><dbl><chr><chr><chr><chr><chr><chr><chr><chr>
203a722bba911de891254934069dc8959 651.496487 0.04597720.5704140 23.9788066.209669e-061w>1m1.987094e-05DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Clostridiales Clostridiaceae Clostridium NA
304ac1aaec7096c309bacf3c0fb0d2311 2.197331 0.00000002.2866770 8.3954991.502936e-02NA 2.337900e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Clostridium_Q NA
408b21f8171b35ef001832ba1df9b8fbc 2545.681519 -1.09617230.5678429 19.8220264.962514e-051m>1w1.292562e-04DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_C Negativicutes Veillonellales Veillonellaceae Veillonella NA
50978d7032689ab3eeef2fc60e4490e11 85.004312 -1.07486370.8558008152.7002746.943154e-341m>1w3.888166e-32DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Lachnospira NA
609efd4982be79756a70f3b739241efda 4.527625 -5.18143512.5475795 7.8832721.941643e-021m>1w2.938702e-02DESeq2 man. geoMeans (ds2)BacteriaActinobacteriotaActinobacteria Actinomycetales Dermabacteraceae Dermabacter NA
80c4ba36f81a215044822bfc2b6533315 51.062266 5.14641011.3012732 26.3255551.920783e-061w>1m6.874525e-06DESeq2 man. geoMeans (ds2)BacteriaProteobacteria GammaproteobacteriaPseudomonadales Moraxellaceae Acinetobacter NA
9107063e8453bf94891c4d7a17b00af07 39.170525 2.34217181.7008926 16.9858662.049114e-041w>1m4.882995e-04DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_C Negativicutes Veillonellales Veillonellaceae Veillonella_A NA
1011d9c9ed1d2d58a9cb0bbcdc93aa3e55 278.826390-11.50428331.4722362 33.0061206.804750e-081m>1w3.048528e-07DESeq2 man. geoMeans (ds2)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus_H NA
121c88ca8aa0ed62fb2fb8bebb8cb25670 194.291604 -3.41154830.9727130 67.7037321.987565e-151m>1w2.226073e-14DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Blautia_A NA
131d0831cd5450dbff194da1403d5f94f8 8.698501 0.00000001.8863857 14.8672325.910464e-04NA 1.297984e-03DESeq2 man. geoMeans (ds2)BacteriaProteobacteria AlphaproteobacteriaRF32 CAG-239 CAG-495 NA
1420cfe7f61d18f6525cc71caae0ab28dc 5977.376654 -1.45339500.5561586 23.7682286.899139e-061m>1w2.146399e-05DESeq2 man. geoMeans (ds2)BacteriaProteobacteria GammaproteobacteriaEnterobacterales Enterobacteriaceae Escherichia NA
1523fd935effd62858dd84fbd5ecf064e6 8.455607-17.41591612.2969786 9.6357938.083773e-031m>1w1.422842e-02DESeq2 man. geoMeans (ds2)BacteriaProteobacteria GammaproteobacteriaEnterobacterales Pasteurellaceae Haemophilus_A NA
1625cd6c22fd2bb5fd97c63c527a7ae9c8 36.492521 -0.52286731.2066239 24.7107764.306488e-061m>1w1.418608e-05DESeq2 man. geoMeans (ds2)BacteriaActinobacteriotaActinobacteria Actinomycetales Micrococcaceae Rothia NA
18269e64276a91bc92a9ba3ed05a0e0e0d 11.628046 0.00000001.6303090 22.0688471.613655e-05NA 4.884577e-05DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Oscillospirales DTU089 DTU053 NA
192bb792dc76a244c5aff56572237e9ffa 3.182132 0.00000002.2959381 9.5204458.563702e-03NA 1.475592e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Eisenbergiella NA
212dc26dda3a7735af9bf29c6bc37c2ecc 108.104811 -8.51461621.3707566 25.9459302.322270e-061m>1w7.881645e-06DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_C Negativicutes Acidaminococcales Acidaminococcaceae Acidaminococcus NA
24333fc718c6bccdbf0fd4e4f84d0dd38e 676.125928 -3.41599800.7698422137.4008441.458095e-301m>1w5.443554e-29DESeq2 man. geoMeans (ds2)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Prevotella NA
263fe60a0670280b071f93d770e1ff16ee 10.176028 1.91791481.3019275 17.0581901.976338e-041w>1m4.811953e-04DESeq2 man. geoMeans (ds2)BacteriaActinobacteriotaActinobacteria Corynebacteriales Corynebacteriaceae Corynebacterium NA
2844cec8943a6e384a4f1abadc3e9b7930 41.688197-26.86356161.8384787 11.3667223.402104e-031m>1w6.458232e-03DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_C Negativicutes Veillonellales Megasphaeraceae Anaeroglobus NA
2944d9fc4de8898b6c82c69654435a9f0b 338.048768 -2.19225791.0307000 40.7385671.424730e-091m>1w8.398409e-09DESeq2 man. geoMeans (ds2)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus NA
3045c4f6f269ec95ff0c8dd594104519e9 1937.877564 1.63613001.4945891 10.5332525.160995e-031w>1m9.633856e-03DESeq2 man. geoMeans (ds2)BacteriaProteobacteria GammaproteobacteriaEnterobacterales Enterobacteriaceae Citrobacter NA
314bc8e35f3e060dd71a46d502972c5b45 15.418482-17.75922372.0877818 14.0202899.026783e-041m>1w1.884100e-03DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Oscillospirales Ruminococcaceae Ruminococcus_D NA
324c407a2fae7af12f863fec08f5e1d8d4 3.370780 0.00000001.6029932 19.9509314.652756e-05NA 1.240735e-04DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Class_Clostridia Class_Clostridia Class_Clostridia NA
334e8b51098d3b598eadf069dcc96e81aa10278.456469 0.76684310.6770911 71.4180173.102950e-161w>1m3.861448e-15DESeq2 man. geoMeans (ds2)BacteriaProteobacteria GammaproteobacteriaEnterobacterales Enterobacteriaceae Family_Enterobacteriaceae NA
344fad93adaab891f153808884f1da1fec 3.166222 0.00000001.8915104 14.0076399.084055e-04NA 1.884100e-03DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia CAG-41 UBA1381 CAG-41 NA
3554576c4896c625acff83191d8d1a004e 12.538527 -1.57580931.4230260 30.2310052.725340e-071m>1w1.130511e-06DESeq2 man. geoMeans (ds2)BacteriaBacteroidota Bacteroidia Bacteroidales Coprobacteraceae Coprobacter NA
3654fe524342380c8bf7ed42580bd17deb 19.384864 0.00000001.4887565 30.8554391.994465e-07NA 8.591544e-07DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Oscillospirales Order_OscillospiralesOrder_Oscillospirales NA
3756eb226a214d3ad80951bea6b746946a 54.144485 -3.59628811.2368297 7.0887022.888736e-021m>1w4.044230e-02DESeq2 man. geoMeans (ds2)BacteriaProteobacteria GammaproteobacteriaBetaproteobacterialesBurkholderiaceae Sutterella NA
3857face48ec571894748b66cab7c52d5e 62.572846-20.07770131.2131998 60.0285789.224864e-141m>1w7.947575e-13DESeq2 man. geoMeans (ds2)BacteriaBacteroidota Bacteroidia Bacteroidales Muribaculaceae Family_Muribaculaceae NA
3959edac319b901da207ccca15fa602dfc 19.511346 24.58401332.5331029 7.4906172.362834e-021w>1m3.392787e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Peptostreptococcales PeptostreptococcaceaeFamily_PeptostreptococcaceaeNA
75a3f2abd64e462dc86f8242a90d98ee66 2.5019316 0.00000002.5644845 7.0135902.999288e-02NA 4.093176e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Oscillospirales DTU089 Eubacterium_R NA
76a4ac1cd75e12a7e924dd49b96366a5df 55.4287225-17.82853451.5441694 36.1416121.418890e-081m>1w7.223438e-08DESeq2 man. geoMeans (ds2)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Bacteroides_A NA
77a4e5fa843fb3038e70fc6cb1bb062c58 0.8085203 0.00000001.9775665 6.9007283.173408e-02NA 4.231211e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Anaerotignaceae Anaerotignum NA
79ae90b0ce379ef144d9bb740516265a6c 10.1088867 -5.11460311.5818332 7.7999982.024193e-021m>1w2.944281e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Hungatella NA
80b213fefdccc6c0e773fe51b1e24aa002 4.9661895 13.24559462.3193437 7.8120322.012050e-021w>1m2.944281e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Oscillospirales Ruminococcaceae Subdoligranulum NA
81b6592341b951857fdd6c3889f7c90e92 16.4927068 0.00000001.3352112 44.9772051.711291e-10NA 1.369033e-09DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Eubacterium_E NA
82b6a1675fa3f43bec37b92541779954a1 578.3718680 2.67365451.1553018 26.2809021.964150e-061w>1m6.874525e-06DESeq2 man. geoMeans (ds2)BacteriaFirmicutes Bacilli Lactobacillales Enterococcaceae Family_EnterococcaceaeNA
83b6b05223adf86d071fd279f79dc2533c 655.7311607 -1.21709550.9307766107.8476793.812186e-241m>1w1.067412e-22DESeq2 man. geoMeans (ds2)BacteriaVerrucomicrobiotaVerrucomicrobiae Verrucomicrobiales Akkermansiaceae Akkermansia NA
85bfe0ef267b9437cc26842e05f6c3e13e 17.7315412 0.00000000.9892403 86.6126631.557038e-19NA 2.906470e-18DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Dorea NA
86c073eaf335852ce09f9b1a8527e5fa75 3.2947808 0.00000002.2968789 9.6242598.130528e-03NA 1.422842e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Blautia NA
87c3337ad671c2e29ae30b948f8c25041d 12.4370796 -3.39896501.8320575 7.8071042.017014e-021m>1w2.944281e-02DESeq2 man. geoMeans (ds2)BacteriaActinobacteriota Actinobacteria Actinomycetales Actinomycetaceae Actinomyces_B NA
88c97908d2d3661f798e98fbf84f1640e9 6.5339909 0.00000001.9842414 12.1330092.319267e-03NA 4.478584e-03DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Oscillospirales Ruminococcaceae Ruminiclostridium_E NA
89ca5152dd2313e7fe7d25872f10c57f6d 191.2020633 11.97784731.4492424 43.0954284.384769e-101w>1m3.069338e-09DESeq2 man. geoMeans (ds2)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus_B NA
90ccc3cce7144df93b827dc9a9fd18dfaf 200.1484297 -4.08547321.2620855 42.2546316.676111e-101m>1w4.398379e-09DESeq2 man. geoMeans (ds2)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus_C NA
91cde00646e8aecf8aaac49a9bb9c967291933.0976610 2.70733550.8906384 65.2841756.663793e-151w>1m6.219540e-14DESeq2 man. geoMeans (ds2)BacteriaFirmicutes Bacilli Staphylococcales StaphylococcaceaeStaphylococcus NA
92cf39b94723be8d7c4e5b41f4b20dab44 23.7922573 0.00000001.8302025 21.0742072.653347e-05NA 7.429373e-05DESeq2 man. geoMeans (ds2)BacteriaProteobacteria GammaproteobacteriaBetaproteobacterialesBurkholderiaceae Parasutterella NA
93d20d3658de331c939f54d8acaed7c4c15407.9470911 -0.77786700.4835447 7.0459332.951175e-021m>1w4.080638e-02DESeq2 man. geoMeans (ds2)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Bacteroides NA
95dc85940d84ddbd7315db16b14390cb8d1310.7737326 1.18141060.9745981 21.8956151.759655e-051w>1m5.186352e-05DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Clostridiales Clostridiaceae Clostridium_P NA
96e0d214a60c3baff72cd75100c6e4cc28 6.2643931 0.00000002.2914749 8.8932021.171833e-02NA 1.930078e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes Bacilli Haloplasmatales TuricibacteraceaeTuricibacter NA
99e2f7aa19c11e3adf96a7030321b7e2e4 9.0859651 0.00000001.4437836 33.1093106.462561e-08NA 3.015862e-07DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae CAG-81 NA
101e8c0f8c73ce984d11e89e3e13b6b21e4 25.0735943 -2.48961351.4743068 29.5514213.828164e-071m>1w1.478463e-06DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Oscillospirales Oscillospiraceae Flavonifractor NA
102ef5a779b6dab7f364d042205f39beb43 28.8310823 -2.36670771.5707066 13.3512181.261304e-031m>1w2.568473e-03DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_C Negativicutes Veillonellales Dialisteraceae Dialister NA
103efe2ef829cdc0a2cf36f8c27f96678d7 30.9641012 -7.82534981.3905539 35.8119201.673172e-081m>1w8.147618e-08DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Tissierellales Helcococcaceae Anaerococcus NA
104f19f6eb0e557605030b949da68e33a3d 37.3058357-19.62890571.3709060 38.6699994.007895e-091m>1w2.244421e-08DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae 992a NA
106f4a2f44c453707eee491e184a4c7b528 7.3065976 -3.01346942.2993407 9.3291489.423260e-031m>1w1.575231e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Hungatella_A NA
107f5ab8244460d1988b40f85a9347297c7 710.9478067 -3.06162070.6834898299.2640231.036676e-651m>1w1.161077e-63DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Oscillospirales Ruminococcaceae Faecalibacterium NA
108fb88d5698f31eb7529434b5287e82b1b 40.1967341-18.41087171.3586099 43.5881883.427245e-101m>1w2.559010e-09DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Agathobacter NA
109fccf9b38f0596644eab0612a5cd57597 21.4374612-19.64788441.5077437 28.8325915.483810e-071m>1w2.047289e-06DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae CAG-65 NA
110feea23e3d913f7f737786c4b8ddcacba 9.9895276 0.00000001.9121429 15.1112865.231496e-04NA 1.171855e-03DESeq2 man. geoMeans (ds2)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Paraprevotella NA
112aa20f3a40eaf07050443d2555f000f971340.2700549 0.29897550.6815946 37.6296386.742610e-091w>1m3.596059e-08DESeq2 man. geoMeans (ds2)BacteriaFirmicutes Bacilli Lactobacillales Streptococcaceae Streptococcus NA

Many genera are significantly differentially abundant between the different time points

We can see that Clostridium, the first one, is more abundant in 1 week than in 1 month, but what about 1 year?

We can change the coeff argument. If we set coeff=3 then we will compare the third level (1 year) with the first level (1 month). The p-value is the same, but the fold change and ordering will change.

In [14]:
res2 <- DA.ds2(phy_genus_pre, predictor = "Time", coeff = 3)
converting counts to integer mode

using pre-existing size factors

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates

fitting model and testing

-- replacing outliers and refitting for 33 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing

See results for the Clostridium:

In [15]:
res2[res2$Feature == "03a722bba911de891254934069dc8959", ]
A data.frame: 1 × 16
FeaturebaseMeanlog2FoldChangelfcSEstatpvalorderingpval.adjMethodKingdomPhylumClassOrderFamilyGenusSpecies
<chr><dbl><dbl><dbl><dbl><dbl><chr><dbl><chr><chr><chr><chr><chr><chr><chr><chr>
203a722bba911de891254934069dc8959651.4965-2.7851520.570547623.978816.209669e-061m>1y1.987094e-05DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_AClostridiaClostridialesClostridiaceaeClostridiumNA

We can in the same way change the reference level for calculating the fold change. Let's change it to 2 (1 week):

In [16]:
res3 <- DA.ds2(phy_genus_pre, predictor = "Time", coeff = 3, coeff.ref = 2)
converting counts to integer mode

using pre-existing size factors

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates

fitting model and testing

-- replacing outliers and refitting for 33 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing

In [17]:
res3[res3$Feature == "03a722bba911de891254934069dc8959", ]
A data.frame: 1 × 16
FeaturebaseMeanlog2FoldChangelfcSEstatpvalorderingpval.adjMethodKingdomPhylumClassOrderFamilyGenusSpecies
<chr><dbl><dbl><dbl><dbl><dbl><chr><dbl><chr><chr><chr><chr><chr><chr><chr><chr>
203a722bba911de891254934069dc8959651.4965-2.831130.570529723.978816.209669e-061w>1y1.987094e-05DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_AClostridiaClostridialesClostridiaceaeClostridiumNA

Based on this we can conclude that Clostridium is most abundant at 1 week, hereafter at 1 month, and least abundant at 1 year.

How to interpret fold change

The log2 fold change (log2FC) is the log2 ratio between the abundance in one group versus another group. For example, in the above analysis it is the ratio in abundance of Clostridium at 1 year versus at 1 week. We can see that it is -2.8. A negative log2FC means that Clostridium is in higher abundance in the reference group (1 week). Since it is on a log2 scale, a log2FC of 1 means the abundance is double compared to the reference, 2 means it is quadruple that of the reference, -1 means half, and -2 means a quarter, and so on. It is quite common to ignore log2FC between -2 and 2, and only focus on large differences. This choice, however, depends on the hypothesis in question.

Include covariates

With DESeq2 we can include covariates in the analysis. If we want to test the effect of mode of Delivery, we can take into account the differences across time points. Here we would say that we "control" for the effect of time, when testing the effect of Delivery.

In [18]:
res_covar <- DA.ds2(phy_genus_pre, predictor = "Delivery", covars = "Time")
converting counts to integer mode

Warning message in DESeqDataSet(se, design = design, ignoreRank):
"some variables in design formula are characters, converting to factors"
using pre-existing size factors

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates

fitting model and testing

-- replacing outliers and refitting for 17 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing

In [19]:
res_covar[res_covar$pval.adj <= 0.05, ]
A data.frame: 16 × 16
FeaturebaseMeanlog2FoldChangelfcSEstatpvalorderingpval.adjMethodKingdomPhylumClassOrderFamilyGenusSpecies
<chr><dbl><dbl><dbl><dbl><dbl><chr><dbl><chr><chr><chr><chr><chr><chr><chr><chr>
408b21f8171b35ef001832ba1df9b8fbc2545.68152-1.6804550.4496852-3.7369581.862599e-04Sectio>Vaginal2.086111e-03DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_C Negativicutes Veillonellales Veillonellaceae Veillonella NA
609efd4982be79756a70f3b739241efda 38.49512-4.7651101.6552294-2.8788213.991644e-03Sectio>Vaginal2.980427e-02DESeq2 man. geoMeans (ds2)BacteriaActinobacteriotaActinobacteria Actinomycetales Dermabacteraceae Dermabacter NA
1011d9c9ed1d2d58a9cb0bbcdc93aa3e55 278.82639 8.1916741.0352716 7.9125842.520998e-15Vaginal>Sectio2.823517e-13DESeq2 man. geoMeans (ds2)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus_H NA
121c88ca8aa0ed62fb2fb8bebb8cb25670 194.29160-3.4913610.7406899-4.7136602.433065e-06Sectio>Vaginal4.541721e-05DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Blautia_A NA
3857face48ec571894748b66cab7c52d5e 62.57285-2.7322550.9714832-2.8124574.916462e-03Sectio>Vaginal3.441523e-02DESeq2 man. geoMeans (ds2)BacteriaBacteroidota Bacteroidia Bacteroidales Muribaculaceae Family_Muribaculaceae NA
4262392dc20bb4fd6c46d3cb038b13cb9e 30.82848-2.0616670.6777708-3.0418352.351407e-03Sectio>Vaginal1.881125e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Family_LachnospiraceaeNA
50705da2e88a1aaa0c02b27eab4a642b72 30.93052 9.0740521.3893145 6.5313166.519447e-11Vaginal>Sectio1.825445e-09DESeq2 man. geoMeans (ds2)BacteriaProteobacteria GammaproteobacteriaEnterobacterales Enterobacteriaceae Morganella NA
628639874b1f9d7eaf752febcf6f7ca9dd 23.79640-5.0495011.6083548-3.1395441.692109e-03Sectio>Vaginal1.457817e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Lachnospirales Lachnospiraceae Fusicatenibacter NA
6487c2912f70537a375ec52d2e37479534 200.58614 6.8311901.0107731 6.7583821.395413e-11Vaginal>Sectio5.209543e-10DESeq2 man. geoMeans (ds2)BacteriaFirmicutes Bacilli ErysipelotrichalesErysipelatoclostridiaceaeErysipelatoclostridiumNA
658c3ff6c4c4b125d5e72f5276d57be4d02298.43780 3.1570370.7310758 4.3183451.572035e-05Vaginal>Sectio2.515256e-04DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Oscillospirales Ruminococcaceae Bittarella NA
75a3f2abd64e462dc86f8242a90d98ee66 11.22191-6.4774111.8015019-3.5955623.236926e-04Sectio>Vaginal3.295779e-03DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_A Clostridia Oscillospirales DTU089 Eubacterium_R NA
89ca5152dd2313e7fe7d25872f10c57f6d 191.20206 4.3997571.1625159 3.7846861.539030e-04Vaginal>Sectio1.915237e-03DESeq2 man. geoMeans (ds2)BacteriaFirmicutes Bacilli Lactobacillales Lactobacillaceae Lactobacillus_B NA
93d20d3658de331c939f54d8acaed7c4c15407.94709 2.5399920.3676458 6.9088044.887583e-12Vaginal>Sectio2.737047e-10DESeq2 man. geoMeans (ds2)BacteriaBacteroidota Bacteroidia Bacteroidales Bacteroidaceae Bacteroides NA
102ef5a779b6dab7f364d042205f39beb43 30.11445 4.1126501.2668198 3.2464361.168595e-03Vaginal>Sectio1.090689e-02DESeq2 man. geoMeans (ds2)BacteriaFirmicutes_C Negativicutes Veillonellales Dialisteraceae Dialister NA
105f476bfba3de5b53f358787c28048d6ea 10.55600-9.5224261.7129281-5.5591512.710905e-08Sectio>Vaginal6.072427e-07DESeq2 man. geoMeans (ds2)BacteriaDesulfobacterotaDesulfovibrionia DesulfovibrionalesDesulfovibrionaceae Desulfovibrio NA
111Others 196.32759-1.2588190.3141851-4.0066156.159501e-05Sectio>Vaginal8.623302e-04DESeq2 man. geoMeans (ds2)NA NA NA NA NA NA NA

This does not tells at which time points the different genera are significantly associated with mode of delivery. For this, we would subset the data to the different time points, and conduct the analysis separately for each time point.

Plotting

A common way to plot results from a differential abundance analysis is the volcano plot. In a volcano plot we plot the p-value against the log2 fold change. One can then color according to for example taxonomy.

To make the plot easier to read, we take the log10 of the p-value and reverse it, such that significant taxa have a high value.

Let's plot the Delivery results from above:

In [20]:
# Plotting package
library(ggplot2)

# Define plot
p <- ggplot(res_covar, aes(x = log2FoldChange, y = -log10(pval.adj))) +
    theme_bw() +
    geom_point()

# View plot
p

We can add a line to discrimate between significant and non-significant. And we can color by Phylum

In [21]:
# Define plot
p <- ggplot(res_covar, aes(x = log2FoldChange, y = -log10(pval.adj), color = Phylum)) +
    theme_bw() +
    geom_point() +
    geom_hline(yintercept = -log10(0.05))

# View plot
p

Annotate the significant ones:

In [22]:
# Annotation package (install first time)
# install.packages("ggrepel")
library(ggrepel)

# Subset only highly significant
res_covar_sig <- res_covar[res_covar$pval.adj <= 0.01, ]

# Define plot
p <- ggplot(res_covar, aes(x = log2FoldChange, y = -log10(pval.adj), color = Phylum)) +
    theme_bw() +
    geom_point() +
    geom_hline(yintercept = -log10(0.05)) +
    geom_text_repel(data = res_covar_sig,
                    aes(x = log2FoldChange, y = -log10(pval.adj), label = Genus),
                    size = 3)

# View plot
p
Warning message:
"Removed 1 rows containing missing values (geom_text_repel)."