Microbial association networks

With a large set of samples we can infer co-occurrence and co-exclusion patterns between taxa. The interpretation of these associations depends on the experimental design. Say we include oral and fecal samples from human subjects, then the associations would largely be dominated by the environmental preference of the taxa, and the network would likely have two clusters; an oral cluster and a fecal cluster. These networks are often created to infer microbial interactions, but it is difficult to dicern between the effects of interactions on co-occurrence and the effect of shared environment (microbes preferring the same nutrients or aboitic condititions are likely to co-exist). Association networks can probably only be used to infer interactions if great care is taken in the experimental design. Read more here.

Types of networks

Correlation network

Networks can be based on correlations where we calculate the correlation between each pair of taxa, independent of all the other taxa. These correlations are not affected by which taxa we include in the network, except that the p-value multiple correction is affected by the number of included taxa.

Examples of correlation network methods:

Inverse covariance network

With inverse covariance we seek to remove indirect associations. Say that taxa A has a positive effect on taxa B, and taxa B has a positive effect on taxa C, but taxa A has no effect on taxa C, and vice versa. Taxa A and C would look to be associated, but this is an indirect effect, mediated by taxa B. Inverse covariance seeks to remove these, and all associations are therefore contingent on which taxa are included in the network

Examples of inverse covariance methods:

Data

In [1]:
library(phyloseq)
library(DAtest)
load("../data/physeq.RData")

# Agglomorate to Genus level
phy_genus <- tax_glom(phy, "Genus")

# Only keep genera present in at least 10 samples
phy_genus_pre <- preDA(phy_genus, min.samples = 10)

# Extract count table:
otu <- data.frame(otu_table(phy_genus_pre))
DAtest version 2.7.18

204 features grouped as 'Others' in the output

Spearman

The simplest network is to run spearman correlations on all pairs of taxa on relative abundances.

Transform to relative abundances

In [2]:
otu_rel <- apply(otu, 2, function(x) x/sum(x))

Calculate correlations

In [3]:
net <- cor(t(otu_rel), method = "spearman")

Set redundant correlations to NA

In [4]:
net[lower.tri(net)] <- NA

Transform the correlation matrix to a data.frame:

In [5]:
net_df <- as.data.frame.table(net)

Remove correlations between similar taxa, and the redundant ones:

In [6]:
net_df <- net_df[net_df$Var1 != net_df$Var2 & !is.na(net_df$Freq), ]

Remove correlations below 0.6.

In [7]:
net_df <- net_df[abs(net_df$Freq) > 0.6, ]

This data.frame can then be used to plot the network in for example Gephi, CytoScape, or with ggnet2 in R

In [8]:
colnames(net_df) <- c("Source", "Target", "Cor")
net_df$Weight <- ifelse(net_df$Cor > 0, net_df$Cor, 0)
net_df$Sign <- ifelse(net_df$Cor > 0, "Positive", "Negative")
net_df
A data.frame: 33 × 5
SourceTargetCorWeightSign
<fct><fct><dbl><dbl><chr>
63102dc26dda3a7735af9bf29c6bc37c2eccf5ab8244460d1988b40f85a9347297c70.60103850.6010385Positive
7561f5ab8244460d1988b40f85a9347297c78036a70608aff1a11c728ceaeabb73420.64282840.6428284Positive
7897f5ab8244460d1988b40f85a9347297c70978d7032689ab3eeef2fc60e4490e110.76905960.7690596Positive
89190978d7032689ab3eeef2fc60e4490e11b6592341b951857fdd6c3889f7c90e920.61007800.6100780Positive
9129f5ab8244460d1988b40f85a9347297c7b6b05223adf86d071fd279f79dc2533c0.68309700.6830970Positive
91430978d7032689ab3eeef2fc60e4490e11b6b05223adf86d071fd279f79dc2533c0.63610420.6361042Positive
9353f5ab8244460d1988b40f85a9347297c79676d0bf5f8468ad67ed9525f3320aee0.76718110.7671811Positive
93670978d7032689ab3eeef2fc60e4490e119676d0bf5f8468ad67ed9525f3320aee0.69516790.6951679Positive
9378b6b05223adf86d071fd279f79dc2533c9676d0bf5f8468ad67ed9525f3320aee0.64687340.6468734Positive
10025f5ab8244460d1988b40f85a9347297c7333fc718c6bccdbf0fd4e4f84d0dd38e0.74138740.7413874Positive
100368036a70608aff1a11c728ceaeabb7342333fc718c6bccdbf0fd4e4f84d0dd38e0.60815220.6081522Positive
100390978d7032689ab3eeef2fc60e4490e11333fc718c6bccdbf0fd4e4f84d0dd38e0.64462760.6446276Positive
100529676d0bf5f8468ad67ed9525f3320aee333fc718c6bccdbf0fd4e4f84d0dd38e0.67412270.6741227Positive
10697f5ab8244460d1988b40f85a9347297c757face48ec571894748b66cab7c52d5e0.61618060.6161806Positive
10809f5ab8244460d1988b40f85a9347297c762392dc20bb4fd6c46d3cb038b13cb9e0.77642030.7764203Positive
108230978d7032689ab3eeef2fc60e4490e1162392dc20bb4fd6c46d3cb038b13cb9e0.62864430.6286443Positive
108369676d0bf5f8468ad67ed9525f3320aee62392dc20bb4fd6c46d3cb038b13cb9e0.70743520.7074352Positive
10842333fc718c6bccdbf0fd4e4f84d0dd38e62392dc20bb4fd6c46d3cb038b13cb9e0.66229970.6622997Positive
1084857face48ec571894748b66cab7c52d5e62392dc20bb4fd6c46d3cb038b13cb9e0.61701070.6170107Positive
11145f5ab8244460d1988b40f85a9347297c77ce4e4dc201a6701bff5771bfcfa6d290.64639640.6463964Positive
111590978d7032689ab3eeef2fc60e4490e117ce4e4dc201a6701bff5771bfcfa6d290.63159470.6315947Positive
111729676d0bf5f8468ad67ed9525f3320aee7ce4e4dc201a6701bff5771bfcfa6d290.62594730.6259473Positive
11593f5ab8244460d1988b40f85a9347297c7bfe0ef267b9437cc26842e05f6c3e13e0.68554130.6855413Positive
116070978d7032689ab3eeef2fc60e4490e11bfe0ef267b9437cc26842e05f6c3e13e0.68778670.6877867Positive
116209676d0bf5f8468ad67ed9525f3320aeebfe0ef267b9437cc26842e05f6c3e13e0.66062110.6606211Positive
1163362392dc20bb4fd6c46d3cb038b13cb9ebfe0ef267b9437cc26842e05f6c3e13e0.66521480.6652148Positive
11817f5ab8244460d1988b40f85a9347297c71c88ca8aa0ed62fb2fb8bebb8cb256700.76376610.7637661Positive
118310978d7032689ab3eeef2fc60e4490e111c88ca8aa0ed62fb2fb8bebb8cb256700.71702730.7170273Positive
11842b6b05223adf86d071fd279f79dc2533c1c88ca8aa0ed62fb2fb8bebb8cb256700.61552950.6155295Positive
118449676d0bf5f8468ad67ed9525f3320aee1c88ca8aa0ed62fb2fb8bebb8cb256700.68807140.6880714Positive
11850333fc718c6bccdbf0fd4e4f84d0dd38e1c88ca8aa0ed62fb2fb8bebb8cb256700.69446230.6944623Positive
1185762392dc20bb4fd6c46d3cb038b13cb9e1c88ca8aa0ed62fb2fb8bebb8cb256700.73115140.7311514Positive
11864bfe0ef267b9437cc26842e05f6c3e13e1c88ca8aa0ed62fb2fb8bebb8cb256700.61493760.6149376Positive

The weight can be used for clustering algorithms, for example Force Atlas in Gephi. In Gephi, negative weights are not supported, so we set all negative correlations to zero (they have no influence on clustering).

Proportionality

Proportionality is a correlation method specifically developed for compositional microbial data. It is included in the MicEco package and can be run directly on a phyloseq object:

In [9]:
library(MicEco)
net <- proportionality(phy_genus_pre)

We can then run the same steps as for the Spearman correlation:

In [10]:
net_df <- as.data.frame.table(net)
net_df <- net_df[net_df$Var1 != net_df$Var2, ]
net_df <- net_df[abs(net_df$Freq) > 0.6, ]

As for the Spearman correaltion, you can choose to just set an arbitrary correlation cut-off for including correlations, for example 0.6. It is also possible to calculate p-values through permutations if this is prefered.

In [11]:
colnames(net_df) <- c("Source", "Target", "Cor")
net_df$Weight <- ifelse(net_df$Cor > 0, net_df$Cor, 0)
net_df$Sign <- ifelse(net_df$Cor > 0, "Positive", "Negative")
net_df
A data.frame: 22 × 5
SourceTargetCorWeightSign
<fct><fct><dbl><dbl><chr>
26195b523dacb244bb891da4c48923d87888e61d77c1f6555e76891ac27d58707c6 0.60281160.6028116Positive
40358e61d77c1f6555e76891ac27d58707c695b523dacb244bb891da4c48923d8788 0.60281160.6028116Positive
63430978d7032689ab3eeef2fc60e4490e11f5ab8244460d1988b40f85a9347297c7 0.66322130.6632213Positive
6353cde00646e8aecf8aaac49a9bb9c96729f5ab8244460d1988b40f85a9347297c7-0.61703300.0000000Negative
6354b6b05223adf86d071fd279f79dc2533cf5ab8244460d1988b40f85a9347297c7 0.69279750.6927975Positive
63569676d0bf5f8468ad67ed9525f3320aeef5ab8244460d1988b40f85a9347297c7 0.77861870.7786187Positive
6362333fc718c6bccdbf0fd4e4f84d0dd38ef5ab8244460d1988b40f85a9347297c7 0.71924860.7192486Positive
63781c88ca8aa0ed62fb2fb8bebb8cb25670f5ab8244460d1988b40f85a9347297c7 0.67278800.6727880Positive
7897f5ab8244460d1988b40f85a9347297c70978d7032689ab3eeef2fc60e4490e11 0.66322130.6632213Positive
9017f5ab8244460d1988b40f85a9347297c7cde00646e8aecf8aaac49a9bb9c96729-0.61703300.0000000Negative
9129f5ab8244460d1988b40f85a9347297c7b6b05223adf86d071fd279f79dc2533c 0.69279750.6927975Positive
91569676d0bf5f8468ad67ed9525f3320aeeb6b05223adf86d071fd279f79dc2533c 0.64601490.6460149Positive
9353f5ab8244460d1988b40f85a9347297c79676d0bf5f8468ad67ed9525f3320aee 0.77861870.7786187Positive
9378b6b05223adf86d071fd279f79dc2533c9676d0bf5f8468ad67ed9525f3320aee 0.64601490.6460149Positive
9386333fc718c6bccdbf0fd4e4f84d0dd38e9676d0bf5f8468ad67ed9525f3320aee 0.63996070.6399607Positive
10025f5ab8244460d1988b40f85a9347297c7333fc718c6bccdbf0fd4e4f84d0dd38e 0.71924860.7192486Positive
100529676d0bf5f8468ad67ed9525f3320aee333fc718c6bccdbf0fd4e4f84d0dd38e 0.63996070.6399607Positive
100741c88ca8aa0ed62fb2fb8bebb8cb25670333fc718c6bccdbf0fd4e4f84d0dd38e 0.62886760.6288676Positive
108581c88ca8aa0ed62fb2fb8bebb8cb2567062392dc20bb4fd6c46d3cb038b13cb9e 0.60690130.6069013Positive
11817f5ab8244460d1988b40f85a9347297c71c88ca8aa0ed62fb2fb8bebb8cb25670 0.67278800.6727880Positive
11850333fc718c6bccdbf0fd4e4f84d0dd38e1c88ca8aa0ed62fb2fb8bebb8cb25670 0.62886760.6288676Positive
1185762392dc20bb4fd6c46d3cb038b13cb9e1c88ca8aa0ed62fb2fb8bebb8cb25670 0.60690130.6069013Positive

SparCC

SparCC is a correlation method which is using various compositional transformations and bootstrapping methods to give reliable correlations on compositional data. The original implemetation is in python, but it is included in the SPIEC-EASI R package.

In [12]:
library(SpiecEasi)
Registered S3 methods overwritten by 'huge':
  method    from
  plot.roc  pROC
  print.roc pROC


Vedhæfter pakke: 'SpiecEasi'


Det følgende objekt er maskeret fra 'package:MicEco':

    clr


SparCC bootstraps (be patient..., it is best to use 1000 or 10000 bootstraps, but it takes too long for this example)

In [13]:
sparcc_boot <- sparccboot(t(otu), R = 100)

SparCC p-values

In [14]:
sparcc_pval <- pval.sparccboot(sparcc_boot)

Turn into a data.frame with taxa names

In [15]:
# Correlation data.frame
net <- matrix(NA, nrow(otu), nrow(otu))
net[upper.tri(net, diag=FALSE)] <- sparcc_pval$cors
rownames(net) <- colnames(net) <- rownames(otu)
net_df <- as.data.frame.table(net)

# P-value data.frame
netp <- matrix(NA, nrow(otu), nrow(otu))
netp[upper.tri(netp, diag=FALSE)] <- sparcc_pval$pval
rownames(netp) <- colnames(netp) <- rownames(otu)
netp_df <- as.data.frame.table(netp)

# Combine
sparcc_df <- merge(net_df, netp_df, by = c("Var1", "Var2"))
colnames(sparcc_df) <- c("Source", "Target", "Cor", "Pval")

# Remove redundant
sparcc_df <- sparcc_df[!is.na(sparcc_df$Cor), ]

Add adjusted p-values

In [16]:
sparcc_df$Pval_adj <- p.adjust(sparcc_df$Pval, method = "fdr")

Keep only significant

In [17]:
sparcc_df <- sparcc_df[sparcc_df$Pval_adj <= 0.05, ]

SPIEC-EASI

SPIEC-EASI is a method based on graphical model inference developed specifically for microbial sequence data. There are 3 steps in the SPIEC-EASI analysis: data transformation, graphical model inference, model selection.

SPIEC-EASI might at first seem much more complicated than the other methods, but it has a big strength in that one does not have to choose an arbitrary correlation or p-value cut-off for determning inclusion of edges, SPIEC-EASI tries to estimate this for us automatically.

Data transformation

Data are CLR-transformed, similar to what is done prior to calculating Proportionality.

Graphical model inference

The graphical model can be inferred through either neighborhood selection (mb) or inverse covariance selection (glasso). Inverse covariance is in itself simple to estimate, but since we are usually having many more taxa than samples, we cannot infer inverse covariance through the usual calculations. We therefore use methods for sparse inverse covariance estimation, relying on the assumption that most covariances are small or zero. The amount of sparsity is controlled with a lambda parameter. Both the mb and the glasso methods estimate models over a range of lambda values, and we need a way to pick the final model to use.

Model selection

SPIEC-EASI uses the (bounded) stars criterion for model selection, to choose the lambda parameter which produces the best graphical model.

In R:

In [18]:
se <- spiec.easi(phy_genus_pre, method='glasso', lambda.min.ratio=1e-3, nlambda=30,
               sel.criterion='bstars', pulsar.select=TRUE, pulsar.params=list(rep.num=50, seed=42))
Applying data transformations...

Selecting model with pulsar using bstars...

Fitting final estimate with glasso...

done

In [19]:
# Get covariances and transform to correlations
netc <- cov2cor(as.matrix(getOptCov(se)))
rownames(netc) <- colnames(netc) <- taxa_names(phy_genus_pre)
netc_df <- as.data.frame.table(netc)

# Get chosen edges
net <- as.matrix(getRefit(se))
rownames(net) <- colnames(net) <- taxa_names(phy_genus_pre)
net_df <- as.data.frame.table(net)

# Combine
se_df <- merge(netc_df, net_df, by = c("Var1", "Var2"))
colnames(se_df) <- c("Source", "Target", "Cor", "Selected")

Subset to only the chosen edges

In [20]:
se_df <- se_df[se_df$Selected == 1, ]

Note that since the covariances are penalized, they are much smaller than from the other methods, and not directly comparable. They can, however, still be used as weights when plotting the network.

Heatmap

You can plot the networks as a heatmap with the pheatmap package:

In [21]:
library(pheatmap)
In [22]:
net <- proportionality(phy_genus_pre)
pheatmap(net)