Phyloseq is a package made for organizing and working with microbiome data in R. With the phyloseq package we can have all our microbiome amplicon sequence data in a single R object. With functions from the phyloseq package, most common operations for preparing data for analysis is possible with few simple commands.
This document is an overview on how phyloseq objects are organized and how they can be accessed.
The paper presenting phyloseq: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0061217
A comprehensive documetation of the phyloseq package: https://joey711.github.io/phyloseq/
To work with phyloseq objects we first have to load the package
library(phyloseq)
Let's load our test dataset, and see how phyloseq is organized.
load("../data/physeq.RData")
If we print the name of the phyloseq object, we can see what it contains
phy
The phy object contains all our data and associated metadata. This is organized in 5 different sub-objects:
Note: "phy" is an arbitrary name, it could be anything else
Below is a section on each of the objects describing what they contain and how to access them.
The otu_table contains the abundance of each OTU/ASV for each sample. We can see from above that it contains data for 1310 taxa and 150 samples. We can access it with the otu_table() function
otu_table(phy)
Here we can see that ASV a805200f08abbfa1a4679a264e851398 was not detected in sample S1, but that 37 reads from sample S2 was assigned to that ASV, and so on.
We can subset specific taxa with the object[subset] notation
otu_table(phy)["6ec6d03fbef9f16e3581ccdc60e7d266"]
otu_table(phy)[c("6ec6d03fbef9f16e3581ccdc60e7d266", "a805200f08abbfa1a4679a264e851398")]
Similarly for samples by preceeding a , inside the [ ]. (For the sake of this tutoial, we use head() to only print the first 6 rows)
head(otu_table(phy)[, "S6"])
These operations can be combined:
otu_table(phy)["6ec6d03fbef9f16e3581ccdc60e7d266", c("S6", "S144")]
The sample_data object contains metadata for our samples. We can access it with the sample_data() function. (For the sake of this tutoial, we use head() to only print the first 6 rows).
Note than in contrast to the otu_table, the samples are now rows.
head(sample_data(phy))
We can subset it in the same way as we did with the otu_table.
sample_data(phy)["S11",]
sample_data(phy)[c("S2", "S150"), c("Patient", "Time")]
The tax_table contains the taxonomical annotations of our taxa/ASVs. It can optionally also contain other metadata on our taxa/ASVs. We can access it with the tax_table() function.
Subsetting is done as with the other objects
tax_table(phy)[c("6ec6d03fbef9f16e3581ccdc60e7d266")]
The phy_tree contains our phylogenetic tree, constructed from an aligment of the 16S rRNA gene sequences of our ASVs. We can access it with the phy_tree() function.
phy_tree(phy)
This prints some basic info about our tree, which we can access with the $ notation
# The 10 first labels:
phy_tree(phy)$tip.label[1:10]
and we can plot it (cex sets the size of the labels):
plot(phy_tree(phy), cex = 0.5)
refseq contains the actual DNA sequences of our ASVs (or alternatively the reference sequences of OTUs). We can access it with the refseq() function.
refseq(phy)
Again, we can subset with the [ ] notation
refseq(phy)["6ec6d03fbef9f16e3581ccdc60e7d266"]
To see the entire sequence, convert it to a string ("character" in R jargon)
as.character(refseq(phy)[c("6ec6d03fbef9f16e3581ccdc60e7d266")])