The phyloseq package¶

Phyloseq is a package made for organizing and working with microbiome data in R. With the phyloseq package we can have all our microbiome amplicon sequence data in a single R object. With functions from the phyloseq package, most common operations for preparing data for analysis is possible with few simple commands.

This document is an overview on how phyloseq objects are organized and how they can be accessed.

The paper presenting phyloseq: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0061217

A comprehensive documetation of the phyloseq package: https://joey711.github.io/phyloseq/

To work with phyloseq objects we first have to load the package

library(phyloseq)

The phyloseq object ¶

Let's load our test dataset, and see how phyloseq is organized.

load("../data/physeq.RData")

If we print the name of the phyloseq object, we can see what it contains

phy

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1310 taxa and 150 samples ]
sample_data() Sample Data:       [ 150 samples by 3 sample variables ]
tax_table()   Taxonomy Table:    [ 1310 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 1310 tips and 1309 internal nodes ]
refseq()      DNAStringSet:      [ 1310 reference sequences ]

The phy object contains all our data and associated metadata. This is organized in 5 different sub-objects:

otu_table: Contains a matrix with the abundance of each taxa (ASV) for each sample
sample_data: Contains the metadata for each sample
tax_table: Contains the taxonomical annotation for each taxon (ASV)
phy_tree: Contains a phylogenetic tree
refseq: Contains sequences (16S rRNA gene sequence) for each taxon (ASV)

Note: "phy" is an arbitrary name, it could be anything else

Below is a section on each of the objects describing what they contain and how to access them.

otu_table ¶

The otu_table contains the abundance of each OTU/ASV for each sample. We can see from above that it contains data for 1310 taxa and 150 samples. We can access it with the otu_table() function

otu_table(phy)

Here we can see that ASV a805200f08abbfa1a4679a264e851398 was not detected in sample S1, but that 37 reads from sample S2 was assigned to that ASV, and so on.

We can subset specific taxa with the object[subset] notation

otu_table(phy)["6ec6d03fbef9f16e3581ccdc60e7d266"]

otu_table(phy)[c("6ec6d03fbef9f16e3581ccdc60e7d266", "a805200f08abbfa1a4679a264e851398")]

Similarly for samples by preceeding a , inside the [ ]. (For the sake of this tutoial, we use head() to only print the first 6 rows)

head(otu_table(phy)[, "S6"])

These operations can be combined:

otu_table(phy)["6ec6d03fbef9f16e3581ccdc60e7d266", c("S6", "S144")]

sample_data ¶

The sample_data object contains metadata for our samples. We can access it with the sample_data() function. (For the sake of this tutoial, we use head() to only print the first 6 rows).

Note than in contrast to the otu_table, the samples are now rows.

head(sample_data(phy))

We can subset it in the same way as we did with the otu_table.

sample_data(phy)["S11",]

sample_data(phy)[c("S2", "S150"), c("Patient", "Time")]

tax_table ¶

The tax_table contains the taxonomical annotations of our taxa/ASVs. It can optionally also contain other metadata on our taxa/ASVs. We can access it with the tax_table() function.

Subsetting is done as with the other objects

tax_table(phy)[c("6ec6d03fbef9f16e3581ccdc60e7d266")]

phy_tree ¶

The phy_tree contains our phylogenetic tree, constructed from an aligment of the 16S rRNA gene sequences of our ASVs. We can access it with the phy_tree() function.

phy_tree(phy)

Phylogenetic tree with 1310 tips and 1309 internal nodes.

Tip labels:
  dc467f0f8b8aa389aa106d751bb9a569, c387bc64fb22cd96d2b79dbfa932ce1e, 42a23e6f4764f572f4d7c6d8e08769c3, 2ea17744c7eeab459b7f41d4f9e22894, 332ef16f5660bfe8ecaabda3404fc08b, 0a39c4f216e370df5d1c10b2ac1f5286, ...
Node labels:
  Root, 0.766, 0.917, 0.895, 0.747, 0.870, ...

Rooted; includes branch lengths.

This prints some basic info about our tree, which we can access with the $ notation

# The 10 first labels:
phy_tree(phy)$tip.label[1:10]

and we can plot it (cex sets the size of the labels):

plot(phy_tree(phy), cex = 0.5)

refseq ¶

refseq contains the actual DNA sequences of our ASVs (or alternatively the reference sequences of OTUs). We can access it with the refseq() function.

refseq(phy)

DNAStringSet object of length 1310:
       width seq                                            names               
   [1]   237 TGGCGAGCGTTGTTCGGATTTA...CAGGGACGAAAGCATGGGGAG dc467f0f8b8aa389a...
   [2]   236 GTGCAAGCGTTAATCGGAATTA...GAAGCGCGAAAGCGTGGGTAG c387bc64fb22cd96d...
   [3]   236 TCGCAAGCGTTATCCGGATTTA...AAAGCGCGAAAGCGTGGGTAG 42a23e6f4764f572f...
   [4]   236 TCACAAGCGTTATCCGGATTTA...AAAGCGCGAAAGCGTGGGTAG 2ea17744c7eeab459...
   [5]   236 TCACGAGCGTTATCCGGATTTA...GAAGCGCGAAAGCGTGGGTAG 332ef16f5660bfe8e...
   ...   ... ...
[1306]   237 GGGCAAGCGTTATCCGGATTTA...GAGGCTCGAAAGCGTGGGGAG f43c36b22c1bc8798...
[1307]   237 GGGCCAGCGTTATCCGGATTTA...GAGGCTCGAAAGCGTGGGGAG 7342bde0691e85a0a...
[1308]   237 GGGCAAGCGTTATCCGGATTTA...GAGGCTCGAAAGCGTGGGGAG b2c5f0d55453383ac...
[1309]   237 GGGCAAGCGTTATCCGGATTTA...GAGGCTCGAAAGCGTGGGGAG ab8cf03d7d5eb6d8a...
[1310]   237 GGGCAAGCGTTATCCGGATTTA...GAGGCTCGAAAGCGTGGGGAG e5e51d9940a3c4553...

Again, we can subset with the [ ] notation

refseq(phy)["6ec6d03fbef9f16e3581ccdc60e7d266"]

DNAStringSet object of length 1:
    width seq                                               names               
[1]   237 GTGCAAGCGTTATCCGGAATTAT...CTGAGGAGCGAAAGCGTGGGGAG 6ec6d03fbef9f16e3...

To see the entire sequence, convert it to a string ("character" in R jargon)

as.character(refseq(phy)[c("6ec6d03fbef9f16e3581ccdc60e7d266")])

	S1	S2	S3	S4	S5	S6	S7	S8	S9	S10	⋯	S141	S142	S143	S144	S145	S146	S147	S148	S149	S150
dc467f0f8b8aa389aa106d751bb9a569	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	19	0	0	0
c387bc64fb22cd96d2b79dbfa932ce1e	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
42a23e6f4764f572f4d7c6d8e08769c3	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
2ea17744c7eeab459b7f41d4f9e22894	83	0	0	468	0	0	0	0	0	0	⋯	0	0	0	0	12	0	0	0	0	0
332ef16f5660bfe8ecaabda3404fc08b	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
0a39c4f216e370df5d1c10b2ac1f5286	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
2928490ca329c46e1649ff8d8c072efc	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	52	0	0	0	0
a805200f08abbfa1a4679a264e851398	0	37	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
095f00e04148b0db5e970c32232272f1	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
bb243e746b385279c86c7c8b365601d3	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
8557aacf799775fc8664c28e239764fc	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
48520a6dc94cba12ffb6452e1335b5c5	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
e5199a623272b9b25c65f0455a1cd77b	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
f3e368bdade206389143a53691dd3403	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
4ecaf2314cf477a3c8f559db0812131f	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
d00281a614fbe1834f8ae99f0ca8a4b8	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
4c304a27bc0520a7c398410713645502	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
10304568dd5843ac94284f138298496d	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
891a50a9672c9fe4fb071d53349a1087	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
cf2732becb3d1d66dfcaf9e85416a3bd	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
806c67148f3ec58a0f5a4c0aff5b18db	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
93d3df70f762ddfbb08ec06ae5ea894c	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	7	0	0
29f6c8a92557d221576220686357171d	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
d50b1472ab5d1b933604d13377ea8141	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
5805c09d874afccb8923b15c48bce80b	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
6ec6d03fbef9f16e3581ccdc60e7d266	10564	24580	49020	41283	0	13425	2319	12337	6588	16213	⋯	1956	0	1498	11625	0	0	0	187	0	913
8acb42cec03ba45f8b7e7dbc90595383	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
327333264d2d3fd218350b8201a128fa	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
8534c358254b1ce9593b641dfc2cb314	0	0	0	0	0	0	0	0	0	0	⋯	0	64	0	0	0	0	0	0	0	0
57ac1b63c5ca76f2e29fb844c9f804e3	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋱	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮
108aeef6545d4a8f2694af63c9b704bb	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
35b60479cc1958c19f27e3e6266e3fcd	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
05924d3a2fbe70d70ed0e7ab11d21c70	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
592ef2acca5a8061c8012f3bd4791c06	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
2f3bdd821f386dc4926db3596aeeeaca	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
51653ce8e5aef59c159c8cc40e3a7a47	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
47376d68d57c18330cf01210f9dae399	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
529eb774e3e9fe2a75c4e5e3fa08c021	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
b79d081871ba6301d1b26d1d900e20a2	0	0	0	0	0	0	0	0	0	0	⋯	5	12	0	0	0	0	0	0	0	0
458d301b7e2fbbed866d30fbd47353ed	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
ccd2d2e1ebdbda1ee4bf03fb8f401307	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	14	0	0	0	0
ae3db49f681a714480b1b6042e120346	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
20bddea1062c34fe82e1cd18adfbf32b	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
aa4045a865a1a92f2ad7c9911bd5bee7	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
c9cf0dccbc3516d0cd12bb9899048e12	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
3f05c76103eade8bd104421817eadad7	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
e2f7aa19c11e3adf96a7030321b7e2e4	0	0	0	0	0	0	0	0	0	0	⋯	0	368	0	0	0	0	0	0	152	0
a581dd58cef1f78d892b8bc14a1f5c4e	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	27	0	0
b16519ba8dcea827c9839d43ef1c1580	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
ce74dce376a5dedddcc0ff2342dc82e7	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
bd930ec718da906ce0ae573b6ed85022	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
36e52ff1c5961d932d931d0f148ed591	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
ff011bda3c950839aa99faf6cedc0ec5	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	85
34e55c0f0706224ff8a1aca1ede0a505	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
30ef39708ebcdcddbcdc2f2439c3c398	0	55	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	63
f43c36b22c1bc8798a8d7f25125919f6	0	0	0	0	0	0	77	0	0	11	⋯	0	47	0	0	0	0	0	0	0	0
7342bde0691e85a0a6fac0728be2aa56	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
b2c5f0d55453383ac1deda19b8c3bed2	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
ab8cf03d7d5eb6d8aedd296ab4225414	0	0	0	0	0	0	0	0	0	0	⋯	33	0	0	0	0	0	0	12	0	0
e5e51d9940a3c45538fd7d9542c958eb	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	85	0

	S1	S2	S3	S4	S5	S6	S7	S8	S9	S10	⋯	S141	S142	S143	S144	S145	S146	S147	S148	S149	S150
6ec6d03fbef9f16e3581ccdc60e7d266	10564	24580	49020	41283	0	13425	2319	12337	6588	16213	⋯	1956	0	1498	11625	0	0	0	187	0	913
a805200f08abbfa1a4679a264e851398	0	37	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0

	Patient	Time	Delivery
	<chr>	<chr>	<chr>
S1	870	1m	Vaginal
S2	b33	1m	Vaginal
S3	9b1	1m	Sectio
S4	9ba	1m	Vaginal
S5	728	1m	Vaginal
S6	960	1m	Sectio

	Patient	Time	Delivery
	<chr>	<chr>	<chr>
S11	a70	1m	Vaginal

	Patient	Time
	<chr>	<chr>
S2	b33	1m
S150	c61	1y

The phyloseq package¶

The phyloseq object¶

otu_table¶

sample_data¶

tax_table¶