Curated Data and Analysis Tools from MicrobiomeDB.org • MicrobiomeDB

A suite of R packages containing all of the data from MicrobiomeDB.org, and tools for analyzing and visualizing the same. It also supports importing data from the following formats:

TreeSummarizedExperiment
phyloseq
BIOM
QIIME2 output files
Mothur output files
dada2 output files
HUMAnN output files

If you have questions or comments of any kind, join us on our Discord Community Server! We’d love to hear from you.

Installation

Use the R package remotes to install MicrobiomeDB. From the R command prompt:

remotes::install_github('microbiomeDB/MicrobiomeDB')

If you would like to import data from any format other than TreeSummarizedExperiment, you will also need to install mia from Bioconductor with:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("mia")

Please note that Linux and MacOS with source-level installations require the ‘gsl’ system dependency for mia.

On Debian or Ubuntu

sudo apt-get install -y libgsl-dev

On Fedora, CentOS or RHEL

sudo yum install libgsl-devel

On macOS (source installations are not common on macOS, so this step is not usually necessary)

brew install gsl

Usage

This package is intended to be used to explore the curated datasets from MicrobiomeDB.org, as well as your own datasets. It comes pre-packaged with the same functions used to power the analysis tools from the website. Functions to facilitate easily importing data from phyloseq objects, TreeSummarizedExperiment objects and .biom files (among others) are available.

This package is paired with a dedicated data package called microbiomeData which includes a number of pre-built MbioDataset objects representing the curated data from the MicrobiomeDB.org website. You can see their names like:

remotes::install_github('microbiomeDB/microbiomeData')
microbiomeData::getCuratedDatasetNames()

This will return a list of names of data objects which were installed with the package. One such dataset is ‘DiabImmune’, which we’ll use in the following example. MbioDataset objects contain two things metadata and collections. Metadata typically include details about samples in the dataset, such as the age of the person they were collected from. Collections can be any group of variables which represent a unified biological concept measured over a consistent or comparable range of values. An example might be relative abundances of various genera.

getCollectionNames(DiabImmune) # will print the names of collections
myCollection <- getCollection(DiabImmune, '16S Species') # NOTE: you can also use the `format` argument here to get these as phyloseq objects

Once you have myCollection, you can start using our microbiomeComputations package (which was installed for you when you installed this one) to do fun things like:

alphaDivResults <- alphaDiv(myCollection)
correlationResults <- correlation(myCollection)
differentialAbundanceResults <- differentialAbundance(
  myCollection, 
  "breastfed_duration", # see getMetadataVariableNames() and getMetadataVariableSummary()
  groupA = function(x) {x < 300},
  groupB = function(x) {x >= 300},
  method = 'Maaslin2')

This will give you a ComputeResult object, with slots for data and statistics that you can explore. These objects can be difficult to parse, so we’ve added some functions to help format these results in more usable and exciting ways! They are called getComputeResult and getComputeResultWithMetadata which will return data.tables (and sometimes igraph objects) which you can use like this:

myCorrelationDT <- getComputeResult(correlationResults)
myCorrelationGraph <- getComputeResult(correlationResults, format = 'igraph')
MicrobiomeDB::correlationNetwork(myCorrelationDT) # will render a network visualization of the results using widgets from our own `corGraph` project

getMetadataVariableNames(DiabImmune) # will print names of metadata variables you can ask for
myAlphaDivDT <- getComputeResultWithMetadata(alphaDivResults, DiabImmune, 'host_body_site')

Or you can take these results as a data.table object and use them to build plots and things with ggplot2 or any other tool you like.

Let us know if you build something interesting, encounter any bugs, or just wish something were easier to do. We’d love to hear from you! We consider each new issue filed to be a compliment.

Contributing

Pull requests are welcome and should be made to the dev branch.

For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

Apache 2.0