Title: | Quantify Compositional Variability Across Relative Abundance Vectors |
---|---|
Description: | Implements the statistic FAVA, an Fst-based Assessment of Variability across vectors of relative Abundances, as well as a suite of helper functions which enable the visualization and statistical analysis of relative abundance data. The 'FAVA' R package accompanies the paper, “Quantifying compositional variability in microbial communities with FAVA” by Morrison, Xue, and Rosenberg (2024) <doi:10.1101/2024.07.03.601929>. |
Authors: | Maike Morrison [aut, cre, cph] |
Maintainer: | Maike Morrison <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.7 |
Built: | 2024-12-12 03:07:18 UTC |
Source: | https://github.com/maikemorrison/fava |
bootstrap_fava
uses bootstrapping to statistically compare FAVA values between pairs of relative abundance matrices. bootstrap_fava
takes the same options as fava
, so, as with fava
, you can separately analyze multiple populations or groups of samples (specify group
), and account for similarity among categories (specify S
) or uneven weighting of rows (specify w
or time
). bootstrap_fava
follows the bootstrapping procedure defined by Efron and Tibshirani (1993). Details on the bootstrapping procedure are available in the Methods section of the accompanying paper.
bootstrap_fava( relab_matrix, n_replicates = 1000, group, K = NULL, S = NULL, w = NULL, time = NULL, normalized = FALSE, seed = NULL, alternative = "two.sided" )
bootstrap_fava( relab_matrix, n_replicates = 1000, group, K = NULL, S = NULL, w = NULL, time = NULL, normalized = FALSE, seed = NULL, alternative = "two.sided" )
relab_matrix |
A matrix or data frame with rows containing non-negative entries that sum to 1. Each row represents a sample, each column represents a category, and each entry represents the abundance of that category in the sample. If |
n_replicates |
The number of bootstrap replicate matrices to generate. Default is |
group |
A string (or vector of strings) specifying the name(s) of the column(s) that describes which group(s) each row (sample) belongs to. Use if |
K |
Optional; an integer specifying the number of categories in the data. Default is |
S |
Optional; a K x K similarity matrix with diagonal elements equal to 1 and off-diagonal elements between 0 and 1. Entry |
w |
Optional; a vector of length |
time |
Optional; a string specifying the name of the column that describes the sampling time for each row. Include if you wish to weight FAVA by the distance between samples. |
normalized |
Optional; should normalized FAVA be used? Default is |
seed |
Optional; an integer to be used as a random seed for the simulations. |
alternative |
Optional; do you want to do a one- or two.sided test? Default is |
A named list containing the following entries:
p_values
: The probability of observing the observed difference in variability between each pair of groups if there were no difference between groups. Computed as the fraction of bootstrap differences greater than or equal to the observed difference. Depends on what alternative
is specified ("greater", "lesser", or "two.sided").
bootstrap_distribution_plot
: The distribution of bootstrap replicate differences in each variability value. The observed differences are shown in red. The further the red points are from 0, the more significant the statistical difference between groups.
observed_stats
: The observed diversity statistics for the groups.
bootstrap_stats
: The bootstrap replicate diversity statistics for the groups.
# Statistically compare values of FAVA between # subjects in the xue_microbiome_sample data: boot_out = bootstrap_fava(relab_matrix = xue_microbiome_sample, n_replicates = 20, # should use 1000 for a real analysis seed = 1, group = "subject", K = 524, S = xue_species_similarity) # Table of P-values comparing values of FAVA between group 1 and group 2: boot_out$P_values # Plots of the bootstrap distributions of differences in FAVA between each pair of matrices, # and how the true observed differences (red dots) compare to the distribution. boot_out$bootstrap_distribution_plot
# Statistically compare values of FAVA between # subjects in the xue_microbiome_sample data: boot_out = bootstrap_fava(relab_matrix = xue_microbiome_sample, n_replicates = 20, # should use 1000 for a real analysis seed = 1, group = "subject", K = 524, S = xue_species_similarity) # Table of P-values comparing values of FAVA between group 1 and group 2: boot_out$P_values # Plots of the bootstrap distributions of differences in FAVA between each pair of matrices, # and how the true observed differences (red dots) compare to the distribution. boot_out$bootstrap_distribution_plot
This function computes the population-genetic statistic Fst on any matrix with rows that sum to 1. Values of 0 are achieved when each row is a permutation of (1,0,..., 0) and at least two categories have non-zero abundance across all rows. The value equals 1 when each row is identical.
fava( relab_matrix, K = NULL, S = NULL, w = NULL, time = NULL, group = NULL, normalized = FALSE )
fava( relab_matrix, K = NULL, S = NULL, w = NULL, time = NULL, group = NULL, normalized = FALSE )
relab_matrix |
A matrix or data frame with rows containing non-negative entries that sum to 1. Each row represents
a sample, each column represents a category, and each entry represents the abundance of that category in the sample.
If |
K |
Optional; an integer specifying the number of categories in the data. Default is |
S |
Optional; a K x K similarity matrix with diagonal elements equal to 1 and off-diagonal elements between 0 and 1. Entry |
w |
Optional; a vector of length |
time |
Optional; a string specifying the name of the column that describes the sampling time for each row. Include if you wish to weight FAVA by the distance between samples. |
group |
Optional; a string (or vector of strings) specifying the name(s) of the column(s) that describes which group(s) each row (sample) belongs to. Use if |
normalized |
Optional; should normalized FAVA be used? Default is |
A numeric value between 0 and 1.
# Compute the Fst of # the following compositional vectors: q1 = c(1, 0, 0, 0) q2 = c(0.5, 0.5, 0, 0) q3 = c(1/4, 1/4, 1/4, 1/4) q4 = c(0, 0, 1, 0) relative_abundances = matrix(c(q1, q2, q3, q4), byrow = TRUE, nrow = 4) fava(relative_abundances) # Incoporating weights: # Compute fava ignoring # rows 2 and 3 row_weights = c(0.5, 0, 0, 0.5) fava(relative_abundances, w = row_weights) # Compute fava assuming that # categories 1 and 2 are identical: similarity_matrix = diag(4) similarity_matrix[1,2] = 1 similarity_matrix[2,1] = 1 fava(relative_abundances, S = similarity_matrix) # Assume categories 1 and 2 are identical AND # ignore rows 2 and 4: row_weights = c(0.5, 0, 0.5, 0) fava(relative_abundances, w = row_weights, S = similarity_matrix)
# Compute the Fst of # the following compositional vectors: q1 = c(1, 0, 0, 0) q2 = c(0.5, 0.5, 0, 0) q3 = c(1/4, 1/4, 1/4, 1/4) q4 = c(0, 0, 1, 0) relative_abundances = matrix(c(q1, q2, q3, q4), byrow = TRUE, nrow = 4) fava(relative_abundances) # Incoporating weights: # Compute fava ignoring # rows 2 and 3 row_weights = c(0.5, 0, 0, 0.5) fava(relative_abundances, w = row_weights) # Compute fava assuming that # categories 1 and 2 are identical: similarity_matrix = diag(4) similarity_matrix[1,2] = 1 similarity_matrix[2,1] = 1 fava(relative_abundances, S = similarity_matrix) # Assume categories 1 and 2 are identical AND # ignore rows 2 and 4: row_weights = c(0.5, 0, 0.5, 0) fava(relative_abundances, w = row_weights, S = similarity_matrix)
This function computes the normalized Fst given the number of rows and the mean abundance of the most abundant category. We employ the normalization employed in the FSTruct package by Morrison, Alcala, and Rosenberg (2020) doi:10.1111/1755-0998.13647.
fava_norm(relab_matrix, K = ncol(relab_matrix))
fava_norm(relab_matrix, K = ncol(relab_matrix))
relab_matrix |
A matrix or data frame with rows containing non-negative entries that sum to 1. Each row represents
a sample, each column represents a category, and each entry represents the abundance of that category in the sample.
If |
K |
Optional; an integer specifying the number of categories in the data. Default is |
A numeric value between 0 and 1.
# Compute the weighted fava of # the following compositional vectors: q1 = c(1, 0, 0, 0) q2 = c(0.5, 0.5, 0, 0) q3 = c(1/4, 1/4, 1/4, 1/4) q4 = c(0, 0, 1, 0) relative_abundances = matrix(c(q1, q2, q3, q4), byrow = TRUE, nrow = 4) fava_norm(relative_abundances)
# Compute the weighted fava of # the following compositional vectors: q1 = c(1, 0, 0, 0) q2 = c(0.5, 0.5, 0, 0) q3 = c(1/4, 1/4, 1/4, 1/4) q4 = c(0, 0, 1, 0) relative_abundances = matrix(c(q1, q2, q3, q4), byrow = TRUE, nrow = 4) fava_norm(relative_abundances)
This function computes the Gini-Simpson index, a statistical measure of variability known in population genetics as heterozygosity, of avector of non-negative entries which sum to 1. The function returns a number between 0 and 1 which quantifies the variability of the vector. Values of 0 are achieved when the vector is a permutation of (1,0,..., 0). The value approaches 1 as the number of categories K increases when the vector is equal to (1/K, 1/K, ..., 1/K).
gini_simpson(q, K = length(q), S = diag(K))
gini_simpson(q, K = length(q), S = diag(K))
q |
A vector with |
K |
Optional; an integer specifying the number of categories in the data. Default is |
S |
Optional; a K x K similarity matrix with diagonal elements equal to 1 and off-diagonal elements between 0 and 1. Entry |
A numeric value between 0 and 1.
# Compute unweighted Gini-Simpson index: gini_simpson(q = c(0.4, 0.3, 0.3)) # Compute Gini-Simpson index assuming that # categories 1 and 2 are identical: similarity_matrix = diag(3) similarity_matrix[1,2] = 1 similarity_matrix[2,1] = 1 gini_simpson(q = c(0.4, 0.3, 0.3), S = similarity_matrix)
# Compute unweighted Gini-Simpson index: gini_simpson(q = c(0.4, 0.3, 0.3)) # Compute Gini-Simpson index assuming that # categories 1 and 2 are identical: similarity_matrix = diag(3) similarity_matrix[1,2] = 1 similarity_matrix[2,1] = 1 gini_simpson(q = c(0.4, 0.3, 0.3), S = similarity_matrix)
This function computes the mean Gini-Simpson index, a statistical measure of variability known in population genetics as heterozygosity, of a set of vectors of non-negative entries which sum to 1. The function returns a number between 0 and 1 which quantifies the mean variability of the vectors. Values of 0 are achieved when each vector is a permutation of (1,0,..., 0). The value approaches 1 as the number of categories K increases when the vectors are equal to (1/K, 1/K, ..., 1/K).
gini_simpson_mean( relab_matrix, K = NULL, S = NULL, w = NULL, time = NULL, group = NULL )
gini_simpson_mean( relab_matrix, K = NULL, S = NULL, w = NULL, time = NULL, group = NULL )
relab_matrix |
A matrix or data frame with rows containing non-negative entries that sum to 1. Each row represents
a sample, each column represents a category, and each entry represents the abundance of that category in the sample.
If |
K |
Optional; an integer specifying the number of categories in the data. Default is |
S |
Optional; a K x K similarity matrix with diagonal elements equal to 1 and off-diagonal elements between 0 and 1. Entry |
w |
Optional; a vector of length |
time |
Optional; a string specifying the name of the column that describes the sampling time for each row. Include if you wish to weight FAVA by the distance between samples. |
group |
Optional; a string (or vector of strings) specifying the name(s) of the column(s) that describes which group(s) each row (sample) belongs to. Use if |
A numeric value between 0 and 1.
# To compute the mean Gini-Simpson index of # the following compositional vectors... q1 = c(1, 0, 0, 0) q2 = c(0.5, 0.5, 0, 0) q3 = c(1/4, 1/4, 1/4, 1/4) q4 = c(0, 0, 1, 0) # we could compute the mean manually: mean(sapply(list(q1, q2, q3, q4), gini_simpson)) # Or we could use gini_simpson_mean: relative_abundances = matrix(c(q1, q2, q3, q4), byrow = TRUE, nrow = 4) gini_simpson_mean(relative_abundances) # Incoporating weights: # Compute mean Gini-Simpson index ignoring # rows 2 and 3 row_weights = c(0.5, 0, 0, 0.5) gini_simpson_mean(relative_abundances, w = row_weights) # Compute mean Gini-Simpson index assuming that # categories 1 and 2 are identical: similarity_matrix = diag(4) similarity_matrix[1,2] = 1 similarity_matrix[2,1] = 1 gini_simpson_mean(relative_abundances, S = similarity_matrix) # Assume categories 1 and 2 are identical AND # ignore rows 2 and 4: row_weights = c(0.5, 0, 0.5, 0) gini_simpson_mean(relative_abundances, w = row_weights, S = similarity_matrix)
# To compute the mean Gini-Simpson index of # the following compositional vectors... q1 = c(1, 0, 0, 0) q2 = c(0.5, 0.5, 0, 0) q3 = c(1/4, 1/4, 1/4, 1/4) q4 = c(0, 0, 1, 0) # we could compute the mean manually: mean(sapply(list(q1, q2, q3, q4), gini_simpson)) # Or we could use gini_simpson_mean: relative_abundances = matrix(c(q1, q2, q3, q4), byrow = TRUE, nrow = 4) gini_simpson_mean(relative_abundances) # Incoporating weights: # Compute mean Gini-Simpson index ignoring # rows 2 and 3 row_weights = c(0.5, 0, 0, 0.5) gini_simpson_mean(relative_abundances, w = row_weights) # Compute mean Gini-Simpson index assuming that # categories 1 and 2 are identical: similarity_matrix = diag(4) similarity_matrix[1,2] = 1 similarity_matrix[2,1] = 1 gini_simpson_mean(relative_abundances, S = similarity_matrix) # Assume categories 1 and 2 are identical AND # ignore rows 2 and 4: row_weights = c(0.5, 0, 0.5, 0) gini_simpson_mean(relative_abundances, w = row_weights, S = similarity_matrix)
This function computes the Gini-Simpson index of a "pooled" vector equal to colMeans(relab_matrix)
. Values of 0 are achieved when this pooled vector is a permutation of (1,0,..., 0). The value approaches 1 as the number of categories K increases when this pooled vector is equal to (1/K, 1/K, ..., 1/K).
gini_simpson_pooled( relab_matrix, K = NULL, S = NULL, w = NULL, time = NULL, group = NULL )
gini_simpson_pooled( relab_matrix, K = NULL, S = NULL, w = NULL, time = NULL, group = NULL )
relab_matrix |
A matrix or data frame with rows containing non-negative entries that sum to 1. Each row represents
a sample, each column represents a category, and each entry represents the abundance of that category in the sample.
If |
K |
Optional; an integer specifying the number of categories in the data. Default is |
S |
Optional; a K x K similarity matrix with diagonal elements equal to 1 and off-diagonal elements between 0 and 1. Entry |
w |
Optional; a vector of length |
time |
Optional; a string specifying the name of the column that describes the sampling time for each row. Include if you wish to weight FAVA by the distance between samples. |
group |
Optional; a string (or vector of strings) specifying the name(s) of the column(s) that describes which group(s) each row (sample) belongs to. Use if |
A numeric value between 0 and 1.
# To compute the pooled Gini-Simpson index of # the following compositional vectors... q1 = c(1, 0, 0, 0) q2 = c(0.5, 0.5, 0, 0) q3 = c(1/4, 1/4, 1/4, 1/4) q4 = c(0, 0, 1, 0) # we could compute the mean manually: qPooled = (q1 + q2 + q3 + q4)/4 gini_simpson(qPooled) # Or we could use gini_simpson_pooled: relative_abundances = matrix(c(q1, q2, q3, q4), byrow = TRUE, nrow = 4) gini_simpson_pooled(relative_abundances) # Incoporating weights: # Compute pooled Gini-Simpson index ignoring # rows 2 and 3 row_weights = c(0.5, 0, 0, 0.5) gini_simpson_pooled(relative_abundances, w = row_weights) # Compute pooled Gini-Simpson index assuming that # categories 1 and 2 are identical: similarity_matrix = diag(4) similarity_matrix[1,2] = 1 similarity_matrix[2,1] = 1 gini_simpson_pooled(relative_abundances, S = similarity_matrix) # Assume categories 1 and 2 are identical AND # ignore rows 2 and 4: row_weights = c(0.5, 0, 0.5, 0) gini_simpson_pooled(relative_abundances, w = row_weights, S = similarity_matrix)
# To compute the pooled Gini-Simpson index of # the following compositional vectors... q1 = c(1, 0, 0, 0) q2 = c(0.5, 0.5, 0, 0) q3 = c(1/4, 1/4, 1/4, 1/4) q4 = c(0, 0, 1, 0) # we could compute the mean manually: qPooled = (q1 + q2 + q3 + q4)/4 gini_simpson(qPooled) # Or we could use gini_simpson_pooled: relative_abundances = matrix(c(q1, q2, q3, q4), byrow = TRUE, nrow = 4) gini_simpson_pooled(relative_abundances) # Incoporating weights: # Compute pooled Gini-Simpson index ignoring # rows 2 and 3 row_weights = c(0.5, 0, 0, 0.5) gini_simpson_pooled(relative_abundances, w = row_weights) # Compute pooled Gini-Simpson index assuming that # categories 1 and 2 are identical: similarity_matrix = diag(4) similarity_matrix[1,2] = 1 similarity_matrix[2,1] = 1 gini_simpson_pooled(relative_abundances, S = similarity_matrix) # Assume categories 1 and 2 are identical AND # ignore rows 2 and 4: row_weights = c(0.5, 0, 0.5, 0) gini_simpson_pooled(relative_abundances, w = row_weights, S = similarity_matrix)
This function enables graphical visualization of a matrix of compostional data. In the output plot, each vertical bar represents a single vector; the height of each color in the bar corresponds to the abundance of each category in that vector. Because this function produces a ggplot object, its output can be modified using standard ggplot2 syntax.
plot_relabund( relab_matrix, group = NULL, time = NULL, w = NULL, K = NULL, arrange = FALSE )
plot_relabund( relab_matrix, group = NULL, time = NULL, w = NULL, K = NULL, arrange = FALSE )
relab_matrix |
A matrix or data frame with rows containing non-negative entries that sum to 1. Each row represents
a sample, each column represents a category, and each entry represents the abundance of that category in the sample.
If |
group |
Optional; a string specifying the name of the column that describes which group each row (sample) belongs to. Use if |
time |
Optional; a string specifying the name of the column that describes the sampling time for each row. Include if you wish to weight FAVA by the distance between samples. |
w |
Optional; a vector of length |
K |
Optional; an integer specifying the number of categories in the data. Default is |
arrange |
Optional; controls horizontal ordering of samples and vertical ordering of categories.
If |
A ggplot object containing a bar plot visualization of the relative abundance matrix.
# Make an example matrix of compositional data # Each row is an individual. Rows sum to 1. population_A = matrix(c( .5, .3, .2, .4, .2, .4, .5, .4, .1, .6, .1, .3, .2, 0, .8 ), nrow = 5, byrow = TRUE ) plot_relabund(relab_matrix = population_A, K = 3, # How many categories per vector? arrange = FALSE ) plot_relabund(relab_matrix = population_A, K = 3, # How many categories per vector? arrange = "horizontal" ) plot_relabund(relab_matrix = population_A, K = 3, # How many categories per vector? arrange = "vertical" ) plot_relabund(relab_matrix = population_A, K = 3, # How many categories per vector? arrange = TRUE # could also be "both" ) # You can modify the plot as you would any ggplot2 object plot_relabund(relab_matrix = population_A, K = 3, # How many categories per vector? arrange = TRUE ) + # Below are example, optional modifications to the default plot ggplot2::ggtitle("Population A") + ggplot2::scale_fill_brewer("Blues") + ggplot2::scale_color_brewer("Blues") + ggplot2::xlab("Individuals") # Note that both scale_fill and scale_color are needed to change the color of the bars. # Plot a dataset which has 2 populations population_B = matrix(c( .9, 0, .1, .6, .4, 0, .7, 0, .3, .3, .4, .3, .5, .3, .2 ), nrow = 5, byrow = TRUE ) populations_AB = cbind(data.frame(c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B")), rbind(population_A, population_B)) colnames(populations_AB) = c("population", "category_1", "category_2", "category_3") plot_relabund(relab_matrix = populations_AB, group = "population") plot_relabund(relab_matrix = populations_AB, group = "population", arrange = "vertical") plot_relabund(relab_matrix = populations_AB, group = "population", arrange = "horizontal") plot_relabund(relab_matrix = populations_AB, group = "population", arrange = "both")
# Make an example matrix of compositional data # Each row is an individual. Rows sum to 1. population_A = matrix(c( .5, .3, .2, .4, .2, .4, .5, .4, .1, .6, .1, .3, .2, 0, .8 ), nrow = 5, byrow = TRUE ) plot_relabund(relab_matrix = population_A, K = 3, # How many categories per vector? arrange = FALSE ) plot_relabund(relab_matrix = population_A, K = 3, # How many categories per vector? arrange = "horizontal" ) plot_relabund(relab_matrix = population_A, K = 3, # How many categories per vector? arrange = "vertical" ) plot_relabund(relab_matrix = population_A, K = 3, # How many categories per vector? arrange = TRUE # could also be "both" ) # You can modify the plot as you would any ggplot2 object plot_relabund(relab_matrix = population_A, K = 3, # How many categories per vector? arrange = TRUE ) + # Below are example, optional modifications to the default plot ggplot2::ggtitle("Population A") + ggplot2::scale_fill_brewer("Blues") + ggplot2::scale_color_brewer("Blues") + ggplot2::xlab("Individuals") # Note that both scale_fill and scale_color are needed to change the color of the bars. # Plot a dataset which has 2 populations population_B = matrix(c( .9, 0, .1, .6, .4, 0, .7, 0, .3, .3, .4, .3, .5, .3, .2 ), nrow = 5, byrow = TRUE ) populations_AB = cbind(data.frame(c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B")), rbind(population_A, population_B)) colnames(populations_AB) = c("population", "category_1", "category_2", "category_3") plot_relabund(relab_matrix = populations_AB, group = "population") plot_relabund(relab_matrix = populations_AB, group = "population", arrange = "vertical") plot_relabund(relab_matrix = populations_AB, group = "population", arrange = "horizontal") plot_relabund(relab_matrix = populations_AB, group = "population", arrange = "both")
The R package phyloseq streamlines the storage and analysis of microbiome sequence data. This function takes a phyloseq object and extracts the OTU table and the sample metadata and combines them into one relative abundance matrix with rows corresponding to samples, metadata on the left-hand side, and OTU relative abundances on the right-hand side.
relab_phyloseq(phyloseq_object)
relab_phyloseq(phyloseq_object)
phyloseq_object |
A phyloseq object containing both an OTU table ( |
A data frame with rows representing samples and columns representing sample data categories or OTU relative abundances. OTU abundances are automatically normalized so that they sum to 1 for each sample, though a warning will be provided if a renormalization was necessary.
if (requireNamespace("phyloseq", quietly = TRUE)) { data(GlobalPatterns, package = "phyloseq") # Make a small phyloseq object for demonstration phyloseq_subset = phyloseq::subset_taxa(phyloseq::subset_samples(GlobalPatterns, X.SampleID %in% c("CL3", "CC1")), Order == "Cenarchaeales") otu_table = relab_phyloseq(phyloseq_subset) otu_table[, 1:10] }
if (requireNamespace("phyloseq", quietly = TRUE)) { data(GlobalPatterns, package = "phyloseq") # Make a small phyloseq object for demonstration phyloseq_subset = phyloseq::subset_taxa(phyloseq::subset_samples(GlobalPatterns, X.SampleID %in% c("CL3", "CC1")), Order == "Cenarchaeales") otu_table = relab_phyloseq(phyloseq_subset) otu_table[, 1:10] }
This function takes a vector of sampling times,
and computes a normalized vector which can be used to weight each sample based on
the time between the subsequent and the preceding samples. The weighting vector
is defined such that each entry,
, where
and
for
not equal to 1 or I.
and
.
time_weights(times, group = NULL)
time_weights(times, group = NULL)
times |
A numeric vector of sampling times. Each entry must be greater than the previous entry. |
group |
Optional; a character vector specifying the group identity of each sampling time. Use if there are samples from multiple replicates or subjects in one dataset. |
A numeric vector. Each entry provides a weight for each entry in the
provided times
vector. If group
is not specified, the vector sums to 1. If
group
is specified, the vector sums to the number of distinct groups.
time_vector = c(1, 8, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 44, 50, 57, 64) time_weights(times = time_vector)
time_vector = c(1, 8, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 44, 50, 57, 64) time_weights(times = time_vector)
This function computes FAVA in sliding window slices of a dataset.
window_fava( relab_matrix, window_size, window_step = 1, group = NULL, index = NULL, time = NULL, w = NULL, S = NULL, K = NULL, normalized = FALSE, alpha = 0.5 )
window_fava( relab_matrix, window_size, window_step = 1, group = NULL, index = NULL, time = NULL, w = NULL, S = NULL, K = NULL, normalized = FALSE, alpha = 0.5 )
relab_matrix |
A matrix or data frame with rows containing non-negative entries that sum to 1. Each row represents
a sample, each column represents a category, and each entry represents the abundance of that category in the sample.
If |
window_size |
An integer number specifying the number of samples per window. |
window_step |
Optional; an integer specifying the distance between the first entry of adjacent windows. Default is |
group |
Optional; a string specifying the name of the column that describes which group each row (sample) belongs to. Use if |
index |
Optional; a string specifying the name of the column in |
time |
Optional; a string specifying the name of the column that describes the sampling time for each row. Include if you wish to weight FAVA by the distance between samples. |
w |
Optional; a vector of length |
S |
Optional; a K x K similarity matrix with diagonal elements equal to 1 and off-diagonal elements between 0 and 1. Entry |
K |
Optional; an integer specifying the number of categories in the data. Default is |
normalized |
Optional; should normalized FAVA be used? Default is |
alpha |
Optional; number between 0 and 1 specifying the opacity of the horizontal
lines plotted. Default is |
A list of values of FAVA for each window.
A = matrix(c(.3,.7,0,.1,0,.9,.2,.5,.3,.1,.8,.1,.3,.4,.3,.6,.4,0,0,.5,.5), ncol = 3, byrow = TRUE) window_out = window_fava(relab_matrix = A, window_size = 4, normalized = TRUE)
A = matrix(c(.3,.7,0,.1,0,.9,.2,.5,.3,.1,.8,.1,.3,.4,.3,.6,.4,0,0,.5,.5), ncol = 3, byrow = TRUE) window_out = window_fava(relab_matrix = A, window_size = 4, normalized = TRUE)
This function generates a list of of sliding windows conditional on two parameters: the length of each window (number of samples) and the total number of samples present in the data.
window_list(window_size, length, window_step = 1)
window_list(window_size, length, window_step = 1)
window_size |
An integer number specifying the number of samples per window. |
length |
An integer number specifying the total number of samples. |
window_step |
Optional; an integer number specifying the distance between the first entry of adjacent windows. Default is |
A list of samples of sample indices. Each list entry represents one window.
window_list(window_size = 6, length = 40) window_list(window_size = 6, length = 40, window_step = 2)
window_list(window_size = 6, length = 40) window_list(window_size = 6, length = 40, window_step = 2)
This function generates a plot of normalized or unnormalized, weighted or unweighted FAVA computed in sliding windows across samples for one or many groups of samples.
window_plot(window_fava, alpha = 0.5)
window_plot(window_fava, alpha = 0.5)
window_fava |
The output of |
alpha |
Optional; number between 0 and 1 specifying the opacity of the horizontal
lines plotted. Default is |
A ggplot2 object.
A = matrix(c(.3,.7,0,.1,0,.9,.2,.5,.3,.1,.8,.1,.3,.4,.3,.6,.4,0,0,.5,.5), ncol = 3, byrow = TRUE) window_out = window_fava(relab_matrix = A, window_size = 4, normalized = TRUE) window_out$window_data window_out$window_plot
A = matrix(c(.3,.7,0,.1,0,.9,.2,.5,.3,.1,.8,.1,.3,.4,.3,.6,.4,0,0,.5,.5), ncol = 3, byrow = TRUE) window_out = window_fava(relab_matrix = A, window_size = 4, normalized = TRUE) window_out$window_data window_out$window_plot
A subset of the data generated by Xue et al. (2024) detailing longitudinal composition of the human gut microbiome for three subjects who experience an antiobitic perturbation between days 29 and 34. We include only the subjects XAA, XBA, and XCA. Data from doi:10.1101/2023.09.26.559480.
xue_microbiome_sample
xue_microbiome_sample
xue_microbiome_sample
A data frame with 75 rows and 1,348 columns:
Subject ID: XBA, XDA, or XMA
Time (days) of sample collection
Species names
...
A data frame providing taxonomic information for the species included in xue_species_tree.
xue_species_info
xue_species_info
xue_species_info
A data frame with 1346 rows and 9 columns:
The species_id given in xue_microbiome_sample
The corresponding taxonomic category for each species
The numeric code associated with each species, as used in xue_species_tree
A similarity matrix, with entry (i,j) corresponding to the pairwise similarity between species i and species j. This similarity matrix was derived from a phylogenetic distance matrix, inferred from the tree xue_species_tree, using the expression s(i,j) = exp(-d(i,j)), where d(i,j) is the phylogenetic distance between species i and j.
xue_species_similarity
xue_species_similarity
xue_species_similarity
A data frame with 524 rows and 524 columns, each corresponding to one species.
A phylogenetic tree in the Newick format.
xue_species_tree
xue_species_tree
xue_species_tree
A Newick tree.