Title: | Dynamic Estimator of Microbial Communities |
---|---|
Description: | Multi-sample algorithm based on contigs and coverage values, to infer the relative distances of contigs from the replication origin and to accurately compare bacterial growth rates between samples. Yuan Gao and Hongzhe Li (2018) <doi:10.1038/s41592-018-0182-0>. |
Authors: | Yuan Gao [aut, cph], Charlie Bushman [cre] |
Maintainer: | Charlie Bushman <[email protected]> |
License: | GPL (>= 3) |
Version: | 2.0.0.9000 |
Built: | 2025-02-18 05:42:00 UTC |
Source: | https://github.com/ulthran/demic |
Compares contig subset x against contig subset y
compare_contig_subsets( est_ptrs_x, est_ptrs_y, pipeline_x, pipeline_y, cor_cutoff, max_cor )
compare_contig_subsets( est_ptrs_x, est_ptrs_y, pipeline_x, pipeline_y, cor_cutoff, max_cor )
est_ptrs_x |
PTR estimates from contig subset x |
est_ptrs_y |
PTR estimates from contig subset y |
pipeline_x |
pipeline for contig subset x |
pipeline_y |
pipeline for contig subset y |
cor_cutoff |
the correlation cutoff |
max_cor |
the max correlation |
a named list including the est_ptr dataframe and a max_cor value
sample: sample
est_ptr: PTR estimate
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage
max_cor: the max correlation achieved
Compares sample subset x against sample subset y
compare_sample_subsets( est_ptrs_x, est_ptrs_y, pipeline_x, pipeline_y, cor_cutoff, max_cor )
compare_sample_subsets( est_ptrs_x, est_ptrs_y, pipeline_x, pipeline_y, cor_cutoff, max_cor )
est_ptrs_x |
PTR estimates from sample subset x |
est_ptrs_y |
PTR estimates from sample subset y |
pipeline_x |
pipeline for sample subset x |
pipeline_y |
pipeline for sample subset y |
cor_cutoff |
the correlation cutoff |
max_cor |
the max correlation |
a named list including the est_ptr dataframe and a max_cor value
sample: sample
est_ptr: PTR estimate
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage
A function for data frame integration
consist_transfer(x, y, i)
consist_transfer(x, y, i)
x |
first data frame |
y |
second data frame |
i |
'sample' column |
a data frame with the other column as mean or max of that in the original two
A function to return the first dimension of PCA on an input matrix
contig_pca(X)
contig_pca(X)
X |
a matrix to undergo PCA |
first dimension of the PCA results
Data associated with DEMIC paper (on SourceForge)
ContigCluster1
ContigCluster1
ContigCluster1
A data frame with 120,897 rows and 5 columns:
Log Coverage for Sliding Windows over Contigs
GC Content for Sliding Windows over Contigs
Sample Name
Contig Name
Length of Contig
https://sourceforge.net/projects/demic/files/
Data associated with DEMIC paper (on SourceForge)
ContigCluster2
ContigCluster2
ContigCluster2
A data frame with 66,735 rows and 5 columns:
Log Coverage for Sliding Windows over Contigs
GC Content for Sliding Windows over Contigs
Sample Name
Contig Name
Length of Contig
https://sourceforge.net/projects/demic/files/
Determine the majority orientation of the input PTR estimates correlations
cor_diff(Z)
cor_diff(Z)
Z |
a vector of values |
a minor subset, where each value has the same orientation
A function for data frame transfer
df_transfer(x, y)
df_transfer(x, y)
x |
first data frame with six columns |
y |
second data frame with six columns |
a data frame with the same six columns but integrated info
Estimate PTRs using all input data as well as using subsets of contigs and samples
est_ptr(X)
est_ptr(X)
X |
dataframe with coverage matrix (column names: "log_cov", "GC_content", "sample", "contig", "length") |
named list with results from all three methods all_ptr dataframe with the estimated PTRs on success, null otherwise
est_ptr: estimated PTR values
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage
contigs_ptr dataframe with the estimated PTRs on success, null otherwise
est_ptr: estimated PTR values
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage
samples_ptr dataframe with the estimated PTRs on success, null otherwise
est_ptr: estimated PTR values
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage
est_ptrs_001 <- est_ptr(max_bin_003) est_ptrs_001
est_ptrs_001 <- est_ptr(max_bin_003) est_ptrs_001
Requires a minimum of 2 * num_subsets contigs/samples
est_ptr_on(X, subset_on, max_attempts = 10, num_subsets = 3, cor_cutoff = 0.98)
est_ptr_on(X, subset_on, max_attempts = 10, num_subsets = 3, cor_cutoff = 0.98)
X |
cov3 dataframe |
subset_on |
either "contig" or "sample" |
max_attempts |
max number of attempts to find a valid ptr estimate |
num_subsets |
number of subsets to split contigs/samples into |
cor_cutoff |
minimum correlation coefficient to accept PTR estimate |
est_ptrs dataframe on success, null otherwise
est_ptr: estimated PTR values
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage
est_ptrs_001_on_contigs <- est_ptr_on(max_bin_003, "contig", num_subsets = 5) est_ptrs_001_on_contigs est_ptrs_001_on_samples <- est_ptr_on(max_bin_003, "sample") is.null(est_ptrs_001_on_samples)
est_ptrs_001_on_contigs <- est_ptr_on(max_bin_003, "contig", num_subsets = 5) est_ptrs_001_on_contigs est_ptrs_001_on_samples <- est_ptr_on(max_bin_003, "sample") is.null(est_ptrs_001_on_samples)
Estimates PTRs based on the whole input dataset
est_ptr_on_all(X)
est_ptr_on_all(X)
X |
cov3 dataframe |
est_ptrs dataframe on success, null otherwise
est_ptr: estimated PTR values
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage
est_ptrs_001 <- est_ptr_on_all(max_bin_003) est_ptrs_001
est_ptrs_001 <- est_ptr_on_all(max_bin_003) est_ptrs_001
Get PTR estimates for output of the core pipeline on a subset of data
est_ptrs_subset(p)
est_ptrs_subset(p)
p |
is the pipeline named list |
a dataframe
sample: sample
est_ptr: PTR estimate
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage
A function for sample filtration Input requirements: 1. have values in more than half of the contigs 2. average log2(cov) > 0 in all these contigs
filter_sample(Z, avg_cutoff, cutoff_ratio)
filter_sample(Z, avg_cutoff, cutoff_ratio)
Z |
a matrix |
avg_cutoff |
threshold of average |
cutoff_ratio |
threshold of ratio |
the coefficient and p value of linear regression
Generate a variety of stats on PTR estimates for a given dataset
get_eptr_stats(X, iterations = 30)
get_eptr_stats(X, iterations = 30)
X |
cov3 dataframe |
iterations |
number of iterations to run |
named list of stats on PTR estimates
all_sd: standard deviation of PTR estimates from all method
all_mean: mean of PTR estimates from all method
contigs_sd: standard deviation of PTR estimates from contigs method
contigs_mean: mean of PTR estimates from contigs method
samples_sd: standard deviation of PTR estimates from samples method
samples_mean: mean of PTR estimates from samples method
stats <- get_eptr_stats(max_bin_001[max_bin_001$sample %in% c('Akk0_001', 'Akk1_001'), ], 2) stats
stats <- get_eptr_stats(max_bin_001[max_bin_001$sample %in% c('Akk0_001', 'Akk1_001'), ], 2) stats
A function for iteration of pipeline until convergence
iterate_pipelines(Z)
iterate_pipelines(Z)
Z |
a matrix of coverages |
a named list
samples: vector of final filtered samples
correct_ys: matrix of sample, contig and corrected coverages
pc1: matrix of contig and PC1 values
pc1_range: vector of PC1 range
samples_y: samples filtered for reliable coverage
A convenient function for KS test of uniform distribution
ks(x)
ks(x)
x |
a vector without NA |
the p value of KS test
A convenient function for ordinary linear regression on two vectors
lm_column(x, y)
lm_column(x, y)
x |
first vector |
y |
second vector |
the coefficient and p value of linear regression
Run mixed linear model with random effect using lme4
lme4_model(X)
lme4_model(X)
X |
input data frame |
a dataframe
Generated by PyCov3 on simulated test data
max_bin_001
max_bin_001
max_bin_001
A data frame with 79,740 rows and 5 columns:
Log Coverage for Sliding Windows over Contigs
GC Content for Sliding Windows over Contigs
Sample Name
Contig Name
Length of Contig
https://sourceforge.net/projects/demic/files/
Generated by PyCov3 on simulated test data
max_bin_002
max_bin_002
max_bin_002
A data frame with 148,638 rows and 5 columns:
Log Coverage for Sliding Windows over Contigs
GC Content for Sliding Windows over Contigs
Sample Name
Contig Name
Length of Contig
https://sourceforge.net/projects/demic/files/
Generated by PyCov3 on simulated test data
max_bin_003
max_bin_003
max_bin_003
A data frame with 124,578 rows and 5 columns:
Log Coverage for Sliding Windows over Contigs
GC Content for Sliding Windows over Contigs
Sample Name
Contig Name
Length of Contig
https://sourceforge.net/projects/demic/files/
A function representing the pipeline of four steps including GC bias correction, sample filtration, PCA and contig filtration
pipeline(Y, i)
pipeline(Y, i)
Y |
a matrix of coverages |
i |
cutoff of filtering samples changes according to parameter i; i=1, cutoffRatio is 0.5; i=2, cutoffRatio is 1 as contig is clean |
a named list
samples: final list of filtered samples
correct_ys: dataframe with correct Y values per contig/sample
pc1: PC1 results of PCA per contig
pc1_range: range of PC1
samples_y: samples filtered for reliable coverage
A function for reshape to facilitate PCA, removing all contigs with missing values for designated samples
reshape_filtered(samples_filtered, Z)
reshape_filtered(samples_filtered, Z)
samples_filtered |
a vector of samples |
Z |
a matrix of coverage |
a reshaped matrix of coverage
A function to remove outlier contigs using KS test
select_by_ks_test(sort_values)
select_by_ks_test(sort_values)
sort_values |
a vector of sorted values |
a vector with all values following a uniform distribution
A function to test whether the result is reasonable
test_reasonable(a, b)
test_reasonable(a, b)
a |
first vector of values |
b |
second vector of values |
the test result
Verify that the input dataframe/matrix is valid
verify_input(X)
verify_input(X)
X |
dataframe/matrix with cov3 information |