Package 'demic' reference manual

Title:	Dynamic Estimator of Microbial Communities
Description:	Multi-sample algorithm based on contigs and coverage values, to infer the relative distances of contigs from the replication origin and to accurately compare bacterial growth rates between samples. Yuan Gao and Hongzhe Li (2018) <doi:10.1038/s41592-018-0182-0>.
Authors:	Yuan Gao [aut, cph], Charlie Bushman [cre]
Maintainer:	Charlie Bushman <[email protected]>
License:	GPL (>= 3)
Version:	2.0.0.9000
Built:	2025-03-20 04:46:11 UTC
Source:	https://github.com/ulthran/demic

Compares contig subset x against contig subset y

Description

Compares contig subset x against contig subset y

Usage

compare_contig_subsets(
  est_ptrs_x,
  est_ptrs_y,
  pipeline_x,
  pipeline_y,
  cor_cutoff,
  max_cor
)
compare_contig_subsets(
  est_ptrs_x,
  est_ptrs_y,
  pipeline_x,
  pipeline_y,
  cor_cutoff,
  max_cor
)

Arguments

`est_ptrs_x`	PTR estimates from contig subset x
`est_ptrs_y`	PTR estimates from contig subset y
`pipeline_x`	pipeline for contig subset x
`pipeline_y`	pipeline for contig subset y
`cor_cutoff`	the correlation cutoff
`max_cor`	the max correlation

Value

a named list including the est_ptr dataframe and a max_cor value

sample: sample
est_ptr: PTR estimate
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage

max_cor: the max correlation achieved

Compares sample subset x against sample subset y

Description

Compares sample subset x against sample subset y

Usage

compare_sample_subsets(
  est_ptrs_x,
  est_ptrs_y,
  pipeline_x,
  pipeline_y,
  cor_cutoff,
  max_cor
)
compare_sample_subsets(
  est_ptrs_x,
  est_ptrs_y,
  pipeline_x,
  pipeline_y,
  cor_cutoff,
  max_cor
)

Arguments

`est_ptrs_x`	PTR estimates from sample subset x
`est_ptrs_y`	PTR estimates from sample subset y
`pipeline_x`	pipeline for sample subset x
`pipeline_y`	pipeline for sample subset y
`cor_cutoff`	the correlation cutoff
`max_cor`	the max correlation

Value

a named list including the est_ptr dataframe and a max_cor value

sample: sample
est_ptr: PTR estimate
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage

A function for data frame integration

Description

A function for data frame integration

Usage

consist_transfer(x, y, i)
consist_transfer(x, y, i)

Arguments

`x`	first data frame
`y`	second data frame
`i`	'sample' column

Value

a data frame with the other column as mean or max of that in the original two

A function to return the first dimension of PCA on an input matrix

Description

A function to return the first dimension of PCA on an input matrix

Usage

contig_pca(X)
contig_pca(X)

Arguments

`X`	a matrix to undergo PCA

Value

first dimension of the PCA results

Contig Cluster 1

Description

Data associated with DEMIC paper (on SourceForge)

Usage

ContigCluster1
ContigCluster1

Format

`ContigCluster1`

A data frame with 120,897 rows and 5 columns:

log_cov: Log Coverage for Sliding Windows over Contigs
GC_content: GC Content for Sliding Windows over Contigs
sample: Sample Name
contig: Contig Name
length: Length of Contig

Source

https://sourceforge.net/projects/demic/files/

Contig Cluster 2

Description

Data associated with DEMIC paper (on SourceForge)

Usage

ContigCluster2
ContigCluster2

Format

`ContigCluster2`

A data frame with 66,735 rows and 5 columns:

log_cov: Log Coverage for Sliding Windows over Contigs
GC_content: GC Content for Sliding Windows over Contigs
sample: Sample Name
contig: Contig Name
length: Length of Contig

Source

https://sourceforge.net/projects/demic/files/

Determine the majority orientation of the input PTR estimates correlations

Description

Determine the majority orientation of the input PTR estimates correlations

Usage

cor_diff(Z)
cor_diff(Z)

Arguments

`Z`	a vector of values

Value

a minor subset, where each value has the same orientation

A function for data frame transfer

Description

A function for data frame transfer

Usage

df_transfer(x, y)
df_transfer(x, y)

Arguments

`x`	first data frame with six columns
`y`	second data frame with six columns

Value

a data frame with the same six columns but integrated info

Estimate PTRs using all input data as well as using subsets of contigs and samples

Description

Estimate PTRs using all input data as well as using subsets of contigs and samples

Usage

est_ptr(X)
est_ptr(X)

Arguments

`X`	dataframe with coverage matrix (column names: "log_cov", "GC_content", "sample", "contig", "length")

Value

named list with results from all three methods all_ptr dataframe with the estimated PTRs on success, null otherwise

est_ptr: estimated PTR values
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage

contigs_ptr dataframe with the estimated PTRs on success, null otherwise

est_ptr: estimated PTR values
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage

samples_ptr dataframe with the estimated PTRs on success, null otherwise

est_ptr: estimated PTR values
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage

Examples

est_ptrs_001 <- est_ptr(max_bin_003)
est_ptrs_001

est_ptrs_001 <- est_ptr(max_bin_003)
est_ptrs_001

Tries up to max_attempts times to compare each permutation of removing random subsets of contigs/samples from X, and returns the PTR estimate if a valid one comes back from the comparisons

Description

Requires a minimum of 2 * num_subsets contigs/samples

Usage

est_ptr_on(X, subset_on, max_attempts = 10, num_subsets = 3, cor_cutoff = 0.98)
est_ptr_on(X, subset_on, max_attempts = 10, num_subsets = 3, cor_cutoff = 0.98)

Arguments

`X`	cov3 dataframe
`subset_on`	either "contig" or "sample"
`max_attempts`	max number of attempts to find a valid ptr estimate
`num_subsets`	number of subsets to split contigs/samples into
`cor_cutoff`	minimum correlation coefficient to accept PTR estimate

Value

est_ptrs dataframe on success, null otherwise

est_ptr: estimated PTR values
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage

Examples

est_ptrs_001_on_contigs <- est_ptr_on(max_bin_003, "contig", num_subsets = 5)
est_ptrs_001_on_contigs

est_ptrs_001_on_samples <- est_ptr_on(max_bin_003, "sample")
is.null(est_ptrs_001_on_samples)

est_ptrs_001_on_contigs <- est_ptr_on(max_bin_003, "contig", num_subsets = 5)
est_ptrs_001_on_contigs

est_ptrs_001_on_samples <- est_ptr_on(max_bin_003, "sample")
is.null(est_ptrs_001_on_samples)

Estimates PTRs based on the whole input dataset

Description

Estimates PTRs based on the whole input dataset

Usage

est_ptr_on_all(X)
est_ptr_on_all(X)

Arguments

`X`	cov3 dataframe

Value

est_ptrs dataframe on success, null otherwise

est_ptr: estimated PTR values
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage

Examples

est_ptrs_001 <- est_ptr_on_all(max_bin_003)
est_ptrs_001

est_ptrs_001 <- est_ptr_on_all(max_bin_003)
est_ptrs_001

Get PTR estimates for output of the core pipeline on a subset of data

Description

Get PTR estimates for output of the core pipeline on a subset of data

Usage

est_ptrs_subset(p)
est_ptrs_subset(p)

Arguments

`p`	is the pipeline named list

Value

a dataframe

sample: sample
est_ptr: PTR estimate
coefficient: coefficient of linear regression
pValue: p-value of linear regression
cor: correlation coefficient
correctY: corrected coverage

A function for sample filtration Input requirements: 1. have values in more than half of the contigs 2. average log2(cov) > 0 in all these contigs

Description

A function for sample filtration Input requirements: 1. have values in more than half of the contigs 2. average log2(cov) > 0 in all these contigs

Usage

filter_sample(Z, avg_cutoff, cutoff_ratio)
filter_sample(Z, avg_cutoff, cutoff_ratio)

Arguments

`Z`	a matrix
`avg_cutoff`	threshold of average
`cutoff_ratio`	threshold of ratio

Value

the coefficient and p value of linear regression

Generate a variety of stats on PTR estimates for a given dataset

Description

Generate a variety of stats on PTR estimates for a given dataset

Usage

get_eptr_stats(X, iterations = 30)
get_eptr_stats(X, iterations = 30)

Arguments

`X`	cov3 dataframe
`iterations`	number of iterations to run

Value

named list of stats on PTR estimates

all_sd: standard deviation of PTR estimates from all method
all_mean: mean of PTR estimates from all method
contigs_sd: standard deviation of PTR estimates from contigs method
contigs_mean: mean of PTR estimates from contigs method
samples_sd: standard deviation of PTR estimates from samples method
samples_mean: mean of PTR estimates from samples method

Examples

stats <- get_eptr_stats(max_bin_001[max_bin_001$sample %in% c('Akk0_001', 'Akk1_001'), ], 2)
stats

stats <- get_eptr_stats(max_bin_001[max_bin_001$sample %in% c('Akk0_001', 'Akk1_001'), ], 2)
stats

A function for iteration of pipeline until convergence

Description

A function for iteration of pipeline until convergence

Usage

iterate_pipelines(Z)
iterate_pipelines(Z)

Arguments

`Z`	a matrix of coverages

Value

a named list

samples: vector of final filtered samples
correct_ys: matrix of sample, contig and corrected coverages
pc1: matrix of contig and PC1 values
pc1_range: vector of PC1 range
samples_y: samples filtered for reliable coverage

A convenient function for KS test of uniform distribution

Description

A convenient function for KS test of uniform distribution

Usage

ks(x)
ks(x)

Arguments

`x`	a vector without NA

Value

the p value of KS test

A convenient function for ordinary linear regression on two vectors

Description

A convenient function for ordinary linear regression on two vectors

Usage

lm_column(x, y)
lm_column(x, y)

Arguments

`x`	first vector
`y`	second vector

Value

the coefficient and p value of linear regression

Run mixed linear model with random effect using lme4

Description

Run mixed linear model with random effect using lme4

Usage

lme4_model(X)
lme4_model(X)

Arguments

`X`	input data frame

Value

a dataframe

MaxBin2 Cluster 001

Description

Generated by PyCov3 on simulated test data

Usage

max_bin_001
max_bin_001

Format

`max_bin_001`

A data frame with 79,740 rows and 5 columns:

log_cov: Log Coverage for Sliding Windows over Contigs
GC_content: GC Content for Sliding Windows over Contigs
sample: Sample Name
contig: Contig Name
length: Length of Contig

Source

https://sourceforge.net/projects/demic/files/

MaxBin2 Cluster 002

Description

Generated by PyCov3 on simulated test data

Usage

max_bin_002
max_bin_002

Format

`max_bin_002`

A data frame with 148,638 rows and 5 columns:

log_cov: Log Coverage for Sliding Windows over Contigs
GC_content: GC Content for Sliding Windows over Contigs
sample: Sample Name
contig: Contig Name
length: Length of Contig

Source

https://sourceforge.net/projects/demic/files/

MaxBin2 Cluster 003

Description

Generated by PyCov3 on simulated test data

Usage

max_bin_003
max_bin_003

Format

`max_bin_003`

A data frame with 124,578 rows and 5 columns:

log_cov: Log Coverage for Sliding Windows over Contigs
GC_content: GC Content for Sliding Windows over Contigs
sample: Sample Name
contig: Contig Name
length: Length of Contig

Source

https://sourceforge.net/projects/demic/files/

A function representing the pipeline of four steps including GC bias correction, sample filtration, PCA and contig filtration

Description

A function representing the pipeline of four steps including GC bias correction, sample filtration, PCA and contig filtration

Usage

pipeline(Y, i)
pipeline(Y, i)

Arguments

`Y`	a matrix of coverages
`i`	cutoff of filtering samples changes according to parameter i; i=1, cutoffRatio is 0.5; i=2, cutoffRatio is 1 as contig is clean

Value

a named list

samples: final list of filtered samples
correct_ys: dataframe with correct Y values per contig/sample
pc1: PC1 results of PCA per contig
pc1_range: range of PC1
samples_y: samples filtered for reliable coverage

A function for reshape to facilitate PCA, removing all contigs with missing values for designated samples

Description

A function for reshape to facilitate PCA, removing all contigs with missing values for designated samples

Usage

reshape_filtered(samples_filtered, Z)
reshape_filtered(samples_filtered, Z)

Arguments

`samples_filtered`	a vector of samples
`Z`	a matrix of coverage

Value

a reshaped matrix of coverage

A function to remove outlier contigs using KS test

Description

A function to remove outlier contigs using KS test

Usage

select_by_ks_test(sort_values)
select_by_ks_test(sort_values)

Arguments

sort_values

a vector of sorted values

Value

a vector with all values following a uniform distribution

A function to test whether the result is reasonable

Description

A function to test whether the result is reasonable

Usage

test_reasonable(a, b)
test_reasonable(a, b)

Arguments

`a`	first vector of values
`b`	second vector of values

Value

the test result

Verify that the input dataframe/matrix is valid

Description

Verify that the input dataframe/matrix is valid

Usage

verify_input(X)
verify_input(X)

Arguments

`X`	dataframe/matrix with cov3 information

Package 'demic'

Help Index

Compares contig subset x against contig subset y

Description

Usage

Arguments

Value

Compares sample subset x against sample subset y

Description

Usage

Arguments

Value

A function for data frame integration

Description

Usage

Arguments

Value

A function to return the first dimension of PCA on an input matrix

Description

Usage

Arguments

Value

Contig Cluster 1

Description

Usage

Format

ContigCluster1

Source

Contig Cluster 2

Description

Usage

Format

ContigCluster2

Source

Determine the majority orientation of the input PTR estimates correlations

Description

Usage

Arguments

Value

A function for data frame transfer

Description

Usage

Arguments

Value

Estimate PTRs using all input data as well as using subsets of contigs and samples

Description

Usage

Arguments

Value

Examples

Tries up to max_attempts times to compare each permutation of removing random subsets of contigs/samples from X, and returns the PTR estimate if a valid one comes back from the comparisons

Description

Usage

Arguments

Value

Examples

Estimates PTRs based on the whole input dataset

Description

Usage

Arguments

Value

Examples

Get PTR estimates for output of the core pipeline on a subset of data

Description

Usage

Arguments

Value

A function for sample filtration Input requirements: 1. have values in more than half of the contigs 2. average log2(cov) > 0 in all these contigs

Description

Usage

Arguments

Value

Generate a variety of stats on PTR estimates for a given dataset

Description

Usage

Arguments

Value

Examples

A function for iteration of pipeline until convergence

Description

`ContigCluster1`

`ContigCluster2`

`max_bin_001`

`max_bin_002`

`max_bin_003`