scran_variances
Model per-gene variance in expression
|
Variance modelling for single-cell expression data. More...
Classes | |
struct | ChooseHighlyVariableGenesOptions |
Options for choose_highly_variable_genes() . More... | |
struct | FitVarianceTrendOptions |
Options for fit_variance_trend() . More... | |
struct | FitVarianceTrendResults |
Results of fit_variance_trend() . More... | |
struct | FitVarianceTrendWorkspace |
Workspace for fit_variance_trend() . More... | |
struct | ModelGeneVariancesBlockedBuffers |
Buffers for model_gene_variances_blocked() . More... | |
struct | ModelGeneVariancesBlockedResults |
Results of model_gene_variances_blocked() . More... | |
struct | ModelGeneVariancesBuffers |
Buffers for model_gene_variances() and friends. More... | |
struct | ModelGeneVariancesOptions |
Options for model_gene_variances() and friends. More... | |
struct | ModelGeneVariancesResults |
Results of model_gene_variances() . More... | |
Variance modelling for single-cell expression data.
void scran_variances::choose_highly_variable_genes | ( | size_t | n, |
const Stat_ * | statistic, | ||
Bool_ * | output, | ||
const ChooseHighlyVariableGenesOptions & | options | ||
) |
Stat_ | Type of the variance statistic. |
Bool_ | Type to be used as a boolean. |
n | Number of genes. | |
[in] | statistic | Pointer to an array of length n containing the per-gene variance statistics. |
[out] | output | Pointer to an array of length n . On output, this is filled with true if the gene is to be retained and false otherwise. |
options | Further options. |
std::vector< Bool_ > scran_variances::choose_highly_variable_genes | ( | size_t | n, |
const Stat_ * | statistic, | ||
const ChooseHighlyVariableGenesOptions & | options | ||
) |
Stat_ | Type of the variance statistic. |
Bool_ | Type to be used as a boolean. |
n | Number of genes. | |
[in] | statistic | Pointer to an array of length n containing the per-gene variance statistics. |
options | Further options. |
n
, indicating whether each gene is to be retained. std::vector< Index_ > scran_variances::choose_highly_variable_genes_index | ( | Index_ | n, |
const Stat_ * | statistic, | ||
const ChooseHighlyVariableGenesOptions & | options | ||
) |
Index_ | Type of the indices. |
Stat_ | Type of the variance statistic. |
n | Number of genes. | |
[in] | statistic | Pointer to an array of length n containing the per-gene variance statistics. |
options | Further options. |
n
. void scran_variances::fit_variance_trend | ( | size_t | n, |
const Float_ * | mean, | ||
const Float_ * | variance, | ||
Float_ * | fitted, | ||
Float_ * | residuals, | ||
FitVarianceTrendWorkspace< Float_ > & | workspace, | ||
const FitVarianceTrendOptions & | options | ||
) |
We fit a trend to the per-feature variances against the means, both of which are computed from log-normalized expression data. We use a LOWESS smoother in several steps:
Float_ | Floating-point type for the statistics. |
n | Number of features. | |
[in] | mean | Pointer to an array of length n , containing the means for all features. |
[in] | variance | Pointer to an array of length n , containing the variances for all features. |
[out] | fitted | Pointer to an array of length n , to store the fitted values. |
[out] | residuals | Pointer to an array of length n , to store the residuals. |
workspace | Collection of temporary data structures. This can be re-used across multiple fit_variance_trend() calls. | |
options | Further options. |
FitVarianceTrendResults< Float_ > scran_variances::fit_variance_trend | ( | size_t | n, |
const Float_ * | mean, | ||
const Float_ * | variance, | ||
const FitVarianceTrendOptions & | options | ||
) |
Overload of fit_variance_trend()
that allocates the output vectors.
Float_ | Floating-point type for the statistics. |
n | Number of features. | |
[in] | mean | Pointer to an array of length n , containing the means for all features. |
[in] | variance | Pointer to an array of length n , containing the variances for all features. |
options | Further options. |
void scran_variances::model_gene_variances_blocked | ( | const tatami::Matrix< Value_, Index_ > & | mat, |
const Block_ * | block, | ||
const ModelGeneVariancesBlockedBuffers< Stat_ > & | buffers, | ||
const ModelGeneVariancesOptions & | options | ||
) |
Compute and model the per-feature variances from a log-expression matrix with blocking. The mean and variance of each gene is computed separately for all cells in each block, and a separate trend is fitted to each block to obtain residuals (see model_gene_variances()
). This ensures that sample and batch effects do not confound the variance estimates.
We also compute the average of each statistic across blocks, using the weighting strategy specified in ModelGeneVariancesOptions::block_weight_policy
. The average residual is particularly useful for feature selection with choose_highly_variable_genes()
.
Value_ | Data type of the matrix. |
Index_ | Integer type for the row/column indices. |
Block_ | Integer type to hold the block IDs. |
Stat_ | Floating-point type for the output statistics. |
mat | A tatami matrix containing log-expression values. Rows should be genes while columns should be cells. | |
[in] | block | Pointer to an array of length equal to the number of cells. Each entry should be a 0-based block identifier in \([0, B)\) where \(B\) is the total number of blocks. block can also be a nullptr , in which case all cells are assumed to belong to the same block. |
[out] | buffers | Collection of pointers of arrays in which to store the output statistics. The length of ModelGeneVariancesBlockedResults::per_block should be equal to the number of blocks. |
options | Further options. |
void scran_variances::model_gene_variances | ( | const tatami::Matrix< Value_, Index_ > & | mat, |
ModelGeneVariancesBuffers< Stat_ > | buffers, | ||
const ModelGeneVariancesOptions & | options | ||
) |
Here, we scan through a log-transformed normalized expression matrix and compute per-gene means and variances. We then fit a trend to the variances with respect to the means using fit_variance_trend()
. We assume that most genes at any given abundance are not highly variable, such that the fitted value of the trend is interpreted as the "uninteresting" variance - this is mostly attributed to technical variation like sequencing noise, but can also represent constitutive biological noise like transcriptional bursting. Under this assumption, the residual can be treated as a measure of biologically interesting variation, and can be used to identify relevant features for downstream analyses.
Value_ | Data type of the matrix. |
Index_ | Integer type for the row/column indices. |
Stat_ | Floating-point type for the output statistics. |
mat | A tatami matrix containing log-expression values. Rows should be genes while columns should be cells. |
buffers | Collection of buffers in which to store the computed statistics. |
options | Further options. |
ModelGeneVariancesResults< Stat_ > scran_variances::model_gene_variances | ( | const tatami::Matrix< Value_, Index_ > & | mat, |
const ModelGeneVariancesOptions & | options | ||
) |
Overload of model_gene_variances()
that allocates space for the output statistics.
Stat_ | Floating-point type for the output statistics. |
Value_ | Data type of the matrix. |
Index_ | Integer type for the row/column indices. |
mat | A tatami matrix containing log-expression values. Rows should be genes while columns should be cells. |
options | Further options. |
ModelGeneVariancesBlockedResults< Stat_ > scran_variances::model_gene_variances_blocked | ( | const tatami::Matrix< Value_, Index_ > & | mat, |
const Block_ * | block, | ||
const ModelGeneVariancesOptions & | options | ||
) |
Overload of model_gene_variances_blocked()
that allocates space for the output statistics.
Stat_ | Floating-point type for the output statistics. |
Value_ | Data type of the matrix. |
Index_ | Integer type for the row/column indices. |
Block_ | Integer type, to hold the block IDs. |
mat | A tatami matrix containing log-expression values. Rows should be genes while columns should be cells. | |
[in] | block | Pointer to an array of length equal to the number of cells, containing 0-based block identifiers. This may also be a nullptr in which case all cells are assumed to belong to the same block. |
options | Further options. |
ModelGeneVariancesOptions::compute_average = true
.