scran_variances
Model per-gene variance in expression
Loading...
Searching...
No Matches
scran_variances Namespace Reference

Variance modelling for single-cell expression data. More...

Classes

struct  ChooseHighlyVariableGenesOptions
 Options for choose_highly_variable_genes(). More...
 
struct  FitVarianceTrendOptions
 Options for fit_variance_trend(). More...
 
struct  FitVarianceTrendResults
 Results of fit_variance_trend(). More...
 
struct  FitVarianceTrendWorkspace
 Workspace for fit_variance_trend(). More...
 
struct  ModelGeneVariancesBlockedBuffers
 Buffers for model_gene_variances_blocked(). More...
 
struct  ModelGeneVariancesBlockedResults
 Results of model_gene_variances_blocked(). More...
 
struct  ModelGeneVariancesBuffers
 Buffers for model_gene_variances() and friends. More...
 
struct  ModelGeneVariancesOptions
 Options for model_gene_variances() and friends. More...
 
struct  ModelGeneVariancesResults
 Results of model_gene_variances(). More...
 

Enumerations

enum class  BlockAveragePolicy : unsigned char { MEAN , QUANTILE , NONE }
 

Functions

template<typename Stat_ , typename Bool_ >
void choose_highly_variable_genes (const std::size_t n, const Stat_ *const statistic, Bool_ *const output, const ChooseHighlyVariableGenesOptions &options)
 
template<typename Bool_ = char, typename Stat_ >
std::vector< Bool_ > choose_highly_variable_genes (const std::size_t n, const Stat_ *const statistic, const ChooseHighlyVariableGenesOptions &options)
 
template<typename Index_ , typename Stat_ >
std::vector< Index_ > choose_highly_variable_genes_index (const Index_ n, const Stat_ *const statistic, const ChooseHighlyVariableGenesOptions &options)
 
template<typename Float_ >
void fit_variance_trend (const std::size_t n, const Float_ *const mean, const Float_ *const variance, Float_ *const fitted, Float_ *const residual, FitVarianceTrendWorkspace< Float_ > &workspace, const FitVarianceTrendOptions &options)
 
template<typename Float_ >
FitVarianceTrendResults< Float_ > fit_variance_trend (const std::size_t n, const Float_ *const mean, const Float_ *const variance, const FitVarianceTrendOptions &options)
 
template<typename Value_ , typename Index_ , typename Stat_ >
void model_gene_variances (const tatami::Matrix< Value_, Index_ > &mat, const ModelGeneVariancesBuffers< Stat_ > buffers, const ModelGeneVariancesOptions &options)
 
template<typename Stat_ = double, typename Value_ , typename Index_ >
ModelGeneVariancesResults< Stat_ > model_gene_variances (const tatami::Matrix< Value_, Index_ > &mat, const ModelGeneVariancesOptions &options)
 
template<typename Value_ , typename Index_ , typename Block_ , typename Stat_ >
void model_gene_variances_blocked (const tatami::Matrix< Value_, Index_ > &mat, const Block_ *const block, const std::size_t num_blocks, const ModelGeneVariancesBlockedBuffers< Stat_ > &buffers, const ModelGeneVariancesOptions &options)
 
template<typename Stat_ = double, typename Value_ , typename Index_ , typename Block_ >
ModelGeneVariancesBlockedResults< Stat_ > model_gene_variances_blocked (const tatami::Matrix< Value_, Index_ > &mat, const Block_ *const block, const std::size_t num_blocks, const ModelGeneVariancesOptions &options)
 

Detailed Description

Variance modelling for single-cell expression data.

Enumeration Type Documentation

◆ BlockAveragePolicy

enum class scran_variances::BlockAveragePolicy : unsigned char
strong

Policy for averaging statistics across blocks.

  • MEAN: weighted mean, where weights are computed using scran_blocks::compute_weights().
  • QUANTILE: quantile, defaulting to 50%, a.k.a., the median.
  • NONE: do not report any inter-block average.

Function Documentation

◆ choose_highly_variable_genes() [1/2]

template<typename Stat_ , typename Bool_ >
void scran_variances::choose_highly_variable_genes ( const std::size_t n,
const Stat_ *const statistic,
Bool_ *const output,
const ChooseHighlyVariableGenesOptions & options )
Template Parameters
Stat_Type of the variance statistic.
Bool_Type to be used as a boolean.
Parameters
nNumber of genes.
[in]statisticPointer to an array of length n containing the per-gene variance statistics. This is typically the residuals from model_gene_variances().
[out]outputPointer to an array of length n. On output, the i-th entry is true if the i-th gene is to be retained and false otherwise.
optionsFurther options.

◆ choose_highly_variable_genes() [2/2]

template<typename Bool_ = char, typename Stat_ >
std::vector< Bool_ > scran_variances::choose_highly_variable_genes ( const std::size_t n,
const Stat_ *const statistic,
const ChooseHighlyVariableGenesOptions & options )
Template Parameters
Bool_Type to be used as a boolean.
Stat_Type of the variance statistic.
Parameters
nNumber of genes.
[in]statisticPointer to an array of length n containing the per-gene variance statistics. This is typically the residuals from model_gene_variances().
optionsFurther options.
Returns
A vector of booleans of length n, indicating whether each gene is to be retained.

◆ choose_highly_variable_genes_index()

template<typename Index_ , typename Stat_ >
std::vector< Index_ > scran_variances::choose_highly_variable_genes_index ( const Index_ n,
const Stat_ *const statistic,
const ChooseHighlyVariableGenesOptions & options )
Template Parameters
Index_Type of the indices.
Stat_Type of the variance statistic.
Parameters
nNumber of genes.
[in]statisticPointer to an array of length n containing the per-gene variance statistics. This is typically the residuals from model_gene_variances().
optionsFurther options.
Returns
Vector of sorted and unique indices for the chosen genes. All indices are guaranteed to be non-negative and less than n.

◆ fit_variance_trend() [1/2]

template<typename Float_ >
void scran_variances::fit_variance_trend ( const std::size_t n,
const Float_ *const mean,
const Float_ *const variance,
Float_ *const fitted,
Float_ *const residual,
FitVarianceTrendWorkspace< Float_ > & workspace,
const FitVarianceTrendOptions & options )

Fit a trend to the per-feature variances against the means, both of which are typically computed from log-normalized expression data. This involves several steps:

  1. Filter out low-abundance genes, to ensure the span of the smoother is not skewed by many low-abundance genes. This step is omitted if FitVarianceTrendOptions::mean_filter = false.
  2. Take the quarter-root of the variances, to squeeze the trend towards 1. This makes the trend more "linear" to improve the performance of the LOWESS smoother; it also reduces the chance of obtaining negative fitted values. This step is omitted if FitVarianceTrendOptions::transform = false.
  3. Apply the LOWESS smoother to the quarter-root variances. This is done using the implementation in the WeightedLowess library.
  4. Reverse the quarter-root transformation to obtain the fitted values for all non-low-abundance genes. This step is omitted if FitVarianceTrendOptions::transform = false.
  5. Extrapolate linearly from the left-most fitted value to the origin to obtain fitted values for the previously filtered genes. This is empirically justified by the observation that mean-variance trends of log-expression data are linear at very low abundances. This step is omitted if FitVarianceTrendOptions::mean_filter = false.
Template Parameters
Float_Floating-point type of the statistics.
Parameters
nNumber of features.
[in]meanPointer to an array of length n, containing the means for all features.
[in]variancePointer to an array of length n, containing the variances for all features.
[out]fittedPointer to an array of length n, to store the fitted values.
[out]residualPointer to an array of length n, to store the residuals.
workspaceCollection of temporary data structures. This can be re-used across multiple fit_variance_trend() calls.
optionsFurther options.

◆ fit_variance_trend() [2/2]

template<typename Float_ >
FitVarianceTrendResults< Float_ > scran_variances::fit_variance_trend ( const std::size_t n,
const Float_ *const mean,
const Float_ *const variance,
const FitVarianceTrendOptions & options )

Overload of fit_variance_trend() that allocates the output vectors.

Template Parameters
Float_Floating-point type of the statistics.
Parameters
nNumber of features.
[in]meanPointer to an array of length n, containing the means for all features.
[in]variancePointer to an array of length n, containing the variances for all features.
optionsFurther options.
Returns
Result of the trend fit, containing the fitted values and residuals for each gene.

◆ model_gene_variances() [1/2]

template<typename Value_ , typename Index_ , typename Stat_ >
void scran_variances::model_gene_variances ( const tatami::Matrix< Value_, Index_ > & mat,
const ModelGeneVariancesBuffers< Stat_ > buffers,
const ModelGeneVariancesOptions & options )

Model the per-gene variances as a function of the mean in single-cell expression data. We compute the mean and variance for each gene and fit a trend to the variances with respect to the means using fit_variance_trend(). We assume that most genes at any given abundance are not highly variable, such that the fitted value of the trend is interpreted as the "uninteresting" variance - this is mostly attributed to technical variation like sequencing noise, but can also represent constitutive biological noise like transcriptional bursting. Under this assumption, the residual can be treated as a measure of biologically interesting variation. Genes with large residuals can then be selected for downstream analyses, e.g., with choose_highly_variable_genes().

Template Parameters
Value_Data type of the matrix.
Index_Integer type of the row/column indices.
Stat_Floating-point type of the output statistics.
Parameters
matMatrix of expression values, typically after normalization and log-transformation. Rows should be genes while columns should be cells.
buffersCollection of buffers in which to store the computed statistics.
optionsFurther options.

◆ model_gene_variances() [2/2]

template<typename Stat_ = double, typename Value_ , typename Index_ >
ModelGeneVariancesResults< Stat_ > scran_variances::model_gene_variances ( const tatami::Matrix< Value_, Index_ > & mat,
const ModelGeneVariancesOptions & options )

Overload of model_gene_variances() that allocates space for the output statistics.

Template Parameters
Stat_Floating-point type of the output statistics.
Value_Data type of the matrix.
Index_Integer type of the row/column indices.
Parameters
matMatrix of expression values, typically after normalization and log-transformation. Rows should be genes while columns should be cells.
optionsFurther options.
Returns
Results of the variance modelling.

◆ model_gene_variances_blocked() [1/2]

template<typename Value_ , typename Index_ , typename Block_ , typename Stat_ >
void scran_variances::model_gene_variances_blocked ( const tatami::Matrix< Value_, Index_ > & mat,
const Block_ *const block,
const std::size_t num_blocks,
const ModelGeneVariancesBlockedBuffers< Stat_ > & buffers,
const ModelGeneVariancesOptions & options )

Model the per-feature variances from a log-expression matrix with blocking. The mean and variance of each gene is computed separately for all cells in each block, and a separate trend is fitted to each block to obtain residuals (see model_gene_variances()). This ensures that sample and batch effects do not confound the variance estimates.

We also compute the average of each statistic across blocks, using the policy described in ModelGeneVariancesOptions::average_policy. This is either a quantile (i.e., median, by default) or weighted mean of values for each gene. Weights are determined by ModelGeneVariancesOptions::block_weight_policy and are based on the size of each block. The average residual is particularly useful for feature selection with choose_highly_variable_genes().

Template Parameters
Value_Data type of the matrix.
Index_Integer type of the row/column indices.
Block_Integer type of the block IDs.
Stat_Floating-point type of the output statistics.
Parameters
matMatrix of expression values, typically after normalization and log-transformation. Rows should be genes while columns should be cells.
[in]blockPointer to an array of length equal to the number of cells. Each entry should be a 0-based block identifier in \([0, B)\) where \(B\) is the total number of blocks.
num_blocksTotal number of blocks, a.k.a., \(B\).
[out]buffersCollection of pointers of arrays in which to store the output statistics. The length of ModelGeneVariancesBlockedResults::per_block should be equal to the number of blocks.
optionsFurther options.

◆ model_gene_variances_blocked() [2/2]

template<typename Stat_ = double, typename Value_ , typename Index_ , typename Block_ >
ModelGeneVariancesBlockedResults< Stat_ > scran_variances::model_gene_variances_blocked ( const tatami::Matrix< Value_, Index_ > & mat,
const Block_ *const block,
const std::size_t num_blocks,
const ModelGeneVariancesOptions & options )

Overload of model_gene_variances_blocked() that allocates space for the output statistics.

Template Parameters
Stat_Floating-point type of the output statistics.
Value_Data type of the matrix.
Index_Integer type of the row/column indices.
Block_Integer type of the block IDs.
Parameters
matMatrix of expression values, typically after normalization and log-transformation. Rows should be genes while columns should be cells.
[in]blockPointer to an array of length equal to the number of cells, containing 0-based block identifiers.
num_blocksTotal number of blocks.
optionsFurther options.
Returns
Results of the variance modelling in each block. An average for each statistic is also computed if ModelGeneVariancesOptions::average_policy is not BlockAveragePolicy::NONE.