scran_variances
Model per-gene variance in expression
Loading...
Searching...
No Matches
Classes | Functions
scran_variances Namespace Reference

Variance modelling for single-cell expression data. More...

Classes

struct  ChooseHighlyVariableGenesOptions
 Options for choose_highly_variable_genes(). More...
 
struct  FitVarianceTrendOptions
 Options for fit_variance_trend(). More...
 
struct  FitVarianceTrendResults
 Results of fit_variance_trend(). More...
 
struct  FitVarianceTrendWorkspace
 Workspace for fit_variance_trend(). More...
 
struct  ModelGeneVariancesBlockedBuffers
 Buffers for model_gene_variances_blocked(). More...
 
struct  ModelGeneVariancesBlockedResults
 Results of model_gene_variances_blocked(). More...
 
struct  ModelGeneVariancesBuffers
 Buffers for model_gene_variances() and friends. More...
 
struct  ModelGeneVariancesOptions
 Options for model_gene_variances() and friends. More...
 
struct  ModelGeneVariancesResults
 Results of model_gene_variances(). More...
 

Functions

template<typename Stat_ , typename Bool_ >
void choose_highly_variable_genes (size_t n, const Stat_ *statistic, Bool_ *output, const ChooseHighlyVariableGenesOptions &options)
 
template<typename Bool_ = uint8_t, typename Stat_ >
std::vector< Bool_choose_highly_variable_genes (size_t n, const Stat_ *statistic, const ChooseHighlyVariableGenesOptions &options)
 
template<typename Index_ , typename Stat_ >
std::vector< Index_ > choose_highly_variable_genes_index (Index_ n, const Stat_ *statistic, const ChooseHighlyVariableGenesOptions &options)
 
template<typename Float_ >
void fit_variance_trend (size_t n, const Float_ *mean, const Float_ *variance, Float_ *fitted, Float_ *residuals, FitVarianceTrendWorkspace< Float_ > &workspace, const FitVarianceTrendOptions &options)
 
template<typename Float_ >
FitVarianceTrendResults< Float_fit_variance_trend (size_t n, const Float_ *mean, const Float_ *variance, const FitVarianceTrendOptions &options)
 
template<typename Value_ , typename Index_ , typename Block_ , typename Stat_ >
void model_gene_variances_blocked (const tatami::Matrix< Value_, Index_ > &mat, const Block_ *block, const ModelGeneVariancesBlockedBuffers< Stat_ > &buffers, const ModelGeneVariancesOptions &options)
 
template<typename Value_ , typename Index_ , typename Stat_ >
void model_gene_variances (const tatami::Matrix< Value_, Index_ > &mat, ModelGeneVariancesBuffers< Stat_ > buffers, const ModelGeneVariancesOptions &options)
 
template<typename Stat_ = double, typename Value_ , typename Index_ >
ModelGeneVariancesResults< Stat_model_gene_variances (const tatami::Matrix< Value_, Index_ > &mat, const ModelGeneVariancesOptions &options)
 
template<typename Stat_ = double, typename Value_ , typename Index_ , typename Block_ >
ModelGeneVariancesBlockedResults< Stat_model_gene_variances_blocked (const tatami::Matrix< Value_, Index_ > &mat, const Block_ *block, const ModelGeneVariancesOptions &options)
 

Detailed Description

Variance modelling for single-cell expression data.

Function Documentation

◆ choose_highly_variable_genes() [1/2]

template<typename Stat_ , typename Bool_ >
void scran_variances::choose_highly_variable_genes ( size_t  n,
const Stat_ statistic,
Bool_ output,
const ChooseHighlyVariableGenesOptions options 
)
Template Parameters
Stat_Type of the variance statistic.
Bool_Type to be used as a boolean.
Parameters
nNumber of genes.
[in]statisticPointer to an array of length n containing the per-gene variance statistics.
[out]outputPointer to an array of length n. On output, this is filled with true if the gene is to be retained and false otherwise.
optionsFurther options.

◆ choose_highly_variable_genes() [2/2]

template<typename Bool_ = uint8_t, typename Stat_ >
std::vector< Bool_ > scran_variances::choose_highly_variable_genes ( size_t  n,
const Stat_ statistic,
const ChooseHighlyVariableGenesOptions options 
)
Template Parameters
Stat_Type of the variance statistic.
Bool_Type to be used as a boolean.
Parameters
nNumber of genes.
[in]statisticPointer to an array of length n containing the per-gene variance statistics.
optionsFurther options.
Returns
A vector of booleans of length n, indicating whether each gene is to be retained.

◆ choose_highly_variable_genes_index()

template<typename Index_ , typename Stat_ >
std::vector< Index_ > scran_variances::choose_highly_variable_genes_index ( Index_  n,
const Stat_ *  statistic,
const ChooseHighlyVariableGenesOptions options 
)
Template Parameters
Index_Type of the indices.
Stat_Type of the variance statistic.
Parameters
nNumber of genes.
[in]statisticPointer to an array of length n containing the per-gene variance statistics.
optionsFurther options.
Returns
Vector of sorted and unique indices for the chosen genes. All indices are guaranteed to be non-negative and less than n.

◆ fit_variance_trend() [1/2]

template<typename Float_ >
void scran_variances::fit_variance_trend ( size_t  n,
const Float_ mean,
const Float_ variance,
Float_ fitted,
Float_ residuals,
FitVarianceTrendWorkspace< Float_ > &  workspace,
const FitVarianceTrendOptions options 
)

We fit a trend to the per-feature variances against the means, both of which are computed from log-normalized expression data. We use a LOWESS smoother in several steps:

  1. Filter out low-abundance genes, to ensure the span of the smoother is not skewed by many low-abundance genes.
  2. Take the quarter-root of the variances, to squeeze the trend towards 1. This makes the trend more "linear" to improve the performance of the LOWESS smoother; it also reduces the chance of obtaining negative fitted values.
  3. Apply the LOWESS smoother to the quarter-root variances. This is done using the implementation in the WeightedLowess library.
  4. Reverse the quarter-root transformation to obtain the fitted values for all non-low-abundance genes.
  5. Extrapolate linearly from the left-most fitted value to the origin to obtain fitted values for the previously filtered genes. This is empirically justified by the observation that mean-variance trends of log-expression data are linear at very low abundances.
Template Parameters
Float_Floating-point type for the statistics.
Parameters
nNumber of features.
[in]meanPointer to an array of length n, containing the means for all features.
[in]variancePointer to an array of length n, containing the variances for all features.
[out]fittedPointer to an array of length n, to store the fitted values.
[out]residualsPointer to an array of length n, to store the residuals.
workspaceCollection of temporary data structures. This can be re-used across multiple fit_variance_trend() calls.
optionsFurther options.

◆ fit_variance_trend() [2/2]

template<typename Float_ >
FitVarianceTrendResults< Float_ > scran_variances::fit_variance_trend ( size_t  n,
const Float_ mean,
const Float_ variance,
const FitVarianceTrendOptions options 
)

Overload of fit_variance_trend() that allocates the output vectors.

Template Parameters
Float_Floating-point type for the statistics.
Parameters
nNumber of features.
[in]meanPointer to an array of length n, containing the means for all features.
[in]variancePointer to an array of length n, containing the variances for all features.
optionsFurther options.
Returns
Result of the trend fit, containing the fitted values and residuals for each gene.

◆ model_gene_variances_blocked() [1/2]

void scran_variances::model_gene_variances_blocked ( const tatami::Matrix< Value_, Index_ > &  mat,
const Block_ block,
const ModelGeneVariancesBlockedBuffers< Stat_ > &  buffers,
const ModelGeneVariancesOptions options 
)

Compute and model the per-feature variances from a log-expression matrix with blocking. The mean and variance of each gene is computed separately for all cells in each block, and a separate trend is fitted to each block to obtain residuals (see model_gene_variances()). This ensures that sample and batch effects do not confound the variance estimates.

We also compute the average of each statistic across blocks, using the weighting strategy specified in ModelGeneVariancesOptions::block_weight_policy. The average residual is particularly useful for feature selection with choose_highly_variable_genes().

Template Parameters
Value_Data type of the matrix.
Index_Integer type for the row/column indices.
Block_Integer type to hold the block IDs.
Stat_Floating-point type for the output statistics.
Parameters
matA tatami matrix containing log-expression values. Rows should be genes while columns should be cells.
[in]blockPointer to an array of length equal to the number of cells. Each entry should be a 0-based block identifier in \([0, B)\) where \(B\) is the total number of blocks. block can also be a nullptr, in which case all cells are assumed to belong to the same block.
[out]buffersCollection of pointers of arrays in which to store the output statistics. The length of ModelGeneVariancesBlockedResults::per_block should be equal to the number of blocks.
optionsFurther options.

◆ model_gene_variances() [1/2]

void scran_variances::model_gene_variances ( const tatami::Matrix< Value_, Index_ > &  mat,
ModelGeneVariancesBuffers< Stat_ buffers,
const ModelGeneVariancesOptions options 
)

Here, we scan through a log-transformed normalized expression matrix and compute per-gene means and variances. We then fit a trend to the variances with respect to the means using fit_variance_trend(). We assume that most genes at any given abundance are not highly variable, such that the fitted value of the trend is interpreted as the "uninteresting" variance - this is mostly attributed to technical variation like sequencing noise, but can also represent constitutive biological noise like transcriptional bursting. Under this assumption, the residual can be treated as a measure of biologically interesting variation, and can be used to identify relevant features for downstream analyses.

Template Parameters
Value_Data type of the matrix.
Index_Integer type for the row/column indices.
Stat_Floating-point type for the output statistics.
Parameters
matA tatami matrix containing log-expression values. Rows should be genes while columns should be cells.
buffersCollection of buffers in which to store the computed statistics.
optionsFurther options.

◆ model_gene_variances() [2/2]

template<typename Stat_ = double, typename Value_ , typename Index_ >
ModelGeneVariancesResults< Stat_ > scran_variances::model_gene_variances ( const tatami::Matrix< Value_, Index_ > &  mat,
const ModelGeneVariancesOptions options 
)

Overload of model_gene_variances() that allocates space for the output statistics.

Template Parameters
Stat_Floating-point type for the output statistics.
Value_Data type of the matrix.
Index_Integer type for the row/column indices.
Parameters
matA tatami matrix containing log-expression values. Rows should be genes while columns should be cells.
optionsFurther options.
Returns
Results of the variance modelling.

◆ model_gene_variances_blocked() [2/2]

ModelGeneVariancesBlockedResults< Stat_ > scran_variances::model_gene_variances_blocked ( const tatami::Matrix< Value_, Index_ > &  mat,
const Block_ block,
const ModelGeneVariancesOptions options 
)

Overload of model_gene_variances_blocked() that allocates space for the output statistics.

Template Parameters
Stat_Floating-point type for the output statistics.
Value_Data type of the matrix.
Index_Integer type for the row/column indices.
Block_Integer type, to hold the block IDs.
Parameters
matA tatami matrix containing log-expression values. Rows should be genes while columns should be cells.
[in]blockPointer to an array of length equal to the number of cells, containing 0-based block identifiers. This may also be a nullptr in which case all cells are assumed to belong to the same block.
optionsFurther options.
Returns
Results of the variance modelling in each block. An average for each statistic is also computed if ModelGeneVariancesOptions::compute_average = true.