gsdecon
C++ port of the GSDecon algorithm
Loading...
Searching...
No Matches
Classes | Functions
gsdecon Namespace Reference

Gene set scoring with gsdecon. More...

Classes

struct  Buffers
 Buffers for the results of compute() and compute_blocked(). More...
 
struct  Options
 Options for compute() and compute_blocked(). More...
 
struct  Results
 Results of compute() and compute_blocked(). More...
 

Functions

template<typename Value_ , typename Index_ , typename Block_ , typename Float_ >
void compute_blocked (const tatami::Matrix< Value_, Index_ > &matrix, const Block_ *block, const Options &options, const Buffers< Float_ > &output)
 
template<typename Float_ = double, typename Value_ , typename Index_ , typename Block_ >
Results< Float_ > compute_blocked (const tatami::Matrix< Value_, Index_ > &matrix, const Block_ *block, const Options &options)
 
template<typename Value_ , typename Index_ , typename Float_ >
void compute (const tatami::Matrix< Value_, Index_ > &matrix, const Options &options, const Buffers< Float_ > &output)
 
template<typename Float_ = double, typename Value_ , typename Index_ >
Results< Float_ > compute (const tatami::Matrix< Value_, Index_ > &matrix, const Options &options)
 

Detailed Description

Gene set scoring with gsdecon.

Function Documentation

◆ compute() [1/2]

template<typename Float_ = double, typename Value_ , typename Index_ >
Results< Float_ > gsdecon::compute ( const tatami::Matrix< Value_, Index_ > &  matrix,
const Options options 
)

Overload of compute() that allocates memory for the results.

Template Parameters
Float_Floating-point type for the output.
Value_Floating-point type for the data.
Index_Integer type for the indices.
Parameters
[in]matrixAn input tatami matrix. Columns should contain cells while rows should contain genes. Entries are typically be log-expression values.
optionsFurther options.
Returns
Results of the gene set score calculation.

◆ compute() [2/2]

template<typename Value_ , typename Index_ , typename Float_ >
void gsdecon::compute ( const tatami::Matrix< Value_, Index_ > &  matrix,
const Options options,
const Buffers< Float_ > &  output 
)

Given an input matrix containing log-expression values for genes in a set of interest, per-cell scores are defined as the column means of the low-rank approximation of that matrix. The assumption here is that the primary activity of the gene set can be quantified by the largest component of variance amongst its genes. (If this was not the case, one could argue that this gene set is not well-suited to capture the biology attributed to it.) In effect, the rotation vector defines weights for all genes in the set, focusing on genes that contribute to the primary activity.

By default, we use a rank-1 approximation (see Options::rank). The reported weight for each gene (in Results::weights) is simply the absolute value of the associated rotation vector from the PCA. Increasing the rank of the approximation may capture more biological signal but also increases noise in the per-cell scores. If higher ranks are used, each gene's weight is instead defined as the root mean square of that gene's values across all rotation vectors.

Template Parameters
Value_Floating-point type for the data.
Index_Integer type for the indices.
Float_Floating-point type for the output.
Parameters
[in]matrixAn input tatami matrix. Columns should contain cells while rows should contain genes in the set of interest. Entries are typically be log-expression values.
optionsFurther options.
[out]outputCollection of buffers in which to store the scores and weights.

◆ compute_blocked() [1/2]

template<typename Float_ = double, typename Value_ , typename Index_ , typename Block_ >
Results< Float_ > gsdecon::compute_blocked ( const tatami::Matrix< Value_, Index_ > &  matrix,
const Block_ *  block,
const Options options 
)

Overload of compute_blocked() that allocates memory for the results.

Template Parameters
Float_Floating-point type for the output.
Value_Floating-point type for the data.
Index_Integer type for the indices.
Block_Integer type for the block assignments.
Parameters
[in]matrixAn input tatami matrix. Columns should contain cells while rows should contain genes. Entries are typically be log-expression values.
[in]blockPointer to an array of length equal to the number of columns in matrix. This should contain the blocking factor as 0-based block assignments (i.e., for \(N\) blocks, block identities should run from 0 to \(N-1\) with at least one entry for each block.)
optionsFurther options.
Returns
Results of the gene set score calculation.

◆ compute_blocked() [2/2]

template<typename Value_ , typename Index_ , typename Block_ , typename Float_ >
void gsdecon::compute_blocked ( const tatami::Matrix< Value_, Index_ > &  matrix,
const Block_ *  block,
const Options options,
const Buffers< Float_ > &  output 
)

Extension of the algorithm described in compute() to datasets containing multiple blocks (e.g., batches, samples).

In the presence of strong block effects, naively running compute() would yield a first PC that is driven by uninteresting inter-block differences. Here, We werform the PCA on the residuals after centering each block, ensuring that the first PC focuses on the interesting variation within each block. Blocks can also be weighted so that they contribute equally to the rotation vector, regardless of the number of cells.

Note that the purpose of the blocking is to ensure that inter-block differences do not drive the first few PCs, not to remove the block effects themselves. Using residuals for batch correction requires strong assumptions such as identical block composition and consistent shifts across subpopulations; we do not attempt make that claim. The caller is instead responsible for ensuring that the block structure is still considered in any further analysis of the computed scores.

Template Parameters
Value_Floating-point type for the data.
Index_Integer type for the indices.
Block_Integer type for the block assignments.
Float_Floating-point type for the output.
Parameters
[in]matrixAn input tatami matrix. Columns should contain cells while rows should contain genes. Entries are typically be log-expression values.
[in]blockPointer to an array of length equal to the number of columns in matrix. This should contain the blocking factor as 0-based block assignments (i.e., for \(N\) blocks, block identities should run from 0 to \(N-1\) with at least one entry for each block.)
optionsFurther options.
[out]outputCollection of buffers in which to store the scores and weights.