gsdecon
C++ port of the GSDecon algorithm
Loading...
Searching...
No Matches
gsdecon Namespace Reference

Gene set scoring with gsdecon. More...

Classes

struct  Buffers
 Buffers for the results of compute() and compute_blocked(). More...
 
struct  Options
 Options for compute() and compute_blocked(). More...
 
struct  Results
 Results of compute() and compute_blocked(). More...
 

Functions

template<typename Value_ , typename Index_ , typename Block_ , typename Float_ >
void compute_blocked (const tatami::Matrix< Value_, Index_ > &matrix, const Block_ *const block, const Options &options, const Buffers< Float_ > &output)
 
template<typename Float_ = double, typename Value_ , typename Index_ , typename Block_ >
Results< Float_ > compute_blocked (const tatami::Matrix< Value_, Index_ > &matrix, const Block_ *const block, const Options &options)
 
template<typename Value_ , typename Index_ , typename Float_ >
void compute (const tatami::Matrix< Value_, Index_ > &matrix, const Options &options, const Buffers< Float_ > &output)
 
template<typename Float_ = double, typename Value_ , typename Index_ >
Results< Float_ > compute (const tatami::Matrix< Value_, Index_ > &matrix, const Options &options)
 

Detailed Description

Gene set scoring with gsdecon.

Function Documentation

◆ compute() [1/2]

template<typename Float_ = double, typename Value_ , typename Index_ >
Results< Float_ > gsdecon::compute ( const tatami::Matrix< Value_, Index_ > & matrix,
const Options & options )

Overload of compute() that allocates memory for the results.

Template Parameters
Float_Floating-point type of the output.
Value_Floating-point type of the data.
Index_Integer type of the indices.
Parameters
[in]matrixA matrix where columns correspond to cells and rows correspond to genes. Entries are typically log-expression values.
optionsFurther options.
Returns
Results of the gene set score calculation.

◆ compute() [2/2]

template<typename Value_ , typename Index_ , typename Float_ >
void gsdecon::compute ( const tatami::Matrix< Value_, Index_ > & matrix,
const Options & options,
const Buffers< Float_ > & output )

Given an input matrix containing log-expression values for genes in a set of interest, per-cell scores are defined as the column means of the low-rank approximation of that matrix. The assumption here is that the primary activity of the gene set can be quantified by the largest component of variance amongst its genes. (If this was not the case, one could argue that this gene set is not well-suited to capture the biology attributed to it.) In effect, the rotation vector defines weights for all genes in the set, focusing on genes that contribute to the primary activity.

By default, we use a rank-1 approximation (see Options::rank). The reported weight for each gene (in Results::weights) is simply the absolute value of the associated rotation vector from the PCA. Increasing the rank of the approximation may capture more biological signal but also increases noise in the per-cell scores. If higher ranks are used, each gene's weight is instead defined as the root mean square of that gene's values across all rotation vectors.

Template Parameters
Value_Floating-point type of the data.
Index_Integer type of the indices.
Float_Floating-point type of the output.
Parameters
[in]matrixA matrix where columns correspond to cells and rows correspond to genes. Entries are typically log-expression values.
optionsFurther options.
[out]outputCollection of buffers in which to store the scores and weights.

◆ compute_blocked() [1/2]

template<typename Float_ = double, typename Value_ , typename Index_ , typename Block_ >
Results< Float_ > gsdecon::compute_blocked ( const tatami::Matrix< Value_, Index_ > & matrix,
const Block_ *const block,
const Options & options )

Overload of compute_blocked() that allocates memory for the results.

Template Parameters
Float_Floating-point type of the output.
Value_Floating-point type of the data.
Index_Integer type of the indices.
Block_Integer type of the block assignments.
Parameters
[in]matrixA matrix where columns correspond to cells and rows correspond to genes. Entries are typically log-expression values.
[in]blockPointer to an array of length equal to the number of columns in matrix. This should contain the blocking factor as 0-based block assignments (i.e., for \(N\) blocks, block identities should run from 0 to \(N-1\) with at least one entry for each block.)
optionsFurther options.
Returns
Results of the gene set score calculation.

◆ compute_blocked() [2/2]

template<typename Value_ , typename Index_ , typename Block_ , typename Float_ >
void gsdecon::compute_blocked ( const tatami::Matrix< Value_, Index_ > & matrix,
const Block_ *const block,
const Options & options,
const Buffers< Float_ > & output )

Extension of the algorithm described in compute() to datasets containing multiple blocks (e.g., batches, samples).

In the presence of strong block effects, naively running compute() would yield a first PC that is driven by uninteresting inter-block differences. Here, we perform the PCA on the residuals after centering each block, ensuring that the first PC focuses on the interesting variation within each block. Blocks can also be weighted so that they contribute equally to the rotation vector, regardless of the number of cells. The score for each cell is obtained by adding the block-specific centers to the low-rank approximation and computing the column means.

Note that the purpose of the blocking is to ensure that inter-block differences do not drive the first few PCs, not to remove the block effects themselves. Using residuals for batch correction requires strong assumptions such as identical block composition and consistent shifts across subpopulations; we do not attempt make that claim. The caller is instead responsible for ensuring that the block structure is still considered in any further analysis of the computed scores.

Template Parameters
Value_Floating-point type of the data.
Index_Integer type of the indices.
Block_Integer type of the block assignments.
Float_Floating-point type of the output.
Parameters
[in]matrixA matrix where columns correspond to cells and rows correspond to genes. Entries are typically log-expression values.
[in]blockPointer to an array of length equal to the number of columns in matrix. This should contain the blocking factor as 0-based block assignments (i.e., for \(N\) blocks, block identities should run from 0 to \(N-1\) with at least one entry for each block.)
optionsFurther options.
[out]outputCollection of buffers in which to store the scores and weights.