|
template<typename Value_ , typename Index_ , typename Block_ , typename Float_ > |
void | compute_blocked (const tatami::Matrix< Value_, Index_ > &matrix, const Block_ *block, const Options &options, const Buffers< Float_ > &output) |
|
template<typename Float_ = double, typename Value_ , typename Index_ , typename Block_ > |
Results< Float_ > | compute_blocked (const tatami::Matrix< Value_, Index_ > &matrix, const Block_ *block, const Options &options) |
|
template<typename Value_ , typename Index_ , typename Float_ > |
void | compute (const tatami::Matrix< Value_, Index_ > &matrix, const Options &options, const Buffers< Float_ > &output) |
|
template<typename Float_ = double, typename Value_ , typename Index_ > |
Results< Float_ > | compute (const tatami::Matrix< Value_, Index_ > &matrix, const Options &options) |
|
Gene set scoring with gsdecon.
template<typename Value_ , typename Index_ , typename Float_ >
Given an input matrix containing log-expression values for genes in a set of interest, per-cell scores are defined as the column means of the low-rank approximation of that matrix. The assumption here is that the primary activity of the gene set can be quantified by the largest component of variance amongst its genes. (If this was not the case, one could argue that this gene set is not well-suited to capture the biology attributed to it.) In effect, the rotation vector defines weights for all genes in the set, focusing on genes that contribute to the primary activity.
By default, we use a rank-1 approximation (see Options::rank
). The reported weight for each gene (in Results::weights
) is simply the absolute value of the associated rotation vector from the PCA. Increasing the rank of the approximation may capture more biological signal but also increases noise in the per-cell scores. If higher ranks are used, each gene's weight is instead defined as the root mean square of that gene's values across all rotation vectors.
- Template Parameters
-
Value_ | Floating-point type for the data. |
Index_ | Integer type for the indices. |
Float_ | Floating-point type for the output. |
- Parameters
-
[in] | matrix | An input tatami matrix. Columns should contain cells while rows should contain genes in the set of interest. Entries are typically be log-expression values. |
| options | Further options. |
[out] | output | Collection of buffers in which to store the scores and weights. |
template<typename Value_ , typename Index_ , typename Block_ , typename Float_ >
void gsdecon::compute_blocked |
( |
const tatami::Matrix< Value_, Index_ > & |
matrix, |
|
|
const Block_ * |
block, |
|
|
const Options & |
options, |
|
|
const Buffers< Float_ > & |
output |
|
) |
| |
Extension of the algorithm described in compute()
to datasets containing multiple blocks (e.g., batches, samples).
In the presence of strong block effects, naively running compute()
would yield a first PC that is driven by uninteresting inter-block differences. Here, We werform the PCA on the residuals after centering each block, ensuring that the first PC focuses on the interesting variation within each block. Blocks can also be weighted so that they contribute equally to the rotation vector, regardless of the number of cells.
Note that the purpose of the blocking is to ensure that inter-block differences do not drive the first few PCs, not to remove the block effects themselves. Using residuals for batch correction requires strong assumptions such as identical block composition and consistent shifts across subpopulations; we do not attempt make that claim. The caller is instead responsible for ensuring that the block structure is still considered in any further analysis of the computed scores.
- Template Parameters
-
Value_ | Floating-point type for the data. |
Index_ | Integer type for the indices. |
Block_ | Integer type for the block assignments. |
Float_ | Floating-point type for the output. |
- Parameters
-
[in] | matrix | An input tatami matrix. Columns should contain cells while rows should contain genes. Entries are typically be log-expression values. |
[in] | block | Pointer to an array of length equal to the number of columns in matrix . This should contain the blocking factor as 0-based block assignments (i.e., for \(N\) blocks, block identities should run from 0 to \(N-1\) with at least one entry for each block.) |
| options | Further options. |
[out] | output | Collection of buffers in which to store the scores and weights. |