Gene set scoring with gsdecon. More...

Classes
struct	Buffers
	Buffers for the results of `compute()` and `compute_blocked()`. More...

struct	Options
	Options for `compute()` and `compute_blocked()`. More...

struct	Results
	Results of `compute()` and `compute_blocked()`. More...

Functions
template<typename Value_ , typename Index_ , typename Block_ , typename Float_ >
void	compute_blocked (const tatami::Matrix< Value_, Index_ > &matrix, const Block_ *block, const Options &options, const Buffers< Float_ > &output)

template<typename Float_ = double, typename Value_ , typename Index_ , typename Block_ >
Results< Float_ >	compute_blocked (const tatami::Matrix< Value_, Index_ > &matrix, const Block_ *block, const Options &options)

template<typename Value_ , typename Index_ , typename Float_ >
void	compute (const tatami::Matrix< Value_, Index_ > &matrix, const Options &options, const Buffers< Float_ > &output)

template<typename Float_ = double, typename Value_ , typename Index_ >
Results< Float_ >	compute (const tatami::Matrix< Value_, Index_ > &matrix, const Options &options)

Detailed Description

Gene set scoring with gsdecon.

Function Documentation

◆ compute() [1/2]

template<typename Float_ = double, typename Value_ , typename Index_ >

Results< Float_ > gsdecon::compute	(	const tatami::Matrix< Value_, Index_ > &	matrix,
		const Options &	options
	)

Overload of compute() that allocates memory for the results.

Template Parameters

Float_	Floating-point type for the output.
Value_	Floating-point type for the data.
Index_	Integer type for the indices.

Parameters

[in]	matrix	An input tatami matrix. Columns should contain cells while rows should contain genes. Entries are typically be log-expression values.
	options	Further options.

Returns: Results of the gene set score calculation.

◆ compute() [2/2]

template<typename Value_ , typename Index_ , typename Float_ >

void gsdecon::compute	(	const tatami::Matrix< Value_, Index_ > &	matrix,
		const Options &	options,
		const Buffers< Float_ > &	output
	)

Given an input matrix containing log-expression values for genes in a set of interest, per-cell scores are defined as the column means of the low-rank approximation of that matrix. The assumption here is that the primary activity of the gene set can be quantified by the largest component of variance amongst its genes. (If this was not the case, one could argue that this gene set is not well-suited to capture the biology attributed to it.) In effect, the rotation vector defines weights for all genes in the set, focusing on genes that contribute to the primary activity.

By default, we use a rank-1 approximation (see Options::rank). The reported weight for each gene (in Results::weights) is simply the absolute value of the associated rotation vector from the PCA. Increasing the rank of the approximation may capture more biological signal but also increases noise in the per-cell scores. If higher ranks are used, each gene's weight is instead defined as the root mean square of that gene's values across all rotation vectors.

Template Parameters

Value_	Floating-point type for the data.
Index_	Integer type for the indices.
Float_	Floating-point type for the output.

Parameters

[in]	matrix	An input tatami matrix. Columns should contain cells while rows should contain genes in the set of interest. Entries are typically be log-expression values.
	options	Further options.
[out]	output	Collection of buffers in which to store the scores and weights.

◆ compute_blocked() [1/2]

template<typename Float_ = double, typename Value_ , typename Index_ , typename Block_ >

Results< Float_ > gsdecon::compute_blocked	(	const tatami::Matrix< Value_, Index_ > &	matrix,
		const Block_ *	block,
		const Options &	options
	)

Overload of compute_blocked() that allocates memory for the results.

Template Parameters

Float_	Floating-point type for the output.
Value_	Floating-point type for the data.
Index_	Integer type for the indices.
Block_	Integer type for the block assignments.

Parameters

[in]	matrix	An input tatami matrix. Columns should contain cells while rows should contain genes. Entries are typically be log-expression values.
[in]	block	Pointer to an array of length equal to the number of columns in `matrix`. This should contain the blocking factor as 0-based block assignments (i.e., for \(N\) blocks, block identities should run from 0 to \(N-1\) with at least one entry for each block.)
	options	Further options.

Returns: Results of the gene set score calculation.

◆ compute_blocked() [2/2]

template<typename Value_ , typename Index_ , typename Block_ , typename Float_ >

void gsdecon::compute_blocked	(	const tatami::Matrix< Value_, Index_ > &	matrix,
		const Block_ *	block,
		const Options &	options,
		const Buffers< Float_ > &	output
	)

Extension of the algorithm described in compute() to datasets containing multiple blocks (e.g., batches, samples).

In the presence of strong block effects, naively running compute() would yield a first PC that is driven by uninteresting inter-block differences. Here, We werform the PCA on the residuals after centering each block, ensuring that the first PC focuses on the interesting variation within each block. Blocks can also be weighted so that they contribute equally to the rotation vector, regardless of the number of cells.

Note that the purpose of the blocking is to ensure that inter-block differences do not drive the first few PCs, not to remove the block effects themselves. Using residuals for batch correction requires strong assumptions such as identical block composition and consistent shifts across subpopulations; we do not attempt make that claim. The caller is instead responsible for ensuring that the block structure is still considered in any further analysis of the computed scores.

Template Parameters

Value_	Floating-point type for the data.
Index_	Integer type for the indices.
Block_	Integer type for the block assignments.
Float_	Floating-point type for the output.

Parameters

[in]	matrix	An input tatami matrix. Columns should contain cells while rows should contain genes. Entries are typically be log-expression values.
[in]	block	Pointer to an array of length equal to the number of columns in `matrix`. This should contain the blocking factor as 0-based block assignments (i.e., for \(N\) blocks, block identities should run from 0 to \(N-1\) with at least one entry for each block.)
	options	Further options.
[out]	output	Collection of buffers in which to store the scores and weights.

Classes

Functions

Detailed Description

Function Documentation

◆ compute() [1/2]

◆ compute() [2/2]

◆ compute_blocked() [1/2]

◆ compute_blocked() [2/2]