gsdecon
C++ port of the GSDecon algorithm
Loading...
Searching...
No Matches
C++ port of the GSDecon algorithm

Unit tests Documentation Codecov

Introduction

This repository implements a C++ port of the GSDecon algorithm for computing gene set scores from a single-cell expression matrix. The assumption is that there is one main axis of variation for the genes in the set, typically corresponding to some coordinated biological pathway. To capture this variation, we compute the first principal component (PC) in the submatrix consisting of the genes of interest. We then construct a rank-1 approximation of the submatrix and define the gene set score for each cell from the column means of the approximated matrix. In effect, each cell's score is the average expression of each gene in the set after removing noise and other secondary factors of variation from the expression submatrix - this corresponds to the expression of an "eigengene" in the context of the original GSDecon.

Quick start

Given a tatami::Matrix corresponding to the submatrix of genes in the set, the gsdecon::compute() function will compute the scores for each cell:

// This should already be subsetted to the features of interest:
const tatami::Matrix<double, int>& mat = some_data_source();
auto res = gsdecon::compute(mat, opt);
res.scores; // one per cell.
res.weights; // one per gene in the set.
Umbrella header for gsdecon.
void compute(const tatami::Matrix< Value_, Index_ > &matrix, const Options &options, const Buffers< Float_ > &output)
Definition compute.hpp:45
Options for compute() and compute_blocked().
Definition Options.hpp:17

The per-gene weights are set to the absolute values of the rotation matrix corresponding to the first PC. They can be used to obtain some diagnostics about the genes with the greatest contribution to the rank-1 approximation.

In the case of multiple blocks (e.g., samples, batches), we regress out the block effects and compute the first PC from the residuals. This ensures that the major axis of variation is not driven by uninteresting differences between blocks. However, this does not actually remove the block effects themselves, so any further analysis steps should still be block-aware.

std::vector<double> block_ids(mat.ncols()); // fill with block assignments.
auto bres = gsdecon::compute_blocked(mat, block_ids.data(), opt);
void compute_blocked(const tatami::Matrix< Value_, Index_ > &matrix, const Block_ *block, const Options &options, const Buffers< Float_ > &output)
Definition blocked.hpp:49

Check out the reference documentation for more details.

Building projects

CMake with <tt>FetchContent</tt>

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
gsdecon
GIT_REPOSITORY https://github.com/libscran/gsdecon
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(gsdecon)

Then you can link to gsdecon to make the headers available during compilation:

# For executables:
target_link_libraries(myexe libscran::gsdecon)
# For libaries
target_link_libraries(mylib INTERFACE libscran::gsdecon)

CMake with <tt>find_package()</tt>

find_package(libscran_gsdecon CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::gsdecon)

To install the library, use:

mkdir build && cd build
cmake .. -DGSDECON_TESTS=OFF
cmake --build . --target install

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DGSDECON_FETCH_EXTERN=OFF. See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.

Manual

If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This requires the external dependencies listed in extern/CMakeLists.txt, which also need to be made available during compilation.