scran_aggregate
Aggregate expression values across cells
Loading...
Searching...
No Matches
Aggregate expression values across cells

Unit tests Documentation Codecov

Overview

This repository contains functions to aggregate statistics for groups of cells or sets of genes from a gene-by-cell matrix of expression values. It was primarily developed for computing pseudo-bulk expression profiles for clusters of cells, which can then be used for differential expression analysis with packages like edgeR. The code itself was originally derived from the scran R package, factored out into a separate C++ library for easier re-use.

Quick start

Given a tatami::Matrix and an array of group assignments, the aggregate_across_cells() function will compute the aggregate statistics across all genes for each group.

const tatami::Matrix<double, int>& mat = some_data_source();
// Array of groupings should contain integer assignments to groups 0, 1, 2, etc.
std::vector<int> groupings = some_groupings();
auto res = scran_aggregate::aggregate_across_cells(mat, groupings.data(), opt);
res.sums; // vector of vectors of per-group sums across genes.
res.sums[0]; // vector of sums for the first group across genes.
res.detected; // vector of vectors of the number of detected cells per gene.
void aggregate_across_cells(const tatami::Matrix< Data_, Index_ > &input, const Group_ *const group, const AggregateAcrossCellsBuffers< Sum_, Detected_ > &buffers, const AggregateAcrossCellsOptions &options)
Definition aggregate_across_cells.hpp:273
Aggregate single-cell expression values.
Options for aggregate_across_cells().
Definition aggregate_across_cells.hpp:25

We can also use the aggregate_across_genes() function to sum expression values across gene sets, e.g., to compute the activity of a gene signature. This can be done with any number of gene sets, possibly with a different weight for each gene in each set.

std::vector<std::tuple<size_t, const int*, const double*> > gene_sets;
std::vector<int> set1 { 0, 5, 10, 20 };
gene_sets.emplace_back(set1.size(), set1.data(), static_cast<double*>(NULL)); // no weight
std::vector<int> set2 { 0, 2, 4, 6, 8, 10 };
std::vector<double> weight2 { 0.1, 0.3, 0.3, 0.2, 0.1, 0.05 };
gene_sets.emplace_back(set2.size(), set2.data(), weight2.data()); // weighted
mat,
gene_sets,
g_opt
);
g_res.sum[0]; // vector of sums for set 1 in each cell.
void aggregate_across_genes(const tatami::Matrix< Data_, Index_ > &input, const std::vector< std::tuple< std::size_t, const Gene_ *, const Weight_ * > > &gene_sets, const AggregateAcrossGenesBuffers< Sum_ > &buffers, const AggregateAcrossGenesOptions &options)
Definition aggregate_across_genes.hpp:274
Options for aggregate_across_genes().
Definition aggregate_across_genes.hpp:26

Check out the reference documentation for more details.

Building projects

CMake with FetchContent

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
scran_aggregate
GIT_REPOSITORY https://github.com/libscran/scran_aggregate
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_aggregate)

Then you can link to scran_aggregate to make the headers available during compilation:

# For executables:
target_link_libraries(myexe libscran::scran_aggregate)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_aggregate)

CMake with find_package()

find_package(libscran_scran_aggregate CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_aggregate)

To install the library, use:

mkdir build && cd build
cmake .. -DSCRAN_AGGREGATE_TESTS=OFF
cmake --build . --target install

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DSCRAN_AGGREGATE_FETCH_EXTERN=OFF. See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.

Manual

If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This also requires the external dependencies listed in extern/CMakeLists.txt.