scran_aggregate
Aggregate expression values across cells
Loading...
Searching...
No Matches
Aggregate expression values across cells

Unit tests Documentation Codecov

Overview

This repository contains a function to aggregate statistics for groups of cells from a gene-by-cell matrix of expression values. It was primarily developed for computing pseudo-bulk expression profiles for clusters of cells, which can then be used for differential expression analysis. The code itself was originally derived from the scran R package, factored out into a separate C++ library for easier re-use.

Quick start

Given a tatami::Matrix and an array of group assignments, the aggregate_across_cells() function will compute the aggregate statistics across all genes for each group.

const tatami::Matrix<double, int>& mat = some_data_source();
std::vector<int> groupings = some_groupings();
auto res = scran_aggregate::aggregate_across_cells(mat, groupings.data(), opt);
res.sums; // vector of vectors of per-group sums across genes.
res.sums[0]; // vector of sums for the first group across genes.
res.detected; // vector of vectors of the number of detected cells per gene.
void aggregate_across_cells(const tatami::Matrix< Data_, Index_ > &input, const Factor_ *factor, const AggregateAcrossCellsBuffers< Sum_, Detected_ > &buffers, const AggregateAcrossCellsOptions &options)
Definition aggregate_across_cells.hpp:247
Aggregate single-cell expression values.
Options for aggregate_across_cells().
Definition aggregate_across_cells.hpp:18

The array of groupings should contain integer assignments to groups 0, 1, 2, etc. For more complex groupings defined from combinations of multiple factors, the combine_factors() utility will create group assignments from unique combinations of those factors:

std::vector<int> grouping1 { 0, 0, 1, 1, 2, 2 };
std::vector<int> grouping2 { 0, 1, 0, 1, 0, 1 };
std::vector<int> combined(grouping1.size());
grouping1.size(),
std::vector<int*>{ grouping1.data(), grouping2.data() },
combined.data()
);
combined; // defines unique combinations of (grouping1, grouping2).
res.factors[0]; // values of grouping1 for each unique combination.
res.factors[1]; // values of grouping2 for each unique combination.
std::vector< std::vector< Factor_ > > combine_factors(size_t n, const std::vector< const Factor_ * > &factors, Combined_ *combined)
Definition combine_factors.hpp:40

We can also use the aggregate_across_genes() function to sum expression values across gene sets, e.g., to compute the activity of a gene signature. This can be done with any number of gene sets, possibly with a different weight for each gene in each set.

std::vector<std::tuple<size_t, const int*, const double*> > gene_sets;
std::vector<int> set1 { 0, 5, 10, 20 };
gene_sets.emplace_back(set1.size(), set1.data(), static_cast<double*>(NULL)); // no weight
std::vector<int> set2 { 0, 2, 4, 6, 8, 10 };
std::vector<double> weight2 { 0.1, 0.3, 0.3, 0.2, 0.1, 0.05 };
gene_sets.emplace_back(set2.size(), set2.data(), weight2.data()); // weighted
mat,
gene_sets,
g_opt
);
g_res.sum[0]; // vector of sums for set 1 in each cell.
void aggregate_across_genes(const tatami::Matrix< Data_, Index_ > &input, const std::vector< std::tuple< size_t, const Gene_ *, const Weight_ * > > &gene_sets, const AggregateAcrossGenesBuffers< Sum_ > &buffers, const AggregateAcrossGenesOptions &options)
Definition aggregate_across_genes.hpp:269
Options for aggregate_across_genes().
Definition aggregate_across_genes.hpp:21

Check out the reference documentation for more details.

Building projects

CMake with FetchContent

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
scran_aggregate
GIT_REPOSITORY https://github.com/libscran/scran_aggregate
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_aggregate)

Then you can link to scran_aggregate to make the headers available during compilation:

# For executables:
target_link_libraries(myexe libscran::scran_aggregate)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_aggregate)

CMake with find_package()

find_package(libscran_scran_aggregate CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_aggregate)

To install the library, use:

mkdir build && cd build
cmake .. -DSCRAN_AGGREGATE_TESTS=OFF
cmake --build . --target install

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DSCRAN_AGGREGATE_FETCH_EXTERN=OFF. See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.

Manual

If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This requires the external dependencies listed in extern/CMakeLists.txt, which also need to be made available during compilation.