Overview
This repository contains a function to aggregate statistics for groups of cells from a gene-by-cell matrix of expression values. It was primarily developed for computing pseudo-bulk expression profiles for clusters of cells, which can then be used for differential expression analysis. The code itself was originally derived from the scran R package, factored out into a separate C++ library for easier re-use.
Quick start
Given a tatami::Matrix
and an array of group assignments, the aggregate_across_cells()
function will compute the aggregate statistics across all genes for each group.
std::vector<int> groupings = some_groupings();
res.sums;
res.sums[0];
res.detected;
void aggregate_across_cells(const tatami::Matrix< Data_, Index_ > &input, const Factor_ *factor, const AggregateAcrossCellsBuffers< Sum_, Detected_ > &buffers, const AggregateAcrossCellsOptions &options)
Definition aggregate_across_cells.hpp:247
Aggregate single-cell expression values.
Options for aggregate_across_cells().
Definition aggregate_across_cells.hpp:18
The array of groupings should contain integer assignments to groups 0, 1, 2, etc. For more complex groupings defined from combinations of multiple factors, the combine_factors()
utility will create group assignments from unique combinations of those factors:
std::vector<int> grouping1 { 0, 0, 1, 1, 2, 2 };
std::vector<int> grouping2 { 0, 1, 0, 1, 0, 1 };
std::vector<int> combined(grouping1.size());
grouping1.size(),
std::vector<int*>{ grouping1.data(), grouping2.data() },
combined.data()
);
combined;
res.factors[0];
res.factors[1];
std::vector< std::vector< Factor_ > > combine_factors(size_t n, const std::vector< const Factor_ * > &factors, Combined_ *combined)
Definition combine_factors.hpp:40
We can also use the aggregate_across_genes()
function to sum expression values across gene sets, e.g., to compute the activity of a gene signature. This can be done with any number of gene sets, possibly with a different weight for each gene in each set.
std::vector<std::tuple<size_t, const int*, const double*> > gene_sets;
std::vector<int> set1 { 0, 5, 10, 20 };
gene_sets.emplace_back(set1.size(), set1.data(), static_cast<double*>(NULL));
std::vector<int> set2 { 0, 2, 4, 6, 8, 10 };
std::vector<double> weight2 { 0.1, 0.3, 0.3, 0.2, 0.1, 0.05 };
gene_sets.emplace_back(set2.size(), set2.data(), weight2.data());
mat,
gene_sets,
g_opt
);
g_res.sum[0];
void aggregate_across_genes(const tatami::Matrix< Data_, Index_ > &input, const std::vector< std::tuple< size_t, const Gene_ *, const Weight_ * > > &gene_sets, const AggregateAcrossGenesBuffers< Sum_ > &buffers, const AggregateAcrossGenesOptions &options)
Definition aggregate_across_genes.hpp:269
Options for aggregate_across_genes().
Definition aggregate_across_genes.hpp:21
Check out the reference documentation for more details.
Building projects
CMake with FetchContent
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
include(FetchContent)
FetchContent_Declare(
scran_aggregate
GIT_REPOSITORY https://github.com/libscran/scran_aggregate
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_aggregate)
Then you can link to scran_aggregate to make the headers available during compilation:
# For executables:
target_link_libraries(myexe libscran::scran_aggregate)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_aggregate)
CMake with find_package()
find_package(libscran_scran_aggregate CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_aggregate)
To install the library, use:
mkdir build && cd build
cmake .. -DSCRAN_AGGREGATE_TESTS=OFF
cmake --build . --target install
By default, this will use FetchContent
to fetch all external dependencies. If you want to install them manually, use -DSCRAN_AGGREGATE_FETCH_EXTERN=OFF
. See the tags in extern/CMakeLists.txt
to find compatible versions of each dependency.
Manual
If you're not using CMake, the simple approach is to just copy the files in include/
- either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
. This requires the external dependencies listed in extern/CMakeLists.txt
, which also need to be made available during compilation.