Marker detection for single-cell data
Marker detection for groups of cells

This library contains functions for detecting group-specific markers (e.g., for clusters or cell types) from a single-cell expression matrix. It performs differential analyses between pairs of groups, computing a variety of effect sizes like Cohen's d and the AUC. The effect sizes are then summarized for each group to obtain some rankings for prioritizing interesting genes. The code itself was originally derived from the scran R package, factored out into a separate C++ library for easier re-use.

Quick start

Given a tatami::Matrix and an array of group assignments, the score_markers_summary() function will compute the aggregate statistics across all genes for each group.

// Expression matrix, usually log-normalized.
const tatami::Matrix<double, int>& matrix = some_data_source();
// Array containing integer assignments to groups 0, 1, 2, etc.
std::vector<int> groupings = some_groupings();
auto res = scran_markers::score_markers_summary(matrix,, opt);
res.mean[0]; // mean of each gene in the first group.
res.detected[0]; // detected proportion of each gene in the first group.
void score_markers_summary(const tatami::Matrix< Value_, Index_ > &matrix, const Group_ *group, const ScoreMarkersSummaryOptions &options, const ScoreMarkersSummaryBuffers< Stat_, Rank_ > &output)
Definition score_markers_summary.hpp:646
Marker detection for single-cell data.
Options for score_markers_summary() and friends.
Definition score_markers_summary.hpp:28

The most interesting part of this result is the effect size summary for each group. For each group, we compute the Cohen's d, AUC, delta-mean and delta-detected by comparing that group against every other group. We then combine the effect sizes from all comparisons into some summary statistics like the mean or median. Ranking by these summaries yields a list of potential group-specific marker genes for further examination. Picking a different summary statistic and/or effect size will favor different types of markers in the ranking.

res.cohens_d[0].mean; // mean Cohen's d across all genes for the first group
res.auc[0].median; // median AUC across all genes for the first group.

If the dataset contains some uninteresting variation (e.g., batches, samples), we can ensure that it does not affect the effect size calculation by blocking on that factor. This performs the comparisons within each level of the blocking factor so as to ignore the irrelevant variation.

// Array containing integer assignments to blocks 0, 1, 2, etc.
std::vector<int> blocks = some_blocks();
void score_markers_summary_blocked(const tatami::Matrix< Value_, Index_ > &matrix, const Group_ *group, const Block_ *block, const ScoreMarkersSummaryOptions &options, const ScoreMarkersSummaryBuffers< Stat_, Rank_ > &output)
Definition score_markers_summary.hpp:753

If more detail is necessary, we can obtain effect sizes from all pairwise comparisons using the score_markers_pairwise() function.

// 3D arrays of effect sizes for each pairwise comparison and gene.
void score_markers_pairwise(const tatami::Matrix< Value_, Index_ > &matrix, const Group_ *group, const ScoreMarkersPairwiseOptions &options, const ScoreMarkersPairwiseBuffers< Stat_ > &output)
Definition score_markers_pairwise.hpp:368
Options for score_markers_pairwise() and friends.
Definition score_markers_pairwise.hpp:28

Alternatively, if we already have an array effect sizes, we can use the summarize_effects() function to obtain summaries for each group. In fact, score_markers_summary() is just a more memory-efficient version of score_markers_pairwise() followed by summarize_effects().

auto cohen_summary = scran_markers::summarize_effects(
pair_res.mean.size(), // i.e., number of groups
void summarize_effects(Index_ ngenes, size_t ngroups, const Stat_ *effects, const std::vector< SummaryBuffers< Stat_, Rank_ > > &summaries, const SummarizeEffectsOptions &options)
Definition summarize_effects.hpp:113

Check out the reference documentation for more details.

Building projects

CMake with FetchContent

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

GIT_TAG master # or any version of interest

Then you can link to scran_markers to make the headers available during compilation:

# For executables:
target_link_libraries(myexe libscran::scran_markers)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_markers)

CMake with find_package()

find_package(libscran_scran_markers CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_markers)

To install the library, use:

mkdir build && cd build
cmake --build . --target install

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DSCRAN_MARKERS_FETCH_EXTERN=OFF. See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.


If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This requires the external dependencies listed in extern/CMakeLists.txt, which also need to be made available during compilation.