scran_qc
Simple quality control on single-cell data
Loading...
Searching...
No Matches
Quality control on single-cell data

Unit tests Documentation Codecov

Overview

This repository contains functions to perform quality control on cells, using metrics computed from a gene-by-cell matrix of expression values. Cells with "unusual" values for these metrics are considered to be of low quality and are filtered out prior to downstream analysis. The code itself was originally derived from the scran R package, factored out into a separate C++ library for easier re-use.

Quick start

Given a tatami::Matrix containing RNA data, we can compute some common statistics like the sum of counts, number of detected genes, and the mitochondrial proportion:

std::shared_ptr<tatami::Matrix<double, int> > mat = some_data_source();
std::vector<std::vector<int> > subsets;
subsets.push_back(some_mito_subsets()); // vector of row indices for mitochondrial genes.
auto metrics = scran_qc::compute_rna_qc_metrics(*mat, subsets, mopt);
metrics.sum; // vector of count sums across cells.
metrics.detected; // vector of detected genes across cells.
metrics.subset_proportion[0]; // vector of mitochondrial proportions.
void compute_rna_qc_metrics(const tatami::Matrix< Value_, Index_ > &mat, const std::vector< Subset_ > &subsets, const ComputeRnaQcMetricsBuffers< Sum_, Detected_, Proportion_ > &output, const ComputeRnaQcMetricsOptions &options)
Definition rna_quality_control.hpp:92
Simple quality control for single-cell data.
Options for compute_rna_qc_metrics().
Definition rna_quality_control.hpp:25

We can then use this to identify high-quality cells:

auto filters = scran_qc::compute_rna_qc_filters(metrics, fopt);
filters.get_sum(); // lower threshold on the sum.
filters.get_detected(); // lower threshold on the number of detected genes.
filters.get_subset_proportion()[0]; // upper threshold on the mitochondrial proportion.
auto keep = filters.filter(metrics); // vector where 1 = high-quality, 0 = low-quality.
RnaQcFilters< Float_ > compute_rna_qc_filters(size_t num, const ComputeRnaQcMetricsBuffers< Sum_, Detected_, Proportion_ > &metrics, const ComputeRnaQcFiltersOptions &options)
Definition rna_quality_control.hpp:479
Options for compute_rna_qc_filters().
Definition rna_quality_control.hpp:209

Users can also manually adjust the thresholds before filtering:

filters.get_sum() = 500;
filters.get_subset_proportion()[0] = 0.1;

The same general approach applies to ADT and CRISPR data, albeit with different metrics that are most relevant to each modality. For example, we use the sum of counts for the IgG isotype control when filtering ADT metrics:

std::shared_ptr<tatami::Matrix<double, int> > adt_mat = some_adt_data_source();
std::vector<std::vector<int> > asubsets;
asubsets.push_back(some_IgG_subsets()); // vector of row indices for IgG controls.
auto ametrics = scran_qc::compute_adt_qc_metrics(*adt_mat, asubsets, amopt);
auto afilters = scran_qc::compute_adt_qc_filters(ametrics, afopt);
auto akeep = filters.filter(ametrics);
AdtQcFilters< Float_ > compute_adt_qc_filters(size_t num, const ComputeAdtQcMetricsBuffers< Sum_, Detected_ > &metrics, const ComputeAdtQcFiltersOptions &options)
Definition adt_quality_control.hpp:420
void compute_adt_qc_metrics(const tatami::Matrix< Value_, Index_ > &mat, const std::vector< Subset_ > &subsets, const ComputeAdtQcMetricsBuffers< Sum_, Detected_ > &output, const ComputeAdtQcMetricsOptions &options)
Definition adt_quality_control.hpp:92
Options for compute_adt_qc_filters().
Definition adt_quality_control.hpp:180
Options for compute_adt_qc_metrics().
Definition adt_quality_control.hpp:25

Once we have our filter(s), we can subset our dataset so that only the columns corresponding to high-quality cells are used for downstream analysis:

auto submatrix = tatami::make_DelayedSubset(
mat,
scran_qc::filter_index<int>(keep.size(), keep.data()),
/* by_row = */ false
);
// Combine filters from multiple modalities:
auto submatrix2 = tatami::make_DelayedSubset(
mat,
scran_qc::combine_filters_index<int>(keep.size(), { keep.data(), akeep.data() }),
/* by_row = */ false
);
Temporary data structures for find_median_mad_blocked().
Definition find_median_mad.hpp:172
std::shared_ptr< Matrix< Value_, Index_ > > make_DelayedSubset(std::shared_ptr< const Matrix< Value_, Index_ > > matrix, SubsetStorage_ subset, bool by_row)

Check out the reference documentation for more details.

Building projects

CMake with FetchContent

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
scran_qc
GIT_REPOSITORY https://github.com/libscran/scran_qc
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_qc)

Then you can link to scran_qc to make the headers available during compilation:

# For executables:
target_link_libraries(myexe libscran::scran_qc)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_qc)

CMake with find_package()

find_package(libscran_scran_qc CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_qc)

To install the library, use:

mkdir build && cd build
cmake .. -DSCRAN_QC_TESTS=OFF
cmake --build . --target install

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DSCRAN_QC_FETCH_EXTERN=OFF. See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.

Manual

If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This requires the external dependencies listed in extern/CMakeLists.txt, which also need to be made available during compilation.