Overview
This repository contains functions to perform quality control on cells, using metrics computed from a gene-by-cell matrix of expression values. Cells with "unusual" values for these metrics are considered to be of low quality and are filtered out prior to downstream analysis. The code itself was originally derived from the scran R package, factored out into a separate C++ library for easier re-use.
Quick start
Given a tatami::Matrix
containing RNA data, we can compute some common statistics like the sum of counts, number of detected genes, and the mitochondrial proportion:
std::shared_ptr<tatami::Matrix<double, int> > mat = some_data_source();
std::vector<std::vector<int> > subsets;
subsets.push_back(some_mito_subsets());
metrics.sum;
metrics.detected;
metrics.subset_proportion[0];
void compute_rna_qc_metrics(const tatami::Matrix< Value_, Index_ > &mat, const std::vector< Subset_ > &subsets, const ComputeRnaQcMetricsBuffers< Sum_, Detected_, Proportion_ > &output, const ComputeRnaQcMetricsOptions &options)
Definition rna_quality_control.hpp:92
Simple quality control for single-cell data.
Options for compute_rna_qc_metrics().
Definition rna_quality_control.hpp:25
We can then use this to identify high-quality cells:
filters.get_sum();
filters.get_detected();
filters.get_subset_proportion()[0];
auto keep = filters.filter(metrics);
RnaQcFilters< Float_ > compute_rna_qc_filters(size_t num, const ComputeRnaQcMetricsBuffers< Sum_, Detected_, Proportion_ > &metrics, const ComputeRnaQcFiltersOptions &options)
Definition rna_quality_control.hpp:495
Options for compute_rna_qc_filters().
Definition rna_quality_control.hpp:221
Users can also manually adjust the thresholds before filtering:
filters.get_sum() = 500;
filters.get_subset_proportion()[0] = 0.1;
The same general approach applies to ADT and CRISPR data, albeit with different metrics that are most relevant to each modality. For example, we use the sum of counts for the IgG isotype control when filtering ADT metrics:
std::shared_ptr<tatami::Matrix<double, int> > adt_mat = some_adt_data_source();
std::vector<std::vector<int> > asubsets;
asubsets.push_back(some_IgG_subsets());
auto akeep = filters.filter(ametrics);
AdtQcFilters< Float_ > compute_adt_qc_filters(size_t num, const ComputeAdtQcMetricsBuffers< Sum_, Detected_ > &metrics, const ComputeAdtQcFiltersOptions &options)
Definition adt_quality_control.hpp:438
void compute_adt_qc_metrics(const tatami::Matrix< Value_, Index_ > &mat, const std::vector< Subset_ > &subsets, const ComputeAdtQcMetricsBuffers< Sum_, Detected_ > &output, const ComputeAdtQcMetricsOptions &options)
Definition adt_quality_control.hpp:94
Options for compute_adt_qc_filters().
Definition adt_quality_control.hpp:194
Options for compute_adt_qc_metrics().
Definition adt_quality_control.hpp:25
Once we have our filter(s), we can subset our dataset so that only the columns corresponding to high-quality cells are used for downstream analysis:
mat,
false
);
mat,
false
);
std::shared_ptr< Matrix< Value_, Index_ > > make_DelayedSubset(std::shared_ptr< const Matrix< Value_, Index_ > > matrix, SubsetStorage_ subset, bool by_row)
Check out the reference documentation for more details.
Building projects
CMake with FetchContent
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
include(FetchContent)
FetchContent_Declare(
scran_qc
GIT_REPOSITORY https://github.com/libscran/scran_qc
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_qc)
Then you can link to scran_qc to make the headers available during compilation:
# For executables:
target_link_libraries(myexe libscran::scran_qc)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_qc)
CMake with find_package()
find_package(libscran_scran_qc CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_qc)
To install the library, use:
mkdir build && cd build
cmake .. -DSCRAN_QC_TESTS=OFF
cmake --build . --target install
By default, this will use FetchContent
to fetch all external dependencies. If you want to install them manually, use -DSCRAN_QC_FETCH_EXTERN=OFF
. See the tags in extern/CMakeLists.txt
to find compatible versions of each dependency.
Manual
If you're not using CMake, the simple approach is to just copy the files in include/
- either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
. This requires the external dependencies listed in extern/CMakeLists.txt
, which also need to be made available during compilation.