scran_norm
Scaling normalization of single-cell data
Loading...
Searching...
No Matches
Scaling normalization of single-cell count data

Unit tests Documentation Codecov

Overview

This repository contains functions to perform scaling normalization and log-transformation of a gene-by-cell count matrix. Normalization removes per-cell scaling biases such as differences in capture efficiency and sequencing depth, while the log-transformation provides some variance stabilization and allows differences in values to be interpreted as log-fold changes. The code itself was originally derived from the scran R package, factored out into a separate C++ library for easier re-use.

Quick start

Given a measure of the per-cell scaling bias - for example, the sum of counts for a cell - we can convert them into centered size factors:

// Assuming that counts is a std::shared_ptr<tatami::Matrix>
std::vector<double> bias = tatami_stats::sums::by_column(*counts));
scran_norm::center_size_factors(bias.size(), bias.data(), NULL, copt);
// 'bias' is now centered at unity and can be used as size factors.
auto& size_factors = bias;
SizeFactor_ center_size_factors(const std::size_t num_cells, SizeFactor_ *const size_factors, const CenterSizeFactorsOptions &options)
Definition center_size_factors.hpp:197
Scaling normalization of single-cell data.
Options for center_size_factors().
Definition center_size_factors.hpp:149

Alternatively, in the presence of blocks, we adjust our centering so that the mean size factor in each block is no less than 1. This avoids inflated variances from applying small size factors to low-coverage blocks.

bias.size(), // number of cells
bias.data(), // pointer to an array of relative biases, e.g., library sizes.
block.data(), // pointer to an array of block assignments
num_blocks, // number of unique blocks in the block array.
copt // further options
);
std::vector< SizeFactor_ > center_size_factors_blocked(const std::size_t num_cells, SizeFactor_ *const size_factors, const Block_ *const block, const std::size_t num_blocks, const CenterSizeFactorsBlockedOptions &options)
Definition center_size_factors.hpp:298

If our size factors might contain invalid values (i.e., zero, negative, or non-finite), we can sanitize them prior to the construction of the log-normalized matrix:

sopt.handle_zero = scran_norm::SanitizeAction::IGNORE; // don't alter zeros
sopt.handle_infinite = scran_norm::SanitizeAction::SANITIZE; // sanitize infinities
sopt.handle_nan = scran_norm::SanitizeAction::ERROR; // error on seeing NaNs
scran_norm::sanitize_size_factors(size_factors.size(), size_factors.data(), sopt);
void sanitize_size_factors(const std::size_t num, SizeFactor_ *const size_factors, const SizeFactorDiagnostics &status, const SanitizeSizeFactorsOptions &options)
Definition sanitize_size_factors.hpp:202
Options for sanitize_size_factors().
Definition sanitize_size_factors.hpp:146
SanitizeAction handle_zero
Definition sanitize_size_factors.hpp:157
SanitizeAction handle_nan
Definition sanitize_size_factors.hpp:172
SanitizeAction handle_infinite
Definition sanitize_size_factors.hpp:180

Finally, we convert our tatami::Matrix of counts into a log-transformed normalized matrix:

auto logcounts = scran_norm::normalize_counts(counts, size_factors, lopt);
std::shared_ptr< tatami::Matrix< OutputValue_, Index_ > > normalize_counts(std::shared_ptr< const tatami::Matrix< InputValue_, Index_ > > counts, SizeFactors_ size_factors, const NormalizeCountsOptions &options)
Definition normalize_counts.hpp:234
Options for normalize_counts().
Definition normalize_counts.hpp:171

Check out the reference documentation for more details.

Building projects

CMake with FetchContent

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
scran_norm
GIT_REPOSITORY https://github.com/libscran/scran_norm
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_norm)

Then you can link to scran_norm to make the headers available during compilation:

# For executables:
target_link_libraries(myexe libscran::scran_norm)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_norm)

CMake with find_package()

find_package(libscran_scran_norm CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_norm)

To install the library, use:

mkdir build && cd build
cmake .. -DSCRAN_NORM_TESTS=OFF
cmake --build . --target install

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DSCRAN_NORM_FETCH_EXTERN=OFF. See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.

Manual

If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This also requires the external dependencies listed in extern/CMakeLists.txt.