scran_norm
Scaling normalization of single-cell data
|
Options for center_size_factors()
and center_size_factors_blocked()
.
More...
#include <center_size_factors.hpp>
Public Attributes | |
CenterBlockMode | block_mode = CenterBlockMode::LOWEST |
bool | ignore_invalid = true |
Options for center_size_factors()
and center_size_factors_blocked()
.
CenterBlockMode scran_norm::CenterSizeFactorsOptions::block_mode = CenterBlockMode::LOWEST |
Strategy for handling blocks in center_size_factors_blocked()
.
With the PER_BLOCK
strategy, size factors are scaled separately for each block so that they have a mean of 1 within each block. The scaled size factors are identical to those obtained by separate invocations of center_size_factors()
on the size factors for each block. This can be desirable to ensure consistency with independent analyses of each block - otherwise, the centering would depend on the size factors in other blocks. However, any systematic differences in the size factors between blocks are lost, i.e., systematic changes in coverage between blocks will not be normalized.
With the LOWEST
strategy, we compute the mean size factor for each block and we divide all size factors by the lowest mean. Here, our normalization strategy involves downscaling all blocks to match the coverage of the lowest-coverage block. This is useful for datasets with highly variable coverage between different blocks as it avoids egregious upscaling of low-coverage blocks. Specifically, strong upscaling allows the log-transformation to ignore any shrinkage from the pseudo-count. This is problematic as it inflates differences between cells at log-values derived from low counts, increasing noise and overstating log-fold changes. Downscaling is safer as it allows the pseudo-count to shrink the log-differences between cells towards zero at low counts, effectively sacrificing some information in the higher-coverage batches so that they can be compared to the low-coverage batches (which is preferable to exaggerating the informativeness of the latter for comparison to the former).
bool scran_norm::CenterSizeFactorsOptions::ignore_invalid = true |
Whether to ignore invalid size factors when computing the mean size factor. Size factors of infinity and NaN or those with non-positive values may occur in datasets that have not been properly filtered to remove low-quality cells. If such values might be present, we can check for and ignore them during the mean calculations.
Note that this setting does not actually remove any of the invalid size factors. If these are present, users should call sanitize_size_factors()
after centering. The diagnostics
value in center_size_factors()
and center_size_factors_blocked()
can be used to determine whether such a call is necessary. (In general, sanitization should be performed after centering so that the replacement size factors do not interfere with the mean calculations.)
If users know that invalid size factors cannot be present, they can set this flag to false for greater efficiency.