scran_blocks
Blocking utilities for libscran
Loading...
Searching...
No Matches
Blocking utilities for libscran

Unit tests Documentation Codecov

Overview

This repository contains utilities for blocked analyses for the other libscran repositories. Any uninteresting factor of variation (usually across the cells) can be used as a blocking factor, e.g., experimental batches, sample/patient of origin. In the presence of blocks, our general strategy is to perform the analysis within each block before combining our conclusions across blocks. This ensures that our results are not affected by the uninteresting differences between blocks. See the reference documentation for more details.

Parallel means

The parallel_means() function will compute the element-wise mean of any number of equi-length arrays. This is typically used to average statistics across blocks, where each array contains the statistics for a single block over all genes.

std::vector<double> stat1 { 1.0, 2.0, 3.0, 4.0 };
std::vector<double> stat2 { 5.0, 6.0, 7.0, 8.0 };
// Contains { 3, 4, 5, 6 }.
stat1.size(),
{ stat1.data(), stat2.data() }
/* skip_nan = */ false
);
void parallel_means(const std::size_t n, std::vector< Stat_ * > in, Output_ *const out, const bool skip_nan)
Definition parallel_means.hpp:114
Blocking utilities for libscran.

If NaNs are present, they can be ignored:

#include <limits>
auto nan = std::numeric_limits<double>::quiet_NaN();
std::vector<double> stat1n { 1.0, nan, 3.0, nan };
std::vector<double> stat2n { 5.0, 6.0, nan, nan };
// Contains { 3, 6, 3, nan }.
stat1n.size(),
{ stat1n.data(), stat2n.data() }
/* skip_nan = */ true
);

We also support per-vector weights:

std::vector<double> stat1w { 1.0, 2.0, 3.0, 4.0 };
std::vector<double> stat2w { 5.0, 6.0, 7.0, 8.0 };
std::vector<double> weights { 1.0, 9.0 };
// Contains { 4.6, 5.6, 6.6, 7.6 }.
stat1w.size(),
{ stat1w.data(), stat2w.data() }
weights.data(),
/* skip_nan = */ false
);
void parallel_weighted_means(const std::size_t n, std::vector< Stat_ * > in, const Weight_ *const w, Output_ *const out, const bool skip_nan)
Definition parallel_means.hpp:159

Weighting blocks

When combining statistics across blocks, it may be desirable to weight each block by its size, favoring larger blocks that can emit more stable statistics. This is done using the compute_weights() function that calculates a weight for each block based on its size.

std::vector<size_t> block_sizes { 10, 100, 1000 };
block_sizes,
scran_blocks::WeightPolicy::VARIABLE,
[]{
VariableWeightParameters vparams;
vparams.upper_bound = 200;
return vparams;
}()
);
void compute_weights(const std::size_t num_blocks, const Size_ *const sizes, const WeightPolicy policy, const VariableWeightParameters &variable, Weight_ *const weights)
Definition block_weights.hpp:97

The above code chunk uses a variable block weight that increases linearly with block size from 0 to 200, after which it is capped at 1. This VARIABLE policy penalizes very small blocks to ensure that their unstable statistics do not overly influence the average. Blocks are equally weighted once they are "large enough", ensuring that the average is not dominated by a single very large block.

Users can also change the policy to SIZE, where weights are equal to the block size; or EQUAL, where all blocks are equally weighted regardless of size (assuming they are non-empty). In such cases, the variable argument is ignored. Check out the reference documentation for more details.

Parallel quantiles

Much like parallel_means(), we can compute the element-wise quantile for any number of equi-length arrays with parallel_quantiles(). This is again used to average statistics across blocks, where a quantile probability of 0.5 yields the element-wise median.

std::vector<double> stat1 { 1.0, 2.0, 3.0, 4.0 };
std::vector<double> stat2 { 5.0, 8.0, 7.0, 9.0 };
std::vector<double> stat3 { 2.0, 6.0, 4.0, 8.0 };
// Contains { 2, 6, 4, 8 }.
stat1.size(),
{ stat1.data(), stat2.data(), stat3.data() }
/* quantile = */ 0.5,
/* skip_nan = */ false
);
void parallel_quantiles(const std::size_t n, const std::vector< Stat_ * > &in, const double quantile, Output_ *const out, const bool skip_nan)
Definition parallel_quantiles.hpp:172

This function computes the type 7 quantile for each gene. Weighted quantiles are currently not supported, mostly because it's too hard to do correctly (at least, for a continuous quantile function).

Building projects

CMake with FetchContent

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
scran_blocks
GIT_REPOSITORY https://github.com/libscran/scran_blocks
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_blocks)

Then you can link to scran_blocks to make the headers available during compilation:

# For executables:
target_link_libraries(myexe libscran::scran_blocks)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_blocks)

CMake with find_package()

find_package(libscran_scran_blocks CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_blocks)

To install the library, use:

mkdir build && cd build
cmake .. -DSCRAN_BLOCKS_TESTS=OFF
cmake --build . --target install

Manual

If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I.