scran_variances
Model per-gene variance in expression
Loading...
Searching...
No Matches
Model per-gene variance in expression

Unit tests Documentation Codecov

Overview

This repository contains functions to model the per-gene expression from a gene-by-cell matrix of (log-transformed) expression values. Genes with high variance are considered to be more interesting and are prioritized for further analyses. The code itself was originally derived from the scran R package, factored out into a separate C++ library for easier re-use.

Quick start

Given a tatami::Matrix of log-expression values for each gene in each cell, we can compute the per-gene variances and model the trend with respect to the mean across genes:

std::shared_ptr<tatami::Matrix<double, int> > mat = some_data_source();
res.means; // vector of means across genes.
res.variances; // vector of variances across genes.
res.fitted; // vector of fitted values of the mean-variance trend for each gene.
res.residuals; // vector of residuals from the trend.
void model_gene_variances(const tatami::Matrix< Value_, Index_ > &mat, ModelGeneVariancesBuffers< Stat_ > buffers, const ModelGeneVariancesOptions &options)
Definition model_gene_variances.hpp:542
Variance modelling for single-cell expression data.
Options for model_gene_variances() and friends.
Definition model_gene_variances.hpp:24

Typically, the residuals are used for feature selection, as these account for non-trivial mean-variance trends in transformed count data.

copt.top = 5000;
res.residuals.size(),
res.residuals.data(),
copt
);
// Create the HVG submatrix for downstream analysis.
auto hvg_subset = tatami::make_DelayedSubset(mat, chosen, /* by_row = */ true);
std::vector< Index_ > choose_highly_variable_genes_index(Index_ n, const Stat_ *statistic, const ChooseHighlyVariableGenesOptions &options)
Definition choose_highly_variable_genes.hpp:247
std::shared_ptr< Matrix< Value_, Index_ > > make_DelayedSubset(std::shared_ptr< const Matrix< Value_, Index_ > > matrix, SubsetStorage_ subset, bool by_row)
Options for choose_highly_variable_genes().
Definition choose_highly_variable_genes.hpp:19
size_t top
Definition choose_highly_variable_genes.hpp:25

Users can also fit a trend directly to their own statistics.

fopt.span = 0.5;
fopt.minimum_mean = 1;
auto fit = scran_variances::fit_variance_trend(100, means, variances, fopt);
fit.fitted; // fitted values for all genes.
fit.residuals; // residuals values for all genes.
void fit_variance_trend(size_t n, const Float_ *mean, const Float_ *variance, Float_ *fitted, Float_ *residuals, FitVarianceTrendWorkspace< Float_ > &workspace, const FitVarianceTrendOptions &options)
Definition fit_variance_trend.hpp:120
Options for fit_variance_trend().
Definition fit_variance_trend.hpp:19
double span
Definition fit_variance_trend.hpp:43
double minimum_mean
Definition fit_variance_trend.hpp:25

Check out the reference documentation for more details.

Building projects

CMake with FetchContent

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
scran_variances
GIT_REPOSITORY https://github.com/libscran/scran_variances
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_variances)

Then you can link to scran_variances to make the headers available during compilation:

# For executables:
target_link_libraries(myexe libscran::scran_variances)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_variances)

CMake with find_package()

find_package(libscran_scran_variances CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_variances)

To install the library, use:

mkdir build && cd build
cmake .. -DSCRAN_VARIANCES_TESTS=OFF
cmake --build . --target install

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DSCRAN_VARIANCES_FETCH_EXTERN=OFF. See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.

Manual

If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This requires the external dependencies listed in extern/CMakeLists.txt, which also need to be made available during compilation.