Overview
This repository contains functions to model the per-gene expression from a gene-by-cell matrix of (log-transformed) expression values. Genes with high variance are considered to be more interesting and are prioritized for further analyses. The code itself was originally derived from the scran R package, factored out into a separate C++ library for easier re-use.
Quick start
Given a tatami::Matrix
of log-expression values for each gene in each cell, we can compute the per-gene variances and model the trend with respect to the mean across genes:
std::shared_ptr<tatami::Matrix<double, int> > mat = some_data_source();
res.means;
res.variances;
res.fitted;
res.residuals;
void model_gene_variances(const tatami::Matrix< Value_, Index_ > &mat, ModelGeneVariancesBuffers< Stat_ > buffers, const ModelGeneVariancesOptions &options)
Definition model_gene_variances.hpp:559
Variance modelling for single-cell expression data.
Options for model_gene_variances() and friends.
Definition model_gene_variances.hpp:24
Typically, the residuals are used for feature selection, as these account for non-trivial mean-variance trends in transformed count data.
res.residuals.size(),
res.residuals.data(),
copt
);
std::vector< Index_ > choose_highly_variable_genes_index(Index_ n, const Stat_ *statistic, const ChooseHighlyVariableGenesOptions &options)
Definition choose_highly_variable_genes.hpp:243
std::shared_ptr< Matrix< Value_, Index_ > > make_DelayedSubset(std::shared_ptr< const Matrix< Value_, Index_ > > matrix, SubsetStorage_ subset, bool by_row)
Options for choose_highly_variable_genes().
Definition choose_highly_variable_genes.hpp:19
size_t top
Definition choose_highly_variable_genes.hpp:30
Users can also fit a trend directly to their own statistics.
fit.fitted;
fit.residuals;
void fit_variance_trend(size_t n, const Float_ *mean, const Float_ *variance, Float_ *fitted, Float_ *residuals, FitVarianceTrendWorkspace< Float_ > &workspace, const FitVarianceTrendOptions &options)
Definition fit_variance_trend.hpp:120
Options for fit_variance_trend().
Definition fit_variance_trend.hpp:19
double span
Definition fit_variance_trend.hpp:43
double minimum_mean
Definition fit_variance_trend.hpp:25
Check out the reference documentation for more details.
Building projects
CMake with FetchContent
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
include(FetchContent)
FetchContent_Declare(
scran_variances
GIT_REPOSITORY https://github.com/libscran/scran_variances
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_variances)
Then you can link to scran_variances to make the headers available during compilation:
# For executables:
target_link_libraries(myexe libscran::scran_variances)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_variances)
CMake with find_package()
find_package(libscran_scran_variances CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_variances)
To install the library, use:
mkdir build && cd build
cmake .. -DSCRAN_VARIANCES_TESTS=OFF
cmake --build . --target install
By default, this will use FetchContent
to fetch all external dependencies. If you want to install them manually, use -DSCRAN_VARIANCES_FETCH_EXTERN=OFF
. See the tags in extern/CMakeLists.txt
to find compatible versions of each dependency.
Manual
If you're not using CMake, the simple approach is to just copy the files in include/
- either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
. This requires the external dependencies listed in extern/CMakeLists.txt
, which also need to be made available during compilation.