scran_pca
Principal component analysis for single-cell data
Loading...
Searching...
No Matches
Principal components analysis, duh

Unit tests Documentation Codecov

Overview

As the name suggests, this repository implements functions to perform a PCA on the gene-by-cell expression matrix, returning low-dimensional coordinates for each cell that can be used for efficient downstream analyses, e.g., clustering, visualization. The code itself was originally derived from the scran and batchelor R packages factored out into a separate C++ library for easier re-use.

Quick start

Given a tatami::Matrix, the scran_pca::simple_pca() function will compute the PCA to obtain a low-dimensional representation of the cells:

const tatami::Matrix<double, int>& mat = some_data_source();
// Take the top 20 PCs:
opt.rank = 20;
auto res = scran_pca::simple_pca(mat, opt);
res.components; // rows are PCs, columns are cells.
res.rotation; // rows are genes, columns correspond to PCs.
res.variance_explained; // one per PC, in decreasing order.
res.total_variance; // total variance in the dataset.
void simple_pca(const tatami::Matrix< Value_, Index_ > &mat, const SimplePcaOptions &options, SimplePcaResults< EigenMatrix_, EigenVector_ > &output)
Definition simple_pca.hpp:374
Principal component analysis on single-cell data.
Options for simple_pca().
Definition simple_pca.hpp:26

Advanced users can fiddle with more of the options:

opt.scale = true;
opt.num_threads = 4;
opt.realize_matrix = false;
auto res2 = scran_pca::simple_pca(mat, opt);
bool realize_matrix
Definition simple_pca.hpp:66
int num_threads
Definition simple_pca.hpp:60
bool scale
Definition simple_pca.hpp:48

In the presence of multiple blocks, we can perform the PCA on the residuals after regressing out the blocking factor. This ensures that the inter-block differences do not contribute to the first few PCs, instead favoring the representation of intra-block variation.

std::vector<int> blocks = some_blocks();
bopt.rank = 10; // taking the top 10 PCs this time.
auto bres = scran_pca::blocked_pca(mat, blocks.data(), bopt);
bres.components; // rows are PCs, columns are cells.
bres.center; // rows are blocks, columns are genes.
void blocked_pca(const tatami::Matrix< Value_, Index_ > &mat, const Block_ *block, const BlockedPcaOptions &options, BlockedPcaResults< EigenMatrix_, EigenVector_ > &output)
Definition blocked_pca.hpp:969
Options for blocked_pca().
Definition blocked_pca.hpp:28

The components derived from the residuals will only be free of inter-block differences under certain conditions (equal population composition with a consistent shift between blocks). If this is not the case, more sophisticated batch correction methods are required. If those methods accept a low-dimensional representation for the cells as input, we can use scran_pca::blocked_pca() to obtain an appropriate matrix that focuses on intra-block variation without making assumptions about the inter-block differences:

auto bres2 = scran_pca::blocked_pca(mat, blocks.data(), bopt);
bool components_from_residuals
Definition blocked_pca.hpp:74

Check out the reference documentation for more details.

Building projects

CMake with <tt>FetchContent</tt>

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
scran_pca
GIT_REPOSITORY https://github.com/libscran/scran_pca
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_pca)

Then you can link to scran_pca to make the headers available during compilation:

# For executables:
target_link_libraries(myexe libscran::scran_pca)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_pca)

CMake with <tt>find_package()</tt>

find_package(libscran_scran_pca CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_pca)

To install the library, use:

mkdir build && cd build
cmake .. -DSCRAN_PCA_TESTS=OFF
cmake --build . --target install

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DSCRAN_PCA_FETCH_EXTERN=OFF. See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.

Manual

If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This requires the external dependencies listed in extern/CMakeLists.txt, which also need to be made available during compilation.