|
scran_pca
Principal component analysis for single-cell data
|
As the name suggests, this repository implements functions to perform a PCA on the gene-by-cell expression matrix, returning low-dimensional coordinates for each cell that can be used for efficient downstream analyses, e.g., clustering, visualization. The code itself was originally derived from the scran and batchelor R packages, factored out into a separate C++ library for easier re-use.
Given a tatami::Matrix, the scran_pca::simple_pca() function will compute the PCA to obtain a low-dimensional representation of the cells:
Advanced users can fiddle with more of the options:
Check out the reference documentation for more details.
In the presence of multiple blocks, we can perform the PCA on the residuals after regressing out the blocking factor. This ensures that the inter-block differences do not contribute to the first few PCs, instead favoring the representation of intra-block variation.
The components derived from the residuals will only be free of inter-block differences under certain conditions (equal population composition with a consistent shift between blocks). If this is not the case, more sophisticated batch correction methods are required such as MNN correction. If those methods accept a low-dimensional representation for the cells as input, we can use scran_pca::blocked_pca() to obtain an appropriate matrix that focuses on intra-block variation without making assumptions about the inter-block differences:
If we have only a subset of features of interest, the obvious approach is to subset the input matrix like so:
This is fine for the PC scores but will only report the rotation matrix and centering/scaling vectors for the subset of features. If we want to, say, create a low-rank approximation of the entire input matrix, we should instead do:
This returns a rotation matrix that contains entries for all features, not just those in the subset of interest. We can then easily compute the low-rank approximation for any feature in our input matrix:
FetchContentIf you're using CMake, you just need to add something like this to your CMakeLists.txt:
Then you can link to scran_pca to make the headers available during compilation:
find_package()To install the library, use:
By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DSCRAN_PCA_FETCH_EXTERN=OFF. See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This requires the external dependencies listed in extern/CMakeLists.txt.