scran_pca
Principal component analysis for single-cell data
|
As the name suggests, this repository implements functions to perform a PCA on the gene-by-cell expression matrix, returning low-dimensional coordinates for each cell that can be used for efficient downstream analyses, e.g., clustering, visualization. The code itself was originally derived from the scran and batchelor R packages factored out into a separate C++ library for easier re-use.
Given a tatami::Matrix
, the scran_pca::simple_pca()
function will compute the PCA to obtain a low-dimensional representation of the cells:
Advanced users can fiddle with more of the options:
In the presence of multiple blocks, we can perform the PCA on the residuals after regressing out the blocking factor. This ensures that the inter-block differences do not contribute to the first few PCs, instead favoring the representation of intra-block variation.
The components derived from the residuals will only be free of inter-block differences under certain conditions (equal population composition with a consistent shift between blocks). If this is not the case, more sophisticated batch correction methods are required. If those methods accept a low-dimensional representation for the cells as input, we can use scran_pca::blocked_pca()
to obtain an appropriate matrix that focuses on intra-block variation without making assumptions about the inter-block differences:
Check out the reference documentation for more details.
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
Then you can link to scran_pca to make the headers available during compilation:
To install the library, use:
By default, this will use FetchContent
to fetch all external dependencies. If you want to install them manually, use -DSCRAN_PCA_FETCH_EXTERN=OFF
. See the tags in extern/CMakeLists.txt
to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files in include/
- either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
. This requires the external dependencies listed in extern/CMakeLists.txt
, which also need to be made available during compilation.