scran_pca
Principal component analysis for single-cell data
Loading...
Searching...
No Matches
scran_pca Namespace Reference

Principal component analysis on single-cell data. More...

Classes

struct  BlockedPcaOptions
 Options for blocked_pca(). More...
 
struct  BlockedPcaResults
 Results of blocked_pca(). More...
 
struct  SimplePcaOptions
 Options for simple_pca(). More...
 
struct  SimplePcaResults
 Results of simple_pca(). More...
 

Typedefs

typedef SimplePcaOptions SubsetPcaOptions
 
template<typename EigenMatrix_ , class EigenVector_ >
using SubsetPcaResults = SimplePcaResults<EigenMatrix_, EigenVector_>
 
typedef BlockedPcaOptions SubsetPcaBlockedOptions
 
template<typename EigenMatrix_ , class EigenVector_ >
using SubsetPcaBlockedResults = BlockedPcaResults<EigenMatrix_, EigenVector_>
 

Functions

template<typename Value_ , typename Index_ , typename EigenMatrix_ , class EigenVector_ >
void simple_pca (const tatami::Matrix< Value_, Index_ > &mat, const SimplePcaOptions &options, SimplePcaResults< EigenMatrix_, EigenVector_ > &output)
 
template<typename EigenMatrix_ = Eigen::MatrixXd, class EigenVector_ = Eigen::VectorXd, typename Value_ , typename Index_ >
SimplePcaResults< EigenMatrix_, EigenVector_ > simple_pca (const tatami::Matrix< Value_, Index_ > &mat, const SimplePcaOptions &options)
 
template<typename Value_ , typename Index_ , typename Block_ , typename EigenMatrix_ , class EigenVector_ >
void blocked_pca (const tatami::Matrix< Value_, Index_ > &mat, const Block_ *block, const BlockedPcaOptions &options, BlockedPcaResults< EigenMatrix_, EigenVector_ > &output)
 
template<typename EigenMatrix_ = Eigen::MatrixXd, class EigenVector_ = Eigen::VectorXd, typename Value_ , typename Index_ , typename Block_ >
BlockedPcaResults< EigenMatrix_, EigenVector_ > blocked_pca (const tatami::Matrix< Value_, Index_ > &mat, const Block_ *block, const BlockedPcaOptions &options)
 
template<typename Value_ , typename Index_ , typename SubsetVector_ , typename EigenMatrix_ , class EigenVector_ >
void subset_pca (const tatami::Matrix< Value_, Index_ > &mat, const SubsetVector_ &subset, const SubsetPcaOptions &options, SubsetPcaResults< EigenMatrix_, EigenVector_ > &output)
 
template<typename EigenMatrix_ = Eigen::MatrixXd, class EigenVector_ = Eigen::VectorXd, typename Value_ , typename Index_ , class SubsetVector_ >
SubsetPcaResults< EigenMatrix_, EigenVector_ > subset_pca (const tatami::Matrix< Value_, Index_ > &mat, const SubsetVector_ &subset, const SubsetPcaOptions &options)
 
template<typename Value_ , typename Index_ , class SubsetVector_ , typename Block_ , typename EigenMatrix_ , class EigenVector_ >
void subset_pca_blocked (const tatami::Matrix< Value_, Index_ > &mat, const SubsetVector_ &subset, const Block_ *block, const SubsetPcaBlockedOptions &options, SubsetPcaBlockedResults< EigenMatrix_, EigenVector_ > &output)
 
template<typename EigenMatrix_ = Eigen::MatrixXd, class EigenVector_ = Eigen::VectorXd, typename Value_ , typename Index_ , class SubsetVector_ , typename Block_ >
SubsetPcaBlockedResults< EigenMatrix_, EigenVector_ > subset_pca_blocked (const tatami::Matrix< Value_, Index_ > &mat, const SubsetVector_ &subset, const Block_ *block, const SubsetPcaBlockedOptions &options)
 

Detailed Description

Principal component analysis on single-cell data.

Typedef Documentation

◆ SubsetPcaBlockedOptions

Options for subset_pca_blocked(). These are identical to the options for blocked_pca().

◆ SubsetPcaBlockedResults

template<typename EigenMatrix_ , class EigenVector_ >
using scran_pca::SubsetPcaBlockedResults = BlockedPcaResults<EigenMatrix_, EigenVector_>

Results of subset_pca_blocked().

These are mostly the same as the results for blocked_pca(). The only difference is that the number of PCs is the smaller of BlockedPcaOptions::number and min(subset.size(), NC) - 1, where subset is the subset vector and NC is the number of columns of the input matrix.

Template Parameters
EigenMatrix_A floating-point column-major Eigen::Matrix class.
EigenVector_A floating-point Eigen::Vector class.

◆ SubsetPcaOptions

Options for subset_pca(). These are identical to those for simple_pca().

◆ SubsetPcaResults

template<typename EigenMatrix_ , class EigenVector_ >
using scran_pca::SubsetPcaResults = SimplePcaResults<EigenMatrix_, EigenVector_>

Results of subset_pca().

These are mostly the same as the results for simple_pca(). The only difference is that the number of PCs is the smaller of SimplePcaOptions::number and min(subset.size(), NC) - 1, where subset is the subset vector and NC is the number of columns of the input matrix.

Template Parameters
EigenMatrix_A floating-point column-major Eigen::Matrix class.
EigenVector_A floating-point Eigen::Vector class.

Function Documentation

◆ blocked_pca() [1/2]

template<typename EigenMatrix_ = Eigen::MatrixXd, class EigenVector_ = Eigen::VectorXd, typename Value_ , typename Index_ , typename Block_ >
BlockedPcaResults< EigenMatrix_, EigenVector_ > scran_pca::blocked_pca ( const tatami::Matrix< Value_, Index_ > & mat,
const Block_ * block,
const BlockedPcaOptions & options )

Overload of blocked_pca() that allocates memory for the output.

Template Parameters
EigenMatrix_A floating-point column-major Eigen::Matrix class.
EigenVector_A floating-point Eigen::Vector class.
Value_Type of the matrix data.
Index_Integer type for the indices.
Block_Integer type for the blocking factor.
Parameters
[in]matInput matrix. Columns should contain cells while rows should contain genes. Matrix entries are typically log-expression values.
[in]blockPointer to an array of length equal to the number of cells, containing the block assignment for each cell. Each assignment should be an integer in \([0, N)\) where \(N\) is the number of blocks.
optionsFurther options.
Returns
Results of the PCA on the residuals.

◆ blocked_pca() [2/2]

template<typename Value_ , typename Index_ , typename Block_ , typename EigenMatrix_ , class EigenVector_ >
void scran_pca::blocked_pca ( const tatami::Matrix< Value_, Index_ > & mat,
const Block_ * block,
const BlockedPcaOptions & options,
BlockedPcaResults< EigenMatrix_, EigenVector_ > & output )

Principal components analysis on residuals, after regressing out a blocking factor across cells.

As discussed in simple_pca(), we extract the top PCs from a single-cell dataset for downstream cell-based procedures like clustering. In the presence of a blocking factor (e.g., batches, samples), we want to ensure that the PCA is not driven by uninteresting differences between blocks of cells. To achieve this, blocked_pca() centers the expression of each gene in each blocking level and uses the residuals for PCA. This means that the gene-gene covariance matrix will only contain variation within each batch, ensuring that the top rotation vectors/principal components capture biological heterogeneity instead of inter-block differences.

The BlockedPcaOptions::components_from_residuals option determines exactly how the PC scores are calculated:

  • If true (the default), the PC scores are computed from the matrix of residuals. This yields a low-dimensional space where inter-block differences have been removed, assuming that all blocks have the same subpopulation composition and the inter-block differences are consistent for all cell subpopulations. Under these assumptions, we could use these components for downstream analysis without any concern for block-wise effects.
  • If false, the rotation vectors are first computed from the matrix of residuals. To obtain PC scores, each cell is then projected onto the associated subspace using its original expression values. This approach ensures that inter-block differences do not contribute to the PCA but does not attempt to explicitly remove them.

In complex datasets, the assumptions mentioned above for true do not hold, and more sophisticated batch correction methods like MNN correction are required. Some of these methods accept a low-dimensional embedding of cells that can be created as described above with BlockedPcaOptions::components_from_residuals = false.

blocked_pca() will weight the contribution from blocks of cells so that each block contributes more or less equally to the PCA. This ensures that the definition of the axes of maximum variance are not dominated by the largest block, potentially masking interesting variation in the smaller blocks. blocked_pca() scales the expression values for each block so that each "sufficiently large" block contributes equally to the gene-gene covariance matrix and thus the rotation vectors. (See BlockedPcaOptions::block_weight_policy for the choice of weighting scheme.) The vector of residuals for each cell - or the original expression values, if BlockedPcaOptions::components_from_residuals = false - is then projected to the subspace defined by these rotation vectors to obtain that cell's PC scores.

Internally, blocked_pca() defers the residual calculation until the matrix multiplication steps within IRLBA. This yields the same results as the naive calculation of residuals but is much faster as it can take advantage of efficient sparse operations.

Template Parameters
Value_Type of the matrix data.
Index_Integer type for the indices.
Block_Integer type for the blocking factor.
EigenMatrix_A floating-point column-major Eigen::Matrix class.
EigenVector_A floating-point Eigen::Vector class.
Parameters
[in]matInput matrix. Columns should contain cells while rows should contain genes. Matrix entries are typically log-expression values.
[in]blockPointer to an array of length equal to the number of cells, containing the block assignment for each cell. Each assignment should be an integer in \([0, N)\) where \(N\) is the number of blocks.
optionsFurther options.
[out]outputOn output, the results of the PCA on the residuals. This can be re-used across multiple calls to blocked_pca().

◆ simple_pca() [1/2]

template<typename EigenMatrix_ = Eigen::MatrixXd, class EigenVector_ = Eigen::VectorXd, typename Value_ , typename Index_ >
SimplePcaResults< EigenMatrix_, EigenVector_ > scran_pca::simple_pca ( const tatami::Matrix< Value_, Index_ > & mat,
const SimplePcaOptions & options )

Overload of simple_pca() that allocates memory for the output.

Template Parameters
EigenMatrix_A floating-point column-major Eigen::Matrix class.
EigenVector_A floating-point Eigen::Vector class.
Value_Type of the matrix data.
Index_Integer type for the indices.
Parameters
[in]matThe input matrix. Columns should contain cells while rows should contain genes. Matrix entries are typically log-expression values.
optionsFurther options.
Returns
Results of the PCA.

◆ simple_pca() [2/2]

template<typename Value_ , typename Index_ , typename EigenMatrix_ , class EigenVector_ >
void scran_pca::simple_pca ( const tatami::Matrix< Value_, Index_ > & mat,
const SimplePcaOptions & options,
SimplePcaResults< EigenMatrix_, EigenVector_ > & output )

Principal components analysis (PCA) for compression and denoising of single-cell expression data.

We assume that most variation in the dataset is driven by biological differences between subpopulations that drive coordinated changes across multiple genes in the same pathways. In contrast, technical noise is random and not synchronized across any one axis in the high-dimensional space. This suggests that the earlier principal components (PCs) should be enriched for biological heterogeneity while the later PCs capture random noise.

Our aim is to reduce the size of the data and eliminate noise by only using the earlier PCs for downstream cell-based analyses (e.g., neighbor detection, clustering). Most practitioners will keep the first 10-50 PCs, though the exact choice is fairly arbitrary - see SimplePcaOptions::number to specify the number of PCs. As we are only interested in the top PCs, we can use approximate algorithms for faster computation, in particular IRLBA.

Template Parameters
Value_Type of the matrix data.
Index_Integer type for the indices.
EigenMatrix_A floating-point column-major Eigen::Matrix class.
EigenVector_A floating-point Eigen::Vector class.
Parameters
[in]matThe input matrix. Columns should contain cells while rows should contain genes. Matrix entries are typically log-expression values.
optionsFurther options.
[out]outputOn output, the results of the PCA on mat. This can be re-used across multiple calls to simple_pca().

◆ subset_pca() [1/2]

template<typename EigenMatrix_ = Eigen::MatrixXd, class EigenVector_ = Eigen::VectorXd, typename Value_ , typename Index_ , class SubsetVector_ >
SubsetPcaResults< EigenMatrix_, EigenVector_ > scran_pca::subset_pca ( const tatami::Matrix< Value_, Index_ > & mat,
const SubsetVector_ & subset,
const SubsetPcaOptions & options )

Overload of subset_pca() that allocates memory for the output.

Template Parameters
EigenMatrix_A floating-point column-major Eigen::Matrix class.
EigenVector_A floating-point Eigen::Vector class.
Value_Type of the matrix data.
Index_Integer type for the indices.
SubsetVector_Container of the row indices. Should support [], size() and copy construction.
Parameters
[in]matThe input matrix. Columns should contain cells while rows should contain genes. Matrix entries are typically log-expression values.
subsetVector of indices for rows to be used in the PCA. This should be sorted and unique.
optionsFurther options.
Returns
Results of the subsetted PCA.

◆ subset_pca() [2/2]

template<typename Value_ , typename Index_ , typename SubsetVector_ , typename EigenMatrix_ , class EigenVector_ >
void scran_pca::subset_pca ( const tatami::Matrix< Value_, Index_ > & mat,
const SubsetVector_ & subset,
const SubsetPcaOptions & options,
SubsetPcaResults< EigenMatrix_, EigenVector_ > & output )

Principal components analysis on a subset of features.

This function performs PCA on a subset of features (e.g., from highly variable genes) in the input matrix. The results are almost equivalent to subsetting the input matrix and then running simple_pca(). However, subset_pca() will also populate the rotation matrix, centering vector and scaling vector for features outside of the subset. For the rotation matrix, this is done by projecting the unused features into the low-dimensional space defined by the PCs. The goal is to allow callers to create a low-rank approximation of the entire input matrix, even if only a subset of the features are relevant to the PCA.

Template Parameters
Value_Type of the matrix data.
Index_Integer type for the indices.
SubsetVector_Container of the row indices. Should support [], size() and copy construction.
EigenMatrix_A floating-point column-major Eigen::Matrix class.
EigenVector_A floating-point Eigen::Vector class.
Parameters
[in]matThe input matrix. Columns should contain cells while rows should contain genes. Matrix entries are typically log-expression values.
subsetVector of indices for rows to be used in the PCA. This should be sorted and unique.
optionsFurther options.
[out]outputOn output, the results of the PCA on mat. This can be re-used across multiple calls to subset_pca().

◆ subset_pca_blocked() [1/2]

template<typename EigenMatrix_ = Eigen::MatrixXd, class EigenVector_ = Eigen::VectorXd, typename Value_ , typename Index_ , class SubsetVector_ , typename Block_ >
SubsetPcaBlockedResults< EigenMatrix_, EigenVector_ > scran_pca::subset_pca_blocked ( const tatami::Matrix< Value_, Index_ > & mat,
const SubsetVector_ & subset,
const Block_ * block,
const SubsetPcaBlockedOptions & options )

Overload of subset_pca_blocked() that allocates memory for the output.

Template Parameters
EigenMatrix_A floating-point column-major Eigen::Matrix class.
EigenVector_A floating-point Eigen::Vector class.
Value_Type of the matrix data.
Index_Integer type for the indices.
SubsetVector_Container of the row indices. Should support [], size() and copy construction.
Block_Integer type for the blocking factor.
Parameters
[in]matInput matrix. Columns should contain cells while rows should contain genes. Matrix entries are typically log-expression values.
subsetVector of indices for rows to be used in the PCA. This should be sorted and unique.
[in]blockPointer to an array of length equal to the number of cells, containing the block assignment for each cell. Each assignment should be an integer in \([0, N)\) where \(N\) is the number of blocks.
optionsFurther options.
Returns
Results of the blocked and subsetted PCA.

◆ subset_pca_blocked() [2/2]

template<typename Value_ , typename Index_ , class SubsetVector_ , typename Block_ , typename EigenMatrix_ , class EigenVector_ >
void scran_pca::subset_pca_blocked ( const tatami::Matrix< Value_, Index_ > & mat,
const SubsetVector_ & subset,
const Block_ * block,
const SubsetPcaBlockedOptions & options,
SubsetPcaBlockedResults< EigenMatrix_, EigenVector_ > & output )

Principal components analysis on a subset of features in the input matrix, with blocking.

This function performs PCA on a subset of interesting features (e.g., from highly variable genes) while accounting for a blocking factor. The results are almost equivalent to subsetting the input matrix and then running blocked_pca(). However, subset_pca_blocked() will also populate the rotation matrix, centering matrix and scaling vector for features outside of the subset. For the rotation matrix, this is done by projecting the unused features into the low-dimensional space defined by the top PCs. The goal is to allow callers to create a low-rank approximation of the entire input matrix, even if only a subset of the features are relevant to the PCA.

Template Parameters
Value_Type of the matrix data.
Index_Integer type for the indices.
SubsetVector_Container of the row indices. Should support [], size() and copy construction.
EigenMatrix_A floating-point column-major Eigen::Matrix class.
EigenVector_A floating-point Eigen::Vector class.
Parameters
[in]matThe input matrix. Columns should contain cells while rows should contain genes. Matrix entries are typically log-expression values.
subsetVector of indices for rows to be used in the PCA. This should be sorted and unique.
[in]blockPointer to an array of length equal to the number of cells, containing the block assignment for each cell. Each assignment should be an integer in \([0, N)\) where \(N\) is the number of blocks.
optionsFurther options.
[out]outputOn output, the results of the PCA on mat. This can be re-used across multiple calls to subset_pca_blocked().