|
template<typename Index_ , typename Distance_ > |
std::pair< Distance_, Distance_ > | compute_distance (const Index_ num_cells, Distance_ *const distances) |
|
template<typename Index_ , typename Input_ , typename Distance_ > |
std::pair< Distance_, Distance_ > | compute_distance (const knncolle::Prebuilt< Index_, Input_, Distance_ > &prebuilt, const Options &options) |
|
template<typename Index_ , typename Input_ , typename Distance_ , class Matrix_ = knncolle::Matrix<Index_, Input_>> |
std::pair< Distance_, Distance_ > | compute_distance (const std::size_t num_dim, const Index_ num_cells, const Input_ *const data, const knncolle::Builder< Index_, Input_, Distance_, Matrix_ > &builder, const Options &options) |
|
template<typename Distance_ > |
Distance_ | compute_scale (const std::pair< Distance_, Distance_ > &ref, const std::pair< Distance_, Distance_ > &target) |
|
template<typename Distance_ > |
std::vector< Distance_ > | compute_scale (const std::vector< std::pair< Distance_, Distance_ > > &distances) |
|
template<typename Index_ , typename Input_ , typename Scale_ , typename Output_ > |
void | combine_scaled_embeddings (const std::vector< std::size_t > &num_dims, const Index_ num_cells, const std::vector< Input_ * > &embeddings, const std::vector< Scale_ > &scaling, Output_ *const output) |
|
Scale multi-modal embeddings to adjust for differences in variance.
template<typename Distance_ >
Distance_ mumosa::compute_scale |
( |
const std::pair< Distance_, Distance_ > & | ref, |
|
|
const std::pair< Distance_, Distance_ > & | target ) |
Compute the scaling factor to be applied to an embedding of a "target" modality relative to a "reference" modality. The aim is to scale the target so that the within-population variance is equal to that of the reference, to ensure that high noise in one modality does not drown out interesting biology in another modality in downstream analyses.
Advanced users may want to scale the target so that its variance is some \(S\)-fold of the reference, e.g., to give more weight to more important modalities. This can be achieved by multiplying the returned factor by \(\sqrt{S}\) prior to the actual scaling.
This approach assumes that the median distance to the Options::num_neighbors
-th nearest neighbor is approximately proportional to the within-population variance. The scaling factor is defined as the ratio of the median distances in the reference to the target. If either of the median distances is zero, this function instead returns the ratio of the RMSDs as a fallback.
- Template Parameters
-
Distance_ | Floating-point type of the distances. |
- Parameters
-
ref | Results of compute_distance() for the embedding of the reference modality. The first value contains the median distance while the second value contains the root-mean squared distance (RMSD). |
target | Results of compute_distance() for the embedding of the target modality. |
- Returns
- A scaling factor to multiply the embedding coordinates of the target modality.
template<typename Index_ , typename Input_ , typename Scale_ , typename Output_ >
void mumosa::combine_scaled_embeddings |
( |
const std::vector< std::size_t > & | num_dims, |
|
|
const Index_ | num_cells, |
|
|
const std::vector< Input_ * > & | embeddings, |
|
|
const std::vector< Scale_ > & | scaling, |
|
|
Output_ *const | output ) |
Scale the embedding for each modality and combine all embeddings from different modalities into a single matrix for further analyses. Each cell in the combined matrix will contain a concatenation of the scaled coordinates from all of the individual embeddings.
- Template Parameters
-
Index_ | Integer type of the number of cells. |
Input_ | Floating-point type of the input data. |
Scale_ | Floating-point type of the scaling factor. |
Output_ | Floating-point type of the output data. |
- Parameters
-
| num_dims | Vector containing the number of dimensions in each embedding. |
| num_cells | Number of cells in each embedding. |
| embeddings | Vector of pointers of length equal to that of num_dims . Each pointer refers to an array containing an embedding matrix for a single modality, which should be in column-major format with dimensions in rows and cells in columns. The number of rows of the i -th matrix should be equal to num_dims[i] and the number of columns should be equal to num_cells . |
| scaling | Scaling to apply to each embedding, usually from compute_scale() . This should be of length equal to that of num_dims . |
[out] | output | Pointer to the output array. This should be of length equal to the product of num_cells and the sum of num_dims . On completion, output is filled with the combined embeddings in column-major format. Each row corresponds to a dimension while each column corresponds to a cell. |