|
template<typename Index_ , typename Float_ > |
std::pair< Float_, Float_ > | compute_distance (Index_ num_cells, Float_ *distances) |
|
template<typename Dim_ , typename Index_ , typename Float_ > |
std::pair< Float_, Float_ > | compute_distance (const knncolle::Prebuilt< Dim_, Index_, Float_ > &prebuilt, const Options &options) |
|
template<typename Dim_ , typename Index_ , typename Float_ > |
std::pair< Float_, Float_ > | compute_distance (Dim_ num_dim, Index_ num_cells, const Float_ *data, const knncolle::Builder< knncolle::SimpleMatrix< Dim_, Index_, Float_ >, Float_ > &builder, const Options &options) |
|
template<typename Float_ > |
Float_ | compute_scale (const std::pair< Float_, Float_ > &ref, const std::pair< Float_, Float_ > &target) |
|
template<typename Float_ > |
std::vector< Float_ > | compute_scale (const std::vector< std::pair< Float_, Float_ > > &distances) |
|
template<typename Dim_ , typename Index_ , typename Input_ , typename Scale_ , typename Output_ > |
void | combine_scaled_embeddings (const std::vector< Dim_ > &num_dims, Index_ num_cells, const std::vector< Input_ * > &embeddings, const std::vector< Scale_ > &scaling, Output_ *output) |
|
Scale multi-modal embeddings to adjust for differences in variance.
template<typename Float_ >
Float_ mumosa::compute_scale |
( |
const std::pair< Float_, Float_ > & |
ref, |
|
|
const std::pair< Float_, Float_ > & |
target |
|
) |
| |
Compute the scaling factor to be applied to an embedding of a "target" modality relative to a reference modality. This aims to scale the target so that the within-population variance is equal to that of the reference.
Advanced users may want to scale the target so that its variance is some \(S\)-fold of the reference, e.g., to give more weight to more important modalities. This can be achieved by multiplying the scaling factor by \(\sqrt{S}\).
- Template Parameters
-
Float_ | Floating-point type for the distances. |
- Parameters
-
ref | Output of compute_distance() for the embedding of the reference modality. The first value contains the median distance while the second value contains the root-mean squared distance (RMSD). |
target | Output of compute_distance() for the embedding of the target modality. |
- Returns
- A scaling factor to apply to the embedding of the target modality, defined as the ratio of the median distances. If either of the median distances is zero, this function instead returns the ratio of the RMSDs. If the reference RMSD is zero, this function will return zero; if the target RMSD is zero, this function will return positive infinity.
template<typename Dim_ , typename Index_ , typename Input_ , typename Scale_ , typename Output_ >
void mumosa::combine_scaled_embeddings |
( |
const std::vector< Dim_ > & |
num_dims, |
|
|
Index_ |
num_cells, |
|
|
const std::vector< Input_ * > & |
embeddings, |
|
|
const std::vector< Scale_ > & |
scaling, |
|
|
Output_ * |
output |
|
) |
| |
Combine multiple embeddings for different modalities into a single embedding matrix, possibly after scaling each embedding. This is done row-wise, i.e., the coordinates are concatenated across embeddings for each column.
- Template Parameters
-
Dim_ | Integer type for the number of dimensions. |
Index_ | Integer type for the number of cells. |
Input_ | Floating-point type for the input data. |
Scale_ | Floating-point type for the scaling factor. |
Output_ | Floating-point type for the output data. |
- Parameters
-
| num_dims | Vector containing the number of dimensions in each embedding. |
| num_cells | Number of cells in each embedding. |
| embeddings | Vector of pointers of length equal to that of num_dims . Each pointer refers to an array containing an embedding matrix for a single modality, which should be in column-major format with dimensions in rows and cells in columns. The number of rows of the i -th matrix should be equal to num_dims[i] and the number of columns should be equal to num_cells . |
| scaling | Scaling to apply to each embedding, usually from compute_scale() . This should be of length equal to that of num_dims . |
[out] | output | Pointer to the output array. This should be of length equal to the product of num_cells and the sum of num_dims . On completion, output is filled with the combined embeddings in column-major format. Each row corresponds to a dimension while each column corresponds to a cell. |