Scale multi-modal embeddings to adjust for differences in variance. More...

Classes
struct	Options
	Options for `compute_distance()`. More...

Functions
template<typename Index_ , typename Distance_ >
std::pair< Distance_, Distance_ >	compute_distance (Index_ num_cells, Distance_ *distances)

template<typename Index_ , typename Input_ , typename Distance_ >
std::pair< Distance_, Distance_ >	compute_distance (const knncolle::Prebuilt< Index_, Input_, Distance_ > &prebuilt, const Options &options)

template<typename Index_ , typename Input_ , typename Distance_ , class Matrix_ = knncolle::Matrix<Index_, Input_>>
std::pair< Distance_, Distance_ >	compute_distance (std::size_t num_dim, Index_ num_cells, const Input_ *data, const knncolle::Builder< Index_, Input_, Distance_, Matrix_ > &builder, const Options &options)

template<typename Distance_ >
Distance_	compute_scale (const std::pair< Distance_, Distance_ > &ref, const std::pair< Distance_, Distance_ > &target)

template<typename Distance_ >
std::vector< Distance_ >	compute_scale (const std::vector< std::pair< Distance_, Distance_ > > &distances)

template<typename Index_ , typename Input_ , typename Scale_ , typename Output_ >
void	combine_scaled_embeddings (const std::vector< std::size_t > &num_dims, Index_ num_cells, const std::vector< Input_ * > &embeddings, const std::vector< Scale_ > &scaling, Output_ *output)

Detailed Description

Scale multi-modal embeddings to adjust for differences in variance.

Function Documentation

◆ compute_distance() [1/3]

template<typename Index_ , typename Distance_ >

std::pair< Distance_, Distance_ > mumosa::compute_distance	(	Index_	num_cells,
		Distance_ *	distances )

Template Parameters

Index_	Integer type for the number of cells.
Distance_	Floating-point type for the distances.

Parameters

	num_cells	Number of cells.
[in,out]	distances	Pointer to an array containing the distances from each cell to its \(k\)-nearest neighbor. It is expected that the same \(k\) was used for each cell. On output, the order of values may be arbitrarily altered during the median calculation; if this is undesirable, users should pass in a copy of the array.

Returns: Pair containing the median distance to the nearest neighbor (first) and the root-mean-squared distance across all cells (second). These values can be used in compute_scale().

◆ compute_distance() [2/3]

template<typename Index_ , typename Input_ , typename Distance_ >

std::pair< Distance_, Distance_ > mumosa::compute_distance	(	const knncolle::Prebuilt< Index_, Input_, Distance_ > &	prebuilt,
		const Options &	options )

Template Parameters

Index_	Integer type for the number of cells.
Input_	Numeric type for the input data used to build the search index. This is only required to define the `knncolle::Prebuilt` class and is otherwise ignored.
Distance_	Floating-point type for the distances.

Parameters

prebuilt	A prebuilt neighbor search index for a modality-specifi embedding.
options	Further options.

Returns: Pair containing the median distance to the Options::num_neighbors-th nearest neighbor (first) and the root-mean-squared distance across all cells (second). These values can be used in compute_scale().

◆ compute_distance() [3/3]

template<typename Index_ , typename Input_ , typename Distance_ , class Matrix_ = knncolle::Matrix<Index_, Input_>>

std::pair< Distance_, Distance_ > mumosa::compute_distance	(	std::size_t	num_dim,
		Index_	num_cells,
		const Input_ *	data,
		const knncolle::Builder< Index_, Input_, Distance_, Matrix_ > &	builder,
		const Options &	options )

Template Parameters

Index_	Integer type for the number of cells.
Input_	Numeric type for the input data.
Distance_	Floating-point type for the distances.
Matrix_	Class of the input data matrix for the neighbor search. This should satisfy the `knncolle::Matrix` interface.

Parameters

	num_dim	Number of dimensions in the embedding.
	num_cells	Number of cells in the embedding.
[in]	data	Pointer to an array containing the embedding matrix for a modality. This should be stored in column-major layout where each row is a dimension and each column is a cell.
	builder	Algorithm to use for the neighbor search.
	options	Further options.

Returns: Pair containing the median distance to the Options::num_neighbors-th nearest neighbor (first) and the root-mean-squared distance across all cells (second). These values can be used in compute_scale().

◆ compute_scale() [1/2]

template<typename Distance_ >

Distance_ mumosa::compute_scale	(	const std::pair< Distance_, Distance_ > &	ref,
		const std::pair< Distance_, Distance_ > &	target )

Compute the scaling factor to be applied to an embedding of a "target" modality relative to a reference modality. This aims to scale the target so that the within-population variance is equal to that of the reference.

Advanced users may want to scale the target so that its variance is some \(S\)-fold of the reference, e.g., to give more weight to more important modalities. This can be achieved by multiplying the scaling factor by \(\sqrt{S}\).

Template Parameters

Distance_ Floating-point type for the distances.

Parameters

ref	Output of `compute_distance()` for the embedding of the reference modality. The first value contains the median distance while the second value contains the root-mean squared distance (RMSD).
target	Output of `compute_distance()` for the embedding of the target modality.

Returns: A scaling factor to apply to the embedding of the target modality, defined as the ratio of the median distances. If either of the median distances is zero, this function instead returns the ratio of the RMSDs. If the reference RMSD is zero, this function will return zero; if the target RMSD is zero, this function will return positive infinity.

◆ compute_scale() [2/2]

template<typename Distance_ >

std::vector< Distance_ > mumosa::compute_scale ( const std::vector< std::pair< Distance_, Distance_ > > & distances )

Compute the scaling factors for a group of embeddings, given the neighbor distances computed by compute_distance(). This aims to scale each embedding so that the within-population variances are equal across embeddings. The "reference" modality is defined as the first embedding with a non-zero RMSD; other than this requirement, the exact choice of reference has no actual impact on the relative values of the scaling factors.

Template Parameters

Distance_ Floating-point type for the distances.

Parameters

distances Vector of distances for embeddings, as computed by compute_distance() on each embedding.

Returns: Vector of scaling factors of length equal to that of distances, to be applied to each embedding. This is equivalent to running compute_scale() on each entry of distances against the chosen reference.

◆ combine_scaled_embeddings()

template<typename Index_ , typename Input_ , typename Scale_ , typename Output_ >

void mumosa::combine_scaled_embeddings	(	const std::vector< std::size_t > &	num_dims,
		Index_	num_cells,
		const std::vector< Input_ * > &	embeddings,
		const std::vector< Scale_ > &	scaling,
		Output_ *	output )

Combine multiple embeddings for different modalities into a single embedding matrix, possibly after scaling each embedding. This is done row-wise, i.e., the coordinates are concatenated across embeddings for each column.

Template Parameters

Index_	Integer type for the number of cells.
Input_	Floating-point type for the input data.
Scale_	Floating-point type for the scaling factor.
Output_	Floating-point type for the output data.

Parameters

	num_dims	Vector containing the number of dimensions in each embedding.
	num_cells	Number of cells in each embedding.
	embeddings	Vector of pointers of length equal to that of `num_dims`. Each pointer refers to an array containing an embedding matrix for a single modality, which should be in column-major format with dimensions in rows and cells in columns. The number of rows of the `i`-th matrix should be equal to `num_dims[i]` and the number of columns should be equal to `num_cells`.
	scaling	Scaling to apply to each embedding, usually from `compute_scale()`. This should be of length equal to that of `num_dims`.
[out]	output	Pointer to the output array. This should be of length equal to the product of `num_cells` and the sum of `num_dims`. On completion, `output` is filled with the combined embeddings in column-major format. Each row corresponds to a dimension while each column corresponds to a cell.

Classes

Functions

Detailed Description

Function Documentation

◆ compute_distance() [1/3]

◆ compute_distance() [2/3]

◆ compute_distance() [3/3]

◆ compute_scale() [1/2]

◆ compute_scale() [2/2]

◆ combine_scaled_embeddings()