Methods for UMAP. More...

Classes
struct	Options
	Options for `initialize()`. More...

class	Status
	Status of the UMAP optimization iterations. More...

Typedefs
template<typename Index_ , typename Float_ >
using	NeighborList = knncolle::NeighborList<Index_, Float_>
	Lists of neighbors for each observation.

Enumerations
enum	InitializeMethod : char { SPECTRAL , SPECTRAL_ONLY , RANDOM , NONE }

Functions
template<typename Index_ , typename Float_ >
Status< Index_, Float_ >	initialize (NeighborList< Index_, Float_ > x, std::size_t num_dim, Float_ *embedding, Options options)

template<typename Index_ , typename Input_ , typename Float_ >
Status< Index_, Float_ >	initialize (const knncolle::Prebuilt< Index_, Input_, Float_ > &prebuilt, std::size_t num_dim, Float_ *embedding, Options options)

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
Status< Index_, Float_ >	initialize (std::size_t data_dim, std::size_t num_obs, const Float_ data, const knncolle::Builder< Index_, Float_, Float_, Matrix_ > &builder, std::size_t num_dim, Float_ embedding, Options options)

Detailed Description

Methods for UMAP.

Typedef Documentation

◆ NeighborList

template<typename Index_ , typename Float_ >

using umappp::NeighborList = knncolle::NeighborList<Index_, Float_>

Lists of neighbors for each observation.

Template Parameters

Index_	Integer type of the neighbor indices.
Float_	Floating-point type for the distances.

This is a convenient alias for the knncolle::NeighborList class. Each inner vector corresponds to an observation and contains the list of nearest neighbors for that observation, sorted by increasing distance. Neighbors for each observation should be unique - there should be no more than one occurrence of each index in each inner vector. Also, the inner vector for observation i should not contain any Neighbor with index i.

Enumeration Type Documentation

◆ InitializeMethod

enum umappp::InitializeMethod : char

How should the initial coordinates of the embedding be obtained?

SPECTRAL: attempts initialization based on spectral decomposition of the graph Laplacian. If that fails, we fall back to random draws from a normal distribution.
SPECTRAL_ONLY: attempts spectral initialization as before, but if that fails, we use the existing values in the supplied embedding array.
RANDOM: fills the embedding with random draws from a normal distribution.
NONE: uses the existing values in the supplied embedding array.

Function Documentation

◆ initialize() [1/3]

template<typename Index_ , typename Input_ , typename Float_ >

Status< Index_, Float_ > umappp::initialize	(	const knncolle::Prebuilt< Index_, Input_, Float_ > &	prebuilt,
		std::size_t	num_dim,
		Float_ *	embedding,
		Options	options )

Template Parameters

Index_	Integer type of the observation indices.
Input_	Floating-point type of the input data for the neighbor search. This is not used other than to define the `knncolle::Prebuilt` type.
Float_	Floating-point type of the input data, neighbor distances and output embedding.

Parameters

	prebuilt	A neighbor search index built on the dataset of interest.
	num_dim	Number of dimensions of the UMAP embedding.
[in,out]	embedding	Pointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (`num_dim`) and columns are observations (`x.size()`). Existing values in this array will be used as input if `Options::initialize = InitializeMethod::NONE`, and may be used as input if `Options::initialize = InitializeMethod::SPECTRAL_ONLY`; otherwise it is only used as output. The lifetime of the array should be no shorter than the final call to `Status::run()`.
	options	Further options.

Returns: A Status object containing the initial state of the UMAP algorithm. Further calls to Status::run() will update the embeddings in embedding.

◆ initialize() [2/3]

template<typename Index_ , typename Float_ >

Status< Index_, Float_ > umappp::initialize	(	NeighborList< Index_, Float_ >	x,
		std::size_t	num_dim,
		Float_ *	embedding,
		Options	options )

Template Parameters

Index_	Integer type of the neighbor indices.
Float_	Floating-point type for the distances.

Parameters

	x	Indices and distances to the nearest neighbors for each observation. Note the expectations in the `NeighborList` documentation.
	num_dim	Number of dimensions of the embedding.
[in,out]	embedding	Pointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (`num_dim`) and columns are observations (`x.size()`). Existing values in this array will be used as input if `Options::initialize = InitializeMethod::NONE`, and may be used as input if `Options::initialize = InitializeMethod::SPECTRAL_ONLY`; otherwise it is only used as output. The lifetime of the array should be no shorter than the final call to `Status::run()`.
	options	Further options. Note that `Options::num_neighbors` is ignored here.

Returns: A Status object containing the initial state of the UMAP algorithm. Further calls to Status::run() will update the embeddings in embedding.

◆ initialize() [3/3]

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>

Status< Index_, Float_ > umappp::initialize	(	std::size_t	data_dim,
		std::size_t	num_obs,
		const Float_ *	data,
		const knncolle::Builder< Index_, Float_, Float_, Matrix_ > &	builder,
		std::size_t	num_dim,
		Float_ *	embedding,
		Options	options )

Template Parameters

Index_	Integer type of the observation indices.
Float_	Floating-point type of the input data, neighbor distances and output embedding.
Matrix_	Class of the input matrix for the neighbor search. This should be a `knncolle::SimpleMatrix` or its base class (i.e., `knncolle::Matrix`).

Parameters

	data_dim	Number of dimensions of the input dataset.
	num_obs	Number of observations in the input dataset.
[in]	data	Pointer to an array containing the input high-dimensional data as a column-major matrix. Each row corresponds to a dimension (`data_dim`) and each column corresponds to an observation (`num_obs`).
	builder	Algorithm to use for the neighbor search.
	num_dim	Number of dimensions of the embedding.
[in,out]	embedding	Pointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (`num_dim`) and columns are observations (`x.size()`). Existing values in this array will be used as input if `Options::initialize = InitializeMethod::NONE`, and may be used as input if `Options::initialize = InitializeMethod::SPECTRAL_ONLY`; otherwise it is only used as output. The lifetime of the array should be no shorter than the final call to `Status::run()`.
	options	Further options.

Returns: A Status object containing the initial state of the UMAP algorithm. Further calls to Status::run() will update the embeddings in embedding.

Classes

Typedefs

Enumerations

Functions

Detailed Description

Typedef Documentation

◆ NeighborList

Enumeration Type Documentation

◆ InitializeMethod

Function Documentation

◆ initialize() [1/3]

◆ initialize() [2/3]

◆ initialize() [3/3]