umappp
A C++ library for UMAP
Loading...
Searching...
No Matches
umappp Namespace Reference

Methods for UMAP. More...

Classes

struct  Options
 Options for initialize(). More...
 
class  Status
 Status of the UMAP optimization iterations. More...
 

Typedefs

template<typename Index_ , typename Float_ >
using NeighborList = knncolle::NeighborList<Index_, Float_>
 Lists of neighbors for each observation.
 
typedef std::mt19937_64 RngEngine
 

Enumerations

enum  InitializeMethod : char { SPECTRAL , RANDOM , NONE }
 

Functions

template<typename Index_ , typename Float_ >
Status< Index_, Float_ > initialize (NeighborList< Index_, Float_ > x, const std::size_t num_dim, Float_ *const embedding, Options options)
 
template<typename Index_ , typename Input_ , typename Float_ >
Status< Index_, Float_ > initialize (const knncolle::Prebuilt< Index_, Input_, Float_ > &prebuilt, const std::size_t num_dim, Float_ *const embedding, Options options)
 
template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
Status< Index_, Float_ > initialize (const std::size_t data_dim, const Index_ num_obs, const Float_ *const data, const knncolle::Builder< Index_, Float_, Float_, Matrix_ > &builder, const std::size_t num_dim, Float_ *const embedding, Options options)
 

Detailed Description

Methods for UMAP.

Typedef Documentation

◆ NeighborList

template<typename Index_ , typename Float_ >
using umappp::NeighborList = knncolle::NeighborList<Index_, Float_>

Lists of neighbors for each observation.

Template Parameters
Index_Integer type of the neighbor indices.
Float_Floating-point type for the distances.

This is a convenient alias for the knncolle::NeighborList class. Each inner vector corresponds to an observation and contains the list of nearest neighbors for that observation, sorted by increasing distance. Neighbors for each observation should be unique - there should be no more than one occurrence of each index in each inner vector. Also, the inner vector for observation i should not contain any Neighbor with index i.

◆ RngEngine

typedef std::mt19937_64 umappp::RngEngine

Class of the random number generator used in umappp.

Enumeration Type Documentation

◆ InitializeMethod

How should the initial coordinates of the embedding be obtained?

  • SPECTRAL: attempts initialization based on spectral decomposition of the graph Laplacian.
  • RANDOM: fills the embedding with random draws from a normal distribution.
  • NONE: uses the existing values in the supplied embedding array.

Function Documentation

◆ initialize() [1/3]

template<typename Index_ , typename Input_ , typename Float_ >
Status< Index_, Float_ > umappp::initialize ( const knncolle::Prebuilt< Index_, Input_, Float_ > & prebuilt,
const std::size_t num_dim,
Float_ *const embedding,
Options options )
Template Parameters
Index_Integer type of the observation indices.
Input_Floating-point type of the input data for the neighbor search. This is not used other than to define the knncolle::Prebuilt type.
Float_Floating-point type of the input data, neighbor distances and output embedding.
Parameters
prebuiltA neighbor search index built on the dataset of interest.
num_dimNumber of dimensions of the UMAP embedding.
[in,out]embeddingPointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (num_dim) and columns are observations (x.size()). Existing values in this array will be used as input if Options::initialize = InitializeMethod::NONE, and may be used as input if Options::initialize = InitializeMethod::SPECTRAL and Options::initialize_random_on_spectral_fail = false; otherwise it is only used as output. The lifetime of the array should be no shorter than the final call to Status::run().
optionsFurther options.
Returns
A Status object containing the initial state of the UMAP algorithm. Further calls to Status::run() will update the embeddings in embedding.

◆ initialize() [2/3]

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
Status< Index_, Float_ > umappp::initialize ( const std::size_t data_dim,
const Index_ num_obs,
const Float_ *const data,
const knncolle::Builder< Index_, Float_, Float_, Matrix_ > & builder,
const std::size_t num_dim,
Float_ *const embedding,
Options options )
Template Parameters
Index_Integer type of the observation indices.
Float_Floating-point type of the input data, neighbor distances and output embedding.
Matrix_Class of the input matrix for the neighbor search. This should be a knncolle::SimpleMatrix or its base class (i.e., knncolle::Matrix).
Parameters
data_dimNumber of dimensions of the input dataset.
num_obsNumber of observations in the input dataset.
[in]dataPointer to an array containing the input high-dimensional data as a column-major matrix. Each row corresponds to a dimension (data_dim) and each column corresponds to an observation (num_obs).
builderAlgorithm to use for the neighbor search.
num_dimNumber of dimensions of the embedding.
[in,out]embeddingPointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (num_dim) and columns are observations (x.size()). Existing values in this array will be used as input if Options::initialize = InitializeMethod::NONE, and may be used as input if Options::initialize = InitializeMethod::SPECTRAL and Options::initialize_random_on_spectral_fail = false; otherwise it is only used as output. The lifetime of the array should be no shorter than the final call to Status::run().
optionsFurther options.
Returns
A Status object containing the initial state of the UMAP algorithm. Further calls to Status::run() will update the embeddings in embedding.

◆ initialize() [3/3]

template<typename Index_ , typename Float_ >
Status< Index_, Float_ > umappp::initialize ( NeighborList< Index_, Float_ > x,
const std::size_t num_dim,
Float_ *const embedding,
Options options )
Template Parameters
Index_Integer type of the neighbor indices.
Float_Floating-point type for the distances.
Parameters
xIndices and distances to the nearest neighbors for each observation. Note the expectations in the NeighborList documentation.
num_dimNumber of dimensions of the embedding.
[in,out]embeddingPointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (num_dim) and columns are observations (x.size()). Existing values in this array will be used as input if Options::initialize = InitializeMethod::NONE, and may be used as input if Options::initialize = InitializeMethod::SPECTRAL and Options::initialize_random_on_spectral_fail = false; otherwise it is only used as output. The lifetime of the array should be no shorter than the final call to Status::run().
optionsFurther options. Note that Options::num_neighbors is ignored here.
Returns
A Status object containing the initial state of the UMAP algorithm. Further calls to Status::run() will update the embeddings in embedding.