umappp
A C++ library for UMAP
Loading...
Searching...
No Matches
umappp Namespace Reference

Functions for creating UMAP embeddings. More...

Classes

struct  Options
 Options for initialize(). More...
 
class  Status
 Status of the UMAP optimization iterations. More...
 

Typedefs

template<typename Index_ , typename Float_ >
using NeighborList = knncolle::NeighborList<Index_, Float_>
 Lists of neighbors for each observation.
 
typedef std::mt19937_64 RngEngine
 

Enumerations

enum  InitializeMethod : char { SPECTRAL , RANDOM , NONE }
 

Functions

template<typename Index_ , typename Float_ >
Status< Index_, Float_ > initialize (NeighborList< Index_, Float_ > x, const std::size_t num_dim, Float_ *const embedding, Options options)
 
template<typename Index_ , typename Input_ , typename Float_ >
Status< Index_, Float_ > initialize (const knncolle::Prebuilt< Index_, Input_, Float_ > &prebuilt, const std::size_t num_dim, Float_ *const embedding, Options options)
 
template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
Status< Index_, Float_ > initialize (const std::size_t data_dim, const Index_ num_obs, const Float_ *const data, const knncolle::Builder< Index_, Float_, Float_, Matrix_ > &builder, const std::size_t num_dim, Float_ *const embedding, Options options)
 
template<typename Task_ , class Run_ >
void parallelize (const int num_workers, const Task_ num_tasks, Run_ run_task_range)
 

Detailed Description

Functions for creating UMAP embeddings.

Typedef Documentation

◆ NeighborList

template<typename Index_ , typename Float_ >
using umappp::NeighborList = knncolle::NeighborList<Index_, Float_>

Lists of neighbors for each observation.

Template Parameters
Index_Integer type of the neighbor indices.
Float_Floating-point type of the distances.

This is a convenient alias for the knncolle::NeighborList class. Each inner vector corresponds to an observation and contains the list of nearest neighbors for that observation, sorted by increasing distance. Neighbors for each observation should be unique - there should be no more than one occurrence of each index in each inner vector. Also, the inner vector for observation i should not contain any Neighbor with index i.

◆ RngEngine

typedef std::mt19937_64 umappp::RngEngine

Class of the random number generator used in umappp.

Enumeration Type Documentation

◆ InitializeMethod

How should the initial coordinates of the embedding be obtained?

  • SPECTRAL: spectral decomposition of the normalized graph Laplacian. Specifically, the initial coordinates are defined from the eigenvectors corresponding to the smallest non-zero eigenvalues. This fails in the presence of multiple graph components or if the approximate SVD (via irlba::compute()) fails to converge.
  • RANDOM: fills the embedding with random draws from a normal distribution.
  • NONE: uses existing values in the supplied embedding array.

Function Documentation

◆ initialize() [1/3]

template<typename Index_ , typename Input_ , typename Float_ >
Status< Index_, Float_ > umappp::initialize ( const knncolle::Prebuilt< Index_, Input_, Float_ > & prebuilt,
const std::size_t num_dim,
Float_ *const embedding,
Options options )
Template Parameters
Index_Integer type of the observation indices.
Input_Floating-point type of the input data for the neighbor search. This only used to define the knncolle::Prebuilt type and is otherwise ignored.
Float_Floating-point type of the input data, neighbor distances and output embedding.
Parameters
prebuiltA neighbor search index built on the dataset of interest.
num_dimNumber of dimensions of the UMAP embedding.
[out]embeddingPointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (num_dim) and columns are observations (x.size()). On output, this contains the initial coordinates of the embedding. Existing values in this array will not be modified if Options::initialize_method = InitializeMethod::NONE, or if Options::initialize_method = InitializeMethod::SPECTRAL and spectral initialization fails and Options::initialize_random_on_spectral_fail = false.
optionsFurther options.
Returns
A Status object containing the initial state of the UMAP algorithm.

◆ initialize() [2/3]

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
Status< Index_, Float_ > umappp::initialize ( const std::size_t data_dim,
const Index_ num_obs,
const Float_ *const data,
const knncolle::Builder< Index_, Float_, Float_, Matrix_ > & builder,
const std::size_t num_dim,
Float_ *const embedding,
Options options )
Template Parameters
Index_Integer type of the observation indices.
Float_Floating-point type of the input data, neighbor distances and output embedding.
Matrix_Class of the input matrix for the neighbor search. This should be a knncolle::SimpleMatrix or knncolle::Matrix.
Parameters
data_dimNumber of dimensions of the input dataset.
num_obsNumber of observations in the input dataset.
[in]dataPointer to an array containing the input dataset as a column-major matrix. Each row corresponds to a dimension (data_dim) and each column corresponds to an observation (num_obs).
builderAlgorithm for the nearest neighbor search.
num_dimNumber of dimensions of the embedding.
[out]embeddingPointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (num_dim) and columns are observations (x.size()). On output, this contains the initial coordinates of the embedding. Existing values in this array will not be modified if Options::initialize_method = InitializeMethod::NONE, or if Options::initialize_method = InitializeMethod::SPECTRAL and spectral initialization fails and Options::initialize_random_on_spectral_fail = false.
optionsFurther options.
Returns
A Status object containing the initial state of the UMAP algorithm.

◆ initialize() [3/3]

template<typename Index_ , typename Float_ >
Status< Index_, Float_ > umappp::initialize ( NeighborList< Index_, Float_ > x,
const std::size_t num_dim,
Float_ *const embedding,
Options options )
Template Parameters
Index_Integer type of the neighbor indices.
Float_Floating-point type of the distances.
Parameters
xIndices and distances to the nearest neighbors for each observation. For each observation, neighbors should be unique and sorted in order of increasing distance; see the NeighborList description for details.
num_dimNumber of dimensions of the embedding.
[out]embeddingPointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (num_dim) and columns are observations (x.size()). On output, this contains the initial coordinates of the embedding. Existing values in this array will not be modified if Options::initialize_method = InitializeMethod::NONE, or if Options::initialize_method = InitializeMethod::SPECTRAL and spectral initialization fails and Options::initialize_random_on_spectral_fail = false.
optionsFurther options. Note that Options::num_neighbors is ignored here.
Returns
A Status object containing the initial state of the UMAP algorithm.

◆ parallelize()

template<typename Task_ , class Run_ >
void umappp::parallelize ( const int num_workers,
const Task_ num_tasks,
Run_ run_task_range )
Template Parameters
Task_Integer type of the number of tasks.
Run_Function to execute a range of tasks.
Parameters
num_workersNumber of workers.
num_tasksNumber of tasks.
run_task_rangeFunction to iterate over a range of tasks within a worker.

By default, this is an alias to subpar::parallelize_range(). However, if the UMAPPP_CUSTOM_PARALLEL function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range().