|
umappp
A C++ library for UMAP
|
Functions for creating UMAP embeddings. More...
Classes | |
| struct | Options |
Options for initialize(). More... | |
| class | Status |
| Status of the UMAP optimization iterations. More... | |
Typedefs | |
| template<typename Index_ , typename Float_ > | |
| using | NeighborList = knncolle::NeighborList<Index_, Float_> |
| Lists of neighbors for each observation. | |
| typedef std::mt19937_64 | RngEngine |
Enumerations | |
| enum | InitializeMethod : char { SPECTRAL , RANDOM , NONE } |
Functions | |
| template<typename Index_ , typename Float_ > | |
| Status< Index_, Float_ > | initialize (NeighborList< Index_, Float_ > x, const std::size_t num_dim, Float_ *const embedding, Options options) |
| template<typename Index_ , typename Input_ , typename Float_ > | |
| Status< Index_, Float_ > | initialize (const knncolle::Prebuilt< Index_, Input_, Float_ > &prebuilt, const std::size_t num_dim, Float_ *const embedding, Options options) |
| template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>> | |
| Status< Index_, Float_ > | initialize (const std::size_t data_dim, const Index_ num_obs, const Float_ *const data, const knncolle::Builder< Index_, Float_, Float_, Matrix_ > &builder, const std::size_t num_dim, Float_ *const embedding, Options options) |
| template<typename Task_ , class Run_ > | |
| void | parallelize (const int num_workers, const Task_ num_tasks, Run_ run_task_range) |
Functions for creating UMAP embeddings.
| using umappp::NeighborList = knncolle::NeighborList<Index_, Float_> |
Lists of neighbors for each observation.
| Index_ | Integer type of the neighbor indices. |
| Float_ | Floating-point type of the distances. |
This is a convenient alias for the knncolle::NeighborList class. Each inner vector corresponds to an observation and contains the list of nearest neighbors for that observation, sorted by increasing distance. Neighbors for each observation should be unique - there should be no more than one occurrence of each index in each inner vector. Also, the inner vector for observation i should not contain any Neighbor with index i.
| typedef std::mt19937_64 umappp::RngEngine |
Class of the random number generator used in umappp.
| enum umappp::InitializeMethod : char |
How should the initial coordinates of the embedding be obtained?
SPECTRAL: spectral decomposition of the normalized graph Laplacian. Specifically, the initial coordinates are defined from the eigenvectors corresponding to the smallest non-zero eigenvalues. This fails in the presence of multiple graph components or if the approximate SVD (via irlba::compute()) fails to converge.RANDOM: fills the embedding with random draws from a normal distribution.NONE: uses existing values in the supplied embedding array. | Status< Index_, Float_ > umappp::initialize | ( | const knncolle::Prebuilt< Index_, Input_, Float_ > & | prebuilt, |
| const std::size_t | num_dim, | ||
| Float_ *const | embedding, | ||
| Options | options ) |
| Index_ | Integer type of the observation indices. |
| Input_ | Floating-point type of the input data for the neighbor search. This only used to define the knncolle::Prebuilt type and is otherwise ignored. |
| Float_ | Floating-point type of the input data, neighbor distances and output embedding. |
| prebuilt | A neighbor search index built on the dataset of interest. | |
| num_dim | Number of dimensions of the UMAP embedding. | |
| [out] | embedding | Pointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (num_dim) and columns are observations (x.size()). On output, this contains the initial coordinates of the embedding. Existing values in this array will not be modified if Options::initialize_method = InitializeMethod::NONE, or if Options::initialize_method = InitializeMethod::SPECTRAL and spectral initialization fails and Options::initialize_random_on_spectral_fail = false. |
| options | Further options. |
Status object containing the initial state of the UMAP algorithm. | Status< Index_, Float_ > umappp::initialize | ( | const std::size_t | data_dim, |
| const Index_ | num_obs, | ||
| const Float_ *const | data, | ||
| const knncolle::Builder< Index_, Float_, Float_, Matrix_ > & | builder, | ||
| const std::size_t | num_dim, | ||
| Float_ *const | embedding, | ||
| Options | options ) |
| Index_ | Integer type of the observation indices. |
| Float_ | Floating-point type of the input data, neighbor distances and output embedding. |
| Matrix_ | Class of the input matrix for the neighbor search. This should be a knncolle::SimpleMatrix or knncolle::Matrix. |
| data_dim | Number of dimensions of the input dataset. | |
| num_obs | Number of observations in the input dataset. | |
| [in] | data | Pointer to an array containing the input dataset as a column-major matrix. Each row corresponds to a dimension (data_dim) and each column corresponds to an observation (num_obs). |
| builder | Algorithm for the nearest neighbor search. | |
| num_dim | Number of dimensions of the embedding. | |
| [out] | embedding | Pointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (num_dim) and columns are observations (x.size()). On output, this contains the initial coordinates of the embedding. Existing values in this array will not be modified if Options::initialize_method = InitializeMethod::NONE, or if Options::initialize_method = InitializeMethod::SPECTRAL and spectral initialization fails and Options::initialize_random_on_spectral_fail = false. |
| options | Further options. |
Status object containing the initial state of the UMAP algorithm. | Status< Index_, Float_ > umappp::initialize | ( | NeighborList< Index_, Float_ > | x, |
| const std::size_t | num_dim, | ||
| Float_ *const | embedding, | ||
| Options | options ) |
| Index_ | Integer type of the neighbor indices. |
| Float_ | Floating-point type of the distances. |
| x | Indices and distances to the nearest neighbors for each observation. For each observation, neighbors should be unique and sorted in order of increasing distance; see the NeighborList description for details. | |
| num_dim | Number of dimensions of the embedding. | |
| [out] | embedding | Pointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (num_dim) and columns are observations (x.size()). On output, this contains the initial coordinates of the embedding. Existing values in this array will not be modified if Options::initialize_method = InitializeMethod::NONE, or if Options::initialize_method = InitializeMethod::SPECTRAL and spectral initialization fails and Options::initialize_random_on_spectral_fail = false. |
| options | Further options. Note that Options::num_neighbors is ignored here. |
Status object containing the initial state of the UMAP algorithm. | void umappp::parallelize | ( | const int | num_workers, |
| const Task_ | num_tasks, | ||
| Run_ | run_task_range ) |
| Task_ | Integer type of the number of tasks. |
| Run_ | Function to execute a range of tasks. |
| num_workers | Number of workers. |
| num_tasks | Number of tasks. |
| run_task_range | Function to iterate over a range of tasks within a worker. |
By default, this is an alias to subpar::parallelize_range(). However, if the UMAPPP_CUSTOM_PARALLEL function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range().