umappp
A C++ library for UMAP
|
Functions for creating UMAP embeddings. More...
Classes | |
struct | Options |
Options for initialize() . More... | |
class | Status |
Status of the UMAP optimization iterations. More... | |
Typedefs | |
template<typename Index_ , typename Float_ > | |
using | NeighborList = knncolle::NeighborList<Index_, Float_> |
Lists of neighbors for each observation. | |
typedef std::mt19937_64 | RngEngine |
Enumerations | |
enum | InitializeMethod : char { SPECTRAL , RANDOM , NONE } |
Functions | |
template<typename Index_ , typename Float_ > | |
Status< Index_, Float_ > | initialize (NeighborList< Index_, Float_ > x, const std::size_t num_dim, Float_ *const embedding, Options options) |
template<typename Index_ , typename Input_ , typename Float_ > | |
Status< Index_, Float_ > | initialize (const knncolle::Prebuilt< Index_, Input_, Float_ > &prebuilt, const std::size_t num_dim, Float_ *const embedding, Options options) |
template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>> | |
Status< Index_, Float_ > | initialize (const std::size_t data_dim, const Index_ num_obs, const Float_ *const data, const knncolle::Builder< Index_, Float_, Float_, Matrix_ > &builder, const std::size_t num_dim, Float_ *const embedding, Options options) |
template<typename Task_ , class Run_ > | |
void | parallelize (const int num_workers, const Task_ num_tasks, Run_ run_task_range) |
Functions for creating UMAP embeddings.
using umappp::NeighborList = knncolle::NeighborList<Index_, Float_> |
Lists of neighbors for each observation.
Index_ | Integer type of the neighbor indices. |
Float_ | Floating-point type of the distances. |
This is a convenient alias for the knncolle::NeighborList
class. Each inner vector corresponds to an observation and contains the list of nearest neighbors for that observation, sorted by increasing distance. Neighbors for each observation should be unique - there should be no more than one occurrence of each index in each inner vector. Also, the inner vector for observation i
should not contain any Neighbor
with index i
.
typedef std::mt19937_64 umappp::RngEngine |
Class of the random number generator used in umappp.
enum umappp::InitializeMethod : char |
How should the initial coordinates of the embedding be obtained?
SPECTRAL
: spectral decomposition of the normalized graph Laplacian. Specifically, the initial coordinates are defined from the eigenvectors corresponding to the smallest non-zero eigenvalues. This fails in the presence of multiple graph components or if the approximate SVD (via irlba::compute()
) fails to converge.RANDOM
: fills the embedding with random draws from a normal distribution.NONE
: uses existing values in the supplied embedding array. Status< Index_, Float_ > umappp::initialize | ( | const knncolle::Prebuilt< Index_, Input_, Float_ > & | prebuilt, |
const std::size_t | num_dim, | ||
Float_ *const | embedding, | ||
Options | options ) |
Index_ | Integer type of the observation indices. |
Input_ | Floating-point type of the input data for the neighbor search. This only used to define the knncolle::Prebuilt type and is otherwise ignored. |
Float_ | Floating-point type of the input data, neighbor distances and output embedding. |
prebuilt | A neighbor search index built on the dataset of interest. | |
num_dim | Number of dimensions of the UMAP embedding. | |
[out] | embedding | Pointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (num_dim ) and columns are observations (x.size() ). On output, this contains the initial coordinates of the embedding. Existing values in this array will not be modified if Options::initialize_method = InitializeMethod::NONE , or if Options::initialize_method = InitializeMethod::SPECTRAL and spectral initialization fails and Options::initialize_random_on_spectral_fail = false . |
options | Further options. |
Status
object containing the initial state of the UMAP algorithm. Status< Index_, Float_ > umappp::initialize | ( | const std::size_t | data_dim, |
const Index_ | num_obs, | ||
const Float_ *const | data, | ||
const knncolle::Builder< Index_, Float_, Float_, Matrix_ > & | builder, | ||
const std::size_t | num_dim, | ||
Float_ *const | embedding, | ||
Options | options ) |
Index_ | Integer type of the observation indices. |
Float_ | Floating-point type of the input data, neighbor distances and output embedding. |
Matrix_ | Class of the input matrix for the neighbor search. This should be a knncolle::SimpleMatrix or knncolle::Matrix . |
data_dim | Number of dimensions of the input dataset. | |
num_obs | Number of observations in the input dataset. | |
[in] | data | Pointer to an array containing the input dataset as a column-major matrix. Each row corresponds to a dimension (data_dim ) and each column corresponds to an observation (num_obs ). |
builder | Algorithm for the nearest neighbor search. | |
num_dim | Number of dimensions of the embedding. | |
[out] | embedding | Pointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (num_dim ) and columns are observations (x.size() ). On output, this contains the initial coordinates of the embedding. Existing values in this array will not be modified if Options::initialize_method = InitializeMethod::NONE , or if Options::initialize_method = InitializeMethod::SPECTRAL and spectral initialization fails and Options::initialize_random_on_spectral_fail = false . |
options | Further options. |
Status
object containing the initial state of the UMAP algorithm. Status< Index_, Float_ > umappp::initialize | ( | NeighborList< Index_, Float_ > | x, |
const std::size_t | num_dim, | ||
Float_ *const | embedding, | ||
Options | options ) |
Index_ | Integer type of the neighbor indices. |
Float_ | Floating-point type of the distances. |
x | Indices and distances to the nearest neighbors for each observation. For each observation, neighbors should be unique and sorted in order of increasing distance; see the NeighborList description for details. | |
num_dim | Number of dimensions of the embedding. | |
[out] | embedding | Pointer to an array in which to store the embedding. This is treated as a column-major matrix where rows are dimensions (num_dim ) and columns are observations (x.size() ). On output, this contains the initial coordinates of the embedding. Existing values in this array will not be modified if Options::initialize_method = InitializeMethod::NONE , or if Options::initialize_method = InitializeMethod::SPECTRAL and spectral initialization fails and Options::initialize_random_on_spectral_fail = false . |
options | Further options. Note that Options::num_neighbors is ignored here. |
Status
object containing the initial state of the UMAP algorithm. void umappp::parallelize | ( | const int | num_workers, |
const Task_ | num_tasks, | ||
Run_ | run_task_range ) |
Task_ | Integer type of the number of tasks. |
Run_ | Function to execute a range of tasks. |
num_workers | Number of workers. |
num_tasks | Number of tasks. |
run_task_range | Function to iterate over a range of tasks within a worker. |
By default, this is an alias to subpar::parallelize_range()
. However, if the UMAPPP_CUSTOM_PARALLEL
function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range()
.