umappp
A C++ library for UMAP
Loading...
Searching...
No Matches
Public Attributes | List of all members
umappp::Options Struct Reference

Options for initialize(). More...

#include <Options.hpp>

Public Attributes

double local_connectivity = 1
 
double bandwidth = 1
 
double mix_ratio = 1
 
double spread = 1
 
double min_dist = 0.1
 
double a = 0
 
double b = 0
 
double repulsion_strength = 1
 
InitializeMethod initialize = InitializeMethod::SPECTRAL
 
int num_epochs = -1
 
double learning_rate = 1
 
double negative_sample_rate = 5
 
int num_neighbors = 15
 
uint64_t seed = 1234567890
 
int num_threads = 1
 
int parallel_optimization = false
 

Detailed Description

Options for initialize().

Member Data Documentation

◆ a

double umappp::Options::a = 0

Positive value for the \(a\) parameter for the fuzzy set membership strength calculations. Larger values yield a sharper decay in membership strength with increasing distance between observations.

If this or Options::a is set to zero, a suitable value for this parameter is automatically determined from Options::spread and Options::min_dist.

◆ b

double umappp::Options::b = 0

Value in \((0, 1)\) for the \(b\) parameter for the fuzzy set membership strength calculations. Larger values yield an earlier decay in membership strength with increasing distance between observations.

If this or Options::a is set to zero, a suitable value for this parameter is automatically determined from the values provided to Options::spread and Options::min_dist.

◆ bandwidth

double umappp::Options::bandwidth = 1

Effective bandwidth of the kernel when converting the distance to a neighbor into a fuzzy set membership confidence. Larger values reduce the decay in confidence with respect to distance, increasing connectivity and favoring global structure.

◆ initialize

InitializeMethod umappp::Options::initialize = InitializeMethod::SPECTRAL

How to initialize the embedding. Some choices may use the existing coordinates provided to initialize() via the embedding argument.

◆ learning_rate

double umappp::Options::learning_rate = 1

Initial learning rate used in the gradient descent. Larger values can improve the speed of convergence but at the cost of stability.

◆ local_connectivity

double umappp::Options::local_connectivity = 1

The number of nearest neighbors that are assumed to be always connected, with maximum membership confidence. Larger values increase the connectivity of the embedding and reduce the focus on local structure. This may be a fractional number of neighbors.

◆ min_dist

double umappp::Options::min_dist = 0.1

Minimum distance between observations in the final low-dimensional embedding. Smaller values will increase local clustering while larger values favor a more even distribution of points throughout the low-dimensional space. This is interpreted relative to the spread of points in Options::spread.

◆ mix_ratio

double umappp::Options::mix_ratio = 1

Mixing ratio to use when combining fuzzy sets. This symmetrizes the sets by ensuring that the confidence of point \(A\) belonging to point \(B\)'s set is the same as the confidence of \(B\) belonging to \(A\)'s set. A mixing ratio of 1 will take the union of confidences, a ratio of 0 will take the intersection, and intermediate values will interpolate between them. Larger values (up to 1) favor connectivity and more global structure.

◆ negative_sample_rate

double umappp::Options::negative_sample_rate = 5

Rate of sampling negative observations to compute repulsive forces. This is interpreted with respect to the number of neighbors with attractive forces, i.e., for each attractive interaction, n negative samples are taken for repulsive interactions. Smaller values can improve the speed of convergence but at the cost of stability.

◆ num_epochs

int umappp::Options::num_epochs = -1

Number of epochs for the gradient descent, i.e., optimization iterations. Larger values improve accuracy at the cost of computational work. If the requested number of epochs is negative, a value is automatically chosen based on the size of the dataset:

  • For datasets with no more than 10000 observations, the number of epochs is set to 500.
  • For larger datasets, the number of epochs decreases from 500 according to the number of cells beyond 10000, to a lower limit of 200.

This choice aims to reduce computational work for very large datasets.

◆ num_neighbors

int umappp::Options::num_neighbors = 15

Number of neighbors to use to define the fuzzy sets. Larger values improve connectivity and favor preservation of global structure, at the cost of increased computational work. This argument is only used in certain initialize() overloads that perform identification of the nearest neighbors.

◆ num_threads

int umappp::Options::num_threads = 1

Number of threads to use. The parallelization scheme is determined by parallelize() for most calculations. The exception is the nearest-neighbor search in some of the initialize() overloads, where the scheme is determined by knncolle::parallelize() instead.

If Options::parallel_optimization = true, this option will also affect the layout optimization, i.e., the gradient descent iterations.

◆ parallel_optimization

int umappp::Options::parallel_optimization = false

Whether to enable parallel optimization. If set to true, this will use the number of threads specified in Options::num_threads for the layout optimization step.

By default, this is set to false as the increase in the number of threads is usually not cost-effective for layout optimization. Specifically, while CPU usage scales with the number of threads, the time spent does not decrease by the same factor. We also expect that the number of available CPUs is at least equal to the requested number of threads, otherwise contention will greatly degrade performance. Nonetheless, users can enable parallel optimization if cost is no issue - usually a higher number of threads (above 4) is required to see a reduction in time.

If the UMAPPP_NO_PARALLEL_OPTIMIZATION macro is defined, umappp will not be compiled with support for parallel optimization. This may be desirable in environments that have no support for threading or atomics, or to reduce the binary size if parallelization is not of interest. In such cases, enabling parallel optimization and calling Status::run() will raise an error.

◆ repulsion_strength

double umappp::Options::repulsion_strength = 1

Modifier for the repulsive force. Larger values increase repulsion and favor local structure.

◆ seed

uint64_t umappp::Options::seed = 1234567890

Seed to use for the Mersenne Twister when sampling negative observations.

◆ spread

double umappp::Options::spread = 1

Scale of the coordinates of the final low-dimensional embedding.


The documentation for this struct was generated from the following file: