umappp
A C++ library for UMAP
|
Options for initialize()
.
More...
#include <Options.hpp>
Public Attributes | |
double | local_connectivity = 1 |
double | bandwidth = 1 |
double | mix_ratio = 1 |
double | spread = 1 |
double | min_dist = 0.1 |
double | a = 0 |
double | b = 0 |
double | repulsion_strength = 1 |
InitializeMethod | initialize = InitializeMethod::SPECTRAL |
int | num_epochs = -1 |
double | learning_rate = 1 |
double | negative_sample_rate = 5 |
int | num_neighbors = 15 |
uint64_t | seed = 1234567890 |
int | num_threads = 1 |
int | parallel_optimization = false |
Options for initialize()
.
double umappp::Options::a = 0 |
Positive value for the \(a\) parameter for the fuzzy set membership strength calculations. Larger values yield a sharper decay in membership strength with increasing distance between observations.
If this or Options::a
is set to zero, a suitable value for this parameter is automatically determined from Options::spread
and Options::min_dist
.
double umappp::Options::b = 0 |
Value in \((0, 1)\) for the \(b\) parameter for the fuzzy set membership strength calculations. Larger values yield an earlier decay in membership strength with increasing distance between observations.
If this or Options::a
is set to zero, a suitable value for this parameter is automatically determined from the values provided to Options::spread
and Options::min_dist
.
double umappp::Options::bandwidth = 1 |
Effective bandwidth of the kernel when converting the distance to a neighbor into a fuzzy set membership confidence. Larger values reduce the decay in confidence with respect to distance, increasing connectivity and favoring global structure.
InitializeMethod umappp::Options::initialize = InitializeMethod::SPECTRAL |
How to initialize the embedding. Some choices may use the existing coordinates provided to initialize()
via the embedding
argument.
double umappp::Options::learning_rate = 1 |
Initial learning rate used in the gradient descent. Larger values can improve the speed of convergence but at the cost of stability.
double umappp::Options::local_connectivity = 1 |
The number of nearest neighbors that are assumed to be always connected, with maximum membership confidence. Larger values increase the connectivity of the embedding and reduce the focus on local structure. This may be a fractional number of neighbors.
double umappp::Options::min_dist = 0.1 |
Minimum distance between observations in the final low-dimensional embedding. Smaller values will increase local clustering while larger values favor a more even distribution of points throughout the low-dimensional space. This is interpreted relative to the spread of points in Options::spread
.
double umappp::Options::mix_ratio = 1 |
Mixing ratio to use when combining fuzzy sets. This symmetrizes the sets by ensuring that the confidence of point \(A\) belonging to point \(B\)'s set is the same as the confidence of \(B\) belonging to \(A\)'s set. A mixing ratio of 1 will take the union of confidences, a ratio of 0 will take the intersection, and intermediate values will interpolate between them. Larger values (up to 1) favor connectivity and more global structure.
double umappp::Options::negative_sample_rate = 5 |
Rate of sampling negative observations to compute repulsive forces. This is interpreted with respect to the number of neighbors with attractive forces, i.e., for each attractive interaction, n
negative samples are taken for repulsive interactions. Smaller values can improve the speed of convergence but at the cost of stability.
int umappp::Options::num_epochs = -1 |
Number of epochs for the gradient descent, i.e., optimization iterations. Larger values improve accuracy at the cost of computational work. If the requested number of epochs is negative, a value is automatically chosen based on the size of the dataset:
This choice aims to reduce computational work for very large datasets.
int umappp::Options::num_neighbors = 15 |
Number of neighbors to use to define the fuzzy sets. Larger values improve connectivity and favor preservation of global structure, at the cost of increased computational work. This argument is only used in certain initialize()
overloads that perform identification of the nearest neighbors.
int umappp::Options::num_threads = 1 |
Number of threads to use. The parallelization scheme is determined by parallelize()
for most calculations. The exception is the nearest-neighbor search in some of the initialize()
overloads, where the scheme is determined by knncolle::parallelize()
instead.
If Options::parallel_optimization = true
, this option will also affect the layout optimization, i.e., the gradient descent iterations.
Whether to enable parallel optimization. If set to true
, this will use the number of threads specified in Options::num_threads
for the layout optimization step.
By default, this is set to false
as the increase in the number of threads is usually not cost-effective for layout optimization. Specifically, while CPU usage scales with the number of threads, the time spent does not decrease by the same factor. We also expect that the number of available CPUs is at least equal to the requested number of threads, otherwise contention will greatly degrade performance. Nonetheless, users can enable parallel optimization if cost is no issue - usually a higher number of threads (above 4) is required to see a reduction in time.
If the UMAPPP_NO_PARALLEL_OPTIMIZATION
macro is defined, umappp will not be compiled with support for parallel optimization. This may be desirable in environments that have no support for threading or atomics, or to reduce the binary size if parallelization is not of interest. In such cases, enabling parallel optimization and calling Status::run()
will raise an error.
double umappp::Options::repulsion_strength = 1 |
Modifier for the repulsive force. Larger values increase repulsion and favor local structure.
uint64_t umappp::Options::seed = 1234567890 |
Seed to use for the Mersenne Twister when sampling negative observations.
double umappp::Options::spread = 1 |
Scale of the coordinates of the final low-dimensional embedding.