Quick and dirty t-SNE.
More...
|
template<int num_dim_, typename Index_ , typename Float_ > |
Status< num_dim_, Index_, Float_ > | initialize (NeighborList< Index_, Float_ > neighbors, const Options &options) |
|
template<int num_dim_, typename Dim_ , typename Index_ , typename Float_ > |
Status< num_dim_, Index_, Float_ > | initialize (const knncolle::Prebuilt< Dim_, Index_, Float_ > &prebuilt, const Options &options) |
|
template<int num_dim_, typename Dim_ , typename Index_ , typename Float_ > |
Status< num_dim_, Index_, Float_ > | initialize (Dim_ data_dim, Index_ num_points, const Float_ *data, const knncolle::Builder< knncolle::SimpleMatrix< Dim_, Index_, Float_ >, Float_ > &builder, const Options &options) |
|
int | perplexity_to_k (double perplexity) |
|
template<int num_dim_, typename Float_ = double> |
void | initialize_random (Float_ *Y, size_t num_points, int seed=42) |
|
template<int num_dim_, typename Float_ = double> |
std::vector< Float_ > | initialize_random (size_t num_points, int seed=42) |
|
template<typename Task_ , class Run_ > |
void | parallelize (int num_workers, Task_ num_tasks, Run_ run_task_range) |
|
◆ NeighborList
Lists of neighbors for each observation.
This is a convenient alias for the knncolle::NeighborList
class. Each inner vector corresponds to an observation and contains the list of nearest neighbors for that observation, sorted by increasing distance. Neighbors for each observation should be unique - there should be no more than one occurrence of each index in each inner vector. Also, the inner vector for observation i
should not contain any Neighbor
with index i
.
- Template Parameters
-
Index_ | Integer type to use for the indices. |
Float_ | Floating-point type to use for the calculations. |
◆ initialize() [1/3]
Overload that accepts a neighbor search index and computes the nearest neighbors for each observation, before proceeding with the initialization of the t-SNE algorithm.
- Template Parameters
-
num_dim_ | Number of dimensions of the final embedding. |
Dim_ | Integer type for the dataset dimensions. |
Index_ | Integer type for the neighbor indices. |
Float_ | Floating-point type to use for the calculations. |
- Parameters
-
prebuilt | A knncolle::Prebuilt instance containing a neighbor search index built on the dataset of interest. |
options | Further options. |
- Returns
- A
Status
object representing an initial state of the t-SNE algorithm.
◆ initialize() [2/3]
Overload that accepts a column-major matrix of coordinates and computes the nearest neighbors for each observation, before proceeding with the initialization of the t-SNE algorithm.
- Template Parameters
-
num_dim_ | Number of dimensions of the final embedding. |
Dim_ | Integer type for the dataset dimensions. |
Index_ | Integer type for the neighbor indices. |
Float_ | Floating-point type to use for the calculations. |
- Parameters
-
| data_dim | Number of rows of the matrix at data , corresponding to the dimensions of the input dataset. |
| num_points | Number of columns of the matrix at data , corresponding to the points of the input dataset. |
[in] | data | Pointer to an array containing a column-major matrix with data_dim rows and num_points columns. |
| builder | A knncolle::Builder instance specifying the nearest-neighbor algorithm to use. |
| options | Further options. |
- Returns
- A
Status
object representing an initial state of the t-SNE algorithm.
◆ initialize() [3/3]
Initialize the data structures for t-SNE algorithm, given the nearest neighbors of each observation.
- Template Parameters
-
num_dim_ | Number of dimensions of the final embedding. |
Index_ | Integer type for the neighbor indices. |
Float_ | Floating-point type to use for the calculations. |
- Parameters
-
neighbors | List of indices and distances to nearest neighbors for each observation. Each observation should have the same number of neighbors, sorted by increasing distance, which should not include itself. |
options | Further options. If Options::infer_perplexity = true , the perplexity is determined from neighbors and the value in Options::perplexity is ignored. |
- Returns
- A
Status
object representing an initial state of the t-SNE algorithm.
◆ initialize_random() [1/2]
Initializes the starting locations of each observation in the embedding. We do so using our own implementation of the Box-Muller transform, to avoid problems with differences in the distribution functions across C++ standard library implementations.
- Template Parameters
-
num_dim_ | Number of embedding dimensions. |
Float_ | Floating-point type to use for the calculations. |
- Parameters
-
[out] | Y | Pointer to a 2D array with number of rows and columns equal to num_dim and num_points , respectively. On output, Y is filled with random draws from a standard normal distribution. |
| num_points | Number of points in the embedding. |
| seed | Seed for the random number generator. |
◆ initialize_random() [2/2]
std::vector< Float_ > qdtsne::initialize_random |
( |
size_t |
num_points, |
|
|
int |
seed = 42 |
|
) |
| |
Creates the initial locations of each observation in the embedding.
- Template Parameters
-
num_dim_ | Number of embedding dimensions. |
Float_ | Floating-point type to use for the calculations. |
- Parameters
-
num_points | Number of observations. |
seed | Seed for the random number generator. |
- Returns
- A vector of length
num_points * num_dim_
containing random draws from a standard normal distribution.
◆ parallelize()
void qdtsne::parallelize |
( |
int |
num_workers, |
|
|
Task_ |
num_tasks, |
|
|
Run_ |
run_task_range |
|
) |
| |
- Template Parameters
-
Task_ | Integer type for the number of tasks. |
Run_ | Function to execute a range of tasks. |
- Parameters
-
num_workers | Number of workers. |
num_tasks | Number of tasks. |
run_task_range | Function to iterate over a range of tasks within a worker. |
By default, this is an alias to subpar::parallelize_range()
. However, if the QDTSNE_CUSTOM_PARALLEL
function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range()
.
◆ perplexity_to_k()
int qdtsne::perplexity_to_k |
( |
double |
perplexity | ) |
|
|
inline |
Determines the appropriate number of neighbors, given a perplexity value. Useful when the neighbor search is conducted outside of initialize()
.
- Parameters
-
perplexity | Perplexity to use in the t-SNE algorithm. |
- Returns
- Number of nearest neighbors to find.