qdtsne
Quick and dirty t-SNE in C++
|
Quick and dirty t-SNE. More...
Classes | |
struct | Options |
Options for initialize() . More... | |
class | Status |
Status of the t-SNE iterations. More... | |
Typedefs | |
template<typename Index_ , typename Float_ > | |
using | NeighborList = knncolle::NeighborList<Index_, Float_> |
Lists of neighbors for each observation. | |
Functions | |
template<std::size_t num_dim_, typename Index_ , typename Float_ > | |
Status< num_dim_, Index_, Float_ > | initialize (NeighborList< Index_, Float_ > neighbors, const Options &options) |
template<std::size_t num_dim_, typename Index_ , typename Input_ , typename Float_ > | |
Status< num_dim_, Index_, Float_ > | initialize (const knncolle::Prebuilt< Index_, Input_, Float_ > &prebuilt, const Options &options) |
template<std::size_t num_dim_, typename Index_ , typename Float_ , class Matrix_ > | |
Status< num_dim_, Index_, Float_ > | initialize (const std::size_t data_dim, const Index_ num_obs, const Float_ *const data, const knncolle::Builder< Index_, Float_, Float_, Matrix_ > &builder, const Options &options) |
template<typename Index_ = int> | |
Index_ | perplexity_to_k (const double perplexity) |
template<std::size_t num_dim_, typename Float_ = double> | |
void | initialize_random (Float_ *const Y, const std::size_t num_points, const unsigned long long seed=42) |
template<std::size_t num_dim_, typename Float_ = double> | |
std::vector< Float_ > | initialize_random (const std::size_t num_points, const unsigned long long seed=42) |
template<typename Task_ , class Run_ > | |
void | parallelize (const int num_workers, const Task_ num_tasks, Run_ run_task_range) |
Quick and dirty t-SNE.
using qdtsne::NeighborList = knncolle::NeighborList<Index_, Float_> |
Lists of neighbors for each observation.
This is a convenient alias for the knncolle::NeighborList
class. Each inner vector corresponds to an observation and contains the list of nearest neighbors for that observation, sorted by increasing distance. Neighbors for each observation should be unique, and the list of neighbors for observation i
should not contain itself.
Index_ | Integer type of the observation indices. |
Float_ | Floating-point type of the neighbor distances. |
Status< num_dim_, Index_, Float_ > qdtsne::initialize | ( | const knncolle::Prebuilt< Index_, Input_, Float_ > & | prebuilt, |
const Options & | options ) |
Overload of initialize()
that accepts a neighbor search index and computes the nearest neighbors for each observation, before proceeding with the initialization of the t-SNE algorithm.
num_dim_ | Number of dimensions of the final embedding. |
Index_ | Integer type of the observation indices. |
Input_ | Floating-point type of the input data for the neighbor search. This is only used to define the knncolle::Prebuilt type and is otherwise ignored. |
Float_ | Floating-point type of the neighbor distances and output embedding. |
prebuilt | A pre-built neighbor search index for the dataset of interest. |
options | Further options. |
Status
object representing an initial state of the t-SNE algorithm. Status< num_dim_, Index_, Float_ > qdtsne::initialize | ( | const std::size_t | data_dim, |
const Index_ | num_obs, | ||
const Float_ *const | data, | ||
const knncolle::Builder< Index_, Float_, Float_, Matrix_ > & | builder, | ||
const Options & | options ) |
Overload of initialize()
that accepts a column-major matrix of coordinates and computes the nearest neighbors for each observation, before proceeding with the initialization of the t-SNE algorithm.
num_dim_ | Number of dimensions of the final embedding. |
Index_ | Integer type of the observation indices. |
Float_ | Floating-point type of the input data, neighbor distances and output embedding. |
Matrix_ | Class of the input matrix for the neighbor search. This should be knncolle::SimpleMatrix or knncolle::Matrix . |
data_dim | Number of rows of the matrix at data , corresponding to the dimensions of the input dataset. | |
num_obs | Number of columns of the matrix at data , corresponding to the observations of the input dataset. | |
[in] | data | Pointer to an array containing a column-major matrix with data_dim rows and num_obs columns. |
builder | A knncolle::Builder instance specifying the nearest-neighbor algorithm to use. | |
options | Further options. |
Status
object representing an initial state of the t-SNE algorithm. Status< num_dim_, Index_, Float_ > qdtsne::initialize | ( | NeighborList< Index_, Float_ > | neighbors, |
const Options & | options ) |
Initialize the data structures for t-SNE algorithm, given the nearest neighbors of each observation in the dataset.
num_dim_ | Number of dimensions of the final embedding. |
Index_ | Integer type of the observation indices. |
Float_ | Floating-point type of the neighbor distances and output embedding. |
neighbors | List of indices and distances to nearest neighbors for each observation. Each observation should have the same number of neighbors, sorted by increasing distance. Each observation should not be included in its own list of neighbors. It is assumed that neighbors.size() will fit in an Index_ . |
options | Further options. If Options::infer_perplexity = true , the perplexity is determined from neighbors and the value in Options::perplexity is ignored. |
Status
object representing an initial state of the t-SNE algorithm. std::vector< Float_ > qdtsne::initialize_random | ( | const std::size_t | num_points, |
const unsigned long long | seed = 42 ) |
Creates the initial locations of each observation in the embedding.
num_dim_ | Number of embedding dimensions. |
Float_ | Floating-point type of the embedding. |
num_points | Number of observations. |
seed | Seed for the random number generator. |
num_points * num_dim_
containing random draws from a standard normal distribution. void qdtsne::initialize_random | ( | Float_ *const | Y, |
const std::size_t | num_points, | ||
const unsigned long long | seed = 42 ) |
Initializes the starting locations of each observation in the embedding. We do so using our own implementation of the Box-Muller transform, to avoid problems with differences in the distribution functions across C++ standard library implementations.
num_dim_ | Number of embedding dimensions. |
Float_ | Floating-point type of the embedding. |
[out] | Y | Pointer to a 2D array with number of rows and columns equal to num_dim and num_points , respectively. On output, Y is filled with random draws from a standard normal distribution. |
num_points | Number of points in the embedding. | |
seed | Seed for the random number generator. |
void qdtsne::parallelize | ( | const int | num_workers, |
const Task_ | num_tasks, | ||
Run_ | run_task_range ) |
Task_ | Integer type of the number of tasks. |
Run_ | Function to execute a range of tasks. |
num_workers | Number of workers. |
num_tasks | Number of tasks. |
run_task_range | Function to iterate over a range of tasks within a worker. |
By default, this is an alias to subpar::parallelize_range()
. However, if the QDTSNE_CUSTOM_PARALLEL
function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range()
.
Index_ qdtsne::perplexity_to_k | ( | const double | perplexity | ) |
Determines the appropriate number of neighbors, given a perplexity value. Useful when the neighbor search is conducted outside of initialize()
.
Index_ | Integer type of the number of neighbors. |
perplexity | Perplexity value, see Options::perplexity . |