qdtsne
Quick and dirty t-SNE in C++
Loading...
Searching...
No Matches
qdtsne Namespace Reference

Quick and dirty t-SNE. More...

Classes

struct  Options
 Options for initialize(). More...
 
class  Status
 Status of the t-SNE iterations. More...
 

Typedefs

template<typename Index_ , typename Float_ >
using NeighborList = knncolle::NeighborList<Index_, Float_>
 Lists of neighbors for each observation.
 

Functions

template<std::size_t num_dim_, typename Index_ , typename Float_ >
Status< num_dim_, Index_, Float_ > initialize (NeighborList< Index_, Float_ > neighbors, const Options &options)
 
template<std::size_t num_dim_, typename Index_ , typename Input_ , typename Float_ >
Status< num_dim_, Index_, Float_ > initialize (const knncolle::Prebuilt< Index_, Input_, Float_ > &prebuilt, const Options &options)
 
template<std::size_t num_dim_, typename Index_ , typename Float_ , class Matrix_ >
Status< num_dim_, Index_, Float_ > initialize (const std::size_t data_dim, const Index_ num_obs, const Float_ *const data, const knncolle::Builder< Index_, Float_, Float_, Matrix_ > &builder, const Options &options)
 
template<typename Index_ = int>
Index_ perplexity_to_k (const double perplexity)
 
template<std::size_t num_dim_, typename Float_ = double>
void initialize_random (Float_ *const Y, const std::size_t num_points, const unsigned long long seed=42)
 
template<std::size_t num_dim_, typename Float_ = double>
std::vector< Float_ > initialize_random (const std::size_t num_points, const unsigned long long seed=42)
 
template<typename Task_ , class Run_ >
void parallelize (const int num_workers, const Task_ num_tasks, Run_ run_task_range)
 

Detailed Description

Quick and dirty t-SNE.

Typedef Documentation

◆ NeighborList

template<typename Index_ , typename Float_ >
using qdtsne::NeighborList = knncolle::NeighborList<Index_, Float_>

Lists of neighbors for each observation.

This is a convenient alias for the knncolle::NeighborList class. Each inner vector corresponds to an observation and contains the list of nearest neighbors for that observation, sorted by increasing distance. Neighbors for each observation should be unique, and the list of neighbors for observation i should not contain itself.

Template Parameters
Index_Integer type of the observation indices.
Float_Floating-point type of the neighbor distances.

Function Documentation

◆ initialize() [1/3]

template<std::size_t num_dim_, typename Index_ , typename Input_ , typename Float_ >
Status< num_dim_, Index_, Float_ > qdtsne::initialize ( const knncolle::Prebuilt< Index_, Input_, Float_ > & prebuilt,
const Options & options )

Overload of initialize() that accepts a neighbor search index and computes the nearest neighbors for each observation, before proceeding with the initialization of the t-SNE algorithm.

Template Parameters
num_dim_Number of dimensions of the final embedding.
Index_Integer type of the observation indices.
Input_Floating-point type of the input data for the neighbor search. This is only used to define the knncolle::Prebuilt type and is otherwise ignored.
Float_Floating-point type of the neighbor distances and output embedding.
Parameters
prebuiltA pre-built neighbor search index for the dataset of interest.
optionsFurther options.
Returns
A Status object representing an initial state of the t-SNE algorithm.

◆ initialize() [2/3]

template<std::size_t num_dim_, typename Index_ , typename Float_ , class Matrix_ >
Status< num_dim_, Index_, Float_ > qdtsne::initialize ( const std::size_t data_dim,
const Index_ num_obs,
const Float_ *const data,
const knncolle::Builder< Index_, Float_, Float_, Matrix_ > & builder,
const Options & options )

Overload of initialize() that accepts a column-major matrix of coordinates and computes the nearest neighbors for each observation, before proceeding with the initialization of the t-SNE algorithm.

Template Parameters
num_dim_Number of dimensions of the final embedding.
Index_Integer type of the observation indices.
Float_Floating-point type of the input data, neighbor distances and output embedding.
Matrix_Class of the input matrix for the neighbor search. This should be knncolle::SimpleMatrix or knncolle::Matrix.
Parameters
data_dimNumber of rows of the matrix at data, corresponding to the dimensions of the input dataset.
num_obsNumber of columns of the matrix at data, corresponding to the observations of the input dataset.
[in]dataPointer to an array containing a column-major matrix with data_dim rows and num_obs columns.
builderA knncolle::Builder instance specifying the nearest-neighbor algorithm to use.
optionsFurther options.
Returns
A Status object representing an initial state of the t-SNE algorithm.

◆ initialize() [3/3]

template<std::size_t num_dim_, typename Index_ , typename Float_ >
Status< num_dim_, Index_, Float_ > qdtsne::initialize ( NeighborList< Index_, Float_ > neighbors,
const Options & options )

Initialize the data structures for t-SNE algorithm, given the nearest neighbors of each observation in the dataset.

Template Parameters
num_dim_Number of dimensions of the final embedding.
Index_Integer type of the observation indices.
Float_Floating-point type of the neighbor distances and output embedding.
Parameters
neighborsList of indices and distances to nearest neighbors for each observation. Each observation should have the same number of neighbors, sorted by increasing distance. Each observation should not be included in its own list of neighbors. It is assumed that neighbors.size() will fit in an Index_.
optionsFurther options. If Options::infer_perplexity = true, the perplexity is determined from neighbors and the value in Options::perplexity is ignored.
Returns
A Status object representing an initial state of the t-SNE algorithm.

◆ initialize_random() [1/2]

template<std::size_t num_dim_, typename Float_ = double>
std::vector< Float_ > qdtsne::initialize_random ( const std::size_t num_points,
const unsigned long long seed = 42 )

Creates the initial locations of each observation in the embedding.

Template Parameters
num_dim_Number of embedding dimensions.
Float_Floating-point type of the embedding.
Parameters
num_pointsNumber of observations.
seedSeed for the random number generator.
Returns
A vector of length num_points * num_dim_ containing random draws from a standard normal distribution.

◆ initialize_random() [2/2]

template<std::size_t num_dim_, typename Float_ = double>
void qdtsne::initialize_random ( Float_ *const Y,
const std::size_t num_points,
const unsigned long long seed = 42 )

Initializes the starting locations of each observation in the embedding. We do so using our own implementation of the Box-Muller transform, to avoid problems with differences in the distribution functions across C++ standard library implementations.

Template Parameters
num_dim_Number of embedding dimensions.
Float_Floating-point type of the embedding.
Parameters
[out]YPointer to a 2D array with number of rows and columns equal to num_dim and num_points, respectively. On output, Y is filled with random draws from a standard normal distribution.
num_pointsNumber of points in the embedding.
seedSeed for the random number generator.

◆ parallelize()

template<typename Task_ , class Run_ >
void qdtsne::parallelize ( const int num_workers,
const Task_ num_tasks,
Run_ run_task_range )
Template Parameters
Task_Integer type of the number of tasks.
Run_Function to execute a range of tasks.
Parameters
num_workersNumber of workers.
num_tasksNumber of tasks.
run_task_rangeFunction to iterate over a range of tasks within a worker.

By default, this is an alias to subpar::parallelize_range(). However, if the QDTSNE_CUSTOM_PARALLEL function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range().

◆ perplexity_to_k()

template<typename Index_ = int>
Index_ qdtsne::perplexity_to_k ( const double perplexity)

Determines the appropriate number of neighbors, given a perplexity value. Useful when the neighbor search is conducted outside of initialize().

Template Parameters
Index_Integer type of the number of neighbors.
Parameters
perplexityPerplexity value, see Options::perplexity.
Returns
Number of nearest neighbors to find.