qdtsne
A quick and dirty t-SNE C++ library
Loading...
Searching...
No Matches
Classes | Typedefs | Functions
qdtsne Namespace Reference

Quick and dirty t-SNE. More...

Classes

struct  Options
 Options for initialize(). More...
 
class  Status
 Status of the t-SNE iterations. More...
 

Typedefs

template<typename Index_ , typename Float_ >
using NeighborList = knncolle::NeighborList< Index_, Float_ >
 Lists of neighbors for each observation.
 

Functions

template<int num_dim_, typename Index_ , typename Float_ >
Status< num_dim_, Index_, Float_initialize (NeighborList< Index_, Float_ > neighbors, const Options &options)
 
template<int num_dim_, typename Dim_ , typename Index_ , typename Float_ >
Status< num_dim_, Index_, Float_initialize (const knncolle::Prebuilt< Dim_, Index_, Float_ > &prebuilt, const Options &options)
 
template<int num_dim_, typename Dim_ , typename Index_ , typename Float_ >
Status< num_dim_, Index_, Float_initialize (Dim_ data_dim, Index_ num_points, const Float_ *data, const knncolle::Builder< knncolle::SimpleMatrix< Dim_, Index_, Float_ >, Float_ > &builder, const Options &options)
 
int perplexity_to_k (double perplexity)
 
template<int num_dim_, typename Float_ = double>
void initialize_random (Float_ *Y, size_t num_points, int seed=42)
 
template<int num_dim_, typename Float_ = double>
std::vector< Float_initialize_random (size_t num_points, int seed=42)
 
template<typename Task_ , class Run_ >
void parallelize (int num_workers, Task_ num_tasks, Run_ run_task_range)
 

Detailed Description

Quick and dirty t-SNE.

Typedef Documentation

◆ NeighborList

Lists of neighbors for each observation.

This is a convenient alias for the knncolle::NeighborList class. Each inner vector corresponds to an observation and contains the list of nearest neighbors for that observation, sorted by increasing distance. Neighbors for each observation should be unique - there should be no more than one occurrence of each index in each inner vector. Also, the inner vector for observation i should not contain any Neighbor with index i.

Template Parameters
Index_Integer type to use for the indices.
Float_Floating-point type to use for the calculations.

Function Documentation

◆ initialize() [1/3]

template<int num_dim_, typename Dim_ , typename Index_ , typename Float_ >
Status< num_dim_, Index_, Float_ > qdtsne::initialize ( const knncolle::Prebuilt< Dim_, Index_, Float_ > &  prebuilt,
const Options options 
)

Overload that accepts a neighbor search index and computes the nearest neighbors for each observation, before proceeding with the initialization of the t-SNE algorithm.

Template Parameters
num_dim_Number of dimensions of the final embedding.
Dim_Integer type for the dataset dimensions.
Index_Integer type for the neighbor indices.
Float_Floating-point type to use for the calculations.
Parameters
prebuiltA knncolle::Prebuilt instance containing a neighbor search index built on the dataset of interest.
optionsFurther options.
Returns
A Status object representing an initial state of the t-SNE algorithm.

◆ initialize() [2/3]

template<int num_dim_, typename Dim_ , typename Index_ , typename Float_ >
Status< num_dim_, Index_, Float_ > qdtsne::initialize ( Dim_  data_dim,
Index_  num_points,
const Float_ data,
const knncolle::Builder< knncolle::SimpleMatrix< Dim_, Index_, Float_ >, Float_ > &  builder,
const Options options 
)

Overload that accepts a column-major matrix of coordinates and computes the nearest neighbors for each observation, before proceeding with the initialization of the t-SNE algorithm.

Template Parameters
num_dim_Number of dimensions of the final embedding.
Dim_Integer type for the dataset dimensions.
Index_Integer type for the neighbor indices.
Float_Floating-point type to use for the calculations.
Parameters
data_dimNumber of rows of the matrix at data, corresponding to the dimensions of the input dataset.
num_pointsNumber of columns of the matrix at data, corresponding to the points of the input dataset.
[in]dataPointer to an array containing a column-major matrix with data_dim rows and num_points columns.
builderA knncolle::Builder instance specifying the nearest-neighbor algorithm to use.
optionsFurther options.
Returns
A Status object representing an initial state of the t-SNE algorithm.

◆ initialize() [3/3]

template<int num_dim_, typename Index_ , typename Float_ >
Status< num_dim_, Index_, Float_ > qdtsne::initialize ( NeighborList< Index_, Float_ neighbors,
const Options options 
)

Initialize the data structures for t-SNE algorithm, given the nearest neighbors of each observation.

Template Parameters
num_dim_Number of dimensions of the final embedding.
Index_Integer type for the neighbor indices.
Float_Floating-point type to use for the calculations.
Parameters
neighborsList of indices and distances to nearest neighbors for each observation. Each observation should have the same number of neighbors, sorted by increasing distance, which should not include itself.
optionsFurther options. If Options::infer_perplexity = true, the perplexity is determined from neighbors and the value in Options::perplexity is ignored.
Returns
A Status object representing an initial state of the t-SNE algorithm.

◆ initialize_random() [1/2]

template<int num_dim_, typename Float_ = double>
void qdtsne::initialize_random ( Float_ Y,
size_t  num_points,
int  seed = 42 
)

Initializes the starting locations of each observation in the embedding. We do so using our own implementation of the Box-Muller transform, to avoid problems with differences in the distribution functions across C++ standard library implementations.

Template Parameters
num_dim_Number of embedding dimensions.
Float_Floating-point type to use for the calculations.
Parameters
[out]YPointer to a 2D array with number of rows and columns equal to num_dim and num_points, respectively. On output, Y is filled with random draws from a standard normal distribution.
num_pointsNumber of points in the embedding.
seedSeed for the random number generator.

◆ initialize_random() [2/2]

template<int num_dim_, typename Float_ = double>
std::vector< Float_ > qdtsne::initialize_random ( size_t  num_points,
int  seed = 42 
)

Creates the initial locations of each observation in the embedding.

Template Parameters
num_dim_Number of embedding dimensions.
Float_Floating-point type to use for the calculations.
Parameters
num_pointsNumber of observations.
seedSeed for the random number generator.
Returns
A vector of length num_points * num_dim_ containing random draws from a standard normal distribution.

◆ parallelize()

template<typename Task_ , class Run_ >
void qdtsne::parallelize ( int  num_workers,
Task_  num_tasks,
Run_  run_task_range 
)
Template Parameters
Task_Integer type for the number of tasks.
Run_Function to execute a range of tasks.
Parameters
num_workersNumber of workers.
num_tasksNumber of tasks.
run_task_rangeFunction to iterate over a range of tasks within a worker.

By default, this is an alias to subpar::parallelize_range(). However, if the QDTSNE_CUSTOM_PARALLEL function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range().

◆ perplexity_to_k()

int qdtsne::perplexity_to_k ( double  perplexity)
inline

Determines the appropriate number of neighbors, given a perplexity value. Useful when the neighbor search is conducted outside of initialize().

Parameters
perplexityPerplexity to use in the t-SNE algorithm.
Returns
Number of nearest neighbors to find.