Quick and dirty t-SNE. More...

Classes
struct	Options
	Options for `initialize()`. More...

class	Status
	Status of the t-SNE iterations. More...

Typedefs
template<typename Index_ , typename Float_ >
using	NeighborList = knncolle::NeighborList<Index_, Float_>
	Lists of neighbors for each observation.

Functions
template<std::size_t num_dim_, typename Index_ , typename Float_ >
Status< num_dim_, Index_, Float_ >	initialize (NeighborList< Index_, Float_ > neighbors, const Options &options)

template<std::size_t num_dim_, typename Index_ , typename Input_ , typename Float_ >
Status< num_dim_, Index_, Float_ >	initialize (const knncolle::Prebuilt< Index_, Input_, Float_ > &prebuilt, const Options &options)

template<std::size_t num_dim_, typename Index_ , typename Float_ , class Matrix_ >
Status< num_dim_, Index_, Float_ >	initialize (std::size_t data_dim, std::size_t num_points, const Float_ *data, const knncolle::Builder< Index_, Float_, Float_, Matrix_ > &builder, const Options &options)

int	perplexity_to_k (double perplexity)

template<std::size_t num_dim_, typename Float_ = double>
void	initialize_random (Float_ *Y, std::size_t num_points, int seed=42)

template<std::size_t num_dim_, typename Float_ = double>
std::vector< Float_ >	initialize_random (std::size_t num_points, int seed=42)

template<typename Task_ , class Run_ >
void	parallelize (int num_workers, Task_ num_tasks, Run_ run_task_range)

Detailed Description

Quick and dirty t-SNE.

Typedef Documentation

◆ NeighborList

template<typename Index_ , typename Float_ >

using qdtsne::NeighborList = knncolle::NeighborList<Index_, Float_>

Lists of neighbors for each observation.

This is a convenient alias for the knncolle::NeighborList class. Each inner vector corresponds to an observation and contains the list of nearest neighbors for that observation, sorted by increasing distance. Neighbors for each observation should be unique - there should be no more than one occurrence of each index in each inner vector. Also, the inner vector for observation i should not contain any Neighbor with index i.

Template Parameters

Index_	Integer type of the observation indices.
Float_	Floating-point type of the neighbor distances.

Function Documentation

◆ initialize() [1/3]

template<std::size_t num_dim_, typename Index_ , typename Input_ , typename Float_ >

Status< num_dim_, Index_, Float_ > qdtsne::initialize	(	const knncolle::Prebuilt< Index_, Input_, Float_ > &	prebuilt,
		const Options &	options )

Overload that accepts a neighbor search index and computes the nearest neighbors for each observation, before proceeding with the initialization of the t-SNE algorithm.

Template Parameters

num_dim_	Number of dimensions of the final embedding.
Index_	Integer type of the observation indices.
Input_	Floating-point type of the input data for the neighbor search. This is not used other than to define the `knncolle::Prebuilt` type.
Float_	Floating-point type of the neighbor distances and output embedding.

Parameters

prebuilt	A neighbor search index built on the dataset of interest.
options	Further options.

Returns: A Status object representing an initial state of the t-SNE algorithm.

◆ initialize() [2/3]

template<std::size_t num_dim_, typename Index_ , typename Float_ >

Status< num_dim_, Index_, Float_ > qdtsne::initialize	(	NeighborList< Index_, Float_ >	neighbors,
		const Options &	options )

Initialize the data structures for t-SNE algorithm, given the nearest neighbors of each observation.

Template Parameters

num_dim_	Number of dimensions of the final embedding.
Index_	Integer type of the observation indices.
Float_	Floating-point type of the neighbor distances and output embedding.

Parameters

neighbors	List of indices and distances to nearest neighbors for each observation. Each observation should have the same number of neighbors, sorted by increasing distance, which should not include itself.
options	Further options. If `Options::infer_perplexity = true`, the perplexity is determined from `neighbors` and the value in `Options::perplexity` is ignored.

Returns: A Status object representing an initial state of the t-SNE algorithm.

◆ initialize() [3/3]

template<std::size_t num_dim_, typename Index_ , typename Float_ , class Matrix_ >

Status< num_dim_, Index_, Float_ > qdtsne::initialize	(	std::size_t	data_dim,
		std::size_t	num_points,
		const Float_ *	data,
		const knncolle::Builder< Index_, Float_, Float_, Matrix_ > &	builder,
		const Options &	options )

Overload that accepts a column-major matrix of coordinates and computes the nearest neighbors for each observation, before proceeding with the initialization of the t-SNE algorithm.

Template Parameters

num_dim_	Number of dimensions of the final embedding.
Index_	Integer type of the observation indices.
Float_	Floating-point type of the input data, neighbor distances and output embedding.
Matrix_	Class of the input matrix for the neighbor search. This should be a `knncolle::SimpleMatrix` or its base class (i.e., `knncolle::Matrix`).

Parameters

	data_dim	Number of rows of the matrix at `data`, corresponding to the dimensions of the input dataset.
	num_points	Number of columns of the matrix at `data`, corresponding to the points of the input dataset.
[in]	data	Pointer to an array containing a column-major matrix with `data_dim` rows and `num_points` columns.
	builder	A `knncolle::Builder` instance specifying the nearest-neighbor algorithm to use.
	options	Further options.

Returns: A Status object representing an initial state of the t-SNE algorithm.

◆ initialize_random() [1/2]

template<std::size_t num_dim_, typename Float_ = double>

void qdtsne::initialize_random	(	Float_ *	Y,
		std::size_t	num_points,
		int	seed = 42 )

Initializes the starting locations of each observation in the embedding. We do so using our own implementation of the Box-Muller transform, to avoid problems with differences in the distribution functions across C++ standard library implementations.

Template Parameters

num_dim_	Number of embedding dimensions.
Float_	Floating-point type of the embedding.

Parameters

[out]	Y	Pointer to a 2D array with number of rows and columns equal to `num_dim` and `num_points`, respectively. On output, `Y` is filled with random draws from a standard normal distribution.
	num_points	Number of points in the embedding.
	seed	Seed for the random number generator.

◆ initialize_random() [2/2]

template<std::size_t num_dim_, typename Float_ = double>

std::vector< Float_ > qdtsne::initialize_random	(	std::size_t	num_points,
		int	seed = 42 )

Creates the initial locations of each observation in the embedding.

Template Parameters

num_dim_	Number of embedding dimensions.
Float_	Floating-point type of the embedding.

Parameters

num_points	Number of observations.
seed	Seed for the random number generator.

Returns: A vector of length num_points * num_dim_ containing random draws from a standard normal distribution.

◆ parallelize()

template<typename Task_ , class Run_ >

void qdtsne::parallelize	(	int	num_workers,
		Task_	num_tasks,
		Run_	run_task_range )

Template Parameters

Task_	Integer type for the number of tasks.
Run_	Function to execute a range of tasks.

Parameters

num_workers	Number of workers.
num_tasks	Number of tasks.
run_task_range	Function to iterate over a range of tasks within a worker.

By default, this is an alias to subpar::parallelize_range(). However, if the QDTSNE_CUSTOM_PARALLEL function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range().

◆ perplexity_to_k()

int qdtsne::perplexity_to_k ( double perplexity )

inline

Determines the appropriate number of neighbors, given a perplexity value. Useful when the neighbor search is conducted outside of initialize().

Parameters

perplexity Perplexity to use in the t-SNE algorithm.

Returns: Number of nearest neighbors to find.

Classes

Typedefs

Functions

Detailed Description

Typedef Documentation

◆ NeighborList

Function Documentation

◆ initialize() [1/3]

◆ initialize() [2/3]

◆ initialize() [3/3]

◆ initialize_random() [1/2]

◆ initialize_random() [2/2]

◆ parallelize()

◆ perplexity_to_k()