qdtsne
A quick and dirty t-SNE C++ library
|
Quick and dirty t-SNE. More...
Classes | |
struct | Options |
Options for initialize() . More... | |
class | Status |
Status of the t-SNE iterations. More... | |
Typedefs | |
template<typename Index_ , typename Float_ > | |
using | NeighborList = knncolle::NeighborList<Index_, Float_> |
Lists of neighbors for each observation. | |
Functions | |
template<int num_dim_, typename Index_ , typename Float_ > | |
Status< num_dim_, Index_, Float_ > | initialize (NeighborList< Index_, Float_ > neighbors, const Options &options) |
template<int num_dim_, typename Dim_ , typename Index_ , typename Float_ > | |
Status< num_dim_, Index_, Float_ > | initialize (const knncolle::Prebuilt< Dim_, Index_, Float_ > &prebuilt, const Options &options) |
template<int num_dim_, typename Dim_ , typename Index_ , typename Float_ > | |
Status< num_dim_, Index_, Float_ > | initialize (Dim_ data_dim, Index_ num_points, const Float_ *data, const knncolle::Builder< knncolle::SimpleMatrix< Dim_, Index_, Float_ >, Float_ > &builder, const Options &options) |
int | perplexity_to_k (double perplexity) |
template<int num_dim_, typename Float_ = double> | |
void | initialize_random (Float_ *Y, size_t num_points, int seed=42) |
template<int num_dim_, typename Float_ = double> | |
std::vector< Float_ > | initialize_random (size_t num_points, int seed=42) |
template<typename Task_ , class Run_ > | |
void | parallelize (int num_workers, Task_ num_tasks, Run_ run_task_range) |
Quick and dirty t-SNE.
using qdtsne::NeighborList = knncolle::NeighborList<Index_, Float_> |
Lists of neighbors for each observation.
This is a convenient alias for the knncolle::NeighborList
class. Each inner vector corresponds to an observation and contains the list of nearest neighbors for that observation, sorted by increasing distance. Neighbors for each observation should be unique - there should be no more than one occurrence of each index in each inner vector. Also, the inner vector for observation i
should not contain any Neighbor
with index i
.
Index_ | Integer type to use for the indices. |
Float_ | Floating-point type to use for the calculations. |
Status< num_dim_, Index_, Float_ > qdtsne::initialize | ( | const knncolle::Prebuilt< Dim_, Index_, Float_ > & | prebuilt, |
const Options & | options ) |
Overload that accepts a neighbor search index and computes the nearest neighbors for each observation, before proceeding with the initialization of the t-SNE algorithm.
num_dim_ | Number of dimensions of the final embedding. |
Dim_ | Integer type for the dataset dimensions. |
Index_ | Integer type for the neighbor indices. |
Float_ | Floating-point type to use for the calculations. |
prebuilt | A knncolle::Prebuilt instance containing a neighbor search index built on the dataset of interest. |
options | Further options. |
Status
object representing an initial state of the t-SNE algorithm. Status< num_dim_, Index_, Float_ > qdtsne::initialize | ( | Dim_ | data_dim, |
Index_ | num_points, | ||
const Float_ * | data, | ||
const knncolle::Builder< knncolle::SimpleMatrix< Dim_, Index_, Float_ >, Float_ > & | builder, | ||
const Options & | options ) |
Overload that accepts a column-major matrix of coordinates and computes the nearest neighbors for each observation, before proceeding with the initialization of the t-SNE algorithm.
num_dim_ | Number of dimensions of the final embedding. |
Dim_ | Integer type for the dataset dimensions. |
Index_ | Integer type for the neighbor indices. |
Float_ | Floating-point type to use for the calculations. |
data_dim | Number of rows of the matrix at data , corresponding to the dimensions of the input dataset. | |
num_points | Number of columns of the matrix at data , corresponding to the points of the input dataset. | |
[in] | data | Pointer to an array containing a column-major matrix with data_dim rows and num_points columns. |
builder | A knncolle::Builder instance specifying the nearest-neighbor algorithm to use. | |
options | Further options. |
Status
object representing an initial state of the t-SNE algorithm. Status< num_dim_, Index_, Float_ > qdtsne::initialize | ( | NeighborList< Index_, Float_ > | neighbors, |
const Options & | options ) |
Initialize the data structures for t-SNE algorithm, given the nearest neighbors of each observation.
num_dim_ | Number of dimensions of the final embedding. |
Index_ | Integer type for the neighbor indices. |
Float_ | Floating-point type to use for the calculations. |
neighbors | List of indices and distances to nearest neighbors for each observation. Each observation should have the same number of neighbors, sorted by increasing distance, which should not include itself. |
options | Further options. If Options::infer_perplexity = true , the perplexity is determined from neighbors and the value in Options::perplexity is ignored. |
Status
object representing an initial state of the t-SNE algorithm. void qdtsne::initialize_random | ( | Float_ * | Y, |
size_t | num_points, | ||
int | seed = 42 ) |
Initializes the starting locations of each observation in the embedding. We do so using our own implementation of the Box-Muller transform, to avoid problems with differences in the distribution functions across C++ standard library implementations.
num_dim_ | Number of embedding dimensions. |
Float_ | Floating-point type to use for the calculations. |
[out] | Y | Pointer to a 2D array with number of rows and columns equal to num_dim and num_points , respectively. On output, Y is filled with random draws from a standard normal distribution. |
num_points | Number of points in the embedding. | |
seed | Seed for the random number generator. |
std::vector< Float_ > qdtsne::initialize_random | ( | size_t | num_points, |
int | seed = 42 ) |
Creates the initial locations of each observation in the embedding.
num_dim_ | Number of embedding dimensions. |
Float_ | Floating-point type to use for the calculations. |
num_points | Number of observations. |
seed | Seed for the random number generator. |
num_points * num_dim_
containing random draws from a standard normal distribution. void qdtsne::parallelize | ( | int | num_workers, |
Task_ | num_tasks, | ||
Run_ | run_task_range ) |
Task_ | Integer type for the number of tasks. |
Run_ | Function to execute a range of tasks. |
num_workers | Number of workers. |
num_tasks | Number of tasks. |
run_task_range | Function to iterate over a range of tasks within a worker. |
By default, this is an alias to subpar::parallelize_range()
. However, if the QDTSNE_CUSTOM_PARALLEL
function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range()
.
|
inline |
Determines the appropriate number of neighbors, given a perplexity value. Useful when the neighbor search is conducted outside of initialize()
.
perplexity | Perplexity to use in the t-SNE algorithm. |