mnncorrect
Batch correction with mutual nearest neighbors
Loading...
Searching...
No Matches
mnncorrect::Options< Index_, Float_, Matrix_ > Struct Template Reference

Options for compute(). More...

#include <Options.hpp>

Public Attributes

int num_neighbors = 15
 
double num_mads = 3
 
std::shared_ptr< knncolle::Builder< Index_, Float_, Float_, Matrix_ > > builder
 
std::vector< BatchIndexorder
 
bool automatic_order = true
 
int robust_iterations = 2
 
double robust_trim = 0.25
 
ReferencePolicy reference_policy = ReferencePolicy::MAX_RSS
 
Index_ mass_cap = 0
 
int num_threads = 1
 

Detailed Description

template<typename Index_, typename Float_, class Matrix_ = knncolle::Matrix<Index_, Float_>>
struct mnncorrect::Options< Index_, Float_, Matrix_ >

Options for compute().

Template Parameters
Index_Integer type for the observation indices.
Float_Floating-point type for the input/output data.
Matrix_Class of the input data matrix for the neighbor search. This should satisfy the knncolle::Matrix interface. Alternatively, it may be a knncolle::SimpleMatrix.

Member Data Documentation

◆ automatic_order

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
bool mnncorrect::Options< Index_, Float_, Matrix_ >::automatic_order = true

Should batches be merged in an automatically-determined order?

If true and Options::order is empty, the largest batch is used as the reference and other batches are successively merged onto it. At each merge step, we choose the batch that forms the largest number of MNNs with the current reference, and the merged dataset is defined as the new reference.

If this is empty and Options::automatic_order = false, batches are merged in the order that they were supplied in compute(). If a batch array was supplied, the batches are merged in order of their identifiers, i.e., batch 0 is the reference.

If Options::order is non-empty, this setting is ignored and the manually specified order is always used.

◆ builder

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
std::shared_ptr<knncolle::Builder<Index_, Float_, Float_, Matrix_> > mnncorrect::Options< Index_, Float_, Matrix_ >::builder

Algorithm to use for building the nearest-neighbor search indices. If NULL, defaults to an exact search via knncolle::VptreeBuilder with Euclidean distances.

◆ mass_cap

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
Index_ mnncorrect::Options< Index_, Float_, Matrix_ >::mass_cap = 0

Cap on the number of observations used to compute the center of mass for each MNN-involved observation in the reference dataset. The reference dataset is effectively downsampled to mass_cap observations for this specific calculation, which speeds up multiple correction iterations at the cost of some precision. If zero, no cap is used.

◆ num_mads

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
double mnncorrect::Options< Index_, Float_, Matrix_ >::num_mads = 3

Number of median absolute deviations to use to define the distance threshold for the center of mass calculations. Larger values reduce biases from the kissing effect but increase the risk of including inappropriately distant subpopulations into the center of mass.

◆ num_neighbors

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
int mnncorrect::Options< Index_, Float_, Matrix_ >::num_neighbors = 15

Number of neighbors used in various search steps, primarily to identify MNN pairs. Larger values increase the number of MNN pairs and improve the stability of the correction, at the cost of reduced resolution of matching subpopulations across batches.

The number of neighbors is also used to identify the closest MNN pairs when computing the average correction vector for each target observation. Again, this improves stability at the cost of resolution for local variations in the correction vectors.

◆ num_threads

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
int mnncorrect::Options< Index_, Float_, Matrix_ >::num_threads = 1

Number of threads to use. The parallelization scheme is defined by parallelize().

◆ order

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
std::vector<BatchIndex> mnncorrect::Options< Index_, Float_, Matrix_ >::order

Manually specified merge order for the batches. This should contain a permutation of all integers in \({0, 1, 2, ..., N-1}\) where \(N\) is the number of batches. Each entry of this vector corresponds to a batch.

At the first merge step, the order[0] batch is considered to be the reference. order[1] is corrected against the reference and merged to form a new reference. This is repeated for each remaining batch in the order specified by order.

If this is empty and Options::automatic_order = false, batches are merged in the order that they were supplied in compute(). If a batch array was supplied, the batches are merged in order of their identifiers, i.e., batch 0 is the reference.

◆ reference_policy

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
ReferencePolicy mnncorrect::Options< Index_, Float_, Matrix_ >::reference_policy = ReferencePolicy::MAX_RSS

Policy to use to choose the reference batch when Options::automatic_order = true.

◆ robust_iterations

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
int mnncorrect::Options< Index_, Float_, Matrix_ >::robust_iterations = 2

Number of iterations to use for robustification when computing the center of mass for each MNN-involved cell. At each iteration, the observations furthest from the center are removed, and the center is recomputed with the remaining observations.

◆ robust_trim

template<typename Index_ , typename Float_ , class Matrix_ = knncolle::Matrix<Index_, Float_>>
double mnncorrect::Options< Index_, Float_, Matrix_ >::robust_trim = 0.25

Trimming proportion to use for robustification when computing the center of mass. The proportion of observations with the largest distances from the center are removed for the next iteration of the center calculation.


The documentation for this struct was generated from the following file: