mnncorrect
Batch correction with mutual nearest neighbors
Loading...
Searching...
No Matches
Public Attributes | List of all members
mnncorrect::Options< Dim_, Index_, Float_ > Struct Template Reference

Options for compute(). More...

#include <Options.hpp>

Public Attributes

int num_neighbors = 15
 
double num_mads = 3
 
std::shared_ptr< knncolle::Builder< knncolle::SimpleMatrix< Dim_, Index_, Float_ >, Float_ > > builder
 
std::vector< size_t > order
 
bool automatic_order = true
 
int robust_iterations = 2
 
double robust_trim = 0.25
 
ReferencePolicy reference_policy = ReferencePolicy::MAX_RSS
 
size_t mass_cap = -1
 
int num_threads = 1
 

Detailed Description

template<typename Dim_ = int, typename Index_ = int, typename Float_ = double>
struct mnncorrect::Options< Dim_, Index_, Float_ >

Options for compute().

Template Parameters
Dim_Integer type for the dimensions of the neighbor search.
Index_Integer type for the observation index of the neighbor search.
Float_Floating-point type for the distances in the neighbor search.

Member Data Documentation

◆ automatic_order

template<typename Dim_ = int, typename Index_ = int, typename Float_ = double>
bool mnncorrect::Options< Dim_, Index_, Float_ >::automatic_order = true

Should batches be merged in an automatically-determined order?

If true and Options::order is empty, the largest batch is used as the reference and other batches are successively merged onto it. At each merge step, we choose the batch that forms the largest number of MNNs with the current reference, and the merged dataset is defined as the new reference.

If this is empty and Options::automatic_order = false, batches are merged in the order that they were supplied in compute(). If a batch array was supplied, the batches are merged in order of their identifiers, i.e., batch 0 is the reference.

If Options::order is non-empty, this setting is ignored and the manually specified order is always used.

◆ builder

template<typename Dim_ = int, typename Index_ = int, typename Float_ = double>
std::shared_ptr<knncolle::Builder<knncolle::SimpleMatrix<Dim_, Index_, Float_>, Float_> > mnncorrect::Options< Dim_, Index_, Float_ >::builder

Algorithm to use for building the nearest-neighbor search indices. If NULL, defaults to an exact search via knncolle::VptreeBuilder with Euclidean distances.

◆ mass_cap

template<typename Dim_ = int, typename Index_ = int, typename Float_ = double>
size_t mnncorrect::Options< Dim_, Index_, Float_ >::mass_cap = -1

Cap on the number of observations used to compute the center of mass for each MNN-involved observation in the reference dataset. The reference dataset is effectively downsampled to mass_cap observations for this specific calculation, which speeds up multiple correction iterations at the cost of some precision. If -1, no cap is used.

◆ num_mads

template<typename Dim_ = int, typename Index_ = int, typename Float_ = double>
double mnncorrect::Options< Dim_, Index_, Float_ >::num_mads = 3

Number of median absolute deviations to use to define the distance threshold for the center of mass calculations. Larger values reduce biases from the kissing effect but increase the risk of including inappropriately distant subpopulations into the center of mass.

◆ num_neighbors

template<typename Dim_ = int, typename Index_ = int, typename Float_ = double>
int mnncorrect::Options< Dim_, Index_, Float_ >::num_neighbors = 15

Number of neighbors used in various search steps, primarily to identify MNN pairs. Larger values increase the number of MNN pairs and improve the stability of the correction, at the cost of reduced resolution of matching subpopulations across batches.

The number of neighbors is also used to identify the closest MNN pairs when computing the average correction vector for each target observation. Again, this improves stability at the cost of resolution for local variations in the correction vectors.

◆ num_threads

template<typename Dim_ = int, typename Index_ = int, typename Float_ = double>
int mnncorrect::Options< Dim_, Index_, Float_ >::num_threads = 1

Number of threads to use. The parallelization scheme is defined by parallelize().

◆ order

template<typename Dim_ = int, typename Index_ = int, typename Float_ = double>
std::vector<size_t> mnncorrect::Options< Dim_, Index_, Float_ >::order

Manually specified merge order for the batches. This should contain a permutation of all integers in \({0, 1, 2, ..., N-1}\) where \(N\) is the number of batches. Each entry of this vector corresponds to a batch.

At the first merge step, the order[0] batch is considered to be the reference. order[1] is corrected against the reference and merged to form a new reference. This is repeated for each remaining batch in the order specified by order.

If this is empty and Options::automatic_order = false, batches are merged in the order that they were supplied in compute(). If a batch array was supplied, the batches are merged in order of their identifiers, i.e., batch 0 is the reference.

◆ reference_policy

template<typename Dim_ = int, typename Index_ = int, typename Float_ = double>
ReferencePolicy mnncorrect::Options< Dim_, Index_, Float_ >::reference_policy = ReferencePolicy::MAX_RSS

Policy to use to choose the reference batch when Options::automatic_order = true.

◆ robust_iterations

template<typename Dim_ = int, typename Index_ = int, typename Float_ = double>
int mnncorrect::Options< Dim_, Index_, Float_ >::robust_iterations = 2

Number of iterations to use for robustification when computing the center of mass for each MNN-involved cell. At each iteration, the observations furthest from the center are removed, and the center is recomputed with the remaining observations.

◆ robust_trim

template<typename Dim_ = int, typename Index_ = int, typename Float_ = double>
double mnncorrect::Options< Dim_, Index_, Float_ >::robust_trim = 0.25

Trimming proportion to use for robustification when computing the center of mass. The proportion of observations with the largest distances from the center are removed for the next iteration of the center calculation.


The documentation for this struct was generated from the following file: