|
template<typename Index_ , typename Float_ , class Matrix_ > |
Details | compute (std::size_t num_dim, const std::vector< Index_ > &num_obs, const std::vector< const Float_ * > &batches, Float_ *output, const Options< Index_, Float_, Matrix_ > &options) |
|
template<typename Index_ , typename Float_ , class Matrix_ > |
Details | compute (std::size_t num_dim, const std::vector< Index_ > &num_obs, const Float_ *input, Float_ *output, const Options< Index_, Float_, Matrix_ > &options) |
|
template<typename Index_ , typename Float_ , typename Batch_ , class Matrix_ > |
Details | compute (std::size_t num_dim, Index_ num_obs, const Float_ *input, const Batch_ *batch, Float_ *output, const Options< Index_, Float_, Matrix_ > &options) |
|
template<typename Task_ , class Run_ > |
void | parallelize (int num_workers, Task_ num_tasks, Run_ run_task_range) |
|
Batch correction with mutual nearest neighbors.
template<typename Index_ , typename Float_ , class Matrix_ >
Details mnncorrect::compute |
( |
std::size_t | num_dim, |
|
|
const std::vector< Index_ > & | num_obs, |
|
|
const std::vector< const Float_ * > & | batches, |
|
|
Float_ * | output, |
|
|
const Options< Index_, Float_, Matrix_ > & | options ) |
Batch correction using mutual nearest neighbors.
This function implements a variant of the MNN correction method described by Haghverdi et al. (2018). Two cells from different batches can form an MNN pair if they each belong in each other's set of nearest neighbors. The MNN pairs are assumed to represent cells from corresponding subpopulations across the two batches. Any differences in location between the paired cells can be interpreted as the batch effect and targeted for removal.
We consider one batch to be the "reference" and the other to be the "target", where the aim is to correct the latter to the (unchanged) former. For each observation in the target batch, we find the closest MNN pairs (based on the locations of the paired observation in the same batch) and we compute a robust average of the correction vectors involving those pairs. This average is used to obtain a single correction vector that is applied to the target observation to obtain corrected values.
Each MNN pair's correction vector is computed between the "center of mass" locations for the paired observations. The center of mass for each observation is defined as a robust average of a subset of neighboring observations from the same batch. Robustification is performed by iterations of trimming of observations that are furthest from the mean. In addition, we explicitly remove observations that are more than a certain distance from the observation in the MNN pair.
- See also
- Haghverdi L et al. (2018). Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nature Biotech. 36, 421-427
- Template Parameters
-
Index_ | Integer type for the observation index. |
Float_ | Floating-point type for the input/output data. |
Matrix_ | Class of the input data matrix for the neighbor search. This should satisfy the knncolle::Matrix interface. Alternatively, it may be a knncolle::SimpleMatrix . |
- Parameters
-
| num_dim | Number of dimensions. |
| num_obs | Vector of length equal to the number of batches. The i -th entry contains the number of observations in batch i . |
[in] | batches | Vector of length equal to the number of batches. The i -th entry points to a column-major dimension-by-observation array containing the uncorrected data for batch i , where the number of rows is equal to num_dim and the number of columns is equal to num_obs[i] . |
[out] | output | Pointer to an array containing a column-major matrix with number of rows equal to num_dim and number of columns equal to the sum of num_obs . On output, the first num_obs[0] columns contain the corrected values of the first batch, the second num_obs[1] columns contain the corrected values of the second batch, and so on. |
| options | Further options. |
- Returns
- Statistics about the merge process.
template<typename Task_ , class Run_ >
void mnncorrect::parallelize |
( |
int | num_workers, |
|
|
Task_ | num_tasks, |
|
|
Run_ | run_task_range ) |
- Template Parameters
-
Task_ | Integer type for the number of tasks. |
Run_ | Function to execute a range of tasks. |
- Parameters
-
num_workers | Number of workers. |
num_tasks | Number of tasks. |
run_task_range | Function to iterate over a range of tasks within a worker. |
By default, this is an alias to subpar::parallelize_range()
. However, if the MNNCORRECT_CUSTOM_PARALLEL
function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range()
.