scran_qc
Simple quality control on single-cell data
|
Options for choose_filter_thresholds()
.
More...
#include <choose_filter_thresholds.hpp>
Public Attributes | |
bool | lower = true |
bool | upper = true |
double | num_mads = 3 |
double | min_diff = 0 |
bool | log = false |
Options for choose_filter_thresholds()
.
bool scran_qc::ChooseFilterThresholdsOptions::lower = true |
Should low values be considered as potential outliers? If false
, no lower threshold is applied when defining outliers.
bool scran_qc::ChooseFilterThresholdsOptions::upper = true |
Should high values be considered as potential outliers? If false
, no upper threshold is applied when defining outliers.
double scran_qc::ChooseFilterThresholdsOptions::num_mads = 3 |
Number of MADs to use to define outliers. Larger values result in more relaxed thresholds. By default, we require 3 MADs, which is motivated by the low probability (less than 1%) of obtaining such a value for normally distributed data.
double scran_qc::ChooseFilterThresholdsOptions::min_diff = 0 |
Minimum difference from the median to define outliers. This enforces a more relaxed threshold in cases where the MAD may be too small. If ChooseFilterThresholdsOptions::log = true
, this difference is interpreted as a unit on the natural log-scale.
bool scran_qc::ChooseFilterThresholdsOptions::log = false |
Whether the median and MAD should computed on the log-scale, i.e., FindMedianMadOptions::log = true
. (Or, for the overload that accepts a FindMedianMadResult
, whether the median and MAD were already computed the log-scale.)
Using a log-transformation instructs the outlier definition to focus on the fold-change from the median. This has several benefits for right-skewed distributions of (mostly) positive values, where the log-transformation symmetrizes the distribution and makes it more normal-like. This improves the relevance of the interpretation of ChooseFilterThresholdsOptions::num_mads
. When defining a lower threshold, the log-transformation also ensures that the defined threshold is always positive.
Some caution is required for distributions close to zero, e.g., proportions. The conversion of near-zero values to large negative log-values can unexpectedly inflate the MAD. This could be mitigated by adding a pseudo-count prior to log-transformation, but a large pseudo-count would cause the log-transformation to converge to a linear transformation, rendering this option meaningless for distributions consisting of small values.
If this is true
, the reported thresholds are still converted back to the original scale of the metrics.