topicks
Pick top genes for downstream analyses
|
The topicks library implements a pick_top_genes()
function to pick the top genes based on some statistic. The idea is to use this to choose highly variable genes based on their variances (e.g., from scran_variances), or for picking the best markers based on a differential expression statistic (e.g., from scran_markers). This functionality is surprisingly complex when we need to consider ties, absolute bounds, and whether to return a boolean filter or an array of indices.
We can obtain an array of booleans indicating whether each gene was picked based on its stats
:
Alternatively we can obtain an array of integer indices:
By default, ties at the selection boundary are retained so the actual number of chosen genes may be greater than what was requested. This can be disabled via the PickTopGenesOptions
options:
We can also set an absolute bound on the statistic, e.g., to ensure that we never select marker genes with log-fold changes below some threshold:
Check out the reference documentation for more details.
FetchContent
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
Then you can link to topicks to make the headers available during compilation:
find_package()
To install the library, use:
By default, this will use FetchContent
to fetch all external dependencies. If you want to install them manually, use -DTOPICKER_FETCH_EXTERN=OFF
. See the tags in extern/CMakeLists.txt
to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files in include/
- either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
. This requires the external dependencies listed in extern/CMakeLists.txt
.