topicks
Pick top genes for downstream analyses
Loading...
Searching...
No Matches
Pick top genes

Unit tests Documentation Codecov

Overview

The topicks library implements a pick_top_genes() function to pick the top genes based on some statistic. The idea is to use this to choose highly variable genes based on their variances (e.g., from scran_variances), or for picking the best markers based on a differential expression statistic (e.g., from scran_markers). This functionality is surprisingly complex when we need to consider ties, absolute bounds, and whether to return a boolean filter or an array of indices.

Quick start

We can obtain an array of booleans indicating whether each gene was picked based on its stats:

std::vector<double> stats(100); // vector of per-gene statistics.
stats.size(),
stats.data(),
10, // number of top genes to pick.
true, // whether to pick genes with the largest 'stats'.
opts
);
void pick_top_genes(const std::size_t n, const Stat_ *const statistic, const std::size_t top, const bool larger, Bool_ *const output, const PickTopGenesOptions< Stat_ > &options)
Definition pick_top_genes.hpp:243
Options for pick_top_genes().
Definition pick_top_genes.hpp:26
Umbrella header for the topicks library.

Alternatively we can obtain an array of integer indices:

auto idx = topicks::pick_top_genes_index(stats.size(), stats.data(), 10, true, opts);
std::vector< Index_ > pick_top_genes_index(const Index_ n, const Stat_ *const statistic, const Index_ top, const bool larger, const PickTopGenesOptions< Stat_ > &options)
Definition pick_top_genes.hpp:306

By default, ties at the selection boundary are retained so the actual number of chosen genes may be greater than what was requested. This can be disabled via the PickTopGenesOptions options:

opt.keep_ties = false;

We can also set an absolute bound on the statistic, e.g., to ensure that we never select marker genes with log-fold changes below some threshold:

opt.bound = 0.5;

Check out the reference documentation for more details.

Building projects

CMake with FetchContent

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
topicks
GIT_REPOSITORY https://github.com/libscran/topicks
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(topicks)

Then you can link to topicks to make the headers available during compilation:

# For executables:
target_link_libraries(myexe libscran::topicks)
# For libaries
target_link_libraries(mylib INTERFACE libscran::topicks)

CMake with find_package()

find_package(libscran_topicks CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::topicks)

To install the library, use:

mkdir build && cd build
cmake .. -DTOPICKER_TESTS=OFF
cmake --build . --target install

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DTOPICKER_FETCH_EXTERN=OFF. See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.

Manual

If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This requires the external dependencies listed in extern/CMakeLists.txt.