umappp
A C++ library for UMAP
|
umappp is a header-only C++ implementation of the Uniform Manifold Approximation and Projection (UMAP) algorithm (McInnes, Healy and Melville, 2018). UMAP is a non-linear dimensionality reduction technique that is most commonly used for visualization of complex datasets. This is achieved by placing each observation on a low-dimensional (usually 2D) embedding in a manner that preserves the neighborhood of each observation from the high-dimensional original space. The aim is to ensure that the local structure of the data is faithfully recapitulated in lower dimensions Further theoretical details can be found in the original UMAP documentation; the implementation here is derived from the C++ code in the uwot R package.
Given a pointer to a column-major input array with ndim
rows and nobs
columns, we use initialize()
to start the UMAP algorithm and run()
to run it across epochs:
We can modify parameters in the Options
class that is passed to initialize()
:
We can also run the algorithm up to the specified number of epochs, which is occasionally useful for inspecting the intermediate states of the embedding:
Advanced users can control the neighbor search by either providing the search results directly (as a vector of vectors of index-distance pairs) or by providing an appropriate knncolle subclass to the initialize()
function:
See the reference documentation for more details.
FetchContent
If you're already using CMake, you can add something like this to your CMakeLists.txt
:
And then:
find_package()
To install the library, use:
By default, this will use FetchContent
to fetch all external dependencies. If you want to install them manually, use -DUMAPPP_FETCH_EXTERN=OFF
. See extern/CMakeLists.txt
to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files in include/
- either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
. This requires the external dependencies listed in extern/CMakeLists.txt
, which also need to be made available during compilation.
McInnes L, Healy J, Melville J (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv, https://arxiv.org/abs/1802.03426