|
factorize
Create factors from categorical variables
|
This repository contains functions to create R-style factors from categorical variables. Each factor is represented by (i) an array of integer codes in the interval $[0, N)$ and (ii) an array of length $N$ containing sorted and unique levels. For any given observation, its value in the categorical variable can be retrieved by indexing the array of levels by its code. Factors are useful as they map arbitrary variables onto integer codes that can be easily processed by other functions.
We can create a factor from any categorical variable:
We can also easily create a factor from multiple variables, where the "levels" will be sorted and unique combinations of the variables.
Check out the reference documentation for more details.
FetchContentIf you're using CMake, you just need to add something like this to your CMakeLists.txt:
Then you can link to factorize to make the headers available during compilation:
find_package()To install the library, use:
By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DFACTORIZE_FETCH_EXTERN=OFF. See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This also requires the external dependencies listed in extern/CMakeLists.txt.