Create factors from categorical variables.
More...
|
| template<typename Input_ , typename Code_ > |
| std::vector< std::vector< Input_ > > | combine_to_factor (const std::size_t n, const std::vector< const Input_ * > &inputs, Code_ *const codes) |
| |
| template<typename Input_ , typename Number_ , typename Code_ > |
| std::vector< std::vector< Input_ > > | combine_to_factor_unused (const std::size_t n, const std::vector< std::pair< const Input_ *, Number_ > > &inputs, Code_ *const codes) |
| |
| template<typename Input_ , typename Code_ > |
| std::vector< Input_ > | create_factor (const std::size_t n, const Input_ *const input, Code_ *const codes) |
| |
Create factors from categorical variables.
◆ combine_to_factor()
template<typename Input_ , typename Code_ >
| std::vector< std::vector< Input_ > > factorize::combine_to_factor |
( |
const std::size_t | n, |
|
|
const std::vector< const Input_ * > & | inputs, |
|
|
Code_ *const | codes ) |
- Template Parameters
-
| Input_ | Type of the categorical variables to be combined. Any type may be used here as long as it implements the comparison operators. |
| Code_ | Integer type of the codes of the combined factor. This should be large enough to hold the number of unique combinations. |
- Parameters
-
| n | Number of observations (i.e., cells). |
| [in] | inputs | Vector of pointers to arrays of length n, each containing a different categorical variable. |
| [out] | codes | Pointer to an array of length n in which the codes of the combined factor are to be stored. On output, the code for observation i refers to the factor level defined by indexing into the inner vectors of the output vector, i.e., for j := codes[i], the factor level is defined by the combination (output[0][j], output[1][j], ...). |
- Returns
- Vector of vectors containing the levels of the combined factor. Each inner vector corresponds to a variables in
inputs, and all inner vectors have the same length. Corresponding entries of the inner vectors represent a level of the combined factor, in the form of a combination of values from the input variables, i.e., the first level is defined as (output[0][0], output[1][0], ...), the second level is defined as (output[0][1], output[1][1], ...), and so on. Each entry in output[i] is guaranteed to be a value in inputs[i]. Combinations are guaranteed to be unique and lexicographically sorted (i.e., by the value of the first variable, then the second, and so on).
◆ combine_to_factor_unused()
template<typename Input_ , typename Number_ , typename Code_ >
| std::vector< std::vector< Input_ > > factorize::combine_to_factor_unused |
( |
const std::size_t | n, |
|
|
const std::vector< std::pair< const Input_ *, Number_ > > & | inputs, |
|
|
Code_ *const | codes ) |
This function is a variation of combine_to_factor() that considers unobserved combinations of variables.
- Template Parameters
-
| Input_ | Factor type. Any type may be used here as long as it is comparable. |
| Number_ | Integer type for the number of unique values in each variable. |
| Code_ | Integer type for the combined factor. This should be large enough to hold the number of unique (possibly unused) combinations. |
- Parameters
-
| n | Number of observations (i.e., cells). |
| [in] | inputs | Vector of pairs, each of which corresponds to a categorical variable. The first element of the pair is a pointer to an array of length n, containing the values of the variable for each observation. The second element is the total number of unique values for this variable, which may be greater than the largest observed level. |
| [out] | codes | Pointer to an array of length n in which the codes of the combined factor are to be stored. On output, each entry determines the corresponding observation's combination of levels by indexing into the inner vectors of the returned object; see the argument of the same name in combine_to_factor() for more details. |
- Returns
- Vector of vectors containing all unique and sorted combinations of the input variables. This has the same structure as the output of
combine_to_factor(), with the only difference being that unobserved combinations are also reported.
◆ create_factor()
template<typename Input_ , typename Code_ >
| std::vector< Input_ > factorize::create_factor |
( |
const std::size_t | n, |
|
|
const Input_ *const | input, |
|
|
Code_ *const | codes ) |
Convert a categorical variable into a factor. Factors are defined in a similar manner as in the R programming language, i.e., an array of integer codes, each of which reference into an array of unique levels.
- Template Parameters
-
| Input_ | Type of the categorical variable. Any type may be used here as long as it is hashable and has an equality operator. |
| Code_ | Integer type for the output factor codes. |
- Parameters
-
| n | Number of observations. |
| [in] | input | Pointer to an array of length n containing the input categorical variable. |
| [out] | codes | Pointer to an array of length n in which the factor codes are to be stored. All values are integers in \([0, N)\) where \(N\) is the length of the output vector; all integers in this range are guaranteed to be present at least once in cleaned. |
- Returns
- A vector of the unique and sorted values of
input, i.e., the factor levels. For any observation i, it is guaranteed that output[codes[i]] == input[i].