scran_graph_cluster
Graph-based clustering of cells
|
Graph-based clustering of single-cell data. More...
Classes | |
struct | BuildSnnGraphOptions |
Options for SNN graph construction. More... | |
struct | BuildSnnGraphResults |
Results of SNN graph construction. More... | |
struct | ClusterLeidenOptions |
Options for cluster_leiden() . More... | |
struct | ClusterLeidenResults |
Result of cluster_leiden() . More... | |
struct | ClusterMultilevelOptions |
Options for cluster_multilevel() . More... | |
struct | ClusterMultilevelResults |
Result of cluster_multilevel() . More... | |
struct | ClusterWalktrapOptions |
Options for cluster_walktrap() . More... | |
struct | ClusterWalktrapResults |
Result of cluster_walktrap() . More... | |
Typedefs | |
typedef int | DefaultNode |
typedef double | DefaultWeight |
Enumerations | |
enum class | SnnWeightScheme : char { RANKED , NUMBER , JACCARD } |
Functions | |
template<typename Node_ = DefaultNode, typename Weight_ = DefaultWeight, typename Index_ , class GetNeighbors_ , class GetIndex_ > | |
void | build_snn_graph (const Index_ num_cells, const GetNeighbors_ get_neighbors, const GetIndex_ get_index, const BuildSnnGraphOptions &options, BuildSnnGraphResults< Node_, Weight_ > &output) |
template<typename Node_ = DefaultNode, typename Weight_ = DefaultWeight, typename Index_ , typename Distance_ > | |
BuildSnnGraphResults< Node_, Weight_ > | build_snn_graph (const knncolle::NeighborList< Index_, Distance_ > &neighbors, const BuildSnnGraphOptions &options) |
template<typename Node_ = int, typename Weight_ = double, typename Index_ > | |
BuildSnnGraphResults< Node_, Weight_ > | build_snn_graph (const std::vector< std::vector< Index_ > > &neighbors, const BuildSnnGraphOptions &options) |
template<typename Node_ = DefaultNode, typename Weight_ = DefaultWeight, typename Index_ , typename Input_ , typename Distance_ > | |
BuildSnnGraphResults< Node_, Weight_ > | build_snn_graph (const knncolle::Prebuilt< Index_, Input_, Distance_ > &prebuilt, const BuildSnnGraphOptions &options) |
template<typename Node_ = DefaultNode, typename Weight_ = DefaultWeight, typename Index_ , typename Input_ , typename Distance_ , class Matrix_ = knncolle::Matrix<Index_, Input_>> | |
BuildSnnGraphResults< Node_, Weight_ > | build_snn_graph (const std::size_t num_dims, const Index_ num_cells, const Input_ *data, const knncolle::Builder< Index_, Input_, Distance_, Matrix_ > &knn_method, const BuildSnnGraphOptions &options) |
void | cluster_leiden (const igraph_t *graph, const igraph_vector_t *weights, const ClusterLeidenOptions &options, ClusterLeidenResults &output) |
ClusterLeidenResults | cluster_leiden (const raiigraph::Graph &graph, const std::vector< igraph_real_t > &weights, const ClusterLeidenOptions &options) |
void | cluster_multilevel (const igraph_t *graph, const igraph_vector_t *weights, const ClusterMultilevelOptions &options, ClusterMultilevelResults &output) |
ClusterMultilevelResults | cluster_multilevel (const raiigraph::Graph &graph, const std::vector< igraph_real_t > &weights, const ClusterMultilevelOptions &options) |
void | cluster_walktrap (const igraph_t *graph, const igraph_vector_t *weights, const ClusterWalktrapOptions &options, ClusterWalktrapResults &output) |
ClusterWalktrapResults | cluster_walktrap (const raiigraph::Graph &graph, const std::vector< igraph_real_t > &weights, const ClusterWalktrapOptions &options) |
template<typename Vertex_ > | |
raiigraph::Graph | edges_to_graph (const std::size_t double_edges, const Vertex_ *const edges, const std::size_t num_vertices, const igraph_bool_t directed) |
Graph-based clustering of single-cell data.
typedef int scran_graph_cluster::DefaultNode |
Default type of the node indices. Set to igraph_integer_t
if igraph is available.
typedef double scran_graph_cluster::DefaultWeight |
Default type of the edge weights. Set to igraph_real_t
if igraph is available.
|
strong |
Choice of edge weighting schemes during graph construction in build_snn_graph()
.
Let \(k\) be the number of nearest neighbors for each node, not including the node itself.
RANKED
defines the weight of the edge between two nodes as \(k - r/2\) where \(r\) is the smallest sum of ranks for any shared neighboring node (Xu and Su, 2015). More shared neighbors, or shared neighbors that are close to both observations, will generally yield larger weights. For the purposes of this ranking, each node has a rank of zero in its own nearest-neighbor set. If only the furthest neighbor is shared between nodes (i.e., \(r = 2f\), the weight is set to 1e-6 to distinguish this edge from pairs of cells with no shared neighbors.NUMBER
defines the weight of the edge between two nodes as the number of shared nearest neighbors between them. The weight can range from zero to \(k + 1\), as we include the node itself. This is a simpler scheme that is also slightly faster but does not account for the ranking of neighbors within each set.JACCARD
defines the weight of the edge between two nodes as the Jaccard index of their neighbor sets, motivated by the algorithm used by the Seurat R package. This weight can range from zero to 1, and is a monotonic transformation of the weight used by NUMBER
.void scran_graph_cluster::build_snn_graph | ( | const Index_ | num_cells, |
const GetNeighbors_ | get_neighbors, | ||
const GetIndex_ | get_index, | ||
const BuildSnnGraphOptions & | options, | ||
BuildSnnGraphResults< Node_, Weight_ > & | output ) |
In a shared nearest-neighbor graph, two cells are connected to each other by an edge if they share any of their nearest neighbors. The weight of this edge is determined from the number or ranking of their shared nearest neighbors. If two cells are close together but have distinct sets of neighbors, the corresponding edge is downweighted as the two cells are unlikely to be part of the same neighborhood. In this manner, strongly weighted edges will only form within highly interconnected neighborhoods where many cells share the same neighbors. This provides a more sophisticated definition of similarity between cells compared to a simpler (unweighted) nearest neighbor graph that just focuses on immediate proximity. Community detection algorithms (e.g., cluster_multilevel()
) can then be applied to the graph to identify clusters of cells.
Node_ | Integer type of the node indices. |
Weight_ | Floating-point type of the edge weights. |
Index_ | Integer type of the observation index. |
GetNeighbors_ | Function that accepts an Index_ cell index and returns a (const reference to) a container-like object. The container should be iterable in a range-based for loop, support the [] operator, and have a size() method. |
GetIndex_ | Function that accepts an element of the container type returned by GetNeighbors_ and returns an Index_ containing its observation index. |
num_cells | Number of cells in the dataset. | |
get_neighbors | Function that accepts an integer cell index in [0, num_cells) and returns a container of that cell's neighbors. Each element of the container corresponds to a neighbor, and neighbors should be sorted by increasing distance from the cell. The same number of neighbors should be identified for each cell. | |
get_index | Function to return the index of each neighbor, given an element of the container returned by get_neighbors . In trivial cases, this is the identity function but it can be more complex depending on the contents of the inner container. | |
options | Further options for graph construction. Note that BuildSnnGraphOptions::num_neighbors is ignored here. | |
[out] | output | On output, the edges and weights of the SNN graph. The input value is ignored so this can be re-used across multiple calls to build_snn_graph() . |
BuildSnnGraphResults< Node_, Weight_ > scran_graph_cluster::build_snn_graph | ( | const knncolle::NeighborList< Index_, Distance_ > & | neighbors, |
const BuildSnnGraphOptions & | options ) |
Overload of build_snn_graph()
to enable convenient usage with pre-computed neighbors from knncolle. Distances are ignored here; only the ordering of neighbor indices is used.
Node_ | Integer type of the node indices. |
Weight_ | Floating-point type of the edge weights. |
Index_ | Integer type of the neighbor indices. |
Distance_ | Floating-point type of the distances. |
neighbors | Vector of nearest-neighbor search results for each cell. Each entry is a pair containing a vector of neighbor indices and a vector of distances to those neighbors. Neighbors should be sorted by increasing distance. The same number of neighbors should be present for each cell. |
options | Further options for graph construction. Note that BuildSnnGraphOptions::num_neighbors is ignored here. |
BuildSnnGraphResults< Node_, Weight_ > scran_graph_cluster::build_snn_graph | ( | const std::vector< std::vector< Index_ > > & | neighbors, |
const BuildSnnGraphOptions & | options ) |
Overload of build_snn_graph()
to enable convenient usage with pre-computed neighbors from knncolle.
Node_ | Integer type of the node indices. |
Weight_ | Floating-point type of the edge weights. |
Index_ | Integer type of the neighbor indices. |
neighbors | Vector of vectors of indices for the neighbors for each cell, sorted by increasing distance. It is generally expected (though not strictly required) that the same number of neighbors are present for each cell. |
options | Further options for graph construction. Note that BuildSnnGraphOptions::num_neighbors is ignored here. |
BuildSnnGraphResults< Node_, Weight_ > scran_graph_cluster::build_snn_graph | ( | const knncolle::Prebuilt< Index_, Input_, Distance_ > & | prebuilt, |
const BuildSnnGraphOptions & | options ) |
Overload of build_snn_graph()
to enable convenient usage with a prebuilt nearest-neighbor search index from knncolle.
Node_ | Integer type of the node indices. |
Weight_ | Floating-point type of the edge weights. |
Index_ | Integer type of the cell index. |
Input_ | Numeric type of the input data used to build the search index. This is only required to define the knncolle::Prebuilt class and is otherwise ignored. |
Distance_ | Floating-point type of the distances. |
[in] | prebuilt | A prebuilt nearest-neighbor search index on the cells of interest. |
options | Further options for graph construction. |
BuildSnnGraphResults< Node_, Weight_ > scran_graph_cluster::build_snn_graph | ( | const std::size_t | num_dims, |
const Index_ | num_cells, | ||
const Input_ * | data, | ||
const knncolle::Builder< Index_, Input_, Distance_, Matrix_ > & | knn_method, | ||
const BuildSnnGraphOptions & | options ) |
Overload of build_snn_graph()
to enable convenient usage with a column-major array of cell coordinates.
Node_ | Integer type of the node indices. |
Weight_ | Floating-point type of the edge weights. |
Index_ | Integer type of the cell index. |
Input_ | Numeric type of the input data. |
Distance_ | Floating-point type of the distances. |
Matrix_ | Class of the input data matrix for the neighbor search. This should satisfy the knncolle::Matrix interface. |
num_dims | Number of dimensions for the cell coordinates. | |
num_cells | Number of cells in the dataset. | |
[in] | data | Pointer to a num_dims -by-num_cells column-major array of cell coordinates where rows are dimensions and columns are cells. |
knn_method | Specification of the nearest-neighbor search algorithm, e.g., knncolle::VptreeBuilder . | |
options | Further options for graph construction. |
|
inline |
Run the Leiden community detection algorithm on a pre-constructed graph to obtain communities of highly inter-connected nodes. See here for more details on the Leiden algorithm.
graph | A graph. Typically, the nodes are cells and edges are formed between similar cells. | |
weights | Pointer to an array of weights of length equal to the number of edges in graph . This should be in the same order as the edge list in graph . Alternatively NULL , if the graph is unweighted. | |
options | Further options. | |
[out] | output | On output, this is filtered with the clustering results. The input value is ignored, so this object can be re-used across multiple calls to cluster_leiden() . |
|
inline |
Overload of cluster_leiden()
that accepts C++ containers instead of the raw igraph pointers.
graph | A graph. Typically, the nodes are cells and edges are formed between similar cells. |
weights | Vector of weights of length equal to the number of edges in graph . This should be in the same order as the edge list in graph . |
options | Further options. |
|
inline |
Run the multi-level community detection algorithm on a pre-constructed graph to obtain communities of highly inter-connected nodes. See here for more details on the multi-level algorithm.
graph | A graph. Typically, the nodes are cells and edges are formed between similar cells. | |
weights | Pointer to an array of weights of length equal to the number of edges in graph . This should be in the same order as the edge list in graph . Alternatively NULL , if the graph is unweighted. | |
options | Further options. | |
[out] | output | On output, this is filtered with the clustering results. The input value is ignored, so this object can be re-used across multiple calls to cluster_multilevel() . |
|
inline |
Overload of cluster_multilevel()
that accepts C++ containers instead of the raw igraph pointers.
graph | A graph. Typically, the nodes are cells and edges are formed between similar cells. |
weights | Vector of weights of length equal to the number of edges in graph . This should be in the same order as the edge list in graph . |
options | Further options. |
|
inline |
Run the Walktrap community detection algorithm on a pre-constructed graph to obtain communities of highly inter-connected nodes. See here for more details on the Walktrap algorithm.
graph | A graph. Typically, the nodes are cells and edges are formed between similar cells. | |
weights | Pointer to an array of weights of length equal to the number of edges in graph . This should be in the same order as the edge list in graph . Alternatively NULL , if the graph is unweighted. | |
options | Further options. | |
[out] | output | On output, this is filtered with the clustering results. The input value is ignored, so this object can be re-used across multiple calls to cluster_walktrap() . |
|
inline |
Overload of cluster_walktrap()
that accepts C++ containers instead of the raw igraph pointers.
graph | A graph. Typically, the nodes are cells and edges are formed between similar cells. |
weights | Vector of weights of length equal to the number of edges in graph . This should be in the same order as the edge list in graph . |
options | Further options. |
raiigraph::Graph scran_graph_cluster::edges_to_graph | ( | const std::size_t | double_edges, |
const Vertex_ *const | edges, | ||
const std::size_t | num_vertices, | ||
const igraph_bool_t | directed ) |
Create an raiigraph:Graph
object from the edges.
Vertex_ | Integer type of the vertex IDs. |
double_edges | The number of edges multiplied by two. | |
[in] | edges | Pointer to an array of length double_edges . edges[2*i] and edges[2*i+1] define the vertices for edge i . For directed graphs, the edge starts from the first vertex and ends at the second vertex. |
num_vertices | Number of vertices in the graph. | |
directed | Whether the graph is directed. This should be one of IGRAPH_DIRECTED or IGRAPH_UNDIRECTED . |
edges
.