scranpy package¶
Submodules¶
scranpy.adt_quality_control module¶
- class scranpy.adt_quality_control.ComputeAdtQcMetricsResults(sum, detected, subset_sum)[source]¶
Bases:
object
Results of
compute_adt_qc_metrics()
.- __annotations__ = {'detected': <class 'numpy.ndarray'>, 'subset_sum': <class 'biocutils.NamedList.NamedList'>, 'sum': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'detected': Field(name='detected',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'subset_sum': Field(name='subset_sum',type=<class 'biocutils.NamedList.NamedList'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'sum': Field(name='sum',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('sum', 'detected', 'subset_sum')¶
- __repr__()¶
Return repr(self).
-
detected:
ndarray
¶ Integer array of length equal to the number of cells, containing the number of detected ADTs in each cell.
-
subset_sum:
NamedList
¶ List of length equal to the number of
subsets
incompute_adt_qc_metrics()
. Each element corresponds to a subset of ADTs and is a NumPy array of length equal to the number of cells. Each entry of the array contains the sum of counts for that subset in each cell.
-
sum:
ndarray
¶ Floating-point array of length equal to the number of cells, containing the sum of counts across all ADTs for each cell.
- to_biocframe(flatten=True)[source]¶
Convert the results into a
BiocFrame
.- Parameters:
flatten (
bool
) – Whether to flatten the subset sums into separate columns. IfTrue
, each entry ofsubset_sum
is represented by asubset_sum_<NAME>
column, where<NAME>
is the the name of each entry (if available) or its index (otherwise). IfFalse
,subset_sum
is represented by a nestedBiocFrame
.- Returns:
A
BiocFrame
where each row corresponds to a cell and each column is one of the metrics.
- class scranpy.adt_quality_control.SuggestAdtQcThresholdsResults(detected, subset_sum, block)[source]¶
Bases:
object
Results of
suggest_adt_qc_thresholds()
.- __annotations__ = {'block': typing.Optional[list], 'detected': typing.Union[biocutils.NamedList.NamedList, float], 'subset_sum': <class 'biocutils.NamedList.NamedList'>}¶
- __dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[list],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'detected': Field(name='detected',type=typing.Union[biocutils.NamedList.NamedList, float],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'subset_sum': Field(name='subset_sum',type=<class 'biocutils.NamedList.NamedList'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('detected', 'subset_sum', 'block')¶
- __repr__()¶
Return repr(self).
-
block:
Optional
[list
]¶ Levels of the blocking factor. Each entry corresponds to a element of
detected
, etc., ifblock
was provided insuggest_adt_qc_thresholds()
. This is set toNone
if no blocking was performed.
-
detected:
Union
[NamedList
,float
]¶ Threshold on the number of detected ADTs. Cells with lower numbers of detected ADTs are considered to be of low quality.
If
block
is provided insuggest_adt_qc_thresholds()
, a list is returned containing a separate threshold for each level of the factor. Otherwise, a single float is returned containing the threshold for all cells.
-
subset_sum:
NamedList
¶ Thresholds on the sum of counts in each ADT subset. Each element of the list corresponds to a ADT subset. Cells with higher sums than the threshold for any subset are considered to be of low quality.
If
block
is provided insuggest_adt_qc_thresholds()
, each entry of the returned list is anotherNamedList
containing a separate threshold for each level. Otherwise, each entry of the list is a single float containing the threshold for all cells.
- scranpy.adt_quality_control.compute_adt_qc_metrics(x, subsets, num_threads=1)[source]¶
Compute quality control metrics from ADT count data.
- Parameters:
x (
Any
) – A matrix-like object containing ADT counts.subsets (
Union
[Mapping
,Sequence
]) –Subsets of ADTs corresponding to control features like IgGs. This may be either:
A list of arrays. Each array corresponds to an ADT subset and can either contain boolean or integer values. For booleans, the array should be of length equal to the number of rows, and values should be truthy for rows that belong in the subset. For integers, each element of the array is treated the row index of an ADT in the subset.
A dictionary where keys are the names of each ADT subset and the values are arrays as described above.
A
NamedList
where each element is an array as described above, possibly with names.
num_threads (
int
) – Number of threads to use.
- Return type:
- Returns:
QC metrics computed from the ADT count matrix for each cell.
References
The
compute_adt_qc_metrics
function in the scran_qc C++ library, which describes the rationale behind these QC metrics.
- scranpy.adt_quality_control.filter_adt_qc_metrics(thresholds, metrics, block=None)[source]¶
Filter for high-quality cells based on ADT-derived QC metrics.
- Parameters:
thresholds (
SuggestAdtQcThresholdsResults
) – Filter thresholds on the QC metrics, typically computed withsuggest_adt_qc_thresholds()
.metrics (
ComputeAdtQcMetricsResults
) – ADT-derived QC metrics, typically computed withcompute_adt_qc_metrics()
.block (
Optional
[Sequence
]) – Blocking factor specifying the block of origin (e.g., batch, sample) for each cell inmetrics
. The levels should be a subset of those used insuggest_adt_qc_thresholds()
.
- Return type:
- Returns:
A NumPy vector of length equal to the number of cells in
metrics
, containing truthy values for putative high-quality cells.
- scranpy.adt_quality_control.suggest_adt_qc_thresholds(metrics, block=None, min_detected_drop=0.1, num_mads=3.0)[source]¶
Suggest filter thresholds for the ADT-derived QC metrics, typically generated from
compute_adt_qc_metrics()
.- Parameters:
metrics (
ComputeAdtQcMetricsResults
) – ADT-derived QC metrics fromcompute_adt_qc_metrics()
.block (
Optional
[Sequence
]) – Blocking factor specifying the block of origin (e.g., batch, sample) for each cell inmetrics
. If supplied, a separate threshold is computed from the cells in each block. AlternativelyNone
, if all cells are from the same block.min_detected_drop (
float
) – Minimum proportional drop in the number of detected ADTs to consider a cell to be of low quality. Specifically, the filter threshold onmetrics.detected
must be no higher than the product ofmin_detected_drop
and the median number of ADTs, regardless ofnum_mads
.num_mads (
float
) – Number of MADs from the median to define the threshold for outliers in each QC metric.
- Return type:
- Returns:
Suggested filters on the relevant QC metrics.
References
The
compute_adt_qc_filters
andcompute_adt_qc_filters_blocked
functions in the scran_qc C++ library, which describes the rationale behind the suggested filters.
scranpy.aggregate_across_cells module¶
- class scranpy.aggregate_across_cells.AggregateAcrossCellsResults(sum, detected, combinations, counts, index)[source]¶
Bases:
object
Results of
aggregate_across_cells()
.- __annotations__ = {'combinations': <class 'biocutils.NamedList.NamedList'>, 'counts': <class 'numpy.ndarray'>, 'detected': <class 'numpy.ndarray'>, 'index': <class 'numpy.ndarray'>, 'sum': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'combinations': Field(name='combinations',type=<class 'biocutils.NamedList.NamedList'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'counts': Field(name='counts',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'detected': Field(name='detected',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'index': Field(name='index',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'sum': Field(name='sum',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('sum', 'detected', 'combinations', 'counts', 'index')¶
- __repr__()¶
Return repr(self).
-
combinations:
NamedList
¶ Sorted and unique combination of levels across all
factors
inaggregate_across_cells()
. Each entry of the list is another list that corresponds to an entry offactors
, where thei
-th combination is defined as thei
-th elements of all inner lists. Combinations are in the same order as the columns ofsum
anddetected
.
-
counts:
ndarray
¶ Number of cells associated with each combination. Each entry corresponds to a combination in
combinations
.
-
detected:
ndarray
¶ Integer matrix where each row corresponds to a gene and each column corresponds to a unique combination of grouping levels. Each entry contains the number of cells with detected expression in that combination.
-
index:
ndarray
¶ Integer vector of length equal to the number of cells. This specifies the combination in
combinations
associated with each cell.
-
sum:
ndarray
¶ Floating-point matrix where each row corresponds to a gene and each column corresponds to a unique combination of grouping levels. Each matrix entry contains the summed expression across all cells with that combination.
- to_summarizedexperiment(include_counts=True)[source]¶
Convert the results to a
SummarizedExperiment
.- Parameters:
include_counts (
bool
) – Whether to includecounts
in the column data. Users may need to set this toFalse
if a"counts"
factor is present incombinations
.- Returns:
A
SummarizedExperiment
wheresum
anddetected
are assays andcombinations
is stored in the column data.
- scranpy.aggregate_across_cells.aggregate_across_cells(x, factors, num_threads=1)[source]¶
Aggregate expression values across cells based on one or more grouping factors. This is primarily used to create pseudo-bulk profiles for each cluster/sample combination.
- Parameters:
x (
Any
) – A matrix-like object where rows correspond to genes or genomic features and columns correspond to cells. Values are expected to be counts.factors (
Sequence
) – One or more grouping factors, seecombine_factors()
. If this is aNamedList
, any names will be retained in the output.num_threads (
int
) – Number of threads to use for aggregation.
- Return type:
- Returns:
Results of the aggregation, including the sum and the number of detected cells in each group for each gene.
References
The
aggregate_across_cells
function in the scran_aggregate C++ library, which implements the aggregation.
scranpy.aggregate_across_genes module¶
- scranpy.aggregate_across_genes.aggregate_across_genes(x, sets, average=False, num_threads=1)[source]¶
Aggregate expression values across genes, potentially with weights. This is typically used to summarize expression values for gene sets into a single per-cell score.
- Parameters:
x (
Any
) – Matrix-like object where rows correspond to genes or genomic features and columns correspond to cells. Values are expected to be log-expression values.sets (
Sequence
) – Sequence of integer arrays containing the row indices of genes in each set. Alternatively, each entry may be a tuple of length 2, containing an integer vector (row indices) and a numeric vector (weights). If this is aNamedList
, the names will be preserved in the output.average (
bool
) – Whether to compute the average rather than the sum.num_threads (
int
) – Number of threads to be used for aggregation.
- Return type:
- Returns:
List of length equal to that of
sets
. Each entry is a numeric vector of length equal to the number of columns inx
, containing the (weighted) sum/mean of expression values for the corresponding set across all cells.
References
The
aggregate_across_genes
function in the scran_aggregate C++ library, which implements the aggregation.
scranpy.analyze module¶
- class scranpy.analyze.AnalyzeResults(rna_qc_metrics, rna_qc_thresholds, rna_qc_filter, adt_qc_metrics, adt_qc_thresholds, adt_qc_filter, crispr_qc_metrics, crispr_qc_thresholds, crispr_qc_filter, combined_qc_filter, rna_filtered, adt_filtered, crispr_filtered, rna_size_factors, rna_normalized, adt_size_factors, adt_normalized, crispr_size_factors, crispr_normalized, rna_gene_variances, rna_highly_variable_genes, rna_pca, adt_pca, crispr_pca, combined_pca, block, mnn_corrected, tsne, umap, snn_graph, graph_clusters, kmeans_clusters, clusters, rna_markers, adt_markers, crispr_markers, rna_row_names, adt_row_names, crispr_row_names, column_names)[source]¶
Bases:
object
Results of
analyse()
.- __annotations__ = {'adt_filtered': typing.Optional[delayedarray.DelayedArray.DelayedArray], 'adt_markers': typing.Optional[scranpy.run_pca.RunPcaResults], 'adt_normalized': typing.Optional[delayedarray.DelayedArray.DelayedArray], 'adt_pca': typing.Optional[scranpy.run_pca.RunPcaResults], 'adt_qc_filter': typing.Optional[numpy.ndarray], 'adt_qc_metrics': typing.Optional[scranpy.adt_quality_control.ComputeAdtQcMetricsResults], 'adt_qc_thresholds': typing.Optional[scranpy.adt_quality_control.SuggestAdtQcThresholdsResults], 'adt_row_names': <class 'biocutils.Names.Names'>, 'adt_size_factors': typing.Optional[numpy.ndarray], 'block': typing.Optional[typing.Sequence], 'clusters': typing.Optional[numpy.ndarray], 'column_names': <class 'biocutils.Names.Names'>, 'combined_pca': typing.Union[typing.Literal['rna_pca', 'adt_pca', 'crispr_pca'], scranpy.scale_by_neighbors.ScaleByNeighborsResults], 'combined_qc_filter': <class 'numpy.ndarray'>, 'crispr_filtered': typing.Optional[delayedarray.DelayedArray.DelayedArray], 'crispr_markers': typing.Optional[scranpy.run_pca.RunPcaResults], 'crispr_normalized': typing.Optional[delayedarray.DelayedArray.DelayedArray], 'crispr_pca': typing.Optional[scranpy.run_pca.RunPcaResults], 'crispr_qc_filter': typing.Optional[numpy.ndarray], 'crispr_qc_metrics': typing.Optional[scranpy.crispr_quality_control.ComputeCrisprQcMetricsResults], 'crispr_qc_thresholds': typing.Optional[scranpy.crispr_quality_control.SuggestCrisprQcThresholdsResults], 'crispr_row_names': <class 'biocutils.Names.Names'>, 'crispr_size_factors': typing.Optional[numpy.ndarray], 'graph_clusters': typing.Optional[scranpy.cluster_graph.ClusterGraphResults], 'kmeans_clusters': typing.Optional[scranpy.cluster_graph.ClusterGraphResults], 'mnn_corrected': typing.Optional[scranpy.correct_mnn.CorrectMnnResults], 'rna_filtered': typing.Optional[delayedarray.DelayedArray.DelayedArray], 'rna_gene_variances': typing.Optional[scranpy.model_gene_variances.ModelGeneVariancesResults], 'rna_highly_variable_genes': typing.Optional[numpy.ndarray], 'rna_markers': typing.Optional[scranpy.run_pca.RunPcaResults], 'rna_normalized': typing.Optional[delayedarray.DelayedArray.DelayedArray], 'rna_pca': typing.Optional[scranpy.run_pca.RunPcaResults], 'rna_qc_filter': typing.Optional[numpy.ndarray], 'rna_qc_metrics': typing.Optional[scranpy.rna_quality_control.ComputeRnaQcMetricsResults], 'rna_qc_thresholds': typing.Optional[scranpy.rna_quality_control.SuggestRnaQcThresholdsResults], 'rna_row_names': typing.Optional[biocutils.Names.Names], 'rna_size_factors': typing.Optional[numpy.ndarray], 'snn_graph': typing.Optional[scranpy.build_snn_graph.GraphComponents], 'tsne': typing.Optional[numpy.ndarray], 'umap': typing.Optional[numpy.ndarray]}¶
- __dataclass_fields__ = {'adt_filtered': Field(name='adt_filtered',type=typing.Optional[delayedarray.DelayedArray.DelayedArray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'adt_markers': Field(name='adt_markers',type=typing.Optional[scranpy.run_pca.RunPcaResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'adt_normalized': Field(name='adt_normalized',type=typing.Optional[delayedarray.DelayedArray.DelayedArray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'adt_pca': Field(name='adt_pca',type=typing.Optional[scranpy.run_pca.RunPcaResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'adt_qc_filter': Field(name='adt_qc_filter',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'adt_qc_metrics': Field(name='adt_qc_metrics',type=typing.Optional[scranpy.adt_quality_control.ComputeAdtQcMetricsResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'adt_qc_thresholds': Field(name='adt_qc_thresholds',type=typing.Optional[scranpy.adt_quality_control.SuggestAdtQcThresholdsResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'adt_row_names': Field(name='adt_row_names',type=<class 'biocutils.Names.Names'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'adt_size_factors': Field(name='adt_size_factors',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'block': Field(name='block',type=typing.Optional[typing.Sequence],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'clusters': Field(name='clusters',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'column_names': Field(name='column_names',type=<class 'biocutils.Names.Names'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'combined_pca': Field(name='combined_pca',type=typing.Union[typing.Literal['rna_pca', 'adt_pca', 'crispr_pca'], scranpy.scale_by_neighbors.ScaleByNeighborsResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'combined_qc_filter': Field(name='combined_qc_filter',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'crispr_filtered': Field(name='crispr_filtered',type=typing.Optional[delayedarray.DelayedArray.DelayedArray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'crispr_markers': Field(name='crispr_markers',type=typing.Optional[scranpy.run_pca.RunPcaResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'crispr_normalized': Field(name='crispr_normalized',type=typing.Optional[delayedarray.DelayedArray.DelayedArray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'crispr_pca': Field(name='crispr_pca',type=typing.Optional[scranpy.run_pca.RunPcaResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'crispr_qc_filter': Field(name='crispr_qc_filter',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'crispr_qc_metrics': Field(name='crispr_qc_metrics',type=typing.Optional[scranpy.crispr_quality_control.ComputeCrisprQcMetricsResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'crispr_qc_thresholds': Field(name='crispr_qc_thresholds',type=typing.Optional[scranpy.crispr_quality_control.SuggestCrisprQcThresholdsResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'crispr_row_names': Field(name='crispr_row_names',type=<class 'biocutils.Names.Names'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'crispr_size_factors': Field(name='crispr_size_factors',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'graph_clusters': Field(name='graph_clusters',type=typing.Optional[scranpy.cluster_graph.ClusterGraphResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'kmeans_clusters': Field(name='kmeans_clusters',type=typing.Optional[scranpy.cluster_graph.ClusterGraphResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'mnn_corrected': Field(name='mnn_corrected',type=typing.Optional[scranpy.correct_mnn.CorrectMnnResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rna_filtered': Field(name='rna_filtered',type=typing.Optional[delayedarray.DelayedArray.DelayedArray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rna_gene_variances': Field(name='rna_gene_variances',type=typing.Optional[scranpy.model_gene_variances.ModelGeneVariancesResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rna_highly_variable_genes': Field(name='rna_highly_variable_genes',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rna_markers': Field(name='rna_markers',type=typing.Optional[scranpy.run_pca.RunPcaResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rna_normalized': Field(name='rna_normalized',type=typing.Optional[delayedarray.DelayedArray.DelayedArray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rna_pca': Field(name='rna_pca',type=typing.Optional[scranpy.run_pca.RunPcaResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rna_qc_filter': Field(name='rna_qc_filter',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rna_qc_metrics': Field(name='rna_qc_metrics',type=typing.Optional[scranpy.rna_quality_control.ComputeRnaQcMetricsResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rna_qc_thresholds': Field(name='rna_qc_thresholds',type=typing.Optional[scranpy.rna_quality_control.SuggestRnaQcThresholdsResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rna_row_names': Field(name='rna_row_names',type=typing.Optional[biocutils.Names.Names],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rna_size_factors': Field(name='rna_size_factors',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'snn_graph': Field(name='snn_graph',type=typing.Optional[scranpy.build_snn_graph.GraphComponents],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'tsne': Field(name='tsne',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'umap': Field(name='umap',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('rna_qc_metrics', 'rna_qc_thresholds', 'rna_qc_filter', 'adt_qc_metrics', 'adt_qc_thresholds', 'adt_qc_filter', 'crispr_qc_metrics', 'crispr_qc_thresholds', 'crispr_qc_filter', 'combined_qc_filter', 'rna_filtered', 'adt_filtered', 'crispr_filtered', 'rna_size_factors', 'rna_normalized', 'adt_size_factors', 'adt_normalized', 'crispr_size_factors', 'crispr_normalized', 'rna_gene_variances', 'rna_highly_variable_genes', 'rna_pca', 'adt_pca', 'crispr_pca', 'combined_pca', 'block', 'mnn_corrected', 'tsne', 'umap', 'snn_graph', 'graph_clusters', 'kmeans_clusters', 'clusters', 'rna_markers', 'adt_markers', 'crispr_markers', 'rna_row_names', 'adt_row_names', 'crispr_row_names', 'column_names')¶
- __repr__()¶
Return repr(self).
-
adt_filtered:
Optional
[DelayedArray
]¶ Matrix of ADT counts that has been filtered to only contain the high-quality cells in
combined_qc_filter
. If ADT data is not available, this is set toNone
instead.
-
adt_markers:
Optional
[RunPcaResults
]¶ Results of calling
score_markers()
onadt_normalized
. If ADT data is not available, this is set toNone
instead. This will also beNone
if no suitable clusterings are available.
-
adt_normalized:
Optional
[DelayedArray
]¶ Matrix of (log-)normalized expression values derived from ADT counts, as computed by
normalize_counts()
usingadt_size_factors
. If ADT data is not available, this is set toNone
instead.
-
adt_pca:
Optional
[RunPcaResults
]¶ Results of calling
run_pca()
onadt_normalized
. If ADT data is not available, this is set toNone
instead.
-
adt_qc_filter:
Optional
[ndarray
]¶ Results of
filter_adt_qc_metrics()
. If ADT data is not available, this is set toNone
instead.
-
adt_qc_metrics:
Optional
[ComputeAdtQcMetricsResults
]¶ Results of
compute_adt_qc_metrics()
. If ADT data is not available, this is set toNone
instead.
-
adt_qc_thresholds:
Optional
[SuggestAdtQcThresholdsResults
]¶ Results of
suggest_adt_qc_thresholds()
. If ADT data is not available, this is set toNone
instead.
-
adt_row_names:
Names
¶ Names for the tags in the ADT data. This is
None
if ADT data is not available or no names were supplied toanalyze()
.
-
adt_size_factors:
Optional
[ndarray
]¶ Size factors for the ADT count matrix, computed by
compute_clrm1_factors()
and centered withcenter_size_factors()
. If ADT data is not available, this is set toNone
instead.
-
block:
Optional
[Sequence
]¶ Sequence containing the blocking factor for all cells (after filtering, if
filter_cells = True
inanalyze()
). This is set toNone
if no blocking factor was supplied.
-
clusters:
Optional
[ndarray
]¶ Array containing a cluster assignment for each cell (after filtering, if
filter_cells = True
inanalyze()
). This may be derived fromgraph_clusters
orkmeans_clusters
, depending on the choice ofclusters_for_markers
inanalyze()
. If no suitable clusterings are available, this is set toNone
.
-
combined_pca:
Union
[Literal
['rna_pca'
,'adt_pca'
,'crispr_pca'
],ScaleByNeighborsResults
]¶ If only one modality is used for the downstream analysis, this is a string specifying the attribute containing the components to be used. If multiple modalities are to be combined for downstream analysis, this contains the results of
scale_by_neighbors()
on the PCs of those modalities.
-
combined_qc_filter:
ndarray
¶ Array of booleans indicating which cells are of high quality and should be retained for downstream analyses.
-
crispr_filtered:
Optional
[DelayedArray
]¶ Matrix of CRISPR counts that has been filtered to only contain the high-quality cells in
combined_qc_filter
. If CRISPR data is not available, this is set toNone
instead.
-
crispr_markers:
Optional
[RunPcaResults
]¶ Results of calling
score_markers()
oncrispr_normalized
. If CRISPR data is not available, this is set toNone
instead. This will also beNone
if no suitable clusterings are available.
-
crispr_normalized:
Optional
[DelayedArray
]¶ Matrix of (log-)normalized expression values derived from CRISPR counts, as computed by
normalize_counts()
usingcrispr_size_factors
. If CRISPR data is not available, this is set toNone
instead.
-
crispr_pca:
Optional
[RunPcaResults
]¶ Results of calling
run_pca()
oncrispr_normalized
. If CRISPR data is not available, this is set toNone
instead.
-
crispr_qc_filter:
Optional
[ndarray
]¶ Results of
filter_crispr_qc_metrics()
. If CRISPR data is not available, this is set toNone
instead.
-
crispr_qc_metrics:
Optional
[ComputeCrisprQcMetricsResults
]¶ Results of
compute_crispr_qc_metrics()
. If CRISPR data is not available, this is set toNone
instead.
-
crispr_qc_thresholds:
Optional
[SuggestCrisprQcThresholdsResults
]¶ Results of
suggest_crispr_qc_thresholds()
. If CRISPR data is not available, this is set toNone
instead.
-
crispr_row_names:
Names
¶ Names for the guides in the CRISPR data. This is
None
if CRISPR data is not available or no names were supplied toanalyze()
.
-
crispr_size_factors:
Optional
[ndarray
]¶ Size factors for the CRISPR count matrix, derived from the sum of counts for each cell and centered with
center_size_factors()
. If CRISPR data is not available, this is set toNone
instead.
-
graph_clusters:
Optional
[ClusterGraphResults
]¶ Results of
cluster_graph()
. This isNone
if graph-based clustering was not performed.
-
kmeans_clusters:
Optional
[ClusterGraphResults
]¶ Results of
cluster_kmeans()
. This isNone
if k-means clustering was not performed.
-
mnn_corrected:
Optional
[CorrectMnnResults
]¶ Results of
correct_mnn()
on the PCs in or referenced bycombined_pca
. If no blocking factor is supplied, this is set toNone
instead.
-
rna_filtered:
Optional
[DelayedArray
]¶ Matrix of RNA counts that has been filtered to only contain the high-quality cells in
combined_qc_filter
. If RNA data is not available, this is set toNone
instead.
-
rna_gene_variances:
Optional
[ModelGeneVariancesResults
]¶ Results of
model_gene_variances()
. If RNA data is not available, this is set toNone
instead.
-
rna_highly_variable_genes:
Optional
[ndarray
]¶ Results of
choose_highly_variable_genes()
. If RNA data is not available, this is set toNone
instead.
-
rna_markers:
Optional
[RunPcaResults
]¶ Results of calling
score_markers()
onrna_normalized
. If RNA data is not available, this is set toNone
instead. This will also beNone
if no suitable clusterings are available.
-
rna_normalized:
Optional
[DelayedArray
]¶ Matrix of (log-)normalized expression values derived from RNA counts, as computed by
normalize_counts()
usingrna_size_factors
. If RNA data is not available, this is set toNone
instead.
-
rna_pca:
Optional
[RunPcaResults
]¶ Results of calling
run_pca()
onrna_normalized
using therna_highly_variable_genes
subset. If RNA data is not available, this is set toNone
instead.
-
rna_qc_filter:
Optional
[ndarray
]¶ Results of
filter_rna_qc_metrics()
. If RNA data is not available, this is set toNone
instead.
-
rna_qc_metrics:
Optional
[ComputeRnaQcMetricsResults
]¶ Results of
compute_rna_qc_metrics()
. If RNA data is not available, this is set toNone
instead.
-
rna_qc_thresholds:
Optional
[SuggestRnaQcThresholdsResults
]¶ Results of
suggest_rna_qc_thresholds()
. If RNA data is not available, this is set toNone
instead.
-
rna_row_names:
Optional
[Names
]¶ Names for the genes in the RNA data. This is
None
if RNA data is not available or no names were supplied toanalyze()
.
-
rna_size_factors:
Optional
[ndarray
]¶ Size factors for the RNA count matrix, derived from the sum of counts for each cell and centered with
center_size_factors()
. If RNA data is not available, this is set toNone
instead.
-
snn_graph:
Optional
[GraphComponents
]¶ Results of
build_snn_graph()
. This isNone
if graph-based clustering was not performed.
- to_singlecellexperiment(main_modality=None, flatten_qc_subsets=True, include_per_block_variances=False)[source]¶
Convert the results into a
SingleCellExperiment
.- Parameters:
main_modality (
Optional
[Literal
['rna'
,'adt'
,'crispr'
]]) – Modality to use as the main experiment. If other modalities are present, they are stored in the alternative experiments. IfNone
, it defaults to RNA, then ADT, then CRISPR, depending on which modalities are available.flatten_qc_subsets (
bool
) – Whether to flatten QC feature subsets, see theto_biocframe()
method of theComputeRnaQcMetricsResults
class for more details.include_per_block_variances (
bool
) – Whether to compute the per-block variances, see theto_biocframe()
method of theModelGeneVariancesResults
class for more details.
- Returns:
A
SingleCellExperiment
containing the filtered and normalized matrices in the assays. QC metrics, size factors and clustering results are stored in the column data. PCA and other low-dimensional embeddings are stored in the reduced dimensions. Additional modalities are stored as alternative experiments.
-
tsne:
Optional
[ndarray
]¶ Results of
run_tsne()
. This isNone
if t-SNE was not performed.
-
umap:
Optional
[ndarray
]¶ Results of
run_umap()
. This isNone
if UMAP was not performed.
- scranpy.analyze.analyze(rna_x, adt_x=None, crispr_x=None, block=None, rna_subsets=[], adt_subsets=[], suggest_rna_qc_thresholds_options={}, suggest_adt_qc_thresholds_options={}, suggest_crispr_qc_thresholds_options={}, filter_cells=True, center_size_factors_options={}, compute_clrm1_factors_options={}, normalize_counts_options={}, model_gene_variances_options={}, choose_highly_variable_genes_options={}, run_pca_options={}, use_rna_pcs=True, use_adt_pcs=True, use_crispr_pcs=True, scale_by_neighbors_options={}, correct_mnn_options={}, run_umap_options={}, run_tsne_options={}, build_snn_graph_options={}, cluster_graph_options={}, run_all_neighbor_steps_options={}, kmeans_clusters=None, cluster_kmeans_options={}, clusters_for_markers=['graph', 'kmeans'], score_markers_options={}, nn_parameters=<knncolle.annoy.AnnoyParameters object>, rna_assay=0, adt_assay=0, crispr_assay=0, num_threads=3)[source]¶
Run through a simple single-cell analysis pipeline, starting from a count matrix and ending with clusters, visualizations and markers. This also supports integration of multiple modalities and correction of batch effects.
- Parameters:
A matrix-like object containing RNA counts. This should have the same number of columns as the other
*_x
arguments.Alternatively, a
SummarizedExperiment
object containing such a matrix in itsrna_assay
.Alternatively
None
, if no RNA counts are available.A matrix-like object containing ADT counts. This should have the same number of columns as the other
*_x
arguments.Alternatively, a
SummarizedExperiment
object containing such a matrix in itsadt_assay
.Alternatively
None
, if no ADT counts are available.A matrix-like object containing CRISPR counts. This should have the same number of columns as the other
*_x
arguments.Alternatively, a
SummarizedExperiment
object containing such a matrix in itscrispr_assay
.Alternatively
None
, if no CRISPR counts are available.block (
Optional
[Sequence
]) – Factor specifying the block of origin (e.g., batch, sample) for each cell in the*_x
matrices. AlternativelyNone
, if all cells are from the same block.rna_subsets (
Union
[Mapping
,Sequence
]) – Gene subsets for quality control, typically used for mitochondrial genes. Check out thesubsets
arguments incompute_rna_qc_metrics()
for details.adt_subsets (
Union
[Mapping
,Sequence
]) – ADT subsets for quality control, typically used for IgG controls. Check out thesubsets
arguments incompute_adt_qc_metrics()
for details.suggest_rna_qc_thresholds_options (
dict
) – Arguments to pass tosuggest_rna_qc_thresholds()
.suggest_adt_qc_thresholds_options (
dict
) – Arguments to pass tosuggest_adt_qc_thresholds()
.suggest_crispr_qc_thresholds_options (
dict
) – Arguments to pass tosuggest_crispr_qc_thresholds()
.filter_cells (
bool
) – Whether to filter the count matrices to only retain high-quality cells in all modalities. IfFalse
, QC metrics and thresholds are still computed but are not used to filter the count matrices.center_size_factors_options (
dict
) – Arguments to pass tocenter_size_factors()
.compute_clrm1_factors_options (
dict
) – Arguments to pass tocompute_clrm1_factors()
. Only used if code{adt.x} is provided.normalize_counts_options (
dict
) – Arguments to pass tonormalize_counts()
.model_gene_variances_options (
dict
) – Arguments to pass tomodel_gene_variances()
. Only used if code{rna.x} is provided.choose_highly_variable_genes_options (
dict
) – Arguments to pass tochoose_highly_variable_genes()
. Only used if code{rna.x} is provided.use_rna_pcs (
bool
) – Whether to use the RNA-derived PCs for downstream steps (i.e., clustering, visualization). Only used if code{rna.x} is provided.use_adt_pcs (
bool
) – Whether to use the ADT-derived PCs for downstream steps (i.e., clustering, visualization). Only used if code{adt.x} is provided.use_crispr_pcs (
bool
) – Whether to use the CRISPR-derived PCs for downstream steps (i.e., clustering, visualization). Only used if code{crispr.x} is provided.scale_by_neighbors_options (
dict
) – Arguments to pass toscale_by_neighbors()
. Only used if multiple modalities are available and their correspondinguse_*_pca
arguments areTrue
.correct_mnn_options (
dict
) – Arguments to pass tocorrect_mnn()
. Only used ifblock
is supplied.run_tsne_options (
Optional
[dict
]) – Arguments to pass torun_tsne()
. IfNone
, t-SNE is not performed.run_umap_options (
Optional
[dict
]) – Arguments to pass torun_umap()
. IfNone
, UMAP is not performed.build_snn_graph_options (
Optional
[dict
]) – Arguments to pass tobuild_snn_graph()
. Ignored ifcluster_graph_options = None
.cluster_graph_options (
dict
) – Arguments to pass tocluster_graph()
. IfNone
, graph-based clustering is not performed.run_all_neighbor_steps_options (
dict
) – Arguments to pass torun_all_neighbor_steps()
.kmeans_clusters (
Optional
[int
]) – Number of clusters to use in k-means clustering. IfNone
, k-means clustering is not performed.cluster_kmeans_options (
dict
) – Arguments to pass tocluster_kmeans()
. Ignored ifkmeans_clusters = None
.clusters_for_markers (
list
) – List of clustering algorithms (eithergraph
orkmeans
), specifying the clustering to be used for marker detection. The first available clustering will be chosen.score_markers_options (
dict
) – Arguments to pass toscore_markers()
. Ignored if no suitable clusterings are available.nn_parameters (
Parameters
) – Algorithm to use for nearest-neighbor searches in the various steps.rna_assay (
Union
[int
,str
]) – Integer or string specifying the assay to use ifrna_x
is aSummarizedExperiment
.adt_assay (
Union
[int
,str
]) – Integer or string specifying the assay to use ifadt_x
is aSummarizedExperiment
.crispr_assay (
Union
[int
,str
]) – Integer or string specifying the assay to use ifcrispr_x
is aSummarizedExperiment
.num_threads (
int
) – Number of threads to use in each step.
- Return type:
- Returns:
The results of the entire analysis, including the results from each step.
References
C++ libraries in the libscran GitHub organization, which implement all of these steps.
scranpy.build_snn_graph module¶
- class scranpy.build_snn_graph.GraphComponents(vertices, edges, weights)[source]¶
Bases:
object
Components of a (possibly weighted) graph. Typically, nodes are cells and edges are formed between cells with similar expression profiles.
- __annotations__ = {'edges': <class 'numpy.ndarray'>, 'vertices': <class 'int'>, 'weights': typing.Optional[numpy.ndarray]}¶
- __dataclass_fields__ = {'edges': Field(name='edges',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'vertices': Field(name='vertices',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'weights': Field(name='weights',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('vertices', 'edges', 'weights')¶
- __repr__()¶
Return repr(self).
- as_igraph()[source]¶
Convert to a
Graph
from the igraph package.- Returns:
A
Graph
for use with methods in the igraph package.
- scranpy.build_snn_graph.build_snn_graph(x, num_neighbors=10, weight_scheme='ranked', num_threads=1, nn_parameters=<knncolle.annoy.AnnoyParameters object>)[source]¶
Build a shared nearest neighbor (SNN) graph where each node is a cell. Edges are formed between cells that share one or more nearest neighbors, weighted by the number or importance of those shared neighbors.
- Parameters:
x (
Union
[ndarray
,FindKnnResults
,Index
]) –Numeric matrix where rows are dimensions and columns are cells, typically containing a low-dimensional representation from, e.g.,
run_pca()
.Alternatively, a
FindKnnResults
object containing existing neighbor search results. The number of neighbors should be the same asnum_neighbors
, otherwise a warning is raised.Alternatively, a
Index
object.num_neighbors (
int
) – Number of neighbors in the nearest-neighbor graph. Larger values generally result in broader clusters during community detection.weight_scheme (
Literal
['ranked'
,'number'
,'jaccard'
]) – Weighting scheme to use for the edges of the SNN graph, based on the number or ranking of the shared nearest neighbors.num_threads (
int
) – Number of threads to use.nn_parameters (
Parameters
) – The algorithm to use for the nearest-neighbor search. Only used ifx
is not a pre-built nearest-neighbor search index or a list of existing nearest-neighbor search results.
- Return type:
- Results:
The components of the SNN graph, to be used in community detection.
References
The
build_snn_graph
function in the scran_graph_cluster C++ library, which provides some more details on the weighting.
scranpy.center_size_factors module¶
- scranpy.center_size_factors.center_size_factors(size_factors, block=None, mode='lowest', in_place=False)[source]¶
Center size factors before computing normalized values from the count matrix. This ensures that the normalized values are on the same scale as the original counts for easier interpretation.
- Parameters:
size_factors (
ndarray
) – Floating-point array containing size factors for all cells.block (
Optional
[Sequence
]) – Block assignment for each cell. If provided, this should have length equal to the number of cells, where cells have the same value if and only if they are in the same block. Defaults toNone
, where all cells are treated as being part of the same block.mode (
Literal
['lowest'
,'per-block'
]) – How to scale size factors across blocks.lowest
will scale all size factors by the lowest per-block average.per-block
will center the size factors in each block separately. This argument is only used ifblock
is provided.in_place (
bool
) – Whether to modifysize_factors
in place. IfFalse
, a new array is returned. This argument only used ifsize_factors
is double-precision, otherwise a new array is always returned.
- Return type:
- Returns:
Array containing centered size factors. If
in_place = True
, this is a reference tosize_factors
.
References
The
center_size_factors
andcenter_size_factors_blocked
functions in the scran_norm C++ library, which describes the rationale behind centering.
scranpy.choose_highly_variable_genes module¶
- scranpy.choose_highly_variable_genes.choose_highly_variable_genes(stats, top=4000, larger=True, keep_ties=True, bound=None)[source]¶
Choose highly variable genes (HVGs), typically based on a variance-related statistic.
- Parameters:
stats (
ndarray
) – Array of variances (or a related statistic) across all genes. Typically the residuals frommodel_gene_variances()
used here.top (
int
) – Number of top genes to retain. Note that the actual number of retained genes may not be equal totop
, depending on the other options.larger (
bool
) – Whether larger values ofstats
represent more variable genes. If true, HVGs are defined from the largest values ofstats
.keep_ties (
bool
) – Whether to keep ties at thetop
-th most variable gene. This avoids arbitrary breaking of tied values.bound (
Optional
[float
]) – The lower bound (iflarger = True
) or upper bound (otherwise) to be applied tostats
. Genes are not considered to be HVGs if they do not pass this bound, even if they are within thetop
genes. Ignored ifNone
.
- Return type:
- Returns:
Array containing the indices of genes in
stats
that are considered to be highly variable.
References
The
choose_highly_variable_genes
function from the scran_variances library, which provides the underlying implementation.
scranpy.choose_pseudo_count module¶
- scranpy.choose_pseudo_count.choose_pseudo_count(size_factors, quantile=0.05, max_bias=1, min_value=1)[source]¶
Choose a suitable pseudo-count to control the bias introduced by log-transformation of normalized counts.
- Parameters:
- Return type:
- Returns:
Choice of pseudo-count, for use in
normalize_counts()
.
References
The
choose_pseudo_count
function in the scran_norm C++ library, which describes the rationale behind the choice of pseudo-count.
scranpy.cluster_graph module¶
- class scranpy.cluster_graph.ClusterGraphLeidenResults(status, membership, quality)[source]¶
Bases:
ClusterGraphResults
Clustering results from
cluster_graph()
whenmethod = "leiden"
.- __annotations__ = {'quality': <class 'float'>}¶
- __dataclass_fields__ = {'membership': Field(name='membership',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'quality': Field(name='quality',type=<class 'float'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'status': Field(name='status',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('status', 'membership', 'quality')¶
- __repr__()¶
Return repr(self).
- class scranpy.cluster_graph.ClusterGraphMultilevelResults(status, membership, levels, modularity)[source]¶
Bases:
ClusterGraphResults
Clustering results from
cluster_graph()
whenmethod = "multilevel"
.- __annotations__ = {'levels': tuple[numpy.ndarray], 'modularity': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'levels': Field(name='levels',type=tuple[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'membership': Field(name='membership',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'modularity': Field(name='modularity',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'status': Field(name='status',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('status', 'membership', 'levels', 'modularity')¶
- __repr__()¶
Return repr(self).
-
levels:
tuple
[ndarray
]¶ Clustering at each level of the algorithm. Each array corresponds to one level and contains the cluster assignment for each cell at that level.
-
modularity:
ndarray
¶ Modularity at each level. This has length equal to
levels
, and the largest value corresponds to the assignments reported inmembership
.
- class scranpy.cluster_graph.ClusterGraphResults(status, membership)[source]¶
Bases:
object
Clustering results from
cluster_graph()
.- __annotations__ = {'membership': <class 'numpy.ndarray'>, 'status': <class 'int'>}¶
- __dataclass_fields__ = {'membership': Field(name='membership',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'status': Field(name='status',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('status', 'membership')¶
- __repr__()¶
Return repr(self).
- class scranpy.cluster_graph.ClusterGraphWalktrapResults(status, membership, merges, modularity)[source]¶
Bases:
ClusterGraphResults
Clustering results from
cluster_graph()
whenmethod = "walktrap"
.- __annotations__ = {'merges': <class 'numpy.ndarray'>, 'modularity': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'membership': Field(name='membership',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'merges': Field(name='merges',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'modularity': Field(name='modularity',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'status': Field(name='status',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('status', 'membership', 'merges', 'modularity')¶
- __repr__()¶
Return repr(self).
- scranpy.cluster_graph.cluster_graph(x, method='multilevel', multilevel_resolution=1, leiden_resolution=1, leiden_objective='modularity', walktrap_steps=4, seed=42)[source]¶
Identify clusters of cells using a variety of community detection methods from a graph where similar cells are connected.
- Parameters:
x (
GraphComponents
) – Components of the graph to be clustered, typically produced bybuild_snn_graph()
.method (
Literal
['multilevel'
,'leiden'
,'walktrap'
]) – Community detection algorithm to use.multilevel_resolution (
float
) – Resolution of the clustering whenmethod = "multilevel"
. Larger values result in finer clusters.leiden_resolution (
float
) – Resolution of the clustering whenmethod = "leiden"
. Larger values result in finer clusters.leiden_objective (
Literal
['modularity'
,'cpm'
]) – Objective function to use whenmethod = "leiden"
.walktrap_steps (
int
) – Number of steps to use whenmethod = "walktrap"
.seed (
int
) – Random seed to use formethod = "multilevel"
or"leiden"
.
- Returns:
ClusterGraphMultilevelResults
, ifmethod = "multilevel"
.ClusterGraphLeidenResults
, ifmethod = "leiden"
.ClusterGraphWalktrapResults
, ifmethod = "walktrap"
.
All objects contain at least
status
, an indicator of whether the algorithm successfully completed; andmembership
, an array of cluster assignments for each node inx
.- Return type:
Clustering results, as a
References
https://igraph.org/c/html/latest/igraph-Community.html, for the underlying implementation of each clustering method.
The various
cluster_*
functions in the scran_graph_cluster C++ library, which wraps the igraph functions.
scranpy.cluster_kmeans module¶
- class scranpy.cluster_kmeans.ClusterKmeansResults(clusters, centers, iterations, status)[source]¶
Bases:
object
Results of
cluster_kmeans()
.- __annotations__ = {'centers': <class 'numpy.ndarray'>, 'clusters': <class 'numpy.ndarray'>, 'iterations': <class 'int'>, 'status': <class 'int'>}¶
- __dataclass_fields__ = {'centers': Field(name='centers',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'clusters': Field(name='clusters',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'iterations': Field(name='iterations',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'status': Field(name='status',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('clusters', 'centers', 'iterations', 'status')¶
- __repr__()¶
Return repr(self).
-
centers:
ndarray
¶ Matrix contaiing the coordinates of the cluster centroids. Dimensions are in the rows while centers are in the columns.
-
clusters:
ndarray
¶ Array containing the cluster assignment for each cell. Values are integers in [0, N) where N is the total number of clusters.
-
status:
int
¶ Convergence status. Any non-zero value indicates a convergence failure though the exact meaning depends on the choice of
refine_method
incluster_kmeans()
.
- scranpy.cluster_kmeans.cluster_kmeans(x, k, init_method='var-part', refine_method='hartigan-wong', var_part_optimize_partition=True, var_part_size_adjustment=1, lloyd_iterations=100, hartigan_wong_iterations=10, hartigan_wong_quick_transfer_iterations=50, hartigan_wong_quit_quick_transfer_failure=False, seed=5489, num_threads=1)[source]¶
Perform k-means clustering with a variety of different initialization and refinement algorithms.
- Parameters:
x (
ndarray
) – Input data matrix where rows are dimensions and columns are observations (i.e., cells).k (
int
) – Number of clusters.init_method (
Literal
['var-part'
,'kmeans++'
,'random'
]) – Initialization method for defining the initial centroid coordinates. Choices are variance partitioning (var-part
), kmeans++ (kmeans++
) or random initialization (random
).refine_method (
Literal
['hartigan-wong'
,'lloyd'
]) – Method to use to refine the cluster assignments and centroid coordinates. Choices are Lloyd’s algorithm (lloyd
) or the Hartigan-Wong algorithm (hartigan-wong
).var_part_optimize_partition (
bool
) – Whether each partition boundary should be optimized to reduce the sum of squares in the child partitions. Only used ifinit_method = "var-part"
.var_part_size_adjustment (
float
) – Floating-point value between 0 and 1, specifying the adjustment to the cluster size when prioritizing the next cluster to partition. Setting this to 0 will ignore the cluster size while setting this to 1 will generally favor larger clusters. Only used ifinit_method = "var-part"
.lloyd_iterations (
int
) – Maximmum number of iterations for the Lloyd algorithm.hartigan_wong_iterations (
int
) – Maximmum number of iterations for the Hartigan-Wong algorithm.hartigan_wong_quick_transfer_iterations (
int
) – Maximmum number of quick transfer iterations for the Hartigan-Wong algorithm.hartigan_wong_quit_quick_transfer_failure (
bool
) – Whether to quit the Hartigan-Wong algorithm upon convergence failure during quick transfer iterations.seed (
int
) – Seed to use for random or kmeans++ initialization.num.threads – Number of threads to use.
- Return type:
- Returns:
Results of k-means clustering on the observations.
References
https://ltla.github.io/CppKmeans, which describes the various initialization and refinement algorithms in more detail.
scranpy.combine_factors module¶
- scranpy.combine_factors.combine_factors(factors, keep_unused=False)[source]¶
Combine multiple categorical factors based on the unique combinations of levels from each factor.
- Parameters:
factors (
Sequence
) – Sequence containing factors of interest. Each entry corresponds to a factor and should be a sequence of the same length. Corresponding elements across all factors represent the combination of levels for a single observation.keep_unused (
bool
) – Whether to report unused combinations of levels. If any entry offactors
is aFactor
object, any unused levels will also be preserved.
- Returns:
Sorted and unique combinations of levels as a tuple. Each entry of the tuple is a list that corresponds to a factor in
factors
. Corresponding elements of each list define a single combination, i.e., thei
-th combination is defined by taking thei
-th element of each sequence in the tuple.Integer array of length equal to each sequence of
factors
, specifying the combination for each observation. Each entry is an indexi
into the sequences in the previous tuple.
- Return type:
Tuple containing
References
The
combine_factors
function in the scran_aggregate library, which provides the underlying implementation.
scranpy.compute_clrm1_factors module¶
- scranpy.compute_clrm1_factors.compute_clrm1_factors(x, num_threads=1)[source]¶
Compute size factors from an ADT count matrix using the CLRm1 method.
- Parameters:
- Return type:
- Returns:
Array containing the CLRm1 size factor for each cell. Note that these size factors are not centered and should be passed through, e.g.,
center_size_factors()
before normalization.
References
https://github.com/libscran/clrm1, for a description of the CLRm1 method.
scranpy.correct_mnn module¶
- class scranpy.correct_mnn.CorrectMnnResults(corrected, merge_order, num_pairs)[source]¶
Bases:
object
Results of
correct_mnn()
.- __annotations__ = {'corrected': <class 'numpy.ndarray'>, 'merge_order': list[str], 'num_pairs': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'corrected': Field(name='corrected',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'merge_order': Field(name='merge_order',type=list[str],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'num_pairs': Field(name='num_pairs',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('corrected', 'merge_order', 'num_pairs')¶
- __repr__()¶
Return repr(self).
-
corrected:
ndarray
¶ Floating-point matrix of the same dimensions as the
x
used incorrect_mnn()
, containing the corrected values.
- scranpy.correct_mnn.correct_mnn(x, block, num_neighbors=15, num_mads=3, robust_iterations=2, robust_trim=0.25, mass_cap=None, order=None, reference_policy='max-rss', nn_parameters=<knncolle.annoy.AnnoyParameters object>, num_threads=1)[source]¶
Apply mutual nearest neighbor (MNN) correction to remove batch effects from a low-dimensional matrix.
- Parameters:
x (
ndarray
) – Matrix of coordinates where rows are dimensions and columns are cells, typically generated byrun_pca()
.block (
Sequence
) – Factor specifying the block of origin (e.g., batch, sample) for each cell. Length should equal the number of columns inx
.num_neighbors (
int
) – Number of neighbors to use when identifying MNN pairs.num_mads (
int
) – Number of median absolute deviations to use for removing outliers in the center-of-mass calculations.robust_iterations (
int
) – Number of iterations for robust calculation of the center of mass.robust_trim (
float
) – Trimming proportion for robust calculation of the center of mass. This should be a value in [0, 1).mass_cap (
Optional
[int
]) – Cap on the number of observations to use for center-of-mass calculations on the reference dataset. A value of 100,000 may be appropriate for speeding up correction of very large datasets. IfNone
, no cap is used.order (
Optional
[Sequence
]) – Sequence containing the unique levels ofblock
in the desired merge order. IfNone
, a suitable merge order is automatically determined.reference_policy (
Literal
['max-rss'
,'max-size'
,'max-variance'
,'input'
]) – Policy to use to choose the first reference batch. This can be based on the largest batch (max-size
), the most variable batch (max-variance
), the batch with the largest residual sum of squares (max-rss
), or the first specified input (input
). Only used for automatic merges, i.e., whenorder = None
.nn_parameters (
Parameters
) – The nearest-neighbor algorithm to use.num_threads (
int
) – Number of threads to use.
- Return type:
- Returns:
The results of the MNN correction, including a matrix of the corrected coordinates and some additional diagnostics.
References
https://libscran.github.io/mnncorrect, which describes the MNN correction algorithm in more detail.
scranpy.crispr_quality_control module¶
- class scranpy.crispr_quality_control.ComputeCrisprQcMetricsResults(sum, detected, max_value, max_index)[source]¶
Bases:
object
Results of
compute_crispr_qc_metrics()
.- __annotations__ = {'detected': <class 'numpy.ndarray'>, 'max_index': <class 'numpy.ndarray'>, 'max_value': <class 'numpy.ndarray'>, 'sum': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'detected': Field(name='detected',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'max_index': Field(name='max_index',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'max_value': Field(name='max_value',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'sum': Field(name='sum',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('sum', 'detected', 'max_value', 'max_index')¶
- __repr__()¶
Return repr(self).
-
detected:
ndarray
¶ Integer array of length equal to the number of cells, containing the number of detected guides in each cell.
-
max_index:
ndarray
¶ Integer array of length equal to the number of cells, containing the row index of the guide with the maximum count in each cell.
-
max_value:
ndarray
¶ Floating-point array of length equal to the number of cells, containing the maximum count for each cell.
- class scranpy.crispr_quality_control.SuggestCrisprQcThresholdsResults(max_value, block)[source]¶
Bases:
object
Results of
suggest_crispr_qc_thresholds()
.- __annotations__ = {'block': typing.Optional[list], 'max_value': typing.Union[biocutils.NamedList.NamedList, float]}¶
- __dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[list],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'max_value': Field(name='max_value',type=typing.Union[biocutils.NamedList.NamedList, float],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('max_value', 'block')¶
- __repr__()¶
Return repr(self).
-
block:
Optional
[list
]¶ Levels of the blocking factor. Each entry corresponds to a element of
max_value
ifblock
was provided insuggest_crispr_qc_thresholds()
. This is set toNone
if no blocking was performed.
-
max_value:
Union
[NamedList
,float
]¶ Threshold on the maximum count in each cell. Cells with lower maxima are considered to be of low quality.
If
block
is provided insuggest_crispr_qc_thresholds()
, a list is returned containing a separate threshold for each level of the factor. Otherwise, a single float is returned containing the threshold for all cells.
- scranpy.crispr_quality_control.compute_crispr_qc_metrics(x, num_threads=1)[source]¶
Compute quality control metrics from CRISPR count data.
- Parameters:
- Returns:
QC metrics computed from the count matrix for each cell.
References
The
compute_crispr_qc_metrics
function in the scran_qc C++ library, which describes the rationale behind these QC metrics.
- scranpy.crispr_quality_control.filter_crispr_qc_metrics(thresholds, metrics, block=None)[source]¶
Filter for high-quality cells based on CRISPR-derived QC metrics.
- Parameters:
thresholds (
SuggestCrisprQcThresholdsResults
) – Filter thresholds on the QC metrics, typically computed withsuggest_crispr_qc_thresholds()
.metrics (
ComputeCrisprQcMetricsResults
) – CRISPR-derived QC metrics, typically computed withcompute_crispr_qc_metrics()
.block (
Optional
[Sequence
]) – Blocking factor specifying the block of origin (e.g., batch, sample) for each cell inmetrics
. The levels should be a subset of those used insuggest_crispr_qc_thresholds()
.
- Return type:
- Returns:
A NumPy vector of length equal to the number of cells in
metrics
, containing truthy values for putative high-quality cells.
- scranpy.crispr_quality_control.suggest_crispr_qc_thresholds(metrics, block=None, num_mads=3.0)[source]¶
Suggest filter thresholds for the CRISPR-derived QC metrics, typically generated from
compute_crispr_qc_metrics()
.- Parameters:
metrics (
ComputeCrisprQcMetricsResults
) – CRISPR-derived QC metrics fromcompute_crispr_qc_metrics()
.block (
Optional
[Sequence
]) – Blocking factor specifying the block of origin (e.g., batch, sample) for each cell inmetrics
. If supplied, a separate threshold is computed from the cells in each block. AlternativelyNone
, if all cells are from the same block.num_mads (
float
) – Number of MADs from the median to define the threshold for outliers in each QC metric.
- Return type:
- Returns:
Suggested filters on the relevant QC metrics.
References
The
compute_crispr_qc_filters
andcompute_crispr_qc_filters_blocked
functions in the scran_qc C++ library, which describes the rationale behind the suggested filters.
scranpy.fit_variance_trend module¶
- scranpy.fit_variance_trend.fit_variance_trend(mean, variance, mean_filter=True, min_mean=0.1, transform=True, span=0.3, use_min_width=False, min_width=1, min_window_count=200, num_threads=1)[source]¶
Fit a trend to the per-cell variances with respect to the mean.
- Parameters:
mean (
ndarray
) – Array containing the mean (log-)expression for each gene.variance (
ndarray
) – Array containing the variance in the (log-)expression for each gene. This should have length equal tomean
.mean_filter (
bool
) – Whether to filter on the means before trend fitting.min_mean (
float
) – The minimum mean of genes to use in trend fitting. Only used ifmean_filter = True
.transform (
bool
) – Whether a quarter-root transformation should be applied before trend fitting.span (
float
) – Span of the LOWESS smoother. Ignored ifuse_min_width = TRUE
.use_min_width (
bool
) – Whether a minimum width constraint should be applied to the LOWESS smoother. This is useful to avoid overfitting in high-density intervals.min_width (
float
) – Minimum width of the window to use whenuse_min_width = TRUE
.min_window_count (
int
) – Minimum number of observations in each window. Only used ifuse_min_width=TRUE
.num_threads (
int
) – Number of threads to use.
- Return type:
- Returns:
A tuple of two arrays. The first array contains the fitted value of the trend for each gene while the second array contains the residual.
References
The
fit_variance_trend
function in the scran_variances C++ library, for the underlying implementation.
scranpy.lib_scranpy module¶
- scranpy.lib_scranpy.aggregate_across_cells(arg0: int, arg1: numpy.ndarray, arg2: int) tuple ¶
- scranpy.lib_scranpy.build_snn_graph(arg0: numpy.ndarray, arg1: str, arg2: int) tuple ¶
- scranpy.lib_scranpy.center_size_factors(arg0: numpy.ndarray, arg1: numpy.ndarray | None, arg2: bool) None ¶
- scranpy.lib_scranpy.choose_highly_variable_genes(arg0: numpy.ndarray, arg1: int, arg2: bool, arg3: bool, arg4: float | None) numpy.ndarray ¶
- scranpy.lib_scranpy.choose_pseudo_count(arg0: numpy.ndarray, arg1: float, arg2: float, arg3: float) float ¶
- scranpy.lib_scranpy.cluster_kmeans(arg0: numpy.ndarray, arg1: int, arg2: str, arg3: str, arg4: bool, arg5: float, arg6: int, arg7: int, arg8: int, arg9: bool, arg10: int, arg11: int) tuple ¶
- scranpy.lib_scranpy.combine_factors(arg0: tuple, arg1: bool, arg2: numpy.ndarray) tuple ¶
- scranpy.lib_scranpy.compute_clrm1_factors(arg0: int, arg1: int) numpy.ndarray ¶
- scranpy.lib_scranpy.correct_mnn(arg0: numpy.ndarray, arg1: numpy.ndarray, arg2: int, arg3: float, arg4: int, arg5: float, arg6: int, arg7: int, arg8: numpy.ndarray | None, arg9: str, arg10: int) tuple ¶
- scranpy.lib_scranpy.filter_adt_qc_metrics(arg0: tuple, arg1: tuple, arg2: numpy.ndarray | None) numpy.ndarray ¶
- scranpy.lib_scranpy.filter_crispr_qc_metrics(arg0: tuple, arg1: tuple, arg2: numpy.ndarray | None) numpy.ndarray ¶
- scranpy.lib_scranpy.filter_rna_qc_metrics(arg0: tuple, arg1: tuple, arg2: numpy.ndarray | None) numpy.ndarray ¶
- scranpy.lib_scranpy.fit_variance_trend(arg0: numpy.ndarray, arg1: numpy.ndarray, arg2: bool, arg3: float, arg4: bool, arg5: float, arg6: bool, arg7: float, arg8: int, arg9: int) tuple ¶
- scranpy.lib_scranpy.model_gene_variances(arg0: int, arg1: numpy.ndarray | None, arg2: int, arg3: str, arg4: tuple, arg5: bool, arg6: float, arg7: bool, arg8: float, arg9: bool, arg10: float, arg11: int, arg12: int) tuple ¶
- scranpy.lib_scranpy.normalize_counts(arg0: int, arg1: numpy.ndarray, arg2: bool, arg3: float, arg4: float, arg5: bool) int ¶
- scranpy.lib_scranpy.run_pca(arg0: int, arg1: int, arg2: numpy.ndarray | None, arg3: str, arg4: tuple, arg5: bool, arg6: bool, arg7: bool, arg8: int, arg9: int, arg10: int, arg11: int) tuple ¶
- scranpy.lib_scranpy.run_tsne(arg0: numpy.ndarray, arg1: numpy.ndarray, arg2: float, arg3: int, arg4: int, arg5: int, arg6: int, arg7: int) numpy.ndarray ¶
- scranpy.lib_scranpy.run_umap(arg0: numpy.ndarray, arg1: numpy.ndarray, arg2: int, arg3: float, arg4: int, arg5: int, arg6: int, arg7: bool) numpy.ndarray ¶
- scranpy.lib_scranpy.sanitize_size_factors(arg0: numpy.ndarray, arg1: bool, arg2: bool, arg3: bool, arg4: bool) None ¶
- scranpy.lib_scranpy.scale_by_neighbors(arg0: list) numpy.ndarray ¶
- scranpy.lib_scranpy.score_gene_set(arg0: int, arg1: int, arg2: numpy.ndarray | None, arg3: str, arg4: tuple, arg5: bool, arg6: bool, arg7: int, arg8: int, arg9: int, arg10: int) tuple ¶
- scranpy.lib_scranpy.score_markers_pairwise(arg0: int, arg1: numpy.ndarray, arg2: int, arg3: numpy.ndarray | None, arg4: str, arg5: tuple, arg6: float, arg7: int, arg8: bool, arg9: bool, arg10: bool, arg11: bool) tuple ¶
- scranpy.lib_scranpy.score_markers_summary(arg0: int, arg1: numpy.ndarray, arg2: int, arg3: numpy.ndarray | None, arg4: str, arg5: tuple, arg6: float, arg7: int, arg8: bool, arg9: bool, arg10: bool, arg11: bool) tuple ¶
- scranpy.lib_scranpy.subsample_by_neighbors(arg0: numpy.ndarray, arg1: numpy.ndarray, arg2: int) numpy.ndarray ¶
- scranpy.lib_scranpy.suggest_adt_qc_thresholds(arg0: tuple, arg1: numpy.ndarray | None, arg2: float, arg3: float) tuple ¶
- scranpy.lib_scranpy.suggest_crispr_qc_thresholds(arg0: tuple, arg1: numpy.ndarray | None, arg2: float) tuple ¶
- scranpy.lib_scranpy.suggest_rna_qc_thresholds(arg0: tuple, arg1: numpy.ndarray | None, arg2: float) tuple ¶
- scranpy.lib_scranpy.summarize_effects(arg0: numpy.ndarray, arg1: int) tuple ¶
- scranpy.lib_scranpy.test_enrichment(arg0: numpy.ndarray, arg1: int, arg2: numpy.ndarray, arg3: int, arg4: bool, arg5: int) numpy.ndarray ¶
scranpy.model_gene_variances module¶
- class scranpy.model_gene_variances.ModelGeneVariancesResults(mean, variance, fitted, residual, per_block)[source]¶
Bases:
object
Results of
model_gene_variances()
.- __annotations__ = {'fitted': <class 'numpy.ndarray'>, 'mean': <class 'numpy.ndarray'>, 'per_block': typing.Optional[biocutils.NamedList.NamedList], 'residual': <class 'numpy.ndarray'>, 'variance': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'fitted': Field(name='fitted',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'mean': Field(name='mean',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'per_block': Field(name='per_block',type=typing.Optional[biocutils.NamedList.NamedList],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'residual': Field(name='residual',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'variance': Field(name='variance',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('mean', 'variance', 'fitted', 'residual', 'per_block')¶
- __repr__()¶
Return repr(self).
-
fitted:
ndarray
¶ Floating-point array of length equal to the number of genes, containing the fitted value of the mean-variance trend for each gene.
-
mean:
ndarray
¶ Floating-point array of length equal to the number of genes, containing the mean (log-)expression for each gene.
-
per_block:
Optional
[NamedList
]¶ List of per-block results, obtained from modelling the variances separately for each block of cells. Each entry is another
ModelGeneVariancesResults
object, containing the statistics for the corresponding block. This is only filled ifblock
was used inmodel_gene_variances()
, otherwise it is set toNone
.
-
residual:
ndarray
¶ Floating-point array of length equal to the number of genes, containing the residual from the mean-variance trend for each gene.
- scranpy.model_gene_variances.model_gene_variances(x, block=None, block_weight_policy='variable', variable_block_weight=(0, 1000), mean_filter=True, min_mean=0.1, transform=True, span=0.3, use_min_width=False, min_width=1, min_window_count=200, num_threads=1)[source]¶
Compute the variance in (log-)expression values for each gene, and model the trend in the variances with respect to the mean.
- Parameters:
x (
Any
) – A matrix-like object where rows correspond to genes or genomic features and columns correspond to cells. It is typically expected to contain log-expression values, e.g., fromnormalize_counts()
.block (
Optional
[Sequence
]) – Array of length equal to the number of columns ofx
, containing the block of origin (e.g., batch, sample) for each cell. AlternativelyNone
, if all cells are from the same block.block_weight_policy (
Literal
['variable'
,'equal'
,'none'
]) – Policy to use for weighting different blocks when computing the average for each statistic. Only used ifblock
is provided.variable_block_weight (
Tuple
) – Parameters for variable block weighting. This should be a tuple of length 2 where the first and second values are used as the lower and upper bounds, respectively, for the variable weight calculation. Only used ifblock
is provided andblock_weight_policy = "variable"
.mean_filter (
bool
) – Whether to filter on the means before trend fitting.min_mean (
float
) – The minimum mean of genes to use in trend fitting. Only used ifmean_filter = True
.transform (
bool
) – Whether a quarter-root transformation should be applied before trend fitting.span (
float
) – Span of the LOWESS smoother for trend fitting, seefit_variance_trend()
.use_min_width (
bool
) – Whether a minimum width constraint should be applied during trend fitting, seefit_variance_trend()
.min_width (
float
) – Minimum width of the smoothing window for trend fitting, seefit_variance_trend()
.min_window_count (
int
) – Minimum number of observations in each smoothing window for trend fitting, seefit_variance_trend()
.num_threads (
int
) – Number of threads to use.
- Return type:
- Returns:
The results of the variance modelling for each gene.
References
The
model_gene_variances
function in the scran_variances C++ library, for the underlying implementation.
scranpy.normalize_counts module¶
- scranpy.normalize_counts.normalize_counts(x, size_factors, log=True, pseudo_count=1, log_base=2, preserve_sparsity=False)[source]¶
Create a matrix of (log-transformed) normalized expression values. The normalization removes uninteresting per-cell differences due to sequencing efficiency and library size. The log-transformation ensures that any differences represent log-fold changes in downstream analysis steps; such relative changes in expression are more relevant than absolute changes.
- Parameters:
x (
Any
) –Matrix-like object containing cells in columns and features in rows, typically with count data.
Alternatively, a
InitializedMatrix
representing a count matrix, typically created byinitialize
.size_factors (
Sequence
) – Size factor for each cell. This should have length equal to the number of columns inx
.log (
bool
) – Whether log-transformation should be performed.pseudo_count (
float
) – Positive pseudo-count to add before log-transformation. Ignored iflog = False
.log_base (
float
) – Base of the log-transformation, ignored iflog = False
.preserve_sparsity (
bool
) – Whether to preserve sparsity whenpseudo_count != 1
. IfTrue
, users should manually addlog(pseudo_count, log_base)
to the returned matrix to obtain the desired log-transformed expression values. Ignored iflog = False
orpseudo_count = 1
.
- Return type:
- Returns:
If
x
is a matrix-like object, aDelayedArray
is returned containing the (log-transformed) normalized expression matrix.If
x
is anInitializedMatrix
, a newInitializedMatrix
is returned containing the normalized expression matrix.
References
The
normalize_counts
function in the scran_norm C++ library, for the rationale behind normalization and log-transformation.
scranpy.rna_quality_control module¶
- class scranpy.rna_quality_control.ComputeRnaQcMetricsResults(sum, detected, subset_proportion)[source]¶
Bases:
object
Results of
compute_rna_qc_metrics()
.- __annotations__ = {'detected': <class 'numpy.ndarray'>, 'subset_proportion': <class 'biocutils.NamedList.NamedList'>, 'sum': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'detected': Field(name='detected',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'subset_proportion': Field(name='subset_proportion',type=<class 'biocutils.NamedList.NamedList'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'sum': Field(name='sum',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('sum', 'detected', 'subset_proportion')¶
- __repr__()¶
Return repr(self).
-
detected:
ndarray
¶ Integer array of length equal to the number of cells, containing the number of detected genes in each cell.
-
subset_proportion:
NamedList
¶ Proportion of counts in each gene subset in each cell. Each list element corresponds to a gene subset and is a NumPy array of length equal to the number of cells. Each entry of the array contains the proportion of counts in that subset in each cell.
-
sum:
ndarray
¶ Floating-point array of length equal to the number of cells, containing the sum of counts across all genes for each cell.
- to_biocframe(flatten=True)[source]¶
Convert the results into a
BiocFrame
.- Parameters:
flatten (
bool
) – Whether to flatten the subset proportions into separate columns. IfTrue
, each entry ofsubset_proportion
is represented by asubset_proportion_<NAME>
column, where<NAME>
is the the name of each entry (if available) or its index (otherwise). IfFalse
,subset_proportion
is represented by a nestedBiocFrame
.- Returns:
A
BiocFrame
where each row corresponds to a cell and each column is one of the metrics.
- class scranpy.rna_quality_control.SuggestRnaQcThresholdsResults(sum, detected, subset_proportion, block)[source]¶
Bases:
object
Results of
suggest_rna_qc_thresholds()
.- __annotations__ = {'block': typing.Optional[list], 'detected': typing.Union[biocutils.NamedList.NamedList, float], 'subset_proportion': <class 'biocutils.NamedList.NamedList'>, 'sum': typing.Union[biocutils.NamedList.NamedList, float]}¶
- __dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[list],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'detected': Field(name='detected',type=typing.Union[biocutils.NamedList.NamedList, float],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'subset_proportion': Field(name='subset_proportion',type=<class 'biocutils.NamedList.NamedList'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'sum': Field(name='sum',type=typing.Union[biocutils.NamedList.NamedList, float],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('sum', 'detected', 'subset_proportion', 'block')¶
- __repr__()¶
Return repr(self).
-
block:
Optional
[list
]¶ Levels of the blocking factor. Each entry corresponds to a element of
sum
,detected
, etc., ifblock
was provided insuggest_rna_qc_thresholds()
. This is set toNone
if no blocking was performed.
-
detected:
Union
[NamedList
,float
]¶ Threshold on the number of detected genes. Cells with lower numbers of detected genes are considered to be of low quality.
If
block
is provided insuggest_rna_qc_thresholds()
, a list is returned containing a separate threshold for each level of the factor. Otherwise, a single float is returned containing the threshold for all cells.
-
subset_proportion:
NamedList
¶ Thresholds on the sum of counts in each gene subset. Each element of the list corresponds to a gene subset. Cells with higher sums than the threshold for any subset are considered to be of low quality.
If
block
is provided insuggest_rna_qc_thresholds()
, each entry of the returned list is anotherNamedList
containing a separate threshold for each level. Otherwise, each entry of the list is a single float containing the threshold for all cells.
-
sum:
Union
[NamedList
,float
]¶ Threshold on the sum of counts in each cell. Cells with lower totals are considered to be of low quality.
If
block
is provided insuggest_rna_qc_thresholds()
, a list is returned containing a separate threshold for each level of the factor. Otherwise, a single float is returned containing the threshold for all cells.
- scranpy.rna_quality_control.compute_rna_qc_metrics(x, subsets, num_threads=1)[source]¶
Compute quality control metrics from RNA count data.
- Parameters:
x (
Any
) – A matrix-like object containing RNA counts.subsets (
Union
[Mapping
,Sequence
]) –Subsets of genes corresponding to “control” features like mitochondrial genes. This may be either:
A list of arrays. Each array corresponds to an gene subset and can either contain boolean or integer values. For booleans, the array should be of length equal to the number of rows, and values should be truthy for rows that belong in the subset. For integers, each element of the array is treated the row index of an gene in the subset.
A dictionary where keys are the names of each gene subset and the values are arrays as described above.
A
NamedList
where each element is an array as described above, possibly with names.
num_threads (
int
) – Number of threads to use.
- Returns:
QC metrics computed from the count matrix for each cell.
References
The
compute_rna_qc_metrics
function in the scran_qc C++ library, which describes the rationale behind these QC metrics.
- scranpy.rna_quality_control.filter_rna_qc_metrics(thresholds, metrics, block=None)[source]¶
Filter for high-quality cells based on RNA-derived QC metrics.
- Parameters:
thresholds (
SuggestRnaQcThresholdsResults
) – Filter thresholds on the QC metrics, typically computed withsuggest_rna_qc_thresholds()
.metrics (
ComputeRnaQcMetricsResults
) – RNA-derived QC metrics, typically computed withcompute_rna_qc_metrics()
.block (
Optional
[Sequence
]) – Blocking factor specifying the block of origin (e.g., batch, sample) for each cell inmetrics
. The levels should be a subset of those used insuggest_rna_qc_thresholds()
.
- Return type:
- Returns:
A NumPy vector of length equal to the number of cells in
metrics
, containing truthy values for putative high-quality cells.
- scranpy.rna_quality_control.suggest_rna_qc_thresholds(metrics, block=None, num_mads=3.0)[source]¶
Suggest filter thresholds for the RNA-derived QC metrics, typically generated from
compute_rna_qc_metrics()
.- Parameters:
metrics (
ComputeRnaQcMetricsResults
) – RNA-derived QC metrics fromcompute_rna_qc_metrics()
.block (
Optional
[Sequence
]) – Blocking factor specifying the block of origin (e.g., batch, sample) for each cell inmetrics
. If supplied, a separate threshold is computed from the cells in each block. AlternativelyNone
, if all cells are from the same block.num_mads (
float
) – Number of MADs from the median to define the threshold for outliers in each QC metric.
- Return type:
- Returns:
Suggested filters on the relevant QC metrics.
References
The
compute_rna_qc_filters
andcompute_rna_qc_filters_blocked
functions in the scran_qc C++ library, which describes the rationale behind the suggested filters.
scranpy.run_all_neighbor_steps module¶
- class scranpy.run_all_neighbor_steps.RunAllNeighborStepsResults(run_tsne, run_umap, build_snn_graph, cluster_graph)[source]¶
Bases:
object
Results of
run_all_neighbor_steps()
.- __annotations__ = {'build_snn_graph': typing.Optional[scranpy.build_snn_graph.GraphComponents], 'cluster_graph': typing.Optional[scranpy.cluster_graph.ClusterGraphResults], 'run_tsne': typing.Optional[numpy.ndarray], 'run_umap': typing.Optional[numpy.ndarray]}¶
- __dataclass_fields__ = {'build_snn_graph': Field(name='build_snn_graph',type=typing.Optional[scranpy.build_snn_graph.GraphComponents],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'cluster_graph': Field(name='cluster_graph',type=typing.Optional[scranpy.cluster_graph.ClusterGraphResults],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'run_tsne': Field(name='run_tsne',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'run_umap': Field(name='run_umap',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('run_tsne', 'run_umap', 'build_snn_graph', 'cluster_graph')¶
- __repr__()¶
Return repr(self).
-
build_snn_graph:
Optional
[GraphComponents
]¶ Results of
build_snn_graph()
. This isNone
if clustering was not performed.
-
cluster_graph:
Optional
[ClusterGraphResults
]¶ Results of
cluster_graph()
. This isNone
if clustering was not performed.
-
run_tsne:
Optional
[ndarray
]¶ Results of
run_tsne()
. This isNone
if t-SNE was not performed.
-
run_umap:
Optional
[ndarray
]¶ Results of
run_umap()
. This isNone
if UMAP was not performed.
- scranpy.run_all_neighbor_steps.run_all_neighbor_steps(x, run_umap_options={}, run_tsne_options={}, build_snn_graph_options={}, cluster_graph_options={}, nn_parameters=<knncolle.annoy.AnnoyParameters object>, collapse_search=False, num_threads=3)[source]¶
Run all steps that depend on the nearest neighbor search - namely,
run_tsne()
,run_umap()
,build_snn_graph()
, andcluster_graph()
. This builds the index once and re-uses it for the neighbor search in each step; the various steps are also run in parallel to save more time.- Parameters:
Matrix of principal components where rows are cells and columns are PCs, typically produced by
run_pca()
.Alternatively, a
Index
instance containing a prebuilt search index for the cells.run_umap_options (
Optional
[dict
]) – Optional arguments forrun_umap()
. IfNone
, UMAP is not performed.run_tsne_options (
Optional
[dict
]) – Optional arguments forrun_tsne()
. IfNone
, t-SNE is not performed.build_snn_graph_options (
Optional
[dict
]) – Optional arguments forbuild_snn_graph()
. Ignored ifcluster_graph_options = None
.cluster_graph_options (
dict
) – Optional arguments forcluster_graph()
. IfNone
, graph-based clustering is not performed.nn_parameters (
Parameters
) – Parameters for the nearest-neighbor search.collapse_search (
bool
) – Whether to collapse the nearest-neighbor search for each step into a single search. Steps that need fewer neighbors will use a subset of the neighbors from the collapsed search. This is faster but may not give the same results as separate searches for some approximate search algorithms.num_threads (
int
) – Number of threads to use for the parallel execution of UMAP, t-SNE and SNN graph construction. This overrides the specified number of threads in the various*_options
arguments.
- Return type:
- Returns:
The results of each step. These should be equivalent to the result of running each step in serial.
scranpy.run_pca module¶
- class scranpy.run_pca.RunPcaResults(components, rotation, variance_explained, total_variance, center, scale, block)[source]¶
Bases:
object
Results of
run_pca()
.- __annotations__ = {'block': typing.Optional[list], 'center': <class 'numpy.ndarray'>, 'components': <class 'numpy.ndarray'>, 'rotation': <class 'numpy.ndarray'>, 'scale': typing.Optional[numpy.ndarray], 'total_variance': <class 'float'>, 'variance_explained': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[list],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'center': Field(name='center',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'components': Field(name='components',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rotation': Field(name='rotation',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'scale': Field(name='scale',type=typing.Optional[numpy.ndarray],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'total_variance': Field(name='total_variance',type=<class 'float'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'variance_explained': Field(name='variance_explained',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('components', 'rotation', 'variance_explained', 'total_variance', 'center', 'scale', 'block')¶
- __repr__()¶
Return repr(self).
-
block:
Optional
[list
]¶ Levels of the blocking factor, corresponding to each row of
center
. This isNone
if no blocking was performed.
-
center:
ndarray
¶ If
block
was used inrun_pca()
, this is a floating-point matrix containing the mean for each gene (column) in each block of cells (row). Otherwise, this is a floating-point array of length equal to the number of genes, containing the mean for each gene across all cells.
-
components:
ndarray
¶ Floating-point matrix of principal component (PC) scores. Rows are dimensions (i.e., PCs) and columns are cells.
-
rotation:
ndarray
¶ Floating-point rotation matrix. Rows are genes and columns are dimensions (i.e., PCs).
- scranpy.run_pca.run_pca(x, number=25, scale=False, block=None, block_weight_policy='variable', variable_block_weight=(0, 1000), components_from_residuals=False, extra_work=7, iterations=1000, seed=5489, realized=True, num_threads=1)[source]¶
Run a PCA on the gene-by-cell log-expression matrix to obtain a low-dimensional representation for downstream analyses.
- Parameters:
x (
Any
) – A matrix-like object where rows correspond to genes or genomic features and columns correspond to cells. Typically, the matrix is expected to contain log-expression values, and the rows should be filtered to relevant (e.g., highly variable) genes.number (
int
) – Number of PCs to retain.scale (
bool
) – Whether to scale all genes to have the same variance.block (
Optional
[Sequence
]) – Array of length equal to the number of columns ofx
, containing the block of origin (e.g., batch, sample) for each cell. AlternativelyNone
, if all cells are from the same block.block_weight_policy (
Literal
['variable'
,'equal'
,'none'
]) – Policy to use for weighting different blocks when computing the average for each statistic. Only used ifblock
is provided.variable_block_weight (
Tuple
) – Parameters for variable block weighting. This should be a tuple of length 2 where the first and second values are used as the lower and upper bounds, respectively, for the variable weight calculation. Only used ifblock
is provided andblock_weight_policy = "variable"
.components_from_residuals (
bool
) – Whether to compute the PC scores from the residuals in the presence of a blocking factor. IfFalse
, the residuals are only used to compute the rotation matrix, and the original expression values of the cells are projected onto this new space. Only used ifblock
is provided.extra_work (
int
) – Number of extra dimensions for the IRLBA workspace.iterations (
int
) – Maximum number of restart iterations for IRLBA.seed (
int
) – Seed for the initial random vector in IRLBA.realized (
bool
) – Whether to realizex
into an optimal memory layout for IRLBA. This speeds up computation at the cost of increased memory usage.num_threads (
int
) – Number of threads to use.
- Return type:
- Returns:
The results of the PCA.
References
https://libscran.github.io/scran_pca, which describes the approach in more detail. In particular, the documentation for the
blocked_pca
function explains the blocking strategy.
scranpy.run_tsne module¶
- scranpy.run_tsne.run_tsne(x, perplexity=30, num_neighbors=None, max_depth=20, leaf_approximation=False, max_iterations=500, seed=42, num_threads=1, nn_parameters=<knncolle.annoy.AnnoyParameters object>)[source]¶
Compute t-SNE coordinates to visualize similarities between cells.
- Parameters:
x (
Union
[ndarray
,FindKnnResults
,Index
]) –Numeric matrix where rows are dimensions and columns are cells, typically containing a low-dimensional representation from, e.g.,
run_pca()
.Alternatively, a
FindKnnResults
object containing existing neighbor search results. The number of neighbors should be the same asnum_neighbors
, otherwise a warning is raised.Alternatively, a
Index
object.perplexity (
float
) – Perplexity to use in the t-SNE algorithm. Larger values cause the embedding to focus on global structure.num_neighbors (
Optional
[int
]) – Number of neighbors in the nearest-neighbor graph. Typically derived fromperplexity
usingtsne_perplexity_to_neighbors()
.max_depth (
int
) – Maximum depth of the Barnes-Hut quadtree. Smaller values (7-10) improve speed at the cost of accuracy.leaf_approximation (
bool
) – Whether to use the “leaf approximation” approach, which sacrifices some accuracy for greater speed. Only effective whenmax_depth
is small enough for multiple cells to be assigned to the same leaf node of the quadtree.max_iterations (
int
) – Maximum number of iterations to perform.seed (
int
) – Random seed to use for generating the initial coordinates.num_threads (
int
) – Number of threads to use.nn_parameters (
Parameters
) – The algorithm to use for the nearest-neighbor search. Only used ifx
is not a pre-built nearest-neighbor search index or a list of existing nearest-neighbor search results.
- Return type:
- Returns:
Array containing the coordinates of each cell in a 2-dimensional embedding. Each row corresponds to a dimension and each column represents a cell.
References
https://libscran.github.io/qdtsne, for some more details on the approximations.
- scranpy.run_tsne.tsne_perplexity_to_neighbors(perplexity)[source]¶
Determine the number of nearest neighbors required to support a given perplexity in the t-SNE algorithm.
- Parameters:
perplexity (
float
) – Perplexity to use inrun_tsne()
.- Return type:
- Returns:
The corresponding number of nearest neighbors.
scranpy.run_umap module¶
- scranpy.run_umap.run_umap(x, num_dim=2, num_neighbors=15, num_epochs=None, min_dist=0.1, seed=1234567890, num_threads=1, parallel_optimization=False, nn_parameters=<knncolle.annoy.AnnoyParameters object>)[source]¶
Compute UMAP coordinates to visualize similarities between cells.
- Parameters:
x (
Union
[ndarray
,FindKnnResults
,Index
]) –Numeric matrix where rows are dimensions and columns are cells, typically containing a low-dimensional representation from, e.g.,
run_pca()
.Alternatively, a
FindKnnResults
object containing existing neighbor search results. The number of neighbors should be the same asnum_neighbors
, otherwise a warning is raised.Alternatively, a
Index
object.num_dim (
int
) – Number of dimensions in the UMAP embedding.num_neighbors (
int
) – Number of neighbors to use in the UMAP algorithm. Larger values cause the embedding to focus on global structure.num_epochs (
Optional
[int
]) – Number of epochs to perform. If set to None, an appropriate number of epochs is chosen based on the number of points inx
.min_dist (
float
) – Minimum distance between points in the embedding. Larger values result in more visual clusters that are more dispersed.seed (
int
) – Integer scalar specifying the seed to use.num_threads (
int
) – Number of threads to use.parallel_optimization (
bool
) – Whether to parallelize the optimization step.nn_parameters (
Parameters
) – The algorithm to use for the nearest-neighbor search. Only used ifx
is not a pre-built nearest-neighbor search index or a list of existing nearest-neighbor search results.
- Return type:
- Returns:
Array containing the coordinates of each cell in a 2-dimensional embedding. Each row corresponds to a dimension and each column represents a cell.
References
https://libscran.github.io/umappp, for the underlying implementation.
scranpy.sanitize_size_factors module¶
- scranpy.sanitize_size_factors.sanitize_size_factors(size_factors, replace_zero=True, replace_negative=True, replace_infinite=True, replace_nan=True, in_place=False)[source]¶
Replace invalid size factors, i.e., zero, negative, infinite or NaNs.
- Parameters:
size_factors (
ndarray
) – Floating-point array containing size factors for all cells.replace_zero (
bool
) – Whether to replace size factors of zero with the lowest positive factor. IfFalse
, zeros are retained.replace_negative (
bool
) – Whether to replace negative size factors with the lowest positive factor. IfFalse
, negative values are retained.replace_infinite (
bool
) – Whether to replace infinite size factors with the largest positive factor. IfFalse
, infinite values are retained.replace_nan (
bool
) – Whether to replace NaN size factors with unity. If False, NaN values are retained.in_place (
bool
) – Whether to modifysize_factors
in place. If False, a new array is returned. This argument only used ifsize_factors
is double-precision, otherwise a new array is always returned.
- Return type:
- Returns:
Array containing sanitized size factors. If
in_place = True
, this is a reference tosize_factors
.
References
The
sanitize_size_factors
function in the scran_norm C++ library, which provides the underlying implementation.
scranpy.scale_by_neighbors module¶
- class scranpy.scale_by_neighbors.ScaleByNeighborsResults(scaling, combined)[source]¶
Bases:
object
Results of
scale_by_neighbors()
.- __annotations__ = {'combined': <class 'numpy.ndarray'>, 'scaling': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'combined': Field(name='combined',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'scaling': Field(name='scaling',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('scaling', 'combined')¶
- __repr__()¶
Return repr(self).
-
combined:
ndarray
¶ Floating-point matrix of scaled embeddings. Each row corresponds to a dimension and each column corresponds to a cell. Formed by scaling each embedding in the
x
supplied toscale_by_neighbors()
by its corresponding entry ofscaling
, and then concatenating them together by row.
- scranpy.scale_by_neighbors.scale_by_neighbors(x, num_neighbors=20, num_threads=1, weights=None, nn_parameters=<knncolle.annoy.AnnoyParameters object>)[source]¶
Scale multiple embeddings (usually derived from different modalities across the same set of cells) so that their within-population variances are comparable. Then, combine them into a single embedding matrix for combined downstream analysis.
- Parameters:
x (
Sequence
) – Sequence of of numeric matrices of principal components or other embeddings, one for each modality. For each entry, rows are dimensions and columns are cells. All entries should have the same number of columns but may have different numbers of rows.num_neighbors (
int
) – Number of neighbors to use to define the scaling factor.num_threads (
int
) – Number of threads to use.nn_parameters (
Parameters
) – Algorithm for the nearest-neighbor search.weights (
Optional
[Sequence
]) – Array of length equal tox
, specifying the weights to apply to each modality. Each value represents a multiplier of the within-population variance of its modality, i.e., larger values increase the contribution of that modality in the combined output matrix. The default ofNone
is equivalent to an all-1 vector, i.e., all modalities are scaled to have the same within-population variance.
- Return type:
- Returns:
Scaling factors and the combined matrix from all modalities.
References
https://libscran.github.io/mumosa, for the basis and caveats of this approach.
scranpy.score_gene_set module¶
- class scranpy.score_gene_set.ScoreGeneSetResults(scores, weights)[source]¶
Bases:
object
Results of
score_gene_set()
.- __annotations__ = {'scores': <class 'numpy.ndarray'>, 'weights': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'scores': Field(name='scores',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'weights': Field(name='weights',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('scores', 'weights')¶
- __repr__()¶
Return repr(self).
- scranpy.score_gene_set.score_gene_set(x, set, rank=1, scale=False, block=None, block_weight_policy='variable', variable_block_weight=(0, 1000), extra_work=7, iterations=1000, seed=5489, realized=True, num_threads=1)[source]¶
Compute per-cell scores for a gene set, defined as the column sums of a rank-1 approximation to the submatrix for the feature set. This uses the same approach implemented in the GSDecon package by Jason Hackney.
- Parameters:
x (
Any
) – A matrix-like object where rows correspond to genes or genomic features and columns correspond to cells. The matrix is expected to contain log-expression values.set (
Sequence
) – Array of integer indices specifying the rows ofx
belonging to the gene set. Alternatively, a sequence of boolean values of length equal to the number of rows, where truthy elements indicate that the corresponding row belongs to the gene set.rank (
int
) – Rank of the approximation.scale (
bool
) – Whether to scale all genes to have the same variance.block (
Optional
[Sequence
]) – Array of length equal to the number of columns ofx
, containing the block of origin (e.g., batch, sample) for each cell. AlternativelyNone
, if all cells are from the same block.block_weight_policy (
Literal
['variable'
,'equal'
,'none'
]) – Policy to use for weighting different blocks when computing the average for each statistic. Only used ifblock
is provided.variable_block_weight (
Tuple
) – Parameters for variable block weighting. This should be a tuple of length 2 where the first and second values are used as the lower and upper bounds, respectively, for the variable weight calculation. Only used ifblock
is provided andblock_weight_policy = "variable"
.extra_work (
int
) – Number of extra dimensions for the IRLBA workspace.iterations (
int
) – Maximum number of restart iterations for IRLBA.seed (
int
) – Seed for the initial random vector in IRLBA.realized (
bool
) – Whether to realizex
into an optimal memory layout for IRLBA. This speeds up computation at the cost of increased memory usage.num_threads (
int
) – Number of threads to use.
- Return type:
- Returns:
Array of per-cell scores and per-gene weights.
References
https://libscran.github.io/gsdecon, which describes the approach in more detail. In particular, the documentation for the
compute_blocked
function explains the blocking strategy.
scranpy.score_markers module¶
- class scranpy.score_markers.ScoreMarkersResults(groups, mean, detected, cohens_d, auc, delta_mean, delta_detected)[source]¶
Bases:
object
Results of
score_markers()
.- __annotations__ = {'auc': typing.Union[numpy.ndarray, biocutils.NamedList.NamedList, NoneType], 'cohens_d': typing.Union[numpy.ndarray, biocutils.NamedList.NamedList, NoneType], 'delta_detected': typing.Union[numpy.ndarray, biocutils.NamedList.NamedList, NoneType], 'delta_mean': typing.Union[numpy.ndarray, biocutils.NamedList.NamedList, NoneType], 'detected': <class 'numpy.ndarray'>, 'groups': <class 'list'>, 'mean': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'auc': Field(name='auc',type=typing.Union[numpy.ndarray, biocutils.NamedList.NamedList, NoneType],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'cohens_d': Field(name='cohens_d',type=typing.Union[numpy.ndarray, biocutils.NamedList.NamedList, NoneType],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'delta_detected': Field(name='delta_detected',type=typing.Union[numpy.ndarray, biocutils.NamedList.NamedList, NoneType],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'delta_mean': Field(name='delta_mean',type=typing.Union[numpy.ndarray, biocutils.NamedList.NamedList, NoneType],default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'detected': Field(name='detected',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'groups': Field(name='groups',type=<class 'list'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'mean': Field(name='mean',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('groups', 'mean', 'detected', 'cohens_d', 'auc', 'delta_mean', 'delta_detected')¶
- __repr__()¶
Return repr(self).
-
auc:
Union
[ndarray
,NamedList
,None
]¶ Same as
cohens_d
but for the AUCs. Ifcompute_auc = False
, this isNone
.
-
cohens_d:
Union
[ndarray
,NamedList
,None
]¶ If
all_pairwise = False
, this is a named list ofGroupwiseSummarizedEffects
objects. Each object corresponds to a group in the same order asgroups
, and contains a summary of Cohen’s d from pairwise comparisons to all other groups. This includes the min, mean, median, max and min-rank.If
all_pairwise = True
, this is a 3-dimensional numeric array containing the Cohen’s d from each pairwise comparison between groups. The extents of the first two dimensions are equal to the number of groups, while the extent of the final dimension is equal to the number of genes. The entry[i, j, k]
represents Cohen’s d from the comparison of groupj
over groupi
for genek
.If
compute_cohens_d = False
, this isNone
.
-
delta_detected:
Union
[ndarray
,NamedList
,None
]¶ Same as
cohens_d
but for the delta-detected. Ifcompute_delta_detected = False
, this isNone
.
-
delta_mean:
Union
[ndarray
,NamedList
,None
]¶ Same as
cohens_d
but for the delta-mean. Ifcompute_delta_mean = False
, this isNone
.
-
detected:
ndarray
¶ Floating-point matrix containing the proportion of cells with detected expression for each gene in each group. Each row is a gene and each column is a group, ordered as in
groups
.
-
mean:
ndarray
¶ Floating-point matrix containing the mean expression for each gene in each group. Each row is a gene and each column is a group, ordered as in
groups
.
- to_biocframes(effect_sizes=None, summaries=None, include_mean=True, include_detected=True)[source]¶
Convert the effect size summaries into a
BiocFrame
for each group. This should only be used ifall_pairwise = False
inscore_markers()
.- Parameters:
effect_sizes (
Optional
[list
]) – List of effect sizes to include in eachBiocFrame
. This can contain any ofcohens_d
,auc
,delta_mean
, anddelta_detected
. IfNone
, all non-None
effect sizes are reported.summaries (
Optional
[list
]) – List of summary statistics to include in eachBiocFrame
. This can contain any ofmin
,mean
,median
,max
, andmin_rank
. IfNone
, all summary statistics are reported.include_mean (
bool
) – Whether to include the mean for each group.include_detected (
bool
) – Whether to include the detected proportion for each group.
- Return type:
- Returns:
A list of length equal to
groups
, containing aBiocFrame
with the effect size summaries for each group. Each row of theBiocFrame
corresponds toa gene. Each effect size summary is represented by a column named<EFFECT>_<SUMMARY>
. Ifinclude_mean = True
orinclude_detected = True
, additional columns will be present with the mean and detected proportion, respectively.The list itself is named according to
groups
if the elements can be converted to strings, otherwise it is unnamed.
- scranpy.score_markers.score_markers(x, groups, block=None, block_weight_policy='variable', variable_block_weight=(0, 1000), compute_delta_mean=True, compute_delta_detected=True, compute_cohens_d=True, compute_auc=True, threshold=0, all_pairwise=False, num_threads=1)[source]¶
Score marker genes for each group using a variety of effect sizes from pairwise comparisons between groups. This includes Cohen’s d, the area under the curve (AUC), the difference in the means (delta-mean) and the difference in the proportion of detected cells (delta-detected).
- Parameters:
x (
Any
) – A matrix-like object where rows correspond to genes or genomic features and columns correspond to cells. It is typically expected to contain log-expression values, e.g., fromnormalize_counts()
.groups (
Sequence
) – Group assignment for each cell inx
. This should have length equal to the number of columns inx
.block (
Optional
[Sequence
]) – Array of length equal to the number of columns ofx
, containing the block of origin (e.g., batch, sample) for each cell. AlternativelyNone
, if all cells are from the same block.block_weight_policy (
Literal
['variable'
,'equal'
,'none'
]) – Policy to use for weighting different blocks when computing the average for each statistic. Only used ifblock
is provided.variable_block_weight (
Tuple
) – Parameters for variable block weighting. This should be a tuple of length 2 where the first and second values are used as the lower and upper bounds, respectively, for the variable weight calculation. Only used ifblock
is provided andblock_weight_policy = "variable"
.compute_delta_mean (
bool
) – Whether to compute the delta-means, i.e., the log-fold change whenx
contains log-expression values.compute_delta_detected (
bool
) – Whether to compute the delta-detected, i.e., differences in the proportion of cells with detected expression.cohens_d – Whether to compute Cohen’s d.
compute_auc (
bool
) – Whether to compute the AUC. Setting this toFalse
can improve speed and memory efficiency.threshold (
float
) – Non-negative value specifying the minimum threshold on the differences in means (i.e., the log-fold change, ifx
contains log-expression values). This is incorporated into the calculation for Cohen’s d and the AUC.all_pairwise (
bool
) – Whether to report the full effects for every pairwise comparison between groups. IfFalse
, only summaries are reported.num_threads (
int
) – Number of threads to use.
- Return type:
- Returns:
Scores for ranking marker genes in each group, based on the effect sizes for pairwise comparisons between groups.
References
The
score_markers_summary
andscore_markers_pairwise
functions in the scran_markers C++ library, which describes the rationale behind the choice of effect sizes and summary statistics. Also see their blocked equivalentsscore_markers_summary_blocked
andscore_markers_pairwise_blocked
whenblock
is provided.
scranpy.subsample_by_neighbors module¶
- scranpy.subsample_by_neighbors.subsample_by_neighbors(x, num_neighbors=20, min_remaining=10, nn_parameters=<knncolle.annoy.AnnoyParameters object>, num_threads=1)[source]¶
Subsample a dataset by selecting cells to represent all of their nearest neighbors.
- Parameters:
x (
Union
[ndarray
,FindKnnResults
,Index
]) –Numeric matrix where rows are dimensions and columns are cells, typically containing a low-dimensional representation from, e.g.,
run_pca()
.Alternatively, a
Index
object containing a pre-built search index for a dataset.Alternatively, a
FindKnnResults
object containing pre-computed search results for a dataset. The number of neighbors should be equal tonum_neighbors
, otherwise a warning is raised.num_neighbors (
int
) – Number of neighbors to use. Larger values result in greater downsampling. Only used ifx
does not contain existing neighbor search results.nn_parameters (
Parameters
) – Neighbor search algorithm to use. Only used ifx
does not contain existing neighbor search results.min_remaining (
int
) – Minimum number of remaining (i.e., unselected) neighbors that a cell must have in order to be considered for selection. This should be less than or equal tonum_neighbors
.num_threads (
int
) – Number of threads to use for the nearest-neighbor search. Only used ifx
does not contain existing neighbor search results.
- Return type:
- Returns:
Integer array with indices of the cells selected to be in the subsample.
References
https://libscran.github.io/nenesub, for the rationale behind this approach.
scranpy.summarize_effects module¶
- class scranpy.summarize_effects.GroupwiseSummarizedEffects(min, mean, median, max, min_rank)[source]¶
Bases:
object
Summarized effect sizes for a single group, typically created by
summarize_effects()
orscore_markers()
.- __annotations__ = {'max': <class 'numpy.ndarray'>, 'mean': <class 'numpy.ndarray'>, 'median': <class 'numpy.ndarray'>, 'min': <class 'numpy.ndarray'>, 'min_rank': <class 'numpy.ndarray'>}¶
- __dataclass_fields__ = {'max': Field(name='max',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'mean': Field(name='mean',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'median': Field(name='median',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'min': Field(name='min',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'min_rank': Field(name='min_rank',type=<class 'numpy.ndarray'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __match_args__ = ('min', 'mean', 'median', 'max', 'min_rank')¶
- __repr__()¶
Return repr(self).
-
max:
ndarray
¶ Floating-point array of length equal to the number of genes. Each entry is the maximum effect size for that gene from all pairwise comparisons to other groups.
-
mean:
ndarray
¶ Floating-point array of length equal to the number of genes. Each entry is the mean effect size for that gene from all pairwise comparisons to other groups.
-
median:
ndarray
¶ Floating-point array of length equal to the number of genes. Each entry is the median effect size for that gene from all pairwise comparisons to other groups.
-
min:
ndarray
¶ Floating-point array of length equal to the number of genes. Each entry is the minimum effect size for that gene from all pairwise comparisons to other groups.
- scranpy.summarize_effects.summarize_effects(effects, num_threads=1)[source]¶
For each group, summarize the effect sizes for all pairwise comparisons to other groups. This yields a set of summary statistics that can be used to rank marker genes for each group.
- Parameters:
effects (
ndarray
) – A 3-dimensional numeric containing the effect sizes from each pairwise comparison between groups. The extents of the first two dimensions should be equal to the number of groups, while the extent of the final dimension is equal to the number of genes. The entry[i, j, k]
should represent the effect size from the comparison of groupj
against groupi
for genek
. See also the output ofscore_markers()
withall_pairwise = True
.num_threads (
int
) – Number of threads to use.
- Return type:
- Returns:
List of length equal to the number of groups (i.e., the extents of the first two dimensions of
effects
). Each entry contains the summary statistics of the effect sizes of the comparisons involving the corresponding group.
References
The
summarize_effects
function in the scran_markers C++ library, for more details on the statistics.
scranpy.test_enrichment module¶
- scranpy.test_enrichment.test_enrichment(x, sets, universe, log=False, num_threads=1)[source]¶
Perform a hypergeometric test for enrichment of interesting genes (e.g., markers) in one or more pre-defined gene sets.
- Parameters:
x (
Sequence
) – Sequence of identifiers for the interesting genes.sets (
Sequence
) – Sequence of gene sets, where each entry corresponds to a gene set and contains a sequence of identifiers for genes in that set.universe (
Union
[int
,Sequence
]) – Sequence of identifiers for the universe of genes in the dataset. It is expected thatx
is a subset ofuniverse
. Alternatively, an integer specifying the number of genes in the universe.log (
bool
) – Whether to report the log-transformed p-values.num_threads (
int
) – Number of threads to use.
- Return type:
- Returns:
Array of (log-transformed) p-values to test for significant enrichment of
x
in each entry ofsets
.
References
https://libscran.github.io/phyper, for the underlying implementation.