Single-cell RNA-seq analysis with scrapper
Version: 0.99.0
Last updated: 2026-01-18
Last built: 2026-01-18
Introduction
Single-cell RNA-sequencing (scRNA-seq) - the name says it all, really. Long story short, we isolate single cells and we sequence their transcriptomes to quantify the expression of each gene in each cell (Kołodziejczyk et al. 2015). Our aim is to explore heterogeneity in a cell population at the resolution of individual cells, typically to identify subpopulations or states that would not be apparent from population-level (i.e., “bulk”) assays. Since its inception, scRNA-seq has emerged as one of the premier techniques for publishing genomics papers. Occasionally, it is even used to do some actual science.
This book describes a computational workflow for analyzing scRNA-seq data using the R/Bioconductor ecosystem
(Huber et al. 2015).
Most of the heavy lifting is performed using the scrapper package, while scater handles the plotting (McCarthy et al. 2017).
We rely heavily on Bioconductor data structures like the SingleCellExperiment class,
so readers should check out the associated documentation if they haven’t already.
Each chapter is devoted to a particular step in the analysis where we provide its theoretical rationale, the associated code, and some typical results from real public datasets.
This includes:
- Quality control, to filter out cells that were damaged or not properly sequenced.
- Normalization, to remove cell-specific biases.
- Feature selection, to identify genes with interesting biologial variation.
- Principal components analysis, to compact and denoise the data.
- Visualization, to generate the all-important Figure 1 of our manuscript.
- Clustering, to summarize the data into groups of similar cells.
- Marker detection, to assign biological meaning to each cluster based on its upregulated genes.
Much of this content was scraped together from the “Orchestrating Single-Cell Analysis with Bioconductor” (OSCA) series of books (Amezquita et al. 2020) that were primarily based on the older scran package. scrapper is just a rewrite of the most important parts of scran with improved efficiency and less historical baggage. Similarly, this book is a more streamlined rewrite of OSCA books that (hopefully) will be easier to read and run.
Truth be told, you don’t actually need to read this book if you don’t care about how/why things are done. Just copy and paste the following into your R session:
# Pulling out an example dataset.
library(scRNAseq)
sce.zeisel <- ZeiselBrainData()
# Running the full analysis pipeline.
library(scrapper)
is.mito.zeisel <- grep("^mt-", rownames(sce.zeisel))
res.zeisel <- analyze.se(sce.zeisel, rna.qc.subsets=list(MT=is.mito.zeisel))
# Visualizing the cluster assignments for each cell:
library(scater)
plotReducedDim(res.zeisel$x, "TSNE", colour_by="graph.cluster")
# Looking at the top markers for cluster 1:
previewMarkers(res.zeisel$markers$rna[["1"]])## DataFrame with 10 rows and 3 columns
## mean detected lfc
## <numeric> <numeric> <numeric>
## Gad1 4.79503 1.000000 4.56949
## Gad2 4.44192 0.996503 4.25766
## Ndrg4 4.40310 0.996503 2.59179
## Vstm2a 2.94119 0.965035 2.67985
## Stmn3 4.71546 0.993007 2.64538
## Slc6a1 3.75820 0.993007 3.08908
## Tspyl4 3.36568 1.000000 2.15128
## Nap1l5 4.32495 1.000000 3.09812
## Rab3c 3.91161 0.982517 2.98746
## Slc32a1 2.04411 0.909091 2.01340
And that’s it. Sometimes, ignorance is bliss, and it’s better to not know how the sausage is made. But hey - you’re already here, so why not keep reading?
Any questions can be posted at Bioconductor support site or the GitHub page for this book.
References
Amezquita, R. A., A. T. L. Lun, E. Becht, V. J. Carey, L. N. Carpp, L. Geistlinger, F. Marini, et al. 2020. “Orchestrating single-cell analysis with Bioconductor.” Nat. Methods 17 (2): 137–45.
Huber, W., V. J. Carey, R. Gentleman, S. Anders, M. Carlson, B. S. Carvalho, H. C. Bravo, et al. 2015. “Orchestrating high-throughput genomic analysis with Bioconductor.” Nat. Methods 12 (2): 115–21.
Kołodziejczyk, A. A., J. K. Kim, V. Svensson, J. C. Marioni, and S. A. Teichmann. 2015. “The technology and biology of single-cell RNA sequencing.” Mol. Cell 58 (4): 610–20.
McCarthy, D. J., K. R. Campbell, A. T. Lun, and Q. F. Wills. 2017. “Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.” Bioinformatics 33 (8): 1179–86.
