Single-cell RNA sequencing (scRNA-seq) is one of the fastest-growing life science technologies, and Seurat is the most popular R package for its analysis. This guide walks you through the complete workflow from raw count matrix to UMAP clustering, with clear annotations for each step—perfect for beginners.
Installation and Loading
# Requires R 4.1 or higher, Seurat v5
install.packages("Seurat")
library(Seurat)
library(ggplot2)
Reading Data
# 10x Genomics output directory should contain:
# barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz
counts <- Read10X(data.dir = "data/sample1/")
seurat <- CreateSeuratObject(
counts = counts,
project = "my_project",
min.cells = 3, # gene expressed in at least 3 cells (filter noise)
min.features = 200 # cell expresses at least 200 genes (filter empty droplets)
)
Quality Control (QC)
# Calculate mitochondrial gene percentage (high % often indicates dying/lysed cells)
seurat[["percent.mt"]] <- PercentageFeatureSet(seurat, pattern = "^MT-")
# For mouse data use "^mt-"
# Visualize QC metrics
VlnPlot(seurat, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
# Filter (thresholds depend on your data; these are examples)
seurat <- subset(seurat,
nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt < 20)
Normalization, Variable Features, and Scaling
seurat <- NormalizeData(seurat) # Normalize total UMI per cell to 10,000, then log-transform
seurat <- FindVariableFeatures(seurat, nfeatures = 2000) # Identify highly variable genes
seurat <- ScaleData(seurat) # z-score scaling to eliminate library size differences
Dimensionality Reduction and Clustering
seurat <- RunPCA(seurat) # PCA dimensionality reduction
ElbowPlot(seurat) # Check elbow to decide number of PCs (usually 10-20)
seurat <- FindNeighbors(seurat, dims = 1:15) # Build KNN graph
seurat <- FindClusters(seurat, resolution = 0.5) # Higher resolution = more clusters
seurat <- RunUMAP(seurat, dims = 1:15) # UMAP visualization
DimPlot(seurat, label = TRUE) # Plot UMAP, each point is a cell
Finding Marker Genes and Annotating Cell Types
# Find specifically upregulated genes for each cluster
markers <- FindAllMarkers(
seurat,
only.pos = TRUE, # only upregulated genes
min.pct = 0.25, # expressed in at least 25% of cells
logfc.threshold = 0.25
)
# View top 10 markers for cluster 0
head(subset(markers, cluster == 0), 10)
# Visualize known markers (T cells, monocytes, etc.)
FeaturePlot(seurat, features = c("CD3D", "CD14", "MS4A1"))
Runtime and Memory Reference
- 10,000 cells: ~10-15 minutes on a standard laptop, 8GB RAM sufficient
- 50,000 cells: 32GB+ RAM recommended; ScaleData and FindClusters are bottlenecks; run on a server
- 100,000+ cells: Consider Seurat v5 sketch workflow or switch to AnnData/Scanpy (Python)
文章评论