A robust R pipeline designed for the identification and characterization of rare cell populations in single-cell RNA sequencing data, utilizing Seurat and BigSur.
UMAP projection highlighting rare cell populations identified by the BigSur algorithm
immune-chord offers a standardized and reproducible workflow for identifying rare cell populations, such as neural crest stem cells and uncommon immune subsets, in single-cell RNA sequencing data. This pipeline encompasses the entire analysis lifecycle, from raw data processing to advanced statistical analysis and visualization.
- Rare Cell Detection: Utilizes the BigSur algorithm for precise identification of rare populations
- Complete Workflow: Provides end-to-end processing, from quality control to biological interpretation
- Reproducible: Features Conda environment management and thorough documentation for consistency
- Adaptive Analysis: Capable of addressing scenarios where strict criteria for rare cells are not met
- Publication-Ready: Produces high-quality visualizations and comprehensive reports suitable for publication
- Input: A Seurat object or raw count matrix from scRNA-seq data
- Output: Identified rare cell populations, marker genes, differential expression results, and publication-quality visualizations
- Create and activate Conda Environment:
conda create -n immune-chord -c conda-forge r-base=4.3.2 r-essentials
conda activate immune-chord- Install R Dependencies:
# Install CRAN packages
install.packages(c("Seurat", "tidyverse", "devtools", "remotes", "BiocManager"))
# Install Bioconductor packages
BiocManager::install(c("SingleCellExperiment", "scran"))
# Install BigSur
remotes::install_github("landerlabcode/BigSurR")- Run the pipeline:
# Load your data (example with test data)
library(scRNAseq)
pancreas_data <- BaronPancreasData(which = "human")
seu_obj <- CreateSeuratObject(counts = counts(pancreas_data))
# Execute the full pipeline
source("R/01_chord_quality_control_normalization.R")
source("R/02_chord_clustering_celltype_id.R")
source("R/03_chord_rare_population_analysis.R") # Uses BigSur
source("R/04_chord_differential_expression_visualization.R")immune-chord/
βββ data/
β βββ raw_data/ # Raw data (with README for download instructions)
β βββ processed_data/ # Processed datasets (.rds files)
βββ R/ # Pipeline scripts
β βββ 01_chord_quality_control_normalization.R
β βββ 02_chord_clustering_celltype_id.R
β βββ 03_chord_rare_population_analysis.R # Uses BigSur
β βββ 04_chord_differential_expression_visualization.R
β βββ functions.R # Helper functions
βββ analysis/
β βββ vignette.Rmd # Complete tutorial
βββ docs/
β βββ tutorial.md # Rendered tutorial
βββ figures/ # Output plots
βββ session_info.txt # Session information for reproducibility
βββ README.md
- BaronPancreasData (easiest for testing):
library(scRNAseq)
data <- BaronPancreasData(which = "human")-
10X Genomics PBMC (standard benchmark):
- Download: 10x Genomics Datasets
- Contains rare dendritic cells and progenitors
-
Tabula Sapiens (comprehensive atlas):
- Download: Tabula Sapiens Portal
- Includes rare cell types across multiple tissues
| Parameter | Default | Description |
|---|---|---|
| min_features | 200 | Minimum features per cell |
| max_mito | 10 | Maximum mitochondrial percentage |
| fano.alpha | 0.05 | FDR cutoff for variable features |
| min.fano | 1.5 | Minimum Fano factor threshold |
| resolution | 1.2 | Clustering resolution |
# Custom parameter analysis
results <- BigSur(
seurat.obj = your_data,
assay = "RNA",
counts.slot = "counts",
variable.features = TRUE,
correlations = FALSE,
fano.alpha = 0.05,
min.fano = 1.5
)The pipeline generates various visualizations and results:
Violin plots showing quality metrics (nFeature_RNA, nCount_RNA, percent.mito)
UMAP visualization showing cell clustering
UMAP highlighting rare cell populations identified by BigSur
Volcano plot of differentially expressed genes in rare populations
- No rare cells detected:
# Try adjusting parameters
results <- BigSur(
seurat.obj = your_data,
fano.alpha = 0.1, # Less strict FDR
min.fano = 1.2 # Lower Fano threshold
)- Memory issues:
# Increase memory allocation
options(future.globals.maxSize = 8000 * 1024^2) # 8GB- Installation problems:
- Check
session_info.txtfor package versions - Ensure all system dependencies are installed
- Check
For exact environment replication, refer to session_info.txt which contains:
- R version and platform information
- Loaded package versions
- System dependencies
We welcome contributions! Please feel free to submit issues, feature requests, or pull requests.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- BigSur developers: landerlabcode/BigSur
- Seurat team: For the comprehensive single-cell analysis framework
- 10x Genomics: For providing benchmark datasets
- Bioconductor: For maintaining essential bioinformatics packages
If you use immune-chord in your research, please cite:
@software{immune_chord,
title = {immune-chord: An R Pipeline for Rare Cell Population Identification},
author = {Perez, Constanza},
year = {2024},
url = {https://github.com/ceugenia/immune-chord},
note = {Version 1.0}
}Note: This pipeline is under active development. Please report any issues or suggestions for improvement through the GitHub issues page.