immune-chord: R Pipeline for Identifying Rare Cell Populations

A robust R pipeline designed for the identification and characterization of rare cell populations in single-cell RNA sequencing data, utilizing Seurat and BigSur.

UMAP projection highlighting rare cell populations identified by the BigSur algorithm

📋 Overview

immune-chord offers a standardized and reproducible workflow for identifying rare cell populations, such as neural crest stem cells and uncommon immune subsets, in single-cell RNA sequencing data. This pipeline encompasses the entire analysis lifecycle, from raw data processing to advanced statistical analysis and visualization.

🎯 Key Features

Rare Cell Detection: Utilizes the BigSur algorithm for precise identification of rare populations
Complete Workflow: Provides end-to-end processing, from quality control to biological interpretation
Reproducible: Features Conda environment management and thorough documentation for consistency
Adaptive Analysis: Capable of addressing scenarios where strict criteria for rare cells are not met
Publication-Ready: Produces high-quality visualizations and comprehensive reports suitable for publication

🚀 Quick Start

Input & Output

Input: A Seurat object or raw count matrix from scRNA-seq data
Output: Identified rare cell populations, marker genes, differential expression results, and publication-quality visualizations

Installation

Create and activate Conda Environment:

conda create -n immune-chord -c conda-forge r-base=4.3.2 r-essentials
conda activate immune-chord

Install R Dependencies:

# Install CRAN packages
install.packages(c("Seurat", "tidyverse", "devtools", "remotes", "BiocManager"))

# Install Bioconductor packages
BiocManager::install(c("SingleCellExperiment", "scran"))

# Install BigSur
remotes::install_github("landerlabcode/BigSurR")

Run the pipeline:

# Load your data (example with test data)
library(scRNAseq)
pancreas_data <- BaronPancreasData(which = "human")
seu_obj <- CreateSeuratObject(counts = counts(pancreas_data))

# Execute the full pipeline
source("R/01_chord_quality_control_normalization.R")
source("R/02_chord_clustering_celltype_id.R")
source("R/03_chord_rare_population_analysis.R")  # Uses BigSur
source("R/04_chord_differential_expression_visualization.R")

📁 Project Structure

immune-chord/
├── data/
│   ├── raw_data/                 # Raw data (with README for download instructions)
│   └── processed_data/           # Processed datasets (.rds files)
├── R/                            # Pipeline scripts
│   ├── 01_chord_quality_control_normalization.R
│   ├── 02_chord_clustering_celltype_id.R
│   ├── 03_chord_rare_population_analysis.R      # Uses BigSur
│   ├── 04_chord_differential_expression_visualization.R
│   └── functions.R               # Helper functions
├── analysis/
│   └── vignette.Rmd              # Complete tutorial
├── docs/
│   └── tutorial.md               # Rendered tutorial
├── figures/                      # Output plots
├── session_info.txt              # Session information for reproducibility
└── README.md

📊 Recommended Datasets

BaronPancreasData (easiest for testing):

library(scRNAseq)
data <- BaronPancreasData(which = "human")

10X Genomics PBMC (standard benchmark):
- Download: 10x Genomics Datasets
- Contains rare dendritic cells and progenitors
Tabula Sapiens (comprehensive atlas):
- Download: Tabula Sapiens Portal
- Includes rare cell types across multiple tissues

🔧 Configuration

Key Parameters

Parameter	Default	Description
min_features	200	Minimum features per cell
max_mito	10	Maximum mitochondrial percentage
fano.alpha	0.05	FDR cutoff for variable features
min.fano	1.5	Minimum Fano factor threshold
resolution	1.2	Clustering resolution

Example Analysis

# Custom parameter analysis
results <- BigSur(
  seurat.obj = your_data,
  assay = "RNA",
  counts.slot = "counts",
  variable.features = TRUE,
  correlations = FALSE,
  fano.alpha = 0.05,
  min.fano = 1.5
)

📈 Example Outputs

The pipeline generates various visualizations and results:

Quality Control

Violin plots showing quality metrics (nFeature_RNA, nCount_RNA, percent.mito)

Dimensionality Reduction

UMAP visualization showing cell clustering

Rare Cell Identification

UMAP highlighting rare cell populations identified by BigSur

Differential Expression

Volcano plot of differentially expressed genes in rare populations

🐛 Troubleshooting

Common Issues

No rare cells detected:

# Try adjusting parameters
results <- BigSur(
  seurat.obj = your_data,
  fano.alpha = 0.1,  # Less strict FDR
  min.fano = 1.2     # Lower Fano threshold
)

Memory issues:

# Increase memory allocation
options(future.globals.maxSize = 8000 * 1024^2)  # 8GB

Installation problems:
- Check session_info.txt for package versions
- Ensure all system dependencies are installed

Reproducibility

For exact environment replication, refer to session_info.txt which contains:

R version and platform information
Loaded package versions
System dependencies

🤝 Contributing

We welcome contributions! Please feel free to submit issues, feature requests, or pull requests.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

BigSur developers: landerlabcode/BigSur
Seurat team: For the comprehensive single-cell analysis framework
10x Genomics: For providing benchmark datasets
Bioconductor: For maintaining essential bioinformatics packages

📚 Citation

If you use immune-chord in your research, please cite:

@software{immune_chord,
  title = {immune-chord: An R Pipeline for Rare Cell Population Identification},
  author = {Perez, Constanza},
  year = {2024},
  url = {https://github.com/ceugenia/immune-chord},
  note = {Version 1.0}
}

🔗 Useful Links

Note: This pipeline is under active development. Please report any issues or suggestions for improvement through the GitHub issues page.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
R		R
analysis		analysis
data		data
docs		docs
figures		figures
python		python
.gitignore		.gitignore
.gitignore.txt		.gitignore.txt
LICENSE		LICENSE
README.md		README.md
immune-chord.Rproj		immune-chord.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

immune-chord: R Pipeline for Identifying Rare Cell Populations

📋 Overview

🎯 Key Features

🚀 Quick Start

Input & Output

Installation

📁 Project Structure

📊 Recommended Datasets

🔧 Configuration

Key Parameters

Example Analysis

📈 Example Outputs

Quality Control

Dimensionality Reduction

Rare Cell Identification

Differential Expression

🐛 Troubleshooting

Common Issues

Reproducibility

🤝 Contributing

📄 License

🙏 Acknowledgments

📚 Citation

🔗 Useful Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

immune-chord: R Pipeline for Identifying Rare Cell Populations

📋 Overview

🎯 Key Features

🚀 Quick Start

Input & Output

Installation

📁 Project Structure

📊 Recommended Datasets

🔧 Configuration

Key Parameters

Example Analysis

📈 Example Outputs

Quality Control

Dimensionality Reduction

Rare Cell Identification

Differential Expression

🐛 Troubleshooting

Common Issues

Reproducibility

🤝 Contributing

📄 License

🙏 Acknowledgments

📚 Citation

🔗 Useful Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages