Skip to content

ceugenia/immune-chord

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

immune-chord: R Pipeline for Identifying Rare Cell Populations

Lifecycle R Version License

A robust R pipeline designed for the identification and characterization of rare cell populations in single-cell RNA sequencing data, utilizing Seurat and BigSur.

UMAP Visualization of Rare Cells UMAP projection highlighting rare cell populations identified by the BigSur algorithm

πŸ“‹ Overview

immune-chord offers a standardized and reproducible workflow for identifying rare cell populations, such as neural crest stem cells and uncommon immune subsets, in single-cell RNA sequencing data. This pipeline encompasses the entire analysis lifecycle, from raw data processing to advanced statistical analysis and visualization.

🎯 Key Features

  • Rare Cell Detection: Utilizes the BigSur algorithm for precise identification of rare populations
  • Complete Workflow: Provides end-to-end processing, from quality control to biological interpretation
  • Reproducible: Features Conda environment management and thorough documentation for consistency
  • Adaptive Analysis: Capable of addressing scenarios where strict criteria for rare cells are not met
  • Publication-Ready: Produces high-quality visualizations and comprehensive reports suitable for publication

πŸš€ Quick Start

Input & Output

  • Input: A Seurat object or raw count matrix from scRNA-seq data
  • Output: Identified rare cell populations, marker genes, differential expression results, and publication-quality visualizations

Installation

  1. Create and activate Conda Environment:
conda create -n immune-chord -c conda-forge r-base=4.3.2 r-essentials
conda activate immune-chord
  1. Install R Dependencies:
# Install CRAN packages
install.packages(c("Seurat", "tidyverse", "devtools", "remotes", "BiocManager"))

# Install Bioconductor packages
BiocManager::install(c("SingleCellExperiment", "scran"))

# Install BigSur
remotes::install_github("landerlabcode/BigSurR")
  1. Run the pipeline:
# Load your data (example with test data)
library(scRNAseq)
pancreas_data <- BaronPancreasData(which = "human")
seu_obj <- CreateSeuratObject(counts = counts(pancreas_data))

# Execute the full pipeline
source("R/01_chord_quality_control_normalization.R")
source("R/02_chord_clustering_celltype_id.R")
source("R/03_chord_rare_population_analysis.R")  # Uses BigSur
source("R/04_chord_differential_expression_visualization.R")

πŸ“ Project Structure

immune-chord/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw_data/                 # Raw data (with README for download instructions)
β”‚   └── processed_data/           # Processed datasets (.rds files)
β”œβ”€β”€ R/                            # Pipeline scripts
β”‚   β”œβ”€β”€ 01_chord_quality_control_normalization.R
β”‚   β”œβ”€β”€ 02_chord_clustering_celltype_id.R
β”‚   β”œβ”€β”€ 03_chord_rare_population_analysis.R      # Uses BigSur
β”‚   β”œβ”€β”€ 04_chord_differential_expression_visualization.R
β”‚   └── functions.R               # Helper functions
β”œβ”€β”€ analysis/
β”‚   └── vignette.Rmd              # Complete tutorial
β”œβ”€β”€ docs/
β”‚   └── tutorial.md               # Rendered tutorial
β”œβ”€β”€ figures/                      # Output plots
β”œβ”€β”€ session_info.txt              # Session information for reproducibility
└── README.md

πŸ“Š Recommended Datasets

  1. BaronPancreasData (easiest for testing):
library(scRNAseq)
data <- BaronPancreasData(which = "human")
  1. 10X Genomics PBMC (standard benchmark):

  2. Tabula Sapiens (comprehensive atlas):

πŸ”§ Configuration

Key Parameters

Parameter Default Description
min_features 200 Minimum features per cell
max_mito 10 Maximum mitochondrial percentage
fano.alpha 0.05 FDR cutoff for variable features
min.fano 1.5 Minimum Fano factor threshold
resolution 1.2 Clustering resolution

Example Analysis

# Custom parameter analysis
results <- BigSur(
  seurat.obj = your_data,
  assay = "RNA",
  counts.slot = "counts",
  variable.features = TRUE,
  correlations = FALSE,
  fano.alpha = 0.05,
  min.fano = 1.5
)

πŸ“ˆ Example Outputs

The pipeline generates various visualizations and results:

Quality Control

Quality Control Plots Violin plots showing quality metrics (nFeature_RNA, nCount_RNA, percent.mito)

Dimensionality Reduction

UMAP Clustering UMAP visualization showing cell clustering

Rare Cell Identification

Rare Cells UMAP highlighting rare cell populations identified by BigSur

Differential Expression

Volcano Plot Volcano plot of differentially expressed genes in rare populations

πŸ› Troubleshooting

Common Issues

  1. No rare cells detected:
# Try adjusting parameters
results <- BigSur(
  seurat.obj = your_data,
  fano.alpha = 0.1,  # Less strict FDR
  min.fano = 1.2     # Lower Fano threshold
)
  1. Memory issues:
# Increase memory allocation
options(future.globals.maxSize = 8000 * 1024^2)  # 8GB
  1. Installation problems:
    • Check session_info.txt for package versions
    • Ensure all system dependencies are installed

Reproducibility

For exact environment replication, refer to session_info.txt which contains:

  • R version and platform information
  • Loaded package versions
  • System dependencies

🀝 Contributing

We welcome contributions! Please feel free to submit issues, feature requests, or pull requests.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • BigSur developers: landerlabcode/BigSur
  • Seurat team: For the comprehensive single-cell analysis framework
  • 10x Genomics: For providing benchmark datasets
  • Bioconductor: For maintaining essential bioinformatics packages

πŸ“š Citation

If you use immune-chord in your research, please cite:

@software{immune_chord,
  title = {immune-chord: An R Pipeline for Rare Cell Population Identification},
  author = {Perez, Constanza},
  year = {2024},
  url = {https://github.com/ceugenia/immune-chord},
  note = {Version 1.0}
}

πŸ”— Useful Links


Note: This pipeline is under active development. Please report any issues or suggestions for improvement through the GitHub issues page.

About

An end-to-end R pipeline utilizing the BigSur algorithm for robust detection and analysis of rare cell populations in single-cell RNA sequencing data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors