EmptyNN

EmptyNN is a novel cell-calling algorithm based on Positive-unlabeled (PU) learning which removes cell-free droplets and recovers lost cells in droplet-based single cell RNA sequencing data. For more info please see our EmptyNN paper.

Workflow. EmptyNN leverages positive unlabeled learning to classify cell-free and cell-containing droplets. (a) Cells and barcodes are combined in oil droplets. Some droplets may lack a cell but contain ambient RNA. The EmptyNN classifier distinguishes cell-free from cell-containing droplets. (b) Schematic describes the workflow of EmptyNN. The black curve represents the distribution of total counts (y-axis) across sorted barcodes (x-axis). The blue bar represents sets of barcodes with very low total counts, set P. The grey bar represents barcodes with higher total counts consisting of cell-containing and cell-free droplets, set U. EmptyNN trains a classifier, where barcodes from P are labeled as cell-free droplets (blue) and a fraction of barcodes from U is labeled as cell-containing droplets (pink). The classifier is applied to the remaining barcodes in U and the predictions are recorded. During each k fold, each barcode in U is predicted k-1 times. The above process is repeated for N iterations (default: 10). The average prediction probability of each barcode in U defines each barcode as a cell-free or cell-containing droplet.

Animated workflow:

Reproducibility

To reproduce the analysis and figures presented in our manuscript please see the Reproducibility folder.

Tutorial

Check out our jupyter notebook (in R environment) tutorial at EmptyNN - Cell Hashing Dataset Tutorial.

Installation

EmptyNN is implemented in R and depends on the keras and Matrix R packages.

Option 1

$ git clone http://github.com/lkmklsmn/empty_nn
$ cd empty_nn

## enter R and install packages
$ R

> install.packages("EmptyNN_1.0.tar.gz", repos = NULL, type = "source")

Option 2

> install.packages("devtools")
> library(devtools)
> install_github("lkmklsmn/empty_nn")

Usage

Input

Raw unfiltered count matrix in mtx or h5 format

Output

A boolean vector showing the predictions for cell-free or cell-containing droplets
Probabilities for each barcode in set U

Download example datasets

$ cd empty_nn
$ sh ./code/download_data.sh

library(EmptyNN)
library(Seurat)

# Load data
counts <- Read10X_h5("./data/neurons_900_raw.h5", use.names = TRUE, unique.features = TRUE)

# Transpose the count matrix, so rows are cells and columns are genes
counts <- t(counts)

# Run emptynn()
nn.res <- emptynn(counts, threshold = 100, k = 10, iteration = 10, verbose = TRUE)

# Downstream analysis
retained <- runSeurat(counts = counts[, nn.res$nn.keep], resolution = 0.2)
DimPlot(retained,reduction = 'tsne') + ggtitle("EmptyNN") + NoLegend()

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
EmptyNN		EmptyNN
R		R
Reproducibility		Reproducibility
code		code
data		data
man		man
tutorial		tutorial
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
EmptyNN_1.0.tar.gz		EmptyNN_1.0.tar.gz
Figure 1.png		Figure 1.png
NAMESPACE		NAMESPACE
README.md		README.md
Summary_of_algorithm.gif		Summary_of_algorithm.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EmptyNN

Reproducibility

Tutorial

Installation

Option 1

Option 2

Usage

Input

Output

Download example datasets

About

Releases

Packages

Contributors 3

Languages

lkmklsmn/empty_nn

Folders and files

Latest commit

History

Repository files navigation

EmptyNN

Reproducibility

Tutorial

Installation

Option 1

Option 2

Usage

Input

Output

Download example datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages