Skip to content


Repository files navigation


EmptyNN is a novel cell-calling algorithm based on Positive-unlabeled (PU) learning which removes cell-free droplets and recovers lost cells in droplet-based single cell RNA sequencing data. For more info please see our EmptyNN paper.

Workflow. EmptyNN leverages positive unlabeled learning to classify cell-free and cell-containing droplets. (a) Cells and barcodes are combined in oil droplets. Some droplets may lack a cell but contain ambient RNA. The EmptyNN classifier distinguishes cell-free from cell-containing droplets. (b) Schematic describes the workflow of EmptyNN. The black curve represents the distribution of total counts (y-axis) across sorted barcodes (x-axis). The blue bar represents sets of barcodes with very low total counts, set P. The grey bar represents barcodes with higher total counts consisting of cell-containing and cell-free droplets, set U. EmptyNN trains a classifier, where barcodes from P are labeled as cell-free droplets (blue) and a fraction of barcodes from U is labeled as cell-containing droplets (pink). The classifier is applied to the remaining barcodes in U and the predictions are recorded. During each k fold, each barcode in U is predicted k-1 times. The above process is repeated for N iterations (default: 10). The average prediction probability of each barcode in U defines each barcode as a cell-free or cell-containing droplet.

Animated workflow:


To reproduce the analysis and figures presented in our manuscript please see the Reproducibility folder.


Check out our jupyter notebook (in R environment) tutorial at EmptyNN - Cell Hashing Dataset Tutorial.


EmptyNN is implemented in R and depends on the keras and Matrix R packages.

Option 1

$ git clone
$ cd empty_nn

## enter R and install packages
$ R

> install.packages("EmptyNN_1.0.tar.gz", repos = NULL, type = "source")

Option 2

> install.packages("devtools")
> library(devtools)
> install_github("lkmklsmn/empty_nn")



  1. Raw unfiltered count matrix in mtx or h5 format


  1. A boolean vector showing the predictions for cell-free or cell-containing droplets
  2. Probabilities for each barcode in set U

Download example datasets

$ cd empty_nn
$ sh ./code/

# Load data
counts <- Read10X_h5("./data/neurons_900_raw.h5", use.names = TRUE, unique.features = TRUE)

# Transpose the count matrix, so rows are cells and columns are genes
counts <- t(counts)

# Run emptynn()
nn.res <- emptynn(counts, threshold = 100, k = 10, iteration = 10, verbose = TRUE)

# Downstream analysis
retained <- runSeurat(counts = counts[, nn.res$nn.keep], resolution = 0.2)
DimPlot(retained,reduction = 'tsne') + ggtitle("EmptyNN") + NoLegend()


Learning empty droplets in scRNAseq






No releases published


No packages published

Contributors 3
