Skip to content

jianghao-zhang/CellTypeAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CellTypeAgent

Introduction

Cell type annotation is a critical yet laborious step in single-cell RNA sequencing analysis. We present a trustworthy large language model (LLM)-agent, CellTypeAgent, which integrates LLMs with verification from relevant databases. CellTypeAgent achieves higher accuracy than existing methods while mitigating hallucinations. We evaluated CellTypeAgent across nine real datasets involving 303 cell types from 36 tissues. This combined approach holds promise for more efficient and reliable cell type annotation.

Requirements

  1. Clone the repository
git clone https://github.com/jianghao-zhang/CellTypeAgent.git
cd CellTypeAgent
  1. Create a conda environment and install the dependencies
conda create -n CellTypeAgent python=3.10
conda activate CellTypeAgent
pip install -r requirements.txt
  1. Set your OpenAI/Anthropic/DeepSeek API keys configuration in the 'CellTypeAgent/APIs' folder

  2. Prepare the data

  • The datasets used in the paper are stored in the 'CellTypeAgent/data' folder.
  • Please download the gene expression data used in this paper from Google Drive and place it in the 'CellTypeAgent/data/CELLxGENE' directory.
  • Please check the README.md in the 'CellTypeAgent/data' folder for more information.

Example Usage

  • Run an experiment on all datasets:
python CellTypeAgent/get_prediction.py
python CellTypeAgent/get_selection.py

Adapting the Framework to Custom Datasets

To utilize CellTypeAgent with your own datasets, follow these steps:

  1. Format your data according to the structure in CellTypeAgent/data/GPTCellType/datasets
  2. Download the corresponding gene expression data from the CZ CellxGene - Gene Expression Atlas, for more details, please refer to the README.md in the 'CellTypeAgent/data' folder
  3. Modify the dataset settings in get_prediction.py and get_selection.py
  4. Configure model parameters as needed (e.g., model, top_n, max_markers)
  5. Run the pipeline as described in the Example Usage section

About

CellTypeAgent: Trustworthy cell type annotation with Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published