LongevityForest AI Scientist Agent

LongevityForest is a multi-agent bioinformatics system for analysing protein structures, sequences, and functional outcomes in the context of longevity and ageing.

LongevityForest science agents ecosystem

The LongevityForest science agents ecosystem is a set of tools for studying genes and proteins that influence lifespan. It currently includes:

longevity_forest (this repository): multi-agent gene analysis system with specialised bioinformatics agents
protein_hunter_mcp: MCP server for protein structure analysis, protein target selection, and targeted protein degradation design
cell2sentence4longevity-mcp: MCP server for in-silico knockout experiments using the cell2sentence4longevity model to predict age from gene expression patterns

Used together, these tools link cellular observations, sequence analysis, and protein structure analysis across multiple biological scales.

What is this?

This repository provides a delegated multi-agent architecture. Instead of a single monolithic agent, the system orchestrates seven specialised agents, each focused on specific databases or data sources.

The system can analyse a gene by integrating:

Genomic sequences and orthologs (BioMART)
Protein 3D structures and domains (AlphaFold, PDB, InterPro)
Protein-protein interactions (STRING, OmniPath)
Scientific literature and clinical trials (PubMed, EuropePMC)
Longevity and aging data (OpenGenes)
Functional variants and their effects (web search + databases)

The output is a markdown report with source attribution, structured in WikiCrow format.

Quick overview

Query Agent (Orchestrator)
├── Google Agent (web search)
├── Literature Agent (PubMed, clinical trials)
├── Structure Agent (3D structures, domains)
├── BioMART Agent (genomic sequences)
├── OpenGenes Agent (longevity/aging)
└── OmniPath Agent (pathways, interactions)

Quick start

Prerequisites

Python 3.12+
uv package manager (install uv)
Environment variables configured (see Setup section)

Installation

# Clone the repository
git clone https://github.com/longevity-genie/longevity_forest
cd longevity_forest

# Install dependencies with uv
uv sync

# Copy .env.template to .env and fill in your API keys
cp .env.template .env

# Edit .env with your API keys:
# - ANTHROPIC_API_KEY (required) - Used by literature, structure, biomart, and query agents
# - GEMINI_API_KEY (required) - Used by google, opengenes, and omnipath agents
# - Google Cloud credentials (optional - for Vertex AI)
# - Other database credentials as needed

Running gene analysis

Note: You can use either longevity_forest or the shorter alias forest for all commands.

# Analyze a specific gene (default: NRF2)
uv run forest analyze-gene
# or: uv run longevity_forest analyze-gene

# Analyze a specific gene by name
uv run forest analyze-gene TP53

# Analyze multiple genes
uv run forest analyze-genes NRF2 TP53 FOXO3
# note: can take long time and claude-credits heavy
# Available options:
# --config, -c: Path to configuration YAML file
# --cache/--no-cache: Enable/disable cached interim results (default: enabled)
# --debug, -d: Show debug information including tool distribution
# --show-history/--no-history: Display conversation history (default: enabled for single gene)

Running protein degradation design (hunt-protein)

⚠️ WARNING: GPU-intensive workflow - This command uses the protein hunter MCP server which requires significant GPU resources (H100 GPU). Protein design tasks take 5-10 minutes per design. Please run mindfully as we do not have advanced GPU VRAM management.

# Design a degradation peptide for a target gene/protein (default: KLF6)
uv run forest hunt-protein
# or: uv run longevity_forest hunt-protein

# Design for a specific target
uv run forest hunt-protein TP53

# Available options:
# --config, -c: Path to protein hunter configuration YAML file
# --debug, -d: Show debug information
# --show-history/--no-history: Display conversation history after design (default: enabled)

This workflow:

Resolves gene names to protein sequences using UniProt
Designs high-affinity protein binders using Boltz/Chai AI models
Creates degradation adaptors by fusing ubiquitin to the binder
Provides comprehensive reports with sequences, metrics, and structure files

Running in-silico knockout analysis (insilico-knockout)

⚠️ WARNING: GPU-intensive workflow - This command will use the cell2sequence4longevity MCP server which requires significant GPU resources (H100 GPU). This workflow performs computationally expensive cellular simulations. Please run mindfully as we do not have advanced GPU VRAM management.

Prerequisites: Ensure the cell2sentence4longevity MCP server is running:

# In the cell2sentence4longevity-mcp directory
uv run cell2sentence4longevity-mcp-run --host 0.0.0.0 --port 3002

Usage:

# Perform in-silico knockout analysis (default: KLF6)
uv run forest insilico-knockout
# or: uv run longevity_forest insilico-knockout

# Analyze a specific gene
uv run forest insilico-knockout TP53

# Provide a custom gene expression sentence and metadata
uv run forest insilico-knockout KLF6 \
  --gene-sentence "MT-CO1 FTL EEF1A1 HLA-B LST1 KLF6 S100A4 HLA-C" \
  --sex female \
  --tissue blood \
  --cell-type "CD14-low, CD16-positive monocyte" \
  --smoking-status 0

# Available options:
# --gene-sentence, -g: Gene expression sentence (space-separated, descending order)
# --sex, -s: Sex metadata (male/female)
# --tissue, -t: Tissue type (e.g., blood, brain, liver)
# --cell-type, -ct: Cell type (e.g., "CD14-low, CD16-positive monocyte")
# --smoking-status, -sm: Smoking status (0 = non-smoker, 1 = smoker)
# --config, -c: Path to configuration YAML file
# --debug, -d: Show debug information
# --show-history/--no-history: Display conversation history (default: enabled)

This workflow will:

Construct or use provided gene expression sentence from aging-related genes
Simulate gene knockout by removing the specified gene
Predict biological age before and after knockout using the Cell2Sentence4Longevity model
Calculate delta age and interpret the gene's impact on aging:
- Positive delta: Gene knockout increases age (gene may be protective/anti-aging)
- Negative delta: Gene knockout decreases age (gene may be pro-aging)
- Near-zero delta: Gene has minimal impact on age prediction
Generate comprehensive reports with biological context and interpretation

Example Output:

Results are saved to data/output/insilico_knockout_GENENAME_TIMESTAMP.md and include:

Table comparing original vs knockout predictions
Delta age calculation and interpretation
Gene expression sentences (original and knockout)
Biological context and known functions
All metadata used in the analysis

Output

Results are saved to data/output/ with format: GENENAME_TIMESTAMP.md

Example output structure:

# NRF2 - Sequence to Function Analysis

## 1. Sequences & Orthologs
## 2. Key Variants
## 3. Functional Domains
## 4. Interaction Network
## 5. Structural Modifications
## 6. References

Features

Multi-source data integration from several specialised biological databases
Results backed by citations with PubMed IDs and DOIs
Task-specific agents and prompts for different parts of the analysis
Conversation history stored for transparency and debugging
Architecture that makes it straightforward to add new agents or databases
Reduced context size through delegation between agents
Automatic continuation when a report is incomplete
Intermediate results cached in data/interim/ for later inspection

⚠️ Important Disclaimers

Resource Requirements

This system provides three distinct agentic workflows with different resource requirements:

analyze-gene / analyze-genes (CPU only, runs locally)
- Uses only LLM APIs (Anthropic Claude, Google Gemini)
- No GPU required
- Safe to run anytime, though may take time depending on gene complexity
- Can be run freely without resource concerns
hunt-protein (GPU-intensive)
- Uses protein_hunter_mcp server
- Requires significant GPU VRAM, right now deployed at H100 instance together with cell2sentence4longevity model
- Takes 5-10 minutes per design
- Must be run carefully to avoid overloading the H100 instance
insilico-knockout GPU-intensive
- Uses cell2sequence4longevity-mcp server deployed at remote H100 instance
- Requires significant GPU VRAM, right now shares H100 instance with protein hunter mcp
- Takes 5-10 minutes per simulation
- Must be run carefully to avoid overloading the H100 instance

CRITICAL: The GPU-intensive workflows (hunt-protein and insilico-knockout) share the same H100 GPU instance and do not have advanced GPU VRAM management. Please run these workflows mindfully - avoid running multiple GPU-intensive tasks simultaneously to prevent out-of-memory errors.

Workflow Limitations

analyze-gene and hunt-protein: Most complete workflows that can handle any gene/protein as input
insilico-knockout: Currently limited to a predefined set of genes due to time constraints during development. Full gene coverage will be added in future releases.

Configuration

Main configuration files

src/longevity_forest/config/agents/web_search_delegated.yaml: Agent profiles and tool mappings (primary)
src/longevity_forest/config/agents/web_search_full.yaml: Alternative monolithic configuration
src/longevity_forest/config/llm.py: LLM settings (Anthropic Claude 4.5 Haiku)
src/longevity_forest/config/prompts.py: System prompts for each agent
src/longevity_forest/config/mcp.py: Database connections (BioMART, OpenGenes, etc.)

Customising gene analysis

To analyze a different gene, use the CLI command:

uv run forest analyze-gene GENE_NAME

To customize the analysis prompt, edit src/longevity_forest/config/prompts.py:

def get_gene_analysis_prompt(gene_name: str) -> str:
    return f"""For the gene {gene_name} retrieve or identify:
    1) Known gene sequences & functional orthologs
    2) Key variants with longevity implications
    3) Interaction partners
    4) Active/functional sites
    5) Sequence modifications and effects
    6) PDB structures
    """

Project structure

longevity_forest/
├── README.md                    # This file
├── pyproject.toml               # Project metadata & dependencies
├── src/
│   └── longevity_forest/       # Main package
│       ├── __init__.py
│       ├── main.py             # Entry point (CLI via entry point)
│       ├── config/
│       │   ├── llm.py           # LLM configuration
│       │   ├── prompts.py       # Agent system prompts
│       │   ├── mcp.py           # Database MCPs (Model Context Protocols)
│       │   ├── gene_analysis_mcp.py # Slim MCPs for gene analysis
│       │   └── agents/
│       │       ├── web_search_delegated.yaml # Delegated architecture config
│       │       └── web_search_full.yaml # Monolithic architecture config (legacy)
│       └── core/
│           ├── helpers.py       # Utility functions (save, validate, serialize)
│           └── experts.py       # Agent delegation logic
├── data/
│   ├── input/                   # Input data files
│   ├── interim/                 # Intermediate cache (YAML & text outputs)
│   ├── output/                  # Final markdown reports (*.md)
│   └── example/                 # Example outputs
├── logs/                        # Execution logs (JSON + text)
├── .env                         # Environment variables (API keys) - create from .env.template
└── .env.template                # Template for environment variables with all required keys

Data flow

Input: Gene name (e.g., "NRF2")
Orchestration: Query Agent delegates to 6 specialists
Collection: Each agent queries its specialized databases
Integration: Query Agent synthesizes findings
Output: Markdown report with full citations

Example: NRF2 analysis

# Run default NRF2 analysis
uv run forest analyze-gene NRF2

# Or use the default (NRF2)
uv run forest analyze-gene

The default NRF2 analysis performs:

BioMART lookup: Human ENSG00000116236, mouse/rat orthologs
Literature search: ~500+ papers on NRF2 function and variants
OpenGenes query: NRF2 association with longevity and aging
Structure analysis: Domains, AlphaFold confidence, PDB codes
OmniPath query: Antioxidant response elements, pathway context
Integration: Cross-referenced findings with source attribution

The output is saved to data/output/NRF2_TIMESTAMP.md.

Dependencies

just-agents >= 0.8.8: Multi-agent framework
typer: CLI framework
python-dotenv: Environment configuration
win-unicode-console: Windows UTF-8 support

See pyproject.toml for complete dependency list.

Environment setup

This project uses two LLM providers: Anthropic (Claude) and Google Gemini. Different agents use different models:

Anthropic Claude: Used by literature_agent, structure_agent, biomart_agent (Haiku), and query_agent (Sonnet)
Google Gemini: Used by google_agent, opengenes_agent, and omnipath_agent (Gemini 2.5 Pro)

Copy the template file:

cp .env.template .env

Edit .env and add your API keys. You need:
- ANTHROPIC_API_KEY (required) - For Claude models
- GEMINI_API_KEY (required) - For Gemini models
Optional:
- Google Cloud credentials (GOOGLE_CLOUD_PROJECT, GOOGLE_API_KEY, etc.) - For Vertex AI usage

See .env.template for the complete list of configuration options.

The environment variables are automatically loaded when running the CLI:

from dotenv import load_dotenv
load_dotenv()

Advanced usage

Running with detailed logging

Logs are automatically saved to logs/ directory:

logs/
├── TIMESTAMP_XXXX.log       # Text logs
└── TIMESTAMP_XXXX.json.log  # JSON formatted logs

To enable debug output, use the --debug flag:

uv run forest analyze-gene NRF2 --debug

This will show tool distribution across agents and other debugging information.

Intermediate results

Cached intermediate results are stored in data/interim/:

interim/
├── *_result.txt    # Agent output text
└── *.yaml          # Agent memory (YAML serialized)

Use helper functions to inspect:

from longevity_forest.core.helpers import serialize_memory_to_yaml, serialize_content

Extending with new agents

Add agent profile to config/agents/web_search_delegated.yaml
Define system prompt in config/prompts.py
Configure tools/MCPs in config/mcp.py
Query Agent will automatically delegate to new agent

Testing and validation

All results are validated before completion:

if is_valid:
    print(f"✓ Query result successfully saved and validated: {filepath}")
else:
    print(f"⚠ Query result saved but validation had issues: {filepath}")

Validation checks include:

Markdown syntax integrity
UTF-8 encoding correctness
File write success

Use cases

Gene function analysis: sequence-to-function relationships
Variant impact assessment: for genetic variants
Longevity research: ageing-related genes and pathways
Drug target analysis: for protein targets and interactions
Protein degradation design (GPU): design targeted protein degraders using hunt-protein
In-silico knockout (GPU, coming soon): simulate gene knockout effects at cellular level
Literature mining: and research synthesis
Structural bioinformatics: combining sequence and 3D structure data

Performance

Context efficiency: 73–88% reduction compared to monolithic agents
Token usage: roughly 3–5K tokens per gene analysis (vs 10–15K for monolithic setups)
Execution time: typically 2–10 minutes depending on sources and gene complexity
Automatic continuation for incomplete responses
Cross-source validation to reduce hallucinations

Troubleshooting

"Agent with shortname X not found"

Verify agent is defined in src/longevity_forest/config/agents/web_search_delegated.yaml
Check agent is loaded in src/longevity_forest/main.py agents list

"API rate limit exceeded"

Wait before re-running
Use cached intermediate results
Consider parallel vs sequential agent calls

"REPORT_END marker not found"

System automatically continues generation
Check logs in logs/ directory for details
Increase continuation attempts if needed

UTF-8 Encoding Issues (Windows)

System automatically reconfigures stdout/stderr to UTF-8
Verify Windows locale settings support Unicode

References

Architecture documentation

For detailed information about the system architecture, see the agent configuration files:

Agent profiles: src/longevity_forest/config/agents/web_search_delegated.yaml
Agent prompts: src/longevity_forest/config/prompts.py
MCP configurations: src/longevity_forest/config/mcp.py
Tool mappings: docs/GENE_ANALYSIS_TOOL_MAPPING.md

Contributing

To extend this system:

Add new agent: Modify src/longevity_forest/config/agents/web_search_delegated.yaml + add prompt
Add new database: Create MCP in src/longevity_forest/config/mcp.py
Modify analysis prompt: Edit src/longevity_forest/config/prompts.py functions
Change output format: Modify report generation in agents

License

See LICENSE file for details.

Scientific use

When publishing results based on this system:

cite the original data sources
verify important findings against the underlying literature
treat the agent outputs as assistance for expert analysis, not a substitute for it

For agent behaviour configuration, see src/longevity_forest/config/prompts.py. For tool mappings, see docs/GENE_ANALYSIS_TOOL_MAPPING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
docs		docs
images		images
logs		logs
src/longevity_forest		src/longevity_forest
.env.template		.env.template
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

longevity-genie/longevity-forest

Folders and files

Latest commit

History

Repository files navigation

LongevityForest AI Scientist Agent

LongevityForest science agents ecosystem

What is this?

Quick overview

Quick start

Prerequisites

Installation

Running gene analysis

Running protein degradation design (hunt-protein)

Running in-silico knockout analysis (insilico-knockout)

Output

Features

⚠️ Important Disclaimers

Resource Requirements

Workflow Limitations

Configuration

Main configuration files

Customising gene analysis

Project structure

Data flow

Example: NRF2 analysis

Dependencies

Environment setup

Advanced usage

Running with detailed logging

Intermediate results

Extending with new agents

Testing and validation

Use cases

Performance

Troubleshooting

"Agent with shortname X not found"

"API rate limit exceeded"

"REPORT_END marker not found"

UTF-8 Encoding Issues (Windows)

References

Architecture documentation

Contributing

License

Scientific use

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages