LongevityForest is a multi-agent bioinformatics system for analysing protein structures, sequences, and functional outcomes in the context of longevity and ageing.
The LongevityForest science agents ecosystem is a set of tools for studying genes and proteins that influence lifespan. It currently includes:
- longevity_forest (this repository): multi-agent gene analysis system with specialised bioinformatics agents
- protein_hunter_mcp: MCP server for protein structure analysis, protein target selection, and targeted protein degradation design
- cell2sentence4longevity-mcp: MCP server for in-silico knockout experiments using the cell2sentence4longevity model to predict age from gene expression patterns
Used together, these tools link cellular observations, sequence analysis, and protein structure analysis across multiple biological scales.
This repository provides a delegated multi-agent architecture. Instead of a single monolithic agent, the system orchestrates seven specialised agents, each focused on specific databases or data sources.
The system can analyse a gene by integrating:
- Genomic sequences and orthologs (BioMART)
- Protein 3D structures and domains (AlphaFold, PDB, InterPro)
- Protein-protein interactions (STRING, OmniPath)
- Scientific literature and clinical trials (PubMed, EuropePMC)
- Longevity and aging data (OpenGenes)
- Functional variants and their effects (web search + databases)
The output is a markdown report with source attribution, structured in WikiCrow format.
Query Agent (Orchestrator)
├── Google Agent (web search)
├── Literature Agent (PubMed, clinical trials)
├── Structure Agent (3D structures, domains)
├── BioMART Agent (genomic sequences)
├── OpenGenes Agent (longevity/aging)
└── OmniPath Agent (pathways, interactions)
- Python 3.12+
uvpackage manager (install uv)- Environment variables configured (see Setup section)
# Clone the repository
git clone https://github.com/longevity-genie/longevity_forest
cd longevity_forest
# Install dependencies with uv
uv sync
# Copy .env.template to .env and fill in your API keys
cp .env.template .env
# Edit .env with your API keys:
# - ANTHROPIC_API_KEY (required) - Used by literature, structure, biomart, and query agents
# - GEMINI_API_KEY (required) - Used by google, opengenes, and omnipath agents
# - Google Cloud credentials (optional - for Vertex AI)
# - Other database credentials as neededNote: You can use either longevity_forest or the shorter alias forest for all commands.
# Analyze a specific gene (default: NRF2)
uv run forest analyze-gene
# or: uv run longevity_forest analyze-gene
# Analyze a specific gene by name
uv run forest analyze-gene TP53
# Analyze multiple genes
uv run forest analyze-genes NRF2 TP53 FOXO3
# note: can take long time and claude-credits heavy
# Available options:
# --config, -c: Path to configuration YAML file
# --cache/--no-cache: Enable/disable cached interim results (default: enabled)
# --debug, -d: Show debug information including tool distribution
# --show-history/--no-history: Display conversation history (default: enabled for single gene)# Design a degradation peptide for a target gene/protein (default: KLF6)
uv run forest hunt-protein
# or: uv run longevity_forest hunt-protein
# Design for a specific target
uv run forest hunt-protein TP53
# Available options:
# --config, -c: Path to protein hunter configuration YAML file
# --debug, -d: Show debug information
# --show-history/--no-history: Display conversation history after design (default: enabled)This workflow:
- Resolves gene names to protein sequences using UniProt
- Designs high-affinity protein binders using Boltz/Chai AI models
- Creates degradation adaptors by fusing ubiquitin to the binder
- Provides comprehensive reports with sequences, metrics, and structure files
Prerequisites: Ensure the cell2sentence4longevity MCP server is running:
# In the cell2sentence4longevity-mcp directory
uv run cell2sentence4longevity-mcp-run --host 0.0.0.0 --port 3002Usage:
# Perform in-silico knockout analysis (default: KLF6)
uv run forest insilico-knockout
# or: uv run longevity_forest insilico-knockout
# Analyze a specific gene
uv run forest insilico-knockout TP53
# Provide a custom gene expression sentence and metadata
uv run forest insilico-knockout KLF6 \
--gene-sentence "MT-CO1 FTL EEF1A1 HLA-B LST1 KLF6 S100A4 HLA-C" \
--sex female \
--tissue blood \
--cell-type "CD14-low, CD16-positive monocyte" \
--smoking-status 0
# Available options:
# --gene-sentence, -g: Gene expression sentence (space-separated, descending order)
# --sex, -s: Sex metadata (male/female)
# --tissue, -t: Tissue type (e.g., blood, brain, liver)
# --cell-type, -ct: Cell type (e.g., "CD14-low, CD16-positive monocyte")
# --smoking-status, -sm: Smoking status (0 = non-smoker, 1 = smoker)
# --config, -c: Path to configuration YAML file
# --debug, -d: Show debug information
# --show-history/--no-history: Display conversation history (default: enabled)This workflow will:
- Construct or use provided gene expression sentence from aging-related genes
- Simulate gene knockout by removing the specified gene
- Predict biological age before and after knockout using the Cell2Sentence4Longevity model
- Calculate delta age and interpret the gene's impact on aging:
- Positive delta: Gene knockout increases age (gene may be protective/anti-aging)
- Negative delta: Gene knockout decreases age (gene may be pro-aging)
- Near-zero delta: Gene has minimal impact on age prediction
- Generate comprehensive reports with biological context and interpretation
Example Output:
Results are saved to data/output/insilico_knockout_GENENAME_TIMESTAMP.md and include:
- Table comparing original vs knockout predictions
- Delta age calculation and interpretation
- Gene expression sentences (original and knockout)
- Biological context and known functions
- All metadata used in the analysis
Results are saved to data/output/ with format: GENENAME_TIMESTAMP.md
Example output structure:
# NRF2 - Sequence to Function Analysis
## 1. Sequences & Orthologs
## 2. Key Variants
## 3. Functional Domains
## 4. Interaction Network
## 5. Structural Modifications
## 6. References
- Multi-source data integration from several specialised biological databases
- Results backed by citations with PubMed IDs and DOIs
- Task-specific agents and prompts for different parts of the analysis
- Conversation history stored for transparency and debugging
- Architecture that makes it straightforward to add new agents or databases
- Reduced context size through delegation between agents
- Automatic continuation when a report is incomplete
- Intermediate results cached in
data/interim/for later inspection
This system provides three distinct agentic workflows with different resource requirements:
-
analyze-gene/analyze-genes(CPU only, runs locally)- Uses only LLM APIs (Anthropic Claude, Google Gemini)
- No GPU required
- Safe to run anytime, though may take time depending on gene complexity
- Can be run freely without resource concerns
-
hunt-protein(GPU-intensive)- Uses protein_hunter_mcp server
- Requires significant GPU VRAM, right now deployed at H100 instance together with cell2sentence4longevity model
- Takes 5-10 minutes per design
- Must be run carefully to avoid overloading the H100 instance
-
insilico-knockoutGPU-intensive- Uses cell2sequence4longevity-mcp server deployed at remote H100 instance
- Requires significant GPU VRAM, right now shares H100 instance with protein hunter mcp
- Takes 5-10 minutes per simulation
- Must be run carefully to avoid overloading the H100 instance
CRITICAL: The GPU-intensive workflows (hunt-protein and insilico-knockout) share the same H100 GPU instance and do not have advanced GPU VRAM management. Please run these workflows mindfully - avoid running multiple GPU-intensive tasks simultaneously to prevent out-of-memory errors.
analyze-geneandhunt-protein: Most complete workflows that can handle any gene/protein as inputinsilico-knockout: Currently limited to a predefined set of genes due to time constraints during development. Full gene coverage will be added in future releases.
src/longevity_forest/config/agents/web_search_delegated.yaml: Agent profiles and tool mappings (primary)src/longevity_forest/config/agents/web_search_full.yaml: Alternative monolithic configurationsrc/longevity_forest/config/llm.py: LLM settings (Anthropic Claude 4.5 Haiku)src/longevity_forest/config/prompts.py: System prompts for each agentsrc/longevity_forest/config/mcp.py: Database connections (BioMART, OpenGenes, etc.)
To analyze a different gene, use the CLI command:
uv run forest analyze-gene GENE_NAMETo customize the analysis prompt, edit src/longevity_forest/config/prompts.py:
def get_gene_analysis_prompt(gene_name: str) -> str:
return f"""For the gene {gene_name} retrieve or identify:
1) Known gene sequences & functional orthologs
2) Key variants with longevity implications
3) Interaction partners
4) Active/functional sites
5) Sequence modifications and effects
6) PDB structures
"""longevity_forest/
├── README.md # This file
├── pyproject.toml # Project metadata & dependencies
├── src/
│ └── longevity_forest/ # Main package
│ ├── __init__.py
│ ├── main.py # Entry point (CLI via entry point)
│ ├── config/
│ │ ├── llm.py # LLM configuration
│ │ ├── prompts.py # Agent system prompts
│ │ ├── mcp.py # Database MCPs (Model Context Protocols)
│ │ ├── gene_analysis_mcp.py # Slim MCPs for gene analysis
│ │ └── agents/
│ │ ├── web_search_delegated.yaml # Delegated architecture config
│ │ └── web_search_full.yaml # Monolithic architecture config (legacy)
│ └── core/
│ ├── helpers.py # Utility functions (save, validate, serialize)
│ └── experts.py # Agent delegation logic
├── data/
│ ├── input/ # Input data files
│ ├── interim/ # Intermediate cache (YAML & text outputs)
│ ├── output/ # Final markdown reports (*.md)
│ └── example/ # Example outputs
├── logs/ # Execution logs (JSON + text)
├── .env # Environment variables (API keys) - create from .env.template
└── .env.template # Template for environment variables with all required keys
- Input: Gene name (e.g., "NRF2")
- Orchestration: Query Agent delegates to 6 specialists
- Collection: Each agent queries its specialized databases
- Integration: Query Agent synthesizes findings
- Output: Markdown report with full citations
# Run default NRF2 analysis
uv run forest analyze-gene NRF2
# Or use the default (NRF2)
uv run forest analyze-geneThe default NRF2 analysis performs:
- BioMART lookup: Human ENSG00000116236, mouse/rat orthologs
- Literature search: ~500+ papers on NRF2 function and variants
- OpenGenes query: NRF2 association with longevity and aging
- Structure analysis: Domains, AlphaFold confidence, PDB codes
- OmniPath query: Antioxidant response elements, pathway context
- Integration: Cross-referenced findings with source attribution
The output is saved to data/output/NRF2_TIMESTAMP.md.
- just-agents >= 0.8.8: Multi-agent framework
- typer: CLI framework
- python-dotenv: Environment configuration
- win-unicode-console: Windows UTF-8 support
See pyproject.toml for complete dependency list.
This project uses two LLM providers: Anthropic (Claude) and Google Gemini. Different agents use different models:
- Anthropic Claude: Used by literature_agent, structure_agent, biomart_agent (Haiku), and query_agent (Sonnet)
- Google Gemini: Used by google_agent, opengenes_agent, and omnipath_agent (Gemini 2.5 Pro)
- Copy the template file:
cp .env.template .env-
Edit
.envand add your API keys. You need:- ANTHROPIC_API_KEY (required) - For Claude models
- GEMINI_API_KEY (required) - For Gemini models
Optional:
- Google Cloud credentials (GOOGLE_CLOUD_PROJECT, GOOGLE_API_KEY, etc.) - For Vertex AI usage
See .env.template for the complete list of configuration options.
The environment variables are automatically loaded when running the CLI:
from dotenv import load_dotenv
load_dotenv()Logs are automatically saved to logs/ directory:
logs/
├── TIMESTAMP_XXXX.log # Text logs
└── TIMESTAMP_XXXX.json.log # JSON formatted logs
To enable debug output, use the --debug flag:
uv run forest analyze-gene NRF2 --debugThis will show tool distribution across agents and other debugging information.
Cached intermediate results are stored in data/interim/:
interim/
├── *_result.txt # Agent output text
└── *.yaml # Agent memory (YAML serialized)
Use helper functions to inspect:
from longevity_forest.core.helpers import serialize_memory_to_yaml, serialize_content- Add agent profile to
config/agents/web_search_delegated.yaml - Define system prompt in
config/prompts.py - Configure tools/MCPs in
config/mcp.py - Query Agent will automatically delegate to new agent
All results are validated before completion:
if is_valid:
print(f"✓ Query result successfully saved and validated: {filepath}")
else:
print(f"⚠ Query result saved but validation had issues: {filepath}")Validation checks include:
- Markdown syntax integrity
- UTF-8 encoding correctness
- File write success
- Gene function analysis: sequence-to-function relationships
- Variant impact assessment: for genetic variants
- Longevity research: ageing-related genes and pathways
- Drug target analysis: for protein targets and interactions
- Protein degradation design (GPU): design targeted protein degraders using
hunt-protein - In-silico knockout (GPU, coming soon): simulate gene knockout effects at cellular level
- Literature mining: and research synthesis
- Structural bioinformatics: combining sequence and 3D structure data
- Context efficiency: 73–88% reduction compared to monolithic agents
- Token usage: roughly 3–5K tokens per gene analysis (vs 10–15K for monolithic setups)
- Execution time: typically 2–10 minutes depending on sources and gene complexity
- Automatic continuation for incomplete responses
- Cross-source validation to reduce hallucinations
- Verify agent is defined in
src/longevity_forest/config/agents/web_search_delegated.yaml - Check agent is loaded in
src/longevity_forest/main.pyagents list
- Wait before re-running
- Use cached intermediate results
- Consider parallel vs sequential agent calls
- System automatically continues generation
- Check logs in
logs/directory for details - Increase continuation attempts if needed
- System automatically reconfigures stdout/stderr to UTF-8
- Verify Windows locale settings support Unicode
- just-agents Documentation
- Model Context Protocols (MCP)
- BioMART
- OpenGenes
- OmniPath
- InterPro
- STRING Database
For detailed information about the system architecture, see the agent configuration files:
- Agent profiles:
src/longevity_forest/config/agents/web_search_delegated.yaml - Agent prompts:
src/longevity_forest/config/prompts.py - MCP configurations:
src/longevity_forest/config/mcp.py - Tool mappings:
docs/GENE_ANALYSIS_TOOL_MAPPING.md
To extend this system:
- Add new agent: Modify
src/longevity_forest/config/agents/web_search_delegated.yaml+ add prompt - Add new database: Create MCP in
src/longevity_forest/config/mcp.py - Modify analysis prompt: Edit
src/longevity_forest/config/prompts.pyfunctions - Change output format: Modify report generation in agents
See LICENSE file for details.
When publishing results based on this system:
- cite the original data sources
- verify important findings against the underlying literature
- treat the agent outputs as assistance for expert analysis, not a substitute for it
For agent behaviour configuration, see src/longevity_forest/config/prompts.py. For tool mappings, see docs/GENE_ANALYSIS_TOOL_MAPPING.md.
