Welcome to the Alliance of Genome Resources (Alliance) open data repository on AWS. This documentation helps you access and use comprehensive genomic, genetic, and molecular data from multiple model organisms.
The Alliance of Genome Resources is a consortium integrating data from leading model organism databases:
- Drosophila melanogaster and other Drosophila species
- Caenorhabditis elegans
- Danio rerio (zebrafish)
- Mus musculus (mouse)
- Rattus norvegicus (rat)
- Saccharomyces cerevisiae (yeast)
- Xenopus laevis and Xenopus tropicalis (frogs)
- Homo sapiens (human reference data)
Mission: Provide unified, high-quality genomic data to accelerate biological research and human disease understanding.
Visit the Alliance downloads page or explore the S3 buckets:
FlyBase Data (Public HTTPS):
# Browse via wget
wget -qO- https://s3ftp.flybase.org/releases/current/precomputed_files/ | grep '.tsv.gz'Alliance Data (S3):
# List Alliance releases
aws s3 ls s3://mod-datadumps/ --no-sign-request
# List disease data in latest release
aws s3 ls s3://mod-datadumps/8.3.0/DISEASE-ALLIANCE/ --no-sign-request# Download gene annotation IDs
wget https://s3ftp.flybase.org/releases/current/precomputed_files/genes/fbgn_annotation_ID_current.tsv.gz
# Decompress
gunzip fbgn_annotation_ID_current.tsv.gz
# View
head fbgn_annotation_ID_current.tsvCheck out TUTORIAL.md for step-by-step guides on:
- Downloading and accessing data
- Gene annotation lookups
- Disease gene discovery
- Expression analysis
- Building interaction networks
- Cross-species orthology mapping
Complete reference guide covering:
- Dataset Overview - Scale, update frequency, data categories
- S3 Bucket Structure - Directory organization and file naming
- Data Categories - Detailed descriptions of all data types:
- Gene annotations
- Expression data (bulk RNA-Seq, single-cell RNA-Seq)
- Disease associations
- Molecular and genetic interactions
- Orthology relationships
- Variants and alleles
- Genome sequences and annotations
- Access Methods - S3, HTTPS, Python boto3, PostgreSQL, API
- File Formats - TSV, JSON, FASTA, GFF3, GTF, VCF, MITAB
- Common Use Cases - Real-world examples
Hands-on tutorials with working code examples:
- Getting Started - Download your first dataset (10 min)
- Gene Annotation Lookups - ID conversion and batch processing (15 min)
- Disease Gene Discovery - Find disease-associated genes (20 min)
- RNA-Seq Expression Analysis - Analyze developmental expression (25 min)
- Protein Interaction Networks - Build and visualize PPI networks (20 min)
- Cross-Species Orthology - Map genes between species (15 min)
- Gene Annotations - IDs, symbols, descriptions, mappings
- Gene Ontology - Molecular function, biological process, cellular component
- Gene Groups - Pathway and functional groupings
- Bulk RNA-Seq - RPKM/FPKM expression matrices across tissues and stages
- Single-Cell RNA-Seq - Cell cluster expression from multiple datasets
- Curated Expression - Spatiotemporal expression patterns with ontology terms
- Disease Associations - Links to human diseases (Disease Ontology)
- Phenotypes - Mutant and variant phenotype annotations
- Human Disease Models - Model organism connections to human disease
- Physical Interactions - Protein-protein, protein-RNA, RNA-RNA (PSI-MI TAB format)
- Genetic Interactions - Suppression, enhancement, synthetic lethality
- Orthology - Cross-species gene relationships with DIOPT scores
- Paralogy - Within-species gene duplications
- VCF Files - Genomic variants in standard format
- Allele Annotations - Detailed allele and variant descriptions
- Genotype-Phenotype - Links between genetic changes and phenotypes
- FASTA - Chromosomes, genes, transcripts, proteins
- GFF3/GTF - Genome annotations for analysis pipelines
- Transposable Elements - TE sequences and insertion sites
- Alliance Portal: https://www.alliancegenome.org
- Downloads Page: https://www.alliancegenome.org/downloads
- FTP Browser: https://s3ftp.flybase.org/releases/current/
# Anonymous access - no AWS account needed
# FlyBase data
aws s3 ls s3://s3ftp.flybase.org/releases/current/precomputed_files/genes/ --no-sign-request
aws s3 cp s3://s3ftp.flybase.org/releases/current/precomputed_files/genes/fbgn_annotation_ID_current.tsv.gz . --no-sign-request
# Alliance data
aws s3 ls s3://mod-datadumps/8.3.0/DISEASE-ALLIANCE/COMBINED/ --no-sign-request
aws s3 cp s3://mod-datadumps/8.3.0/DISEASE-ALLIANCE/COMBINED/DISEASE-ALLIANCE_COMBINED_2.tsv.gz . --no-sign-request# wget
wget https://s3ftp.flybase.org/releases/current/precomputed_files/genes/fbgn_annotation_ID_current.tsv.gz
# curl
curl -O https://s3ftp.flybase.org/releases/current/precomputed_files/genes/fbgn_annotation_ID_current.tsv.gzimport boto3
from botocore import UNSIGNED
from botocore.client import Config
# Create S3 client with anonymous access
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))
# Download FlyBase data
s3.download_file('s3ftp.flybase.org',
'releases/current/precomputed_files/genes/fbgn_annotation_ID_current.tsv.gz',
'fbgn_annotation_ID_current.tsv.gz')
# Download Alliance data
s3.download_file('mod-datadumps',
'8.3.0/DISEASE-ALLIANCE/COMBINED/DISEASE-ALLIANCE_COMBINED_2.tsv.gz',
'DISEASE-ALLIANCE_COMBINED_2.tsv.gz')# Public read-only access
psql -h chado.flybase.org -U flybase flybase# Get gene information
curl https://www.alliancegenome.org/api/gene/FBgn0000001
# Search genes
curl https://www.alliancegenome.org/api/search?category=gene&q=whiteAlliance files follow predictable patterns:
<data_type>_<details>_current.tsv.gz
dmel-all-<feature>-current.fasta.gz
Examples:
fbgn_annotation_ID_current.tsv.gzgene_rpkm_matrix_current.tsv.gz
<data_type>_<details>_fb_YYYY_MM.tsv.gz
dmel-all-<feature>-rX.YY.fasta.gz
Examples:
fbgn_annotation_ID_fb_2023_06.tsv.gz(FB2023_06 release)dmel-all-chromosome-r6.55.fasta.gz(genome release 6.55)
Release Schedule:
- Major Releases: Quarterly (every ~3 months)
- Hot Fixes: As needed for critical corrections
- Continuous Updates: Daily for time-sensitive annotations
Versioning:
- Alliance releases:
FB[YEAR]_[MONTH](e.g., FB2023_06) - Genome releases:
r[MAJOR].[MINOR](e.g., r6.55)
Current vs. Archived:
/releases/current/- Always points to latest release/releases/FB2023_06/- Specific archived release
- Convert gene symbols to database IDs
- Retrieve gene descriptions and annotations
- Find genes in specific pathways or GO terms
- Identify genes associated with human diseases
- Find model organism disease models
- Map disease genes to orthologs
- Compare gene expression across developmental stages
- Analyze tissue-specific expression
- Explore single-cell expression patterns
- Build protein-protein interaction networks
- Analyze genetic interactions
- Find interaction partners for proteins of interest
- Map orthologs between species
- Find conserved genes and pathways
- Compare genomic features across model organisms
- Access genomic variants in VCF format
- Link variants to phenotypes
- Study allele effects
| Format | Description | Use Case |
|---|---|---|
| TSV | Tab-separated values | General data tables |
| JSON | JavaScript Object Notation | Structured data, API responses |
| FASTA | Sequence data | Genomic/protein sequences |
| GFF3 | Genome annotations | Genome browsers, analysis |
| GTF | Gene Transfer Format | RNA-Seq pipelines |
| VCF | Variant Call Format | Variant analysis |
| MITAB | PSI-MI TAB | Protein interactions |
| XML | Chado XML | Complete database dumps |
All compressed files use gzip (.gz extension).
- Web browser
- No special software needed
- Linux, Mac, or Windows (WSL)
wgetorcurlgunzip(usually pre-installed)
Python:
- Python 3.7+
- pandas, boto3, matplotlib, biopython
R:
- R 4.0+
- tidyverse, data.table
Tools:
- AWS CLI (optional, for S3 access)
- IGV, JBrowse (genome visualization)
- Cytoscape (network analysis)
Primary Citation:
Alliance of Genome Resources Consortium. Alliance of Genome Resources Portal: unified model organism research platform. Nucleic Acids Research (2023). https://doi.org/10.1093/nar/gkac1003
Most Alliance data is available under CC0 1.0 Universal (Public Domain Dedication). Some datasets may use CC-BY 4.0 (attribution required).
License Details: https://www.alliancegenome.org/terms-of-use
When publishing research using Alliance data:
- Cite the Alliance consortium paper (above)
- Include release version numbers for reproducibility
- Acknowledge specific data sources when applicable
- Link to https://www.alliancegenome.org in web applications
- Data Documentation: DATA_DOCUMENTATION.md
- Tutorials: TUTORIAL.md
- Alliance Homepage: https://www.alliancegenome.org
- API Docs: https://www.alliancegenome.org/api/swagger-ui
- Email: [email protected]
- GitHub Issues: https://github.com/alliance-genome/agr_java_software/issues
- Community Forum: https://community.alliancegenome.org/categories
- Facebook: https://www.facebook.com/AllianceOfGenomeResources
- Mastodon: https://genomic.social/@AllianceGenome
- Bluesky: https://bsky.app/profile/alliancegenome.bsky.social
This dataset is part of the AWS Open Data Sponsorship Program, which provides free hosting for publicly available high-value datasets.
Benefits:
- ✓ No AWS account required for downloads
- ✓ No data egress fees
- ✓ High-speed S3 access
- ✓ Global availability
- ✓ Automatic backups and archiving
Registry Entry: https://registry.opendata.aws/alliance-genome-resources/
| Data Type | File | Location |
|---|---|---|
| Gene IDs | fbgn_annotation_ID_*.tsv.gz |
precomputed_files/genes/ |
| Expression Matrix | gene_rpkm_matrix_*.tsv.gz |
precomputed_files/genes/ |
| Disease Data | disease_model_annotations_*.tsv.gz |
precomputed_files/disease/ |
| Interactions | physical_interactions_mitab_*.tsv.gz |
precomputed_files/interactions/ |
| Orthologs | dmel_human_orthologs_disease_*.tsv.gz |
precomputed_files/orthologs/ |
| Genome Sequence | dmel-all-chromosome-*.fasta.gz |
genomes/.../fasta/ |
| Genome Annotation | dmel-all-*.gff.gz |
genomes/.../gff/ |
# Download gene annotations via wget
wget https://s3ftp.flybase.org/releases/current/precomputed_files/genes/fbgn_annotation_ID_current.tsv.gz
# Download FlyBase data using AWS CLI
aws s3 cp s3://s3ftp.flybase.org/releases/current/precomputed_files/genes/fbgn_annotation_ID_current.tsv.gz . --no-sign-request
# Download Alliance data using AWS CLI
aws s3 cp s3://mod-datadumps/8.3.0/DISEASE-ALLIANCE/COMBINED/DISEASE-ALLIANCE_COMBINED_2.tsv.gz . --no-sign-request
# List files in FlyBase
aws s3 ls s3://s3ftp.flybase.org/releases/current/precomputed_files/genes/ --no-sign-request
# List files in Alliance
aws s3 ls s3://mod-datadumps/8.3.0/DISEASE-ALLIANCE/ --no-sign-request
# Decompress
gunzip fbgn_annotation_ID_current.tsv.gz
# View first 10 data rows (skip comment lines)
grep -v '^#' fbgn_annotation_ID_current.tsv | head -10- Initial documentation release
- Comprehensive data organization guide
- Six hands-on tutorials
- AWS S3 access instructions
- Registry of Open Data submission
We welcome feedback and contributions:
- Report Issues: Use GitHub issues for bugs or documentation improvements
- Suggest Tutorials: Email [email protected] with tutorial ideas
- Share Use Cases: Tell us how you're using Alliance data
- Contribute Code: Submit pull requests for example scripts
- WormBase: https://www.wormbase.org
- ZFIN: https://zfin.org
- MGI: http://www.informatics.jax.org
- RGD: https://rgd.mcw.edu
- SGD: https://www.yeastgenome.org
- Xenbase: http://www.xenbase.org
- Gene Ontology: http://geneontology.org
- Disease Ontology: https://disease-ontology.org
- UniProt: https://www.uniprot.org
- NCBI: https://www.ncbi.nlm.nih.gov
- JBrowse: https://jbrowse.org
- IGV: https://software.broadinstitute.org/software/igv/
- Cytoscape: https://cytoscape.org
Alliance of Genome Resources
- Website: https://www.alliancegenome.org
- Email: [email protected]
- Community Forum: https://community.alliancegenome.org/categories
- Facebook: https://www.facebook.com/AllianceOfGenomeResources
- Mastodon: https://genomic.social/@AllianceGenome
- Bluesky: https://bsky.app/profile/alliancegenome.bsky.social
- GitHub: https://github.com/alliance-genome
Documentation Version: 1.0 Last Updated: 2025-10-17 Maintained by: Alliance of Genome Resources Consortium