This is the first clickhouse ready release of the IGVF Catalog backend.
Notable changes:
regulatory_regions renamed genomic_elements. BREAKING CHANGE
numeric fields with :long
have been changed to remove this string BREAKING CHANGE
underscores replaced with dash in API endpoints BREAKING CHANGE
Full rewrite of loading to JSONL
ClickhouseDB in alpha release, https://datastore.catalog.igvf.org/
partial release of Starita VAMP-seq data
All edges have “name” and “inverse name” representing semantic nature of connection
Bug fixes and API updates
- varaints/coding-variants API endpoint added
- coding-variants/phenotypes (VAMP-seq data, only for CYP2C19) API endpoint added
- pathways and pathways/pathways API endpoints added
- genes/predictions API endpoint added
- variant rsid added to /variants/phenotypes response
- motif endpoints return protein complexes
- added ClinGen allele registry numbers to variant nodes
LONG VERSION:
Release notes - Catalog - v0.4b
Story
DSERV-504 change coding_variants name/API
DSERV-507 coding_variants_proteins edge names
DSERV-508 complexes_proteins edge names
DSERV-510 GO annotations (complexes_terms, go_terms_annotations) edge names
DSERV-511 diseases_genes edge names
DSERV-512 genes_genes & mm_genes_mm_genes edge names
DSERV-514 genes_mm_genes edge names
DSERV-515 genes_pathways edge names
DSERV-516 genes_terms (depMap, rename to genes_biosamples) edge names
DSERV-517 genes_transcripts, transcripts_proteins edge names
DSERV-519 motifs_proteins edge names
DSERV-521 ontology_terms edge names
DSERV-523 pathways_pathways edge names
DSERV-524 proteins_proteins edge names
DSERV-527 fix all regulatory_region biosample hyper edges
DSERV-531 variants_coding_variants edge names
DSERV-532 variants_diseases + _genes edge names
DSERV-533 variants_drugs +_genes edge names
DSERV-534 variants_genes edge names + _terms (biosamples) (QTLs)
DSERV-535 variants_phenotypes (+_studies) GWAS edge names
DSERV-536 variants_proteins (+terms/biosamples) pQTLs/ASB edge names
DSERV-538 variants_variants edge names
DSERV-541 Add ClinGen allele registry to variants JSONL
DSERV-545 data loading inconsistence in eQTL, sQTL, caQTL and pQTL
DSERV-558 Create /variants/coding_variants endpoint
DSERV-564 Make gene query filters consistent for gene edge endpoints
DSERV-565 Add query filters for proteins/proteins endpoint
DSERV-567 Add biosample filter for regulatory_regions_genes endpoints
DSERV-569 set up monitoring for catalog servers
DSERV-570 load clickhouse db from JSONL
DSERV-571 test load arangodb from JSONL
DSERV-580 deduplicate pathway data
DSERV-582 ESLint not working from pre-commit
DSERV-589 Find adapters that use biocypher yield pattern and replace with JSON writing and loading
DSERV-590 gene related endpoints should avoid override behavior
DSERV-595 Create /genes/predictions endpoint similar to /variants/predictions
DSERV-596 Create aggregate allele frequency per region endpoint
DSERV-597 Build API for genes structure
DSERV-598 need edges from gene_structure collections to transcripts
DSERV-601 load SEM predictions from Boyle lab
DSERV-605 Create API for pathway
DSERV-609 update organism field for pathways, pathways_pathways, and genes_pathways.
DSERV-611 Load JSONLs into S3 for each collection and create a source file with data.igvf.org links for each dataset
DSERV-613 API for genes_pathways
DSERV-614 API for pathways_pathways
DSERV-615 load coding variants abundance scores from Starita
DSERV-619 adjust motifs/proteins end points to accomdate complexes
DSERV-623 remove proteins_transcripts from transcripts_proteins collection
DSERV-631 adjust variants/proteins end points to accommodate complexes
DSERV-632 Include variant rsid in the /variants/phenotype response
DSERV-637 create system for release tags or branches
DSERV-640 Refactor SEM motif and SEM prediction adapters.
DSERV-666 get rid of :long fields in adapters
DSERV-674 reload starita data while ennumerating amino acid variants based on 2x and 3x mutations
DSERV-680 get rid of :long fields in APIs
DSERV-705 update index used in API
DSERV-717 Rename regulatory regions API endpoints to genomic elements
DSERV-729 import genomic elements data into clickhouse
DSERV-740 simple coding_variants/phenotypes API
Task
DSERV-618 document JSONL loading process accounts, buckets, instances
Epic
DSERV-649 CATALOG: actually load all arrango collections from JSONL
Bug
DSERV-566 Fix orphanet_association_type filter for diseases/genes endpoint
DSERV-591 remove unused gnomad adapter
DSERV-592 ReactomePathway adapter failing due to 404 from Reactome API
DSERV-606 Variants/predictions endpoint failing for retired genes AND missing pagination query param
DSERV-621 genes/diseases working differently on filters from clinGen vs orphanet
DSERV-624 fix limit bug in query code
DSERV-625 in API for coding_variants coding_variants/variants changes
DSERV-629 replace underscores with dash in API
DSERV-702 fix collection variants_proteins and variants_genes index
DSERV-732 coding_variants_proteins collection points to ENSP not UNIPROT