-
Notifications
You must be signed in to change notification settings - Fork 13
Gene Feature Enumeration
sjmack edited this page Feb 21, 2015
·
2 revisions
A proposal has been made for a system of enumerated gene features (untranslated regions [UTRs], exons and introns) as an extension of the HLA allele nomenclature (http://biorxiv.org/content/early/2015/02/15/015222).
We expanded and refined the elements of the original GFE proposal as summarized here GFE_update_02202015.pdf.
- Change GFE notation for partial sequences from a decimal (e.g., 8.443) enumeration to a separate enumeration of partial sequences denoted with p, for 'partial' (e.g., p1, p2, p3). A partial sequence is defined as a sequence that is not full-length for a given feature due to a limitation of the typing methodology (e.g., different primer locations). Since a partial sequence can potentially match multiple full-length feature sequences, it may not be valid to identify a given partial sequence as a short version of a particular full-length feature.
- Treat unavailable/untyped/untested sequence for a feature as a partial sequence, and denote these as p0. Essentially, a unavailable sequence is a potential match to all full-length feature sequences.
- Treat indels as sequence variants and enumerate them as full sequences; these sequence are not full length for a given feature due to biological variation.
- Similarly treat deleted features as legitimate sequence variants and enumerate them as full sequences.
- Treat duplications of sequence features (e.g., two intron 1(i1) and exon 2 (e2) sequences) in a single gene as nucleotide variants of the second duplicated feature; see GFE_update_02202015.pdf. If i2 and e2 are duplicated (e.g., 5'UTR
e1i1e2i1e23'UTR), treat the second i1~e2 as part of the sequence of the first e2. This maintains the field structure for each gene. - Change the delimiter from colons (:) to semi-colons (;) to further distinguish GFE notation from allele names.
We also discussed ways to implement an effective GFE service, and apparent obstacles to an effective serivce.
- It is not clear how the respective 5' 3' ends of the 5' and 3' UTRs are defined in the IMGT/HLA Database. The basis of such definitions needs to be clarified for the purpose of defining a full length UTR sequence.
- In order to distinguish short feature sequences that distinguish legitimate length variants from partial sequences, the service will need to inspect short sequences for indels via comparison to a reference sequence.
- To persist enumerations (and therefore GFE notations) between IMGT/HLA Database release updates, all numbered full-length and partial GFEs should first be re-evaluated against the new database annotations; new, extended or deleted sequences in that database release are evaluated after all extant enumerations have been evaluated, and new (higher number) full-length and partial enumerations assigned.
- It would be effective to hash each feature sequence, and then enumerate each unique hashed sequence.
- Each hashed sequence feature, and its associated enumeration, should be maintained in the GFE service, even if it appears to have been superseded by a change in the reference database.
- Home
- DaSH 17 (Prague) 2024
- DaSH 16 (Stanford) 2024
- DaSH 15 (Utrecht) 2024
- DaSH 14 (Oklahoma City) 2024
- DaSH 13 (Rochester) 2023
- DASH VRS (Virtual) 2022
- DASSH3 (Virtual) 2020
- DASH12 (Virtual) 2022
- DASSH4 (Virtual) 2021
- DASH11 (Virtual) 2021
- DASSH3 (Virtual) 2020
- DASH10 (Virtual) 2020
- DASH Validation (Minneapolis) 2020
- DaSSH 2 (Minneapolis) 2019
- DASH9 (Denver) 2019
- DASH8 (Baltimore) 2018
- DASSH FHIR (Minneapolis) 2018
- DASH7 (Utrecht) 2017
- DASH IHIWS (Stanford) 2017
- DASH6 (Heidelberg) 2017
- DASH5 (Berkeley) 2017
- DASH4 (Vienna) 2016
- DASH3 (Minneapolis) 2016
- DASH2 (La Jolla) 2015
- DASH1 (Bethesda) 2014
- Preparing for the Hackathon
- Tool access
- Tools
- Data
- Github help