Add GTF tabix adapter, inline GTF parser#5577
Open
cmdcolin wants to merge 2 commits into
Open
Conversation
GFF3: bump to gff-nostream@3.0.10, which fixes multi-segment CDS sharing one ID being dropped after the first segment. Add regression fixtures and tests (plain + tabix) for the discontinuous-feature idiom, which the existing Parent-only volvox data never exercised. GTF: replace the heavyweight gtf-nostream dependency with an inlined parseGtf (parse + transcript_id grouping). Synthesized transcripts no longer leak the first CDS child's reading frame. Add GtfTabixAdapter (redispatch off spanning transcript/gene lines), a shared aggregateGtfFeatures helper, consistent attribute quote stripping, and route .gtf.gz through the tabix adapter. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3e3296c to
e1be22d
Compare
…ingTie/UCSC demos Remove volvox.sorted.gtf (bare exon/CDS, no spanning lines) from web/plugin/cli test data and its volvox demo track. Add small real-format GTF excerpts of the same TP53 hg19 model in four tool conventions, as config_demo.json tracks: GENCODE and StringTie via GtfTabixAdapter (gene_name aggregation), AUGUSTUS via GtfTabixAdapter (gene_id; redispatches off its spanning gene/transcript lines), and UCSC genePredToGtf via the in-memory GtfAdapter (bare, so no tabix redispatch anchor; gene_id). aggregateGtfFeatures now also drops childless transcripts, e.g. AUGUSTUS's bare `transcript` line whose 9th column has no parseable attributes. Repoint the plugin and CLI GTF tests at a real GENCODE excerpt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
e1be22d to
0350c14
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
GTF: replace the heavyweight gtf-nostream dependency with an inlined parseGtf (parse + transcript_id grouping). Synthesized transcripts no longer leak the first CDS child's reading frame. Add GtfTabixAdapter (redispatch off spanning transcript/gene lines), a shared aggregateGtfFeatures helper, consistent attribute quote stripping, and route .gtf.gz through the tabix adapter.
also fixes a non-released (e.g. mainline only) gff-nostream regression