Skip to content

Add GTF tabix adapter, inline GTF parser#5577

Open
cmdcolin wants to merge 2 commits into
mainfrom
gff3_cds_fix_gtf_tabix
Open

Add GTF tabix adapter, inline GTF parser#5577
cmdcolin wants to merge 2 commits into
mainfrom
gff3_cds_fix_gtf_tabix

Conversation

@cmdcolin
Copy link
Copy Markdown
Collaborator

@cmdcolin cmdcolin commented Jun 1, 2026

GTF: replace the heavyweight gtf-nostream dependency with an inlined parseGtf (parse + transcript_id grouping). Synthesized transcripts no longer leak the first CDS child's reading frame. Add GtfTabixAdapter (redispatch off spanning transcript/gene lines), a shared aggregateGtfFeatures helper, consistent attribute quote stripping, and route .gtf.gz through the tabix adapter.

also fixes a non-released (e.g. mainline only) gff-nostream regression

GFF3: bump to gff-nostream@3.0.10, which fixes multi-segment CDS sharing
one ID being dropped after the first segment. Add regression fixtures and
tests (plain + tabix) for the discontinuous-feature idiom, which the
existing Parent-only volvox data never exercised.

GTF: replace the heavyweight gtf-nostream dependency with an inlined
parseGtf (parse + transcript_id grouping). Synthesized transcripts no
longer leak the first CDS child's reading frame. Add GtfTabixAdapter
(redispatch off spanning transcript/gene lines), a shared
aggregateGtfFeatures helper, consistent attribute quote stripping, and
route .gtf.gz through the tabix adapter.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cmdcolin cmdcolin force-pushed the gff3_cds_fix_gtf_tabix branch from 3e3296c to e1be22d Compare June 1, 2026 21:03
…ingTie/UCSC demos

Remove volvox.sorted.gtf (bare exon/CDS, no spanning lines) from web/plugin/cli
test data and its volvox demo track. Add small real-format GTF excerpts of the
same TP53 hg19 model in four tool conventions, as config_demo.json tracks:
GENCODE and StringTie via GtfTabixAdapter (gene_name aggregation), AUGUSTUS via
GtfTabixAdapter (gene_id; redispatches off its spanning gene/transcript lines),
and UCSC genePredToGtf via the in-memory GtfAdapter (bare, so no tabix redispatch
anchor; gene_id).

aggregateGtfFeatures now also drops childless transcripts, e.g. AUGUSTUS's bare
`transcript` line whose 9th column has no parseable attributes.

Repoint the plugin and CLI GTF tests at a real GENCODE excerpt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cmdcolin cmdcolin force-pushed the gff3_cds_fix_gtf_tabix branch from e1be22d to 0350c14 Compare June 1, 2026 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant