Skip to content

TxIdentityInfo fails to handle non-coding (NR_) transcripts due to non-nullable CDS fields #255

@nh13

Description

@nh13

Summary

Normalizing variants on non-coding RNA transcripts (NR_ accessions) fails with "problem accessing data" because TxIdentityInfo defines cds_start_i and cds_end_i as i32 instead of Option<i32>. Non-coding transcripts have NULL values for these fields in UTA, causing the database row conversion to fail.

Steps to Reproduce

use hgvs::parser::HgvsVariant;
use std::str::FromStr;

// Parse and normalize an NR_ variant
let var = HgvsVariant::from_str("NR_072979.2:n.2217G>A")?;
let normalized = normalizer.normalize(&var)?;  // Fails with "problem accessing data"

Expected Behavior

Normalization should succeed. The biocommons/hgvs Python library handles NR_ transcripts correctly.

Actual Behavior

Fails with error: Normalization error: problem accessing data

Root Cause

In src/data/interface.rs, lines 134-145:

pub struct TxIdentityInfo {
    pub tx_ac: String,
    pub alt_ac: String,
    pub alt_aln_method: String,
    pub cds_start_i: i32,      // Should be Option<i32>
    pub cds_end_i: i32,        // Should be Option<i32>
    pub lengths: Vec<i32>,
    pub hgnc: String,
    pub translation_table: TranslationTable,
}

Compare with TxInfoRecord (lines 155-163) which correctly uses Option<i32>:

pub struct TxInfoRecord {
    pub hgnc: String,
    pub cds_start_i: Option<i32>,  // Correctly nullable
    pub cds_end_i: Option<i32>,    // Correctly nullable
    // ...
}

The UTA database returns NULL for cds_start_i and cds_end_i for non-coding transcripts:

SELECT tx_ac, cds_start_i, cds_end_i FROM uta_20210129b.tx_def_summary_v WHERE tx_ac = 'NR_072979.2';
--     tx_ac    | cds_start_i | cds_end_i 
-- -------------+-------------+-----------
--  NR_072979.2 |             |           

Suggested Fix

  1. Change TxIdentityInfo to use Option<i32> for cds_start_i and cds_end_i
  2. Update the TryFrom<Row> implementation in src/data/uta.rs
  3. Update any code that uses these fields to handle None

Workaround

As a temporary workaround, you can set dummy values in the database:

UPDATE uta_20210129b.transcript SET cds_start_i = 0, cds_end_i = 0 WHERE ac LIKE 'NR_%' AND cds_start_i IS NULL;
REFRESH MATERIALIZED VIEW uta_20210129b.tx_def_summary_mv;

Environment

  • hgvs-rs version: 0.19.1
  • UTA schema: uta_20210129b

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions