You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For simple insertions and deletions in which either the REF or one of the ALT alleles would otherwise be null/empty, the REF and ALT Strings must include the base before the event (which must be reflected in the POS field), unless the event occurs at position 1 on the contig in which case it must include the base after the event...
Although, it's noticed that this padding base is not required, all of the INDEL files from Sanger, as long as the files mentioned as testing data for MEA, follow this padding rule. We can see it from the identical first base pair for REF and ALT columns:
(example is taken from 129S1_SvImJ.mgp.v5.indels.dbSNP142.normed.vcf.gz)
It means that the actual insertion/deletion for the above mentioned INDEL file are TTTG, T, GCG, CC
Running alea.jar for SNP files returns correct results, as long as it's just enough to substitute REF to ALT, but INDEL processing adds extra base pairs to output fasta file, which is not correct.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 129S1_SvImJ
1 60 . A AG 92 PASS INDEL;DP4=0,0,5,0;DP=5;CSQ=A||||intergenic_variant|||||||| GT:GQ:DP:MQ0F:GP:PL:AN:MQ:DV:DP4:SP:SGB:PV4:FI 1/1:19:5:0.2:134,19,0:120,15,0:2:32:5:0,0,5,0:0:-0.590765:.:1
1 110 . T TAAAAA 228 PASS INDEL;DP4=19,7,12,26;DP=64;CSQ=TTTTT||||intergenic_variant|||||||| GT:GQ:DP:MQ0F:GP:PL:AN:MQ:DV:DP4:SP:SGB:PV4:FI 1/1:19:64:0:278,19,0:255,11,0:2:57:38:19,7,12,26:27:-0.693143:.:1
1 210 . AGGAT A 228 PASS INDEL;DP4=1,0,25,21;DP=47;CSQ=-||||intergenic_variant|||||||| GT:GQ:DP:MQ0F:GP:PL:AN:MQ:DV:DP4:SP:SGB:PV4:FI 1/1:127:47:0:294,140,0:255,124,0:2:60:46:1,0,25,21:0:-0.693147:.:1
According to VCFv4.2 specification
Although, it's noticed that this padding base is not required, all of the INDEL files from Sanger, as long as the files mentioned as testing data for MEA, follow this padding rule. We can see it from the identical first base pair for REF and ALT columns:
(example is taken from 129S1_SvImJ.mgp.v5.indels.dbSNP142.normed.vcf.gz)
It means that the actual insertion/deletion for the above mentioned INDEL file are
TTTG
,T
,GCG
,CC
Running
alea.jar
for SNP files returns correct results, as long as it's just enough to substitute REF to ALT, but INDEL processing adds extra base pairs to output fasta file, which is not correct.Testing example:
Reference fasta file
reference_genome.fa
:Indels file
indels.vcf
(without header)Expected output:
Received output:
As we can see, INDEL processing was made exactly the same way as if it was SNP file (REF has been completely substituted by ALT)
AG
instead ofG
TAAAAA
instead ofAAAAA
A
instead of NULLThe text was updated successfully, but these errors were encountered: